- Fix missing closing code block tag
- SOR fix + minor fix to template
- Minor typo fixes
- Make no_cmdline_in_header arg visible in docs
- Caveats for UG-only args that are difficult to move out of common collection
- Reference -L and -XL in -gvcf documentation
- Annotation doc fixes (dsde-docs#1289)
- Unhide some ContEst args and make Advanced
- Fix inputPrior error message
- Update web urls
- Add JEXL doc link
- Fixed displaying of default values
- Removed code cruft
- Reorganized tooldoc categories and improved names
- Reorganized tools within categories where applicable
- Touched up various tool docs
- Switched default gatkdocs output to html
- Added parameter in agrregator pom to control output type
- Set gatkdocs publishing script to output php
- Deprecated GenotypeAndValidate walker
- Added back PhoneHome arguments with @Deprecated annotations
- Edited intervals merging docs for correctness & clarity
- Edited VQSR arg docs and made mode required (+added -mode SNP to VQSR tests)
- Moved PaperGenotyper to Toy Walkers to declutter the actually useful docs
- Moved GenotypeGVCFs to Variant Discovery category and clarified a few points
- Clarified that the -resource argument depends on using the -V:tag format
- Clarified how the pcr indel model works
- Added caveat for -U ALLOW_N_CIGAR_READS
- Added MathJax support for displaying equations in GATKDocs
- Updated HC example commands and caveats
-Added docs for ERC mode in HC
-Move RecalibrationPerformance walker since to private since it is experimental and unsupported
-Updated VR docs and restored percentBad/numBad (but @Hidden) to enable deprecation alert if users try to use them
-Improved error msg for conflict between per-interval aggregation and -nt
-Minor clean up in exception docs
-Added Toy Walkers category for devs and dev supercat (to build out docs for developers)
-Added more detailed info to GenotypeConcordance doc based on Chris forum post
-Added system to include min/max argument values in gatkdocs (build gatkdocs with 'ant gatkdocs' to test it, see engine and DoC args for in situ examples)
-Added tentative min/max argument annotations to DepthOfCoverage and CommandLineGATK arguments (and improved docs while at it)
-Added gotoDev annotation to GATKDocumentedFeature to track who is the go-to person in GSA for questions & issues about specific walkers/tools (now discreetly indicated in each gatkdoc)
The update contains:
1. documentation changes for VariantContext and Allele (which used to discuss the now obsolete null allele)
2. better error messages for VCFs containing complex rearrangements with breakends
3. instead of failing badly on format field lists with '.'s, just ignore them
Also, there is a trivial change to use a more efficient method to remove a bunch of attributes from a VC.
Delivers PT#s 59675378, 59496612, and 60524016.
* Refactoring implementations of readHeader(LineReader) -> readActualHeader(LineIterator), including nullary implementations where applicable.
* Galvanizing fo generic types.
* Test fixups, mostly to pass around LineIterators instead of LineReaders.
* New rev of tribble, which incorporates a fix that addresses a problem with TribbleIndexedFeatureReader reading a header twice in some instances.
* New rev of sam, to make AbstractIterator visible (was moved from picard -> sam in Tribble API refactor).
-- The previous version of TribbleIndexedFeatureReader.query() would open a RandomAccessFile each time the GATK crossed a shard boundary. When running with -L wex.intervals (or any time there were lots of intervals) we'd be opening and closing enormous numbers of files, radically slowing down the GATK. With these patched versions of Tribble we see something like the following performance improvements:
SelectVariants with -L wex.intervals on my local machine against non-local file
pre-patch => 3 hours
post-patch => 30 seconds
* As reported here: http://gatkforums.broadinstitute.org/discussion/comment/4270#Comment_4270
* This was a commit into the variant.jar; the changes here are a rev of that jar and handling of errors in VF
* Added integration test to confirm failure with User Error
* Removed illegal header line in KB test VCF that was causing related tests to fail.
GATK-73 updated docs for bqsr args
GATK-9 differentiate CountRODs from CountRODsByRef
GATK-76 generate GATKDoc for CatVariants
GATK-4 made resource arg required
GATK-10 added -o, some docs to CountMales; some docs to CountLoci
GATK-11 fixed by MC's -o change; straightened out the docs.
GATK-77 fixed references to wiki
GATK-76 Added Ami's doc block
GATK-14 Added note that these annotations can only be used with VariantAnnotator
GATK-15 specified required=false for two arguments
GATK-23 Added documentation block
GATK-33 Added documentation
GATK-34 Added documentation
GATK-32 Corrected arg name and docstring in DiffObjects
GATK-32 Added note to DO doc about reference (required but unused)
GATK-29 Added doc block to CountIntervals
GATK-31 Added @Output PrintStream to enable -o
GATK-35 Touched up docs
GATK-36 Touched up docs, specified verbosity is optional
GATK-60 Corrected GContent annot module location in gatkdocs
GATK-68 touched up docs and arg docstrings
GATK-16 Added note of caution about calling RODRequiringAnnotations as a group
GATK-61 Added run requirements (num samples, min genotype quality)
Tweaked template and generic doc block formatting (h2 to h3 titles)
GATK-62 Added a caveat to HR annot
Made experimental annotation hidden
GATK-75 Added setup info regarding BWA
GATK-22 Clarified some argument requirements
GATK-48 Clarified -G doc comments
GATK-67 Added arg requirement
GATK-58 Added annotation and usage docs
GSATDG-96 Corrected doc
Updated MD5 for DiffObjectsIntegrationTests (only change is link in table title)
These 2 changes improve runtime performance almost as much as Ryan's previous attempt (with ID-based comparisons):
* Don't unnecessarily overload Allele.getBases() in the Haplotype class.
* Haplotype.getBases() was calling clone() on the byte array.
* Added a constructor to Allele (and Haplotype) that takes in an Allele as input.
* It makes a copy of he given allele without having to go through the validation of the bases (since the Allele has already been validated).
* Rev'ed the variant jar accordingly.
For the reviewer: all tests passed before rebasing, so this should be good to go as far as correctness.
-- top-level walker type (locus, read etc)
-- parallelism options (nt or nct)
-- annotation type (for Variant Annotations)
-- downsampling settings that override engine defaults
-- reference window size
-- active region settings
-- partitionBy info
-- Added getClazzAnnotations() as hub to retrieve various annotations values and class properties through reflection
-- Added getReadFilters() method to retrieve Read Filter annotations
-- getReadFilters() uses recursion to walk up the inheritance to also capture superclass annotations
-- getClazzAnnotations() stores collected info in doc handler root, which is unit.forTemplate in Doclet
-- Modified FreeMarker template to use the Readfilters info (displayed after arg table, before additional capabilities)
-- Tadaaa :-) #GSATDG-63 resolve
-- Renamed ValidatePileup to CheckPileup since validation is reserved word
-- Renamed AlignmentValidation to CheckAlignment (same as above)
-- Refactored category definitions to use constants defined in HelpConstants
-- Fixed a couple of minor typos and an example error
-- Reorganized the GATKDocs index template to use supercategories
-- Refactored integration tests for renamed walkers (my earlier refactoring had screwed them up or not carried over)
The migration of org.broadinstitute.variant into the Picard repo is
complete. This commit deletes the org.broadinstitute.variant sources
from our repo and replaces it with a jar built from a checkout of the
latest Picard-public svn revision.
This is a necessary prerequisite for the org.broadinstitute.variant migration.
-Picard and sam-jdk go from version 1.67.1197 to 1.84.1337
-Picard-private goes from version 2375 to 2662
-Tribble goes from version 119 to 1.84.1337
-RADICALLY trimmed down the list of classes we extract from Picard-private
(jar goes from 326993 bytes to 6445 bytes!)
-- New tribble library now uses 64 bit sizes. The 26K VCF has so much data that low-level tribble block indices where overflowing their int size values. This includes a to-be-committed tribble jar that fixes this problem
-- See https://jira.broadinstitute.org/browse/GSA-652
-- Minor cleanup of error messages that were useful on the way to solving this monster problem
Added targets to build.xml to effectively 'mvn install' packaged GATK/Queue from ant.
TODO: Versions during 'mvn install' are hardcoded at 0.0.1 until a better versioning scheme that works with maven dependencies has been identified.
We will use DocumentedGATKFeatures to create categories in our documentation. Eric I guess will be in charge of this. We need to remove walkers and think how to categorize everything.
Tools can be hidden from GATKdocs with the @Hidden annotation
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
GATKDocs looks for a key on gsa4, and updates the forum with new walker if it exists.
More changes were made to the GATKDocs. Works nicely with bootstrap on and offline.
Cleaned up the code as well
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
* Did not touch archived walkers... those can be named whatever.
* Kept abstract classes that end in Walker untouched (e.g. LocusWalker, ReadWalker, ...)
* Renamed a few inner classes due to conflict when stripping off Walker from their outer classes: ContigStats, FlagStats and FastaStats.
Optimization for PositionalBufferedStream with specialized read(byte, int, int) method
-- For binary codecs having an efficient reader of lots of bytes that doesn't fall back into read() itself vastly improves performance. The old version was 10x slower than InputStream, while the new version is +30%.
-- Generalize PositionalBufferedStream main() method for performance testing, now accepts cmdline arguments for the file to read, how many iterations, etc
Generalize AsciiLineReader main() method for performance testing
-- Now accepts cmdline arguments for the file to read, how many iterations, etc
AsciiLineReaderTest and PositionBufferedStreamTest were in src not test/src
Scalability bugfixes; can issues tens of thousands of queries to an reader
without opening too many files
-- Fixed missing close() statement in TribbleIndexedFeatureReader
-- Fixed NPE in TabixIteratorLineReader
-- Added scalability test that confirms .query() failure and subsequent fix
Note this actually fixes a tested and reproducible scability issue. Might not be the only one but I believe it should do the trick. Sorry everyone for the inconvenience. Note that we now have a test in Tribble to ensure this doesn't happen again.
From tribble logs:
Binary feature support in tribble
-- Massive refactoring and cleanup
-- Many bug fixes throughout
-- FeatureCodec is now general, with decode etc. taking a PositionBufferedStream
as an argument not a String
-- See ExampleBinaryCodec for an example binary codec
-- AbstractAsciiFeatureCodec provides to its subclass the same String decode,
readHeader functionality before. Old ASCII codecs should inherit from this base
class, and will work without additional modifications
-- Split AsciiLineReader into a position tracking stream
(PositionalBufferedStream). The new AsciiLineReader takes as an argument a
PositionalBufferedStream and provides the readLine() functionality of before.
Could potentially use optimizations (its a TODO in the code)
-- The Positional interface includes some more functionality that's now
necessary to support the more general decoding of binary features
-- FeatureReaders now work using the general FeatureCodec interface, so they can
index binary features
-- Bugfixes to LinearIndexCreator off by 1 error in setting the end block
position
-- Deleted VariantType, since this wasn't used anywhere and it's a particularly
clean why of thinking about the problem
-- Moved DiploidGenotype, which is specific to Gelitext, to the gelitext package
-- TabixReader requires an AsciiFeatureCodec as it's currently only implemented
to handle line oriented records
-- Renamed AsciiFeatureReader to TribbleIndexedFeatureReader now that it handles
Ascii and binary features
-- Removed unused functions here and there as encountered
-- Fixed build.xml to be truly headless
-- FeatureCodec readHeader returns a FeatureCodecHeader obtain that contains a
value and the position in the file where the header ends (not inclusive).
TribbleReaders now skip the header if the position is set, so its no longer
necessary, if one implements the general readHeader(PositionalBufferedStream)
version to see header lines in the decode functions. Necessary for binary
codecs but a nice side benefit for ascii codecs as well
-- Cleaned up the IndexFactory interface so there's a truly general createIndex
function that takes the enumerated index type. Added a writeIndex() function
that writes an index to disk.
-- Vastly expanded the index unit tests and reader tests to really test linear,
interval, and tabix indexed files. Updated test.bed, and created a tabix
version of it as well.
-- Significant BinaryFeaturesTest suite.
-- Some test files have indent changes