* This argument forces GATK to always write every record in the VCF format field, even if some records at the end are missing and could be removed
* Revved htsjdk and picard
* PT 70993484
Changes:
-------
* Updated current unit and integration test to use the new API components.
* Added unit tests for new classes AFPriorProvider and AFCalculatorProviders.
* Added integration test for mixed ploidy GenotypeGVCFs and CombineGVCFs
Changes:
-------
* GenotypingEngine uses now a AFCalc provider instead of
its own thread-local with one-time initialized and fixed
AF calculator.
* All walkers that use a GenotypingEngine now are passing
the appropiate AF calculator provider. For now most
just use a fix calculator (FixedAFCalculatorProvider)
except GenotypeGVCFs as this one now can cope with
mixture of ploidies failing-over to a general-ploidy
calculator when the preferred implementation is not
capable to handle a site's analysis.
* Arguments involved are --no_cmdline_in_header, --sites_only, and --bcf for VCF files and --bam_compression, --simplifyBAM, --disable_bam_indexing, and --generate_md5 for BAM files
* PT 52740563
* Removed ReadUtils.createSAMFileWriterWithCompression(), replaced with ReadUtils.createSAMFileWriter(), which applies all appropriate engine-level arguments
* Replaced hard-coded field names in ArgumentDefinitionField (Queue extension generator) with a Reflections-based lookup that will fail noisily during extension generation if there's an error
Explicitly including gatk/queue test-jar artifacts in package test classpaths.
SelectVariantsIntegrationTest#testInvalidJexl now resets the JexlEngine silent flag that VariantFiltration.initialize() toggles.
External example no longer tries to unpack nonexistent gatk artifact jars during package tests.
Same changes fixed the problem for GenotypeGVCFs and CombineGVCFs.
Stories:
- https://www.pivotaltracker.com/story/show/77626044
- https://www.pivotaltracker.com/story/show/77626854
Changes:
- Generalized the code for the merging in GATKVariantContextUtils to cope
with ploidy != 2.
- GenotypeGVCFs now check that the input's ploidy conform to the '-ploidy'
argument.
- Moved out Refernce Confidence VC merging code from GATKVariantContextUtils
so that we can keep new code in protected.
Caveats:
- GenotypeGVCFs only can deal with input files that have the same ploidy in
all positions; the one that the user MUST indicate in the -ploidy argument
(if different to the default 2).
- CombineGVCFs won't necessarely complain if its passed mixed ploidy
inputs but you won't be able to genotype it with GenotypeGVCFs.
Test:
- Removed deprecated unit tests for GATKVariantContextUtils.
- Moved unit-tests regarding GVCF merging from GATKVariantContextUtilsUnitTest
to ReferenceConfidenceVariantContextUtilsUnitTest.
- Added unit test for new code for mapping genotype indices between allele
index encoding in GenotypeLikelihoodCalculator.
- GenotypeGVCFs and CombineGVCFs original integration test are unaffected
by the change.
- Added tetraploid run integration tests to check on non-diploid execution
of GenotypeGVCFs and CombineGVCFs.
Changed tests and scripts to use gatkdir full path instead of relative testdata/qscripts symbolic links.
Although symlinks not created, left the symlink deletion script execution with a comment about future removal.
Re-enabled example UG pipeline queue test.
Replaced all hardcoded strings of {public,private}/testdata with BaseTest variables.
Refactored temp list creation method from ListFileUtilsUnitTest to BaseTest.createTempListFile.
Removed list files with hardcoded paths, now using createTempListFile instead with private test dir variable.
Story:
https://www.pivotaltracker.com/story/show/77250524
Changes:
- Remove the annotating code in GeneralPloidyExactAFCalc (GPEAFC) class.
- Added the asAlleleList to GenotypeAlleleCounts class and get (GPEAFC) to use that instead of implementing its own (nicer and more reusable code).
- Removed the explicit addition of AlleleCountBySample fields to the VCF header by the walker initialize
- Added utility methods in Utils to wrap and int[] array into a List<Integer>, and double[] array into a List<Double> efficiently.
Test:
- Added unit-testing for asAlleleList in GenotypeAlleleCountsUnitTest (within testFirst and testNext).
- Added unit-testing for new methods in Utils : asList(int[]) and asList(double[])
- Changed UG General Ploidy test to add explicitly those annotations.
- Non-trivial changes in integration tests involving non-diploid runs (namelly haploid and tetraploid) as they are not showing
those annotations anylonger, so the MD5s have been changed accordingly.
Changes in several walker to use new sample, allele closed lists and new GenotypingEngine constructors signatures
Rebase adoption of new calculation system in walkers
If any pair of variants occurs on all used haplotypes together, then we propagate that information into the gVCF.
Can be enabled with the --tryPhysicalPhasing argument.
- Read groups that are excluded by sample_name, platform, or read_group arguments no longer appear in the header
- The performance penalty associated with filtering by read group has been essentially eliminated
- Partial fulfillment of PT 73075482
Stories:
https://www.pivotaltracker.com/story/show/70222086https://www.pivotaltracker.com/story/show/67961652
Changes:
Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM.
Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set.
Updated some integration test md5s.
Fixing GraphBased bugs with new master code
Fixed ReadLikelihoods.changeReads difficult to spot bug.
Changed PairHMM interface to fix a bug
Fixed missing changes for various PairHMM implementations to get them to use the new structure.
Fixed various bugs only detectable when running with full sample(s).
Believe to have fixed the lack of annotations in UG runs
Fixed integrationt test MD5s
Updating some md5s
Fixed yet another md5 probably left out by mistake
The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably.
Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset
i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change.
Stories:
https://www.pivotaltracker.com/story/show/70222086https://www.pivotaltracker.com/story/show/67961652
Changes:
- ReadLikelihoods added to substitute Map<String,PerSampleReadLikelihoods>
- Operation that involve changes in full sets of ReadLikelihoods have been moved into that class.
- Simplified a bit the code that handles the downsampling of reads based on contamination
Caveats:
- Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.
In particular, it was possible to specify arguments for Files or Compound types without values
Added a special "none" value for annotations, since a bare "-A" is no longer allowed
Delivers PT 71792842 and 59360374
ValidationStringency was moved from htsjdk.samtools.SAMFileReader to htsjdk.samtools
samtools find BAM index file method was also moved (and made public!)
- Edited intervals merging docs for correctness & clarity
- Edited VQSR arg docs and made mode required (+added -mode SNP to VQSR tests)
- Moved PaperGenotyper to Toy Walkers to declutter the actually useful docs
- Moved GenotypeGVCFs to Variant Discovery category and clarified a few points
- Clarified that the -resource argument depends on using the -V:tag format
- Clarified how the pcr indel model works
- Added caveat for -U ALLOW_N_CIGAR_READS
- Added MathJax support for displaying equations in GATKDocs
- Updated HC example commands and caveats