-- If the VariantContext is a bi-allelic variant already, don't split up the VC (it doesn't do anything) and then combine it back together. This saves us a lot of work on average
-- Be more protective of calls to AFCalc with a VariantContext that might only have ref allele, throwing an exception
- Throws user exception if it is.
- Can be turned off with --allow_bqsr_on_reduced_bams_despite_repeated_warnings argument.
- Added test to check this is working.
- Added docs to BQSRReadTransformer explaining why this check is not performed on PrintReads end.
- Added small bug fix to GenomeAnalysisEngine that I uncovered in this process.
- Added comment about not changing the program record name, as per reviewer comments.
- Removed unused variable.
- I had added the framework in the VA engine but should not have hooked it up to the HC yet since the RefMetaDataTracker is always null.
- Added contracts and docs to the relevant methods in the VA engine so that this doesn't happen in the future.
- It's now written into the recal report so that it can be used in the PrintReads step.
- Note that we also now write the --deletions_default_quality value which accidentally wasn't being written before!
- Added tests to make sure that the value of the --maximum_cycle_value is being used properly by PR with -BQSR.
(This is my last non-branch commit; all future pushes will follow new GATK practices)
The migration of org.broadinstitute.variant into the Picard repo is
complete. This commit deletes the org.broadinstitute.variant sources
from our repo and replaces it with a jar built from a checkout of the
latest Picard-public svn revision.
- Uncovered small bug in the fix that I added yesterday, which is now fixed properly.
- Uncovered massive general bug: polyploid consensus is totally busted for deletions (because of call to read.getReadBases()[readPos]).
- Need to consult Mauricio on what to do here (are we supporting het compression for deletions? (Insertions are definitely not supported)
contain two columns, Sample (String) and Fraction (Double) that form the Sample-Fraction map for the per-sample AlleleBiasedDownsampling.
-Integration tests to UnifiedGenotyper (Using artificially contaminated BAMs created from a mixure of two broadly concented samples) were added
-includes throwing an exception in HC if called using per-sample contamination file (not implemented); tested in a new integration test.
-(Note: HaplotypeCaller already has "Flat" contamination--using the same fraction for all samples--what it doesn't have is
_per-sample_ AlleleBiasedDownsampling, which is what has been added here to the UnifiedGenotyper.
-New class: DefaultHashMap (a Defaulting HashMap...) and new function: loadContaminationFile (which reads a Sample-Fraction file and returns a map).
-Unit tests to the new class and function are provided.
-Added tests to see that malformed contamination files are found and that spaces and tabs are now read properly.
-Merged the integration tests that pertain to biased downsampling, whether HaplotypeCaller or unifiedGenotyper, into a new IntegrationTest class.
* Fixed implementation of polyploid (het) compression in RR.
* The test for a usable site was all wrong. Worked out details with Mauricio to get it right.
* Added comprehensive unit tests in HeaderElement class to make sure this is done right.
* Still need to add tests for the actual polyploid compression.
* No longer allow non-diploid het compression; I don't want to test/handle it, do you?
* Added nearly full coverage of tests for the BaseCounts class.
-- Testing that cycles in the reference graph fail graph construction appropriately.
-- Minor bug fix in assembly with reduced reads.
Added some docs and contracts to SimpleDeBruijnAssembler
Added a unit test to SimpleDeBruijnAssembler
Part 1 of Variant Annotator Unit tests: PerReadAlleleLikelihoodMap
- Added contract enforcement for public methods
- Refactored the conversion from read -> (allele -> likelihood) to allele -> list[read] into its own method
- added method documentation for non getters/setters
- finals, finals everywhere
- Add in a unit test for the PerReadAlleleLikelihoodMap. Complete coverage except for .clear() and a method that is a straight call into a separately-tested utility class.
- ReduceReads by default now sets up-front ReadWalker downsampling to 40x per start position.
- This is the value I used in my tests with Picard to show that memory issues pretty much disappeared.
- This should hopefully take care of the memory issues being reported on the forum.
- Added javadocs to SlidingWindow (the main RR class) to follow GATK conventions.
- Added more unit tests to increase coverage of BaseCounts class.
- Added more unit tests to test I/D operators in the SlidingWindow class.
- Added RR qual correctness tests (note that this is a case where we don't add code coverage but still need to test critical infrastructure).
- Also added minor cleanup of BaseUtils
I've confirmed via a script that all of these differences only
involve the version number bump in the BAM headers and nothing
else:
< @HD VN:1.0 GO:none SO:coordinate
---
> @HD VN:1.4 GO:none SO:coordinate
testing the adding of reads into the SlidingWindow plus consensus creation. Will flesh these out more after I take care of
some other items on my plate.
-- All functions tested. In the testing / review I discovered several bugs in the ActiveRegion routines that manipulate reads. New version should be correct
-- Enforce correct ordering of supporting states in constructor
-- Enforce read ordering when adding reads to an active region in add
-- Fix bug in HaplotypeCaller map with new updating read spans. Now get the full span before clipping down reads in map, so that variants are correctly placed w.r.t. the full reference sequence
-- Encapsulate isActive field with an accessor function
-- Make sure that all state lists are unmodifiable, and that the docs are clear about this
-- ActiveRegion equalsExceptReads is for testing only, so make it package protected
-- ActiveRegion.hardClipToRegion must resort reads as they can become out of order
-- Previous version of HC clipped reads but, due to clipping, these reads could no longer overlap the active region. The old version of HC kept these reads, while the enforced contracts on the ActiveRegion detected this was a problem and those reads are removed. Has a minor impact on PLs and RankSumTest values
-- Updating HaplotypeCaller MD5s to reflect changes to ActiveRegions read inclusion policy
Please check that your commit hook is properly pointing at ../../private/shell/pre-commit
Conflicts:
public/java/test/org/broadinstitute/variant/VariantBaseTest.java
-Moved some of the more specialized / complex VariantContext and VCF utility
methods back to the GATK.
-Due to this re-shuffling, was able to return things like the Pair class back
to the GATK as well.
One of the fixes was critical: SlidingWindow was not converting between global and relative positions correctly.
Besides not being correct, it was resulting in a massive slow down of the RR traversal.
That fix definitely breaks at least one of the integration tests, but it's not worth changing md5s now because I'll be
changing things all over RR for the next few days, so I am going to let that test fail indefinitely until I can confirm
general correctness of the tool.