When aggregating raw BAM file spans into shards, the IntervalSharder tries to combine
file spans when it can. Unfortunately, the method that combines two BAM file
spans was seriously flawed, and would produce a truncated union if the file spans
overlapped in certain ways. This could cause entire regions of the BAM file containing
reads within the requested intervals to be dropped.
Modified GATKBAMFileSpan.union() to correct this problem, and added unit tests
to verify that the correct union is produced regardless of how the file spans
happen to overlap.
Thanks to Khalid, who did at least as much work on this bug as I did.
"ant alltests" is now "ant committests"
"ant alltests.public" is now "ant committests.public"
"ant alltests.gatk.packagejar" is now "ant releasetests.gatk.packagejar"
"ant alltests.queue.packagejar" is now "ant releasetests.queue.packagejar"
This is going into both Stable + Unstable so that all Bamboo
plans can be properly updated at the same time.
-- Missing BAMs were appearing as StingExceptions
-- Missing VCFs were showing up as CommandLineErrors, but it's clearer for them to be CouldNotReadInputFile exceptions
-- Added integration tests to ensure missing BAMs, VCFs, and -L files are properly thrown as CouldNotReadInputFile exceptions
-- Added path to standard b37 BAM to BaseTest
-- Cleaned up code in SAMDataSource, removing my parallel loading code as this just didn't prove to be useful.
When run without an explicit -onlyCoding option, as we've been doing up to
now, snpEff automatically sets -onlyCoding to "true" provided that there is
at least one transcript marked as "protein_coding", which will always be the
case for us in practice (and indeed, all pipeline runs so far with snpEff
2.0.5 have run with -onlyCoding auto-set to "true").
However, given the disastrous effect on annotation quality setting
"-onlyCoding false" has, we wish to be explicit with this option
rather than relying on snpEff's auto-detection logic.
-- Call sets with indels > 50 bp in length are tagged as CNVs in the tag (following the 1000 Genomes convention) and were unconditionally checking whether the CNV is already known, by looking at the known cnvs file, which is optional. Fixed. Has the annoying side effect that indels > 50bp in size are not counted as indels, and so are substrated from both the novel and known counts for indels. C'est la vie
-- Added integration test to check for this case, using Mauricio's most recent VCF file for NA12878 which has many large indels. Using this more recent and representative file probably a good idea for more future tests in VE and other tools. File is NA12878.HiSeq.WGS.b37_decoy.indel.recalibrated.vcf in Validation_Data
This error was due to the ReadClipper change of contract. Before the read utils would return null if a read was entirely clipped, now it returns an empty (safe) GATKSAMRecord.
Some tests in this class were intermittently not being executed due
to being randomly scheduled before tests whose results they depend on.
Now the serial dependencies are enforced to avoid problematic orderings.
Turns out that because the RTC is the first walker to 'correctly' tree reduce according to functional programming
standards, the RTC has revealed a few problems with the tree reducer holding on to too much data. This is the first
and smaller of two commits to reduce memory consumption. The second commit will likely be pushed after GATK1.4 is
released.
- CoverageByRG now uses a hashmap for its value instead of a list. It runs about 4 times faster.
- Cleaned up some of the code
- CoverageByRG now calculates GCContent
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
WIll take any number of input bams and intervals
Returns a ReportTable with Average Coverage of each Read Group per Interval
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
* Creates an empty GATKSAMRecord with empty (not null) Cigar, bases and quals. Allows empty reads to be probed without breaking.
* All ReadClipper utilities now emit empty reads for fully clipped reads
* added javadoc to all methods
* added GATKDocs style documentation to the ReduceReadsWalker
* revised contracts and made explicit in the documentation