* outputs only the groups of read groups necessary, avoiding multiple pileup creations every call to map
* now also counts the number of variants associated with a given ROD (dbSNP) exist in the interval
* new column: interval size
* -g takes a string of read groups separated by space " "
* multiple -g creates multiple sum columns in the table
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
* Downsampling is now a parameter to the walker with default value of 0 (no downsampling)
* Downsampling selects reads at random at the variant region window and strives to achieve uniform coverage if possible around the desired downsampling value.
* Added integration test
* Knuth-shuffle is a simple, yet effective array permutator (hope this is good english).
* added a simple randomSubset that returns a random subset without repeats of any given array with the same probability for every permutation.
* added unit tests to both functions
* Modified cleanCigarShift to allow insertions in the beginning and end of the read
* Allowed cigars starting/ending in insertions in the systematic ReadClipper tests
* Updated all ReadClipper unit tests
* ReduceReads does not hard clip leading insertions by default anymore
* SlidingWindow adjusts start location if read starts with insertion
* SlidingWindow creates an empty element with insertions to the right
* Fixed all potential divide by zero with totalCount() (from BaseCounts)
* Updated all Integration tests
* Added new integration test for multiple interval reducing
-- Don't try to do nt 16, it's just too painful as the threading doesn't work well and it consumes a large chunk of our available slots on gsa4
-- bugfix: only do multi-threaded test for each iteration, not expanding by subiterations, so we no longer try to do 3x3 nt 16 runs
-- Automatic detection of most recent version of GATK release (just tell the script now to use 1.2, 1.3, and 1.4)
-- Uses 1.4 now
-- By default we do 9 runs of each non-parallel test
-- In PathUtils added convenience utility to find most recent release GATK jar with a specific release number
Turns out that because the RTC is the first walker to 'correctly' tree reduce according to functional programming
standards, the RTC has revealed a few problems with the tree reducer holding on to too much data. This is the first
and smaller of two commits to reduce memory consumption. The second commit will likely be pushed after GATK1.4 is
released.
- CoverageByRG now uses a hashmap for its value instead of a list. It runs about 4 times faster.
- Cleaned up some of the code
- CoverageByRG now calculates GCContent
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
WIll take any number of input bams and intervals
Returns a ReportTable with Average Coverage of each Read Group per Interval
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
* Creates an empty GATKSAMRecord with empty (not null) Cigar, bases and quals. Allows empty reads to be probed without breaking.
* All ReadClipper utilities now emit empty reads for fully clipped reads
* added javadoc to all methods
* added GATKDocs style documentation to the ReduceReadsWalker
* revised contracts and made explicit in the documentation
* Described the ReadClipper contract in the top of the class
* Added contracts where applicable
* Added descriptive information to all tools in the read clipper
* Organized public members and static methods together with the same javadoc