Commit Graph

13 Commits (c103623cf69b272706da0b3748c3be6a2425c4fb)

Author SHA1 Message Date
Mauricio Carneiro 2a4ccfe6fd Updated all JAVA file licenses accordingly
GSATDG-5
2013-01-10 17:06:41 -05:00
Ryan Poplin 57d997f06f Fixing bug from when FragmentUtils merging function moved over to the soft clipped start instead of the unclipped start 2012-08-30 10:10:43 -04:00
Guillermo del Angel 9e25b209e0 First pass of implementation of Reduced Reads with HaplotypeCaller. Main changes: a) Active region: scale PL's by representative count to determine whether region is active. b) Scale per-read, per-haplotype likelihoods by read representative counts. A read representative count is (temporarily) defined as the average representative count over all bases in read, TBD whether this is good enough to avoid biases in GL's. c) DeBruijn assembler inserts kmers N times in graph, where N is min representative count of read over kmer span - TBD again whether this is the best approach. d) Bug fixes in FragmentUtils: logic to merge fragments was wrong in cases where there is discrepancy of overlaps between unclipped/soft clipped bases. Didn't affect things before but RR makes prevalence of hard-clipped bases in CIGARs more prevalent so this was exposed. e) Cache read representative counts along with read likelihoods associated with a Haplotype. Code can/should be cleaned up and unified with PairHMMIndelErrorModelCode, as well as refactored to support arbitrary ploidy in HaplotypeCaller 2012-08-03 12:24:23 -04:00
Mark DePristo e00ed8bc5e Cleanup BQSR classes
-- Moved most of BQSR classes (which are used throughout the codebase) to utils.recalibration.  It's better in my opinion to keep commonly used code in utils, and only specialized code in walkers.  As code becomes embedded throughout GATK its should be refactored to live in utils
-- Removed unncessary imports of BQSR in VQSR v3
-- Now ready to refactor QualQuantizer and unit test into a subclass of RecalDatum, refactor unit tests into RecalDatum unit tests, and generalize into hierarchical recal datum that can be used in QualQuantizer and the analysis of adaptive context covariate
-- Update PluginManager to sort the plugins and interfaces.  This allows us to have a deterministic order in which the plugin classes come back, which caused BQSR integration tests to temporarily change because I moved my classes around a bit.
2012-07-31 08:11:03 -04:00
Ryan Poplin 683d4b508e Bug fix in fragment utils: the read name wasn't being set in the merged read. Misc minor updates to the HaplotypeCaller. 2012-06-11 10:38:35 -04:00
Ryan Poplin 0cdadffe14 Committing the best of the frantic pre-CSHL experiments: Better algorithm for partioning reads amongst the alleles they support. Require the read's original alignment to actually overlap the variant. QD uses the non-informative reads when calculating D. More HC-specific annotations for potential use in a statistical filtering strategy. Increasing the minimum kmer length in the assembly graphs. Misc minor bug fixes. 2012-05-10 13:09:03 -04:00
Mauricio Carneiro 3bfca0ccfd BitSet implementation of the on-the-fly recalibration using the CSV format file.
Infrastructure:
   * Added static interface to all different clipping algorithms of low quality tail clipping
   * Added reverse direction pileup element event lookup (indels) to the PileupElement and LocusIteratorByState
   * Complete refactor of the KeyManager. Much cleaner implementation that handles keys with no optional covariates (necessary for on-the-fly recalibration)
   * EventType is now an independent enum with added capabilities. All functionality is now centralized.

 BQSR and RecalibrateBases:
   * On-the-fly recalibration is now generic and uses the same bit set structure as BQSR for a reduced memory footprint
   * Refactored the object creation to take advantage of the compact key structure
   * Replaced nested hash maps with single hash maps indexed by bitsets
   * Eliminated low quality tails from the context covariate (using ReadClipper's write N's algorithm).
   * Excluded contexts with N's from the output file.
   * Fixed cycle covariate for discrete platforms (need to check flow cycle platforms now!)
   * Redfined error for indels to look at the previous base in negative strand reads (using new PE functionality)
   * Added the covariate ID (for optional covariates) to the output for disambiguation purposes
   * Refactored CovariateKeySet -- eventType functionality is now handled by the EventType enum.
   * Reduced memory usage of the BQSR script to 4

 Tests:
   * Refactored BQSRKeyManagerUnitTest to handle the new implementation of the key manager
   * Added tests for keys without optional covariates
   * Added tests for on-the-fly recalibration (but more tests are necessary)
2012-03-16 13:02:15 -04:00
Ryan Poplin c6ded4d23c Bug fix for hard clipping reads when base insertion and base deletion qualities are present in the read. Updating HaplotypeCaller integration tests to reflect all the recent changes. 2012-03-05 17:54:42 -05:00
Ryan Poplin 41ffd08d53 On the fly base quality score recalibration now happens up front in a SAMIterator on input instead of in a lazy-loading fashion if the BQSR table is provided as an engine argument. On the fly recalibration is now completely hooked up and live. 2012-02-13 12:35:09 -05:00
Ryan Poplin 4d6312d4ea HaplotypeCaller is now an ActiveRegionWalker. 2012-01-22 14:31:01 -05:00
Mauricio Carneiro e89ff063fc GATKSAMRecord refactor
The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...).

* No tools should create SAMRecord anymore, use GATKSAMRecord instead *
2011-11-03 15:43:26 -04:00
Eric Banks 54331b44e9 New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths. 2011-11-02 12:47:30 -04:00
Mark DePristo 034a997d07 Generalized Reads -> Fragment calculation
-- Supports ReadBackedPileup -> FragmentCollection as before
-- Added support for List<SAMRecord> -> FragmentCollection for Ryan's haplotype caller
-- General cleanup, renaming, move to separate package, more extensive unit tests, etc.
-- Added toFragment() function to ReadBackedPileup interface
2011-10-26 15:54:38 -04:00