Commit Graph

744 Commits (d18dbcbac103c0ce8f0480e04efcdd00a50f3394)

Author SHA1 Message Date
Mark DePristo 3c37ea014b Retire original TraverseActiveRegion, leaving only the new optimized version
-- Required some updates to MD5s, which was unexpected, and will be sorted out later with more detailed unit tests
2013-01-15 10:24:45 -05:00
Mark DePristo 94cb50d3d6 Retire LegacyLocusIteratorByState
-- Left in the remaining infrastructure for David to remove, but the legacy downsampler is no longer a functional option in the GATK
2013-01-11 15:17:18 -05:00
Mark DePristo cc0c1b752a Delete old LocusIteratorByState, leaving only new LIBS and legacy 2013-01-11 15:17:18 -05:00
Mark DePristo b9a33d3c66 Split original and optimized ART into largely independent pieces
-- Allows us to cleanly run old and new art, which now have different traversal behavior (on purpose).  Split unit tests as well.
2013-01-11 15:17:18 -05:00
Mark DePristo 02130dfde7 Cleanup ART
-- Initialize routine captures essential information for running the traversal
2013-01-11 15:17:17 -05:00
Mark DePristo 9b2be795a7 Initial working version of new ActiveRegionTraversal based on the LocusIteratorByState read stream
-- Implemented as a subclass of TraverseActiveRegions
-- Passes all unit tests
-- Will be very slow -- needs logical fixes
2013-01-11 15:17:17 -05:00
Mark DePristo 0ac4352614 LIBS can now (optionally) track the unique reads it uses from the underlying read iterator
-- This capability is essential to provide an ordered set of used reads to downstream users of LIBS, such as ART, who want an efficient way to get the reads used in LIBS
-- Vastly expanded the multi-read, multi-sample LIBS unit tests to make sure this capability is working
-- Added createReadStream to ArtificialSAMUtils that makes it relatively easy to create multi-read, multi-sample read streams for testing
2013-01-11 15:17:16 -05:00
Mark DePristo b3ecfbfce8 Refactor LIBS into component parts, expand unit tests, some code cleanup
-- Split out all of the inner classes of LIBS into separate independent classes
-- Split / add unit tests for many of these components.
-- Radically expand unit tests for SAMRecordAlignmentState (the lowest level piece of code) making sure at least some of it works
-- No need to change unit tests or integration tests.  No change in functionality.
-- Added (currently disabled) code to track all submitted reads to LIBS, but this isn't accessible or tested
2013-01-11 15:17:16 -05:00
Mark DePristo b2990497e2 Refactor LIBS into utils.locusiterator before refactoring 2013-01-11 15:17:16 -05:00
Mauricio Carneiro 2a4ccfe6fd Updated all JAVA file licenses accordingly
GSATDG-5
2013-01-10 17:06:41 -05:00
Eric Banks 4fa439d89e Move some classes back to public because they are used in the engine. Move some test classes to protected. We should have no more public->protected dependancies now 2013-01-09 11:06:10 -05:00
Eric Banks b099e2b4ae Moving integration tests to protected 2013-01-08 09:34:08 -05:00
Eric Banks 35d9bd377c Moved (nearly) all Walkers from public to protected and removed GATKLite utils 2013-01-07 14:42:40 -05:00
Eric Banks b4e7b3d691 Fixed precision problem in the Bayesian calculation of Qemp: we need to cap below max integer because the MathUtils code add +1.
Added unit tests for handling large number of observations.
2013-01-07 13:07:36 -05:00
Eric Banks ef638489d5 Fixing BQSR gatherer test to keep up to date with latest changes 2013-01-06 14:07:59 -05:00
Chris Hartl 9df30880cb Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2013-01-04 17:15:22 -05:00
Joel Thibault 01738e70c3 Archive the experimental Active Region Traversals 2013-01-04 17:05:31 -05:00
Chris Hartl 41bc416b65 Remove AAL and update MD5s. 2013-01-04 16:46:14 -05:00
Joel Thibault ab5526b372 More TODOs 2013-01-04 14:09:02 -05:00
Tad Jordan fe06912a87 Removed sorting by row from walkers 2013-01-04 11:52:33 -05:00
Mark DePristo 810e2da1d4 Cleanup and unit tests for EventType and ReadRecalibrationInfo in BQSR
-- Added unit tests for EventType and ReadRecalibrationInfo
-- Simplified interface of EventType.  Previously this enum carried an index with it, but this is redundant with the enum.ordinal function.  Now just using that function instead.
2013-01-04 11:39:25 -05:00
Joel Thibault 319d651e4a Initial updates for ActiveRegionShard 2013-01-03 17:00:13 -05:00
Joel Thibault e7553545ef Initial updates for ReadShard 2013-01-03 17:00:13 -05:00
Joel Thibault 14a3ac0e3c Enable the use of alternate shards 2013-01-03 17:00:13 -05:00
Joel Thibault 47e620dfbc Create BAM index to test shard boundaries 2013-01-03 17:00:12 -05:00
Tad Jordan c1ba12d71a Added unit test for outputting sorted GATKReport Tables
- Made few small modifications to code
- Replaced the two arguments in GATKReportTable constructor with an enum used to specify way of sorting the table
2013-01-03 16:53:59 -05:00
Joel Thibault dcb7735d3c Active Region extensions must stay on contig 2013-01-02 14:46:24 -05:00
Chris Hartl 09199366b7 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2013-01-02 14:44:49 -05:00
Chris Hartl e1d09ab0db QD is now divided by the average length of the alternate allele (weighted by the allele count). The average length is stored in a related annotation, "AAL", which can be used to re-compute the "old" QD by simple multiplication. Integration tests *should* all pass. 2013-01-02 14:41:29 -05:00
Joel Thibault a15f368bdc Re-enable testIsActiveRangeLow/High 2013-01-02 11:57:50 -05:00
Joel Thibault 429567cd3f Rename to TraverseActiveRegionsUnitTest 2013-01-01 19:20:30 -05:00
Joel Thibault 57d38aac8a Temporarily disable due to unknown contracts problem 2013-01-01 19:20:04 -05:00
Joel Thibault 7748b3816f Delete the test BAI file as well as the BAM 2013-01-01 19:20:02 -05:00
Joel Thibault 5afeb465aa TODOs 2013-01-01 19:19:17 -05:00
Mark DePristo 7d250a789a ArtificialReadPileupTestProvider now creates GATKSamRecords with good header values 2012-12-24 13:35:57 -05:00
Tad Jordan b491c177ff Added functionality of outputting sorted GATKReport Tables
- Added an optional argument to BaseRecalibrator to produce sorted GATKReport Tables
- Modified BSQR Integration Tests to include the optional argument. Tests now produce sorted tables
2012-12-20 14:02:21 -05:00
David Roazen 07b369ca7e Move VCF/BCF2/VariantContext to new standalone org.broadinstitute.variant package
This is an intermediate commit so that there is a record of these changes in our
commit history. Next step is to isolate the test classes as well, and then move
the entire package to the Picard repository and replace it with a jar in our repo.

-Removed all dependencies on org.broadinstitute.sting (still need to do the test classes,
though)

-Had to split some of the utility classes into "GATK-specific" vs generic methods
(eg., GATKVCFUtils vs. VCFUtils)

-Placement of some methods and choice of exception classes to replace the StingExceptions
and UserExceptions may need to be tweaked until everyone is happy, but this can be
done after the move.
2012-12-19 10:25:22 -05:00
Joel Thibault a29df3e094 oops 2012-12-18 19:03:12 -05:00
Joel Thibault ee22c1bf44 More TODOs 2012-12-18 18:47:43 -05:00
Joel Thibault 2b1db519d7 Add reads which overstep a boundary by a single base 2012-12-18 18:47:43 -05:00
Joel Thibault 9828b2990f Reads off the end of a contig fail SAM validation when using actual BAMs 2012-12-18 18:47:43 -05:00
Joel Thibault 72e2394b26 Create actual BAM 2012-12-18 18:47:43 -05:00
Joel Thibault d69d1f8988 Fun with varargs 2012-12-18 18:47:42 -05:00
Joel Thibault 1158c1529f Refactor region/read comparisons 2012-12-18 18:47:42 -05:00
Yossi Farjoun 19dd2d628a some changes.
some changes.
2012-12-14 17:21:32 -05:00
Eric Banks 696bf95fba Fix for PBT bug reported on the forum: the AD is actually output correctly now (rather than with 'null' or some gibberish memory pointer). 2012-12-13 23:28:30 +00:00
Ami Levy-Moonshine 2f99569dda change the md5 in one of the CV intergration tests, since it wasn't use the priority list when printing the origin of the annotation (the setValue field) 2012-12-10 22:48:15 -05:00
David Roazen 46edab6d6a Use the new downsampling implementation by default
-Switch back to the old implementation, if needed, with --use_legacy_downsampler

-LocusIteratorByStateExperimental becomes the new LocusIteratorByState, and
the original LocusIteratorByState becomes LegacyLocusIteratorByState

-Similarly, the ExperimentalReadShardBalancer becomes the new ReadShardBalancer,
with the old one renamed to LegacyReadShardBalancer

-Performance improvements: locus traversals used to be 20% slower in the new
downsampling implementation, now they are roughly the same speed.

-Tests show a very high level of concordance with UG calls from the previous
implementation, with some new calls and edge cases that still require more examination.

-With the new implementation, can now use -dcov with ReadWalkers to set a limit
on the max # of reads per alignment start position per sample. Appropriate value
for ReadWalker dcov may be in the single digits for some tools, but this too
requires more investigation.
2012-12-10 09:44:50 -05:00
Eric Banks 574d5b467f Bug fix for indel HMM: protect against situation where long reads (e.g. Sanger) in a pileup can lead to a read starting after the haplotype end for a given haplotype. 2012-12-09 02:09:34 -05:00
Mark DePristo dbf721968d PrintReads large-scale test to protect against another major low-level performance issue 2012-12-05 21:36:27 -05:00