Commit Graph

9688 Commits (c8cda209b53233f34f719d4e65dfaa929b7fa44a)

Author SHA1 Message Date
Joel Thibault c8cda209b5 Make intersection easy to change
Write each client's output to a different file
2012-06-12 15:46:34 -04:00
Eric Banks 1da3e43679 Wow, apparently it's way, way less efficient to iterate over Java Lists than native arrays. With this change and the bit fiddling, Ryan's 10-day test case now runs in 1 day. More to come. 2012-06-12 13:32:56 -04:00
Eric Banks a96c5da884 Oops, forgot to push the unit tests 2012-06-12 11:38:30 -04:00
Eric Banks a057cf31b3 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-06-12 11:02:50 -04:00
Eric Banks fec0bd5e11 Fixing UG argument docs 2012-06-12 09:46:16 -04:00
Eric Banks a4defdfb29 Adding a GT header line to SomaticIndelDetector output 2012-06-12 09:39:17 -04:00
Eric Banks 891ce51908 Refactoring of BQSRv2 to use longs (and standard bit fiddling techniques) instead of Java BitSets for performance improvements. 2012-06-12 09:19:36 -04:00
Mauricio Carneiro 0c98c34f5f Fixing bugs caught by Eric
* if a read starts with insertion, and is part of a read that overlaps both the consensus and the variant region, and the insertion is the first base of this read. Don't try to remove it.
   * keep the last header element in a variant region to prevent adjacent reads starting with insertions from breaking when trying to remove the header element before the insertion.
2012-06-11 17:32:49 -04:00
Mauricio Carneiro e0fbffac3a Fixing bugs caught by Eric
* if a read starts with insertion, and is part of a read that overlaps both the consensus and the variant region, and the insertion is the first base of this read. Don't try to remove it.
   *
2012-06-11 17:32:49 -04:00
Eric Banks ff5749599d Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-06-11 15:46:17 -04:00
Eric Banks fea625632f Don't use asList because it maintains an iterator to the original list and then the result can't be used to create a new one 2012-06-11 15:45:58 -04:00
Eric Banks 41ef22fe7c Re-enabling BQSR integration tests 2012-06-11 15:44:55 -04:00
Ryan Poplin e4d371dc80 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-06-11 10:38:50 -04:00
Ryan Poplin 683d4b508e Bug fix in fragment utils: the read name wasn't being set in the merged read. Misc minor updates to the HaplotypeCaller. 2012-06-11 10:38:35 -04:00
Mauricio Carneiro 4aad7e23ef New ReduceReads v2 with unclipped variant regions and soft-clipped bases
* Re-wrote the sliding window approach to allow the variant region not to clip the reads that overlap it.
   * Updated consensus to include only reads that were not passed on by the variant region, header counts are updated on the fly to avoid recompute
   * Added soft clipped bases to ReduceReads analysis by unclipping high quality soft-clips then re-clipping after reduce reads
   * Updated all integration tests
2012-06-08 14:58:31 -04:00
Mauricio Carneiro edf08ca928 Updates to ReduceReads
* Turn off the post-downsampler by default until new implementation.
   * optimized read clipping in consensus regions -- only clip once instead of every time the window slides.
   * extracted HeaderElement into it's own class.
   * moved RFA to archive
   * unified the context size of indels and mismatches
2012-06-08 14:57:34 -04:00
Eric Banks afa9b2718a Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-06-08 13:54:48 -04:00
Eric Banks 92280b4068 BQSR optimization: cache the BitSetUtils.bitSetFrom() calls since they are called over and over again with the same values. Another 10% reduction in runtime. 2012-06-08 13:54:37 -04:00
Eric Banks 898a0e6161 Minor optimizations 2012-06-08 12:07:58 -04:00
Ryan Poplin 0a37e19998 Bug fix in VQSR so that the VCF index will be created for the recalFile. 2012-06-08 11:51:28 -04:00
Eric Banks d463ab2cbf BQSR optimization: String manipulation is extremely expensive in Java (accounts for 8% of BQSR runtime). Instead use byte[] and StringBuilder when possible. 2012-06-08 10:42:42 -04:00
Eric Banks 2bd48a7351 Bad comments made it into the previous commit 2012-06-07 23:12:56 -04:00
Eric Banks 31c3a6be48 BQSR optimization: getRequiredCovariates() and getOptionalCovariates() were creating a new List every time they were being called, and unfortunately getRequiredCovariates().size() is used as the stop condition in for-loops throughout the code. Just maintaining the original list of covariates results in a 15% reduction in runtime for BQSR. 2012-06-07 20:04:10 -04:00
Eric Banks 0fb9179f76 BQSR optimization: don't clone the original quals for each read, we can just overwrite the original array 2012-06-07 19:41:03 -04:00
Joel Thibault 32e4fe5c87 Enable zero-sample testing 2012-06-07 14:43:13 -04:00
Joel Thibault fe9dc0cc3f Explicitly set intervals in new test
mem limit 4g
2012-06-07 14:43:13 -04:00
Ryan Poplin 0ac4ba9ad3 unintended change crept in 2012-06-07 10:59:04 -04:00
Ryan Poplin d449f169d3 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-06-07 10:56:55 -04:00
Ryan Poplin 0b4281fdd0 misc minor update to HC debug output for when there are a lot of samples 2012-06-07 10:56:41 -04:00
Eric Banks bad50a1b05 Fix docs 2012-06-06 22:45:38 -04:00
Eric Banks b093ba9dcc Stabilized NGSPlatform code: don't assume all reads have read groups (e.g. artificial SAM records) 2012-06-06 15:17:30 -04:00
Eric Banks 54f682a99c Unify to NGSPlatform framework. TechnologyComposition annotation now generalizes to Illumina and not just SLX. 2012-06-06 11:44:37 -04:00
Eric Banks dd46d843fb IR should skip Ion reads just like it does with 454 reads; Tim has confirmed that official platform name for Ion. 2012-06-06 11:04:55 -04:00
David Roazen b6a7c3f780 Merged bug fix from Stable into Unstable
Resolved merge conflict

Conflicts:
	public/java/test/org/broadinstitute/sting/BaseTest.java
2012-06-05 17:44:44 -04:00
David Roazen 3b2fab9a37 Update stable for shptmp -> hptmp migration 2012-06-05 17:38:06 -04:00
Guillermo del Angel 2cbd6e5f90 Merged bug fix from Stable into Unstable 2012-06-05 15:58:23 -04:00
Guillermo del Angel ce4dc2128d Adding minor clarification to -mbq argument documentation 2012-06-05 15:17:56 -04:00
Eric Banks e02ec8c8b6 Don't update the record ID unless we are actually going to emit the record 2012-06-04 14:58:50 -04:00
Eric Banks 8405156ae1 Refactored VariantsToTable so that 1) genotype-level fields can be specified (stabilized and supported code) and 2) the --moltenize argument could be supported to produce molten output of the data. Added tests that cover these capabilities. 2012-06-04 14:28:32 -04:00
Ryan Poplin f11e7ebc3a Fixing the previous fix related to clipping. Adding extra reference padding in the HaplotypeCaller to get those larger alleles during GGA. 2012-06-04 12:49:36 -04:00
Ryan Poplin 320956ee4b Bug fix in clipping function in ReadUtils for when the read ends at exactly the clipping boundary. Bug fixes in HaplotypeCaller GGA mode for when Smith-Waterman produces a different allele than what was given in the input alleles VCF. GGA mode now works with multiallelic records. Adding min pruning factor argument which is combined with the pruning factor that is determined dynamically by the coverage. 2012-06-04 10:55:36 -04:00
Guillermo del Angel 7a54baf08c Merged bug fix from Stable into Unstable 2012-06-03 08:42:08 -04:00
Guillermo del Angel 47df7bbc14 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2012-06-03 08:38:54 -04:00
Guillermo del Angel 2ddbdee3bc Fixed broken VariantEval stratifications VariantType and IndelSize - integration tests to follow 2012-06-03 08:38:38 -04:00
Mauricio Carneiro 12a8c54f9a Fixing VCF header for filter elements (thanks Eric) 2012-06-01 15:45:15 -04:00
David Roazen 3b20b4615d ant fasttest: for when you just can't wait
Cuts major corners for speed. Tests start in SECONDS instead of minutes.
SIGNIFICANT limitations (see below!)

Usage: ant fasttest -Dsingle=TestClass

The idea is that you do a regular "ant test -Dsingle=TestClass" (or "ant committests")
FIRST, then do "ant fasttest -Dsingle=TestClass" for all subsequent runs until
satisfied.

LIMITATIONS:

-REQUIRES that a full test build has already been done (using one of the
 test targets like committests, or a manual "ant test.compile").
-Java only
-Single test class only
-No contracts
-Build jars in dist/ not updated, only classes in build/
-Version number output at runtime may be incorrect
2012-06-01 15:17:18 -04:00
David Roazen 2e3af1a739 Enable contracts for the release jar tests (whoops...) 2012-06-01 13:08:46 -04:00
Guillermo del Angel fdf607c3a9 One-off walker but techdev may find it useful and expand on it: given a BAM, output series of intervals to file (as interval_list [default] or BED [if -bed used]) where there is coverage. Additionally, print out to stdout at the end total covered bases and total covered bases including soft-clips 2012-05-31 20:47:37 -04:00
Eric Banks 3a15ba2102 Malformed VCF headers should be User Errors 2012-05-31 16:05:53 -04:00
David Roazen d1822c926c Temporary fake release directories for Bamboo testing purposes 2012-05-31 16:04:07 -04:00