Commit Graph

8462 Commits (c1eaf7cf81e8606a43b3cc43b2d2b4eb255667ea)

Author SHA1 Message Date
Mauricio Carneiro c1eaf7cf81 ReduceReads will allows different context sizes for different events
* Rename contextSize to contextSizeMismatches
 * Indel context size is now different from mismatches context size
2011-12-26 21:17:29 -05:00
Mauricio Carneiro 4633637af6 Moved ReduceReads to static ReadClipper
* all clipping done in ReduceReads is done using the static methods of the ReadClipper now.
2011-12-26 21:14:40 -05:00
Mauricio Carneiro 9aa1c0c6e5 Better documentation and contracts for ReduceReads
* added javadoc to all methods
  * added GATKDocs style documentation to the ReduceReadsWalker
  * revised contracts and made explicit in the documentation
2011-12-26 21:12:23 -05:00
Mauricio Carneiro 3051cdf9c5 fixed reduced reads integration tests 2011-12-26 21:12:22 -05:00
Mauricio Carneiro 256a7d8bd2 fixing the arguments for RRead script 2011-12-26 21:12:22 -05:00
Eric Banks dd990061f6 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-26 14:45:35 -05:00
Eric Banks 2130b39f33 Found the bug in the engine: RodLocusView was using the wrong seek method so that it would only move to the first locus of a shard (and with multi-locus shards, this meant that we never processed RODs from the other positions). In fact, because the seek(Shard) method is extremely misleading and now no longer used, I think it's safer to delete it and make everyone use the much more transparent seek(GenomeLoc). Note that I have not re-enabled my improvements to the intervals accumulation of ReferenceDataSource because that inefficiency is still present downstream in RodLocusView; need to discuss those changes with Matt. 2011-12-26 14:45:19 -05:00
Mauricio Carneiro 02495a5fd5 renaming script, once more 2011-12-23 20:01:25 -05:00
Mauricio Carneiro afc58b81b2 changing permissions on the scala script 2011-12-23 19:47:48 -05:00
Mauricio Carneiro 5198f3a287 Making -e optional and renaming script
* Expanding intervals should be optional, not mandatory
2011-12-23 19:36:57 -05:00
Mauricio Carneiro 35c41409a1 Better contracts and docs for the ReadClipper
* Described the ReadClipper contract in the top of the class
  * Added contracts where applicable
  * Added descriptive information to all tools in the read clipper
  * Organized public members and static methods together with the same javadoc
2011-12-23 19:36:57 -05:00
David Roazen 506c0e9c97 Disabling SnpEff support in the GATK and SnpEff annotation in the HybridSelectionPipeline
SnpEff support will remain disabled until SnpEff 2.0.4 has been officially released
and we've verified the quality of its annotations.
2011-12-23 19:12:57 -05:00
Eric Banks 24c84da60d 'Fixing' the changes in ReferenceDataSource so that a shard properly contains a list of GenomeLocs instead of a single merged one. However, that uncovered a probable bug in the engine, so instead of letting this code fester unfixed in the build (affecting everyone in the group) I've decided to revert the previous (slow, but working) version and fix the engine in my own branch. 2011-12-23 15:39:12 -05:00
Eric Banks 8762313a0d Better TODO message 2011-12-22 20:54:35 -05:00
Eric Banks a815e875a8 Removing debugging output 2011-12-22 15:49:11 -05:00
Eric Banks deef542a38 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-22 15:44:58 -05:00
Eric Banks 6d260ec6ae Start printing traversal stats after 30 seconds. I can't stand waiting 2 minutes. 2011-12-22 15:40:59 -05:00
David Roazen 510c71158c Merged bug fix from Stable into Unstable 2011-12-22 10:49:52 -05:00
David Roazen 32cdef9682 Rename *PerformanceTest test classes to *LargeScaleTest
This is in preparation for the installation of the new performance test suite in Bamboo.

Note that "ant performancetest" is now "ant largescaletest"
2011-12-22 10:38:49 -05:00
Mauricio Carneiro 473af102c1 Added 'expand intervals' option to reduce reads scala script
This allows generating reduce reads with off-target regions. Default is not to use it.
2011-12-21 15:15:45 -05:00
Mauricio Carneiro 3358c132a8 Updating the MD5s
Clipping adaptor boundaries changed the results of CountCovariates which affected the PPP output.
a few more loci were visible to locus walkers.
2011-12-21 15:14:05 -05:00
Mauricio Carneiro a333144aaf more verbose output for updateSampleList.lua 2011-12-21 13:30:35 -05:00
Mauricio Carneiro 2e232e26da New name for ReduceReads scala script 2011-12-21 13:13:07 -05:00
Mauricio Carneiro 98f4cdecc8 Renaming ReduceReads script
name was confusing with the walker
2011-12-21 13:11:34 -05:00
Matt Hanna 4d65aefc7b Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-20 21:43:57 -05:00
Matt Hanna d50f9b98bb Make sure that the temporary ReadWalker performance improvement hack
works well in the binary release, jic GATK 1.4 arrives before I get
a Picard patch.
2011-12-20 21:42:30 -05:00
Mauricio Carneiro 731a463415 Updated IntegrationTests with new adaptor clipper
phew!
2011-12-20 17:48:52 -05:00
Mauricio Carneiro cadff40247 getRefCoordSoftUnclippedStart and End refactor
These functions are methods of the read, and supplement getAlignmentStart() and getUnclippedStart() by calculating the unclipped start counting only soft clips.

* Removed from ReadUtils
* Added to GATKSAMRecord
* Changed name to getSoftStart() and getSoftEnd
* Updated third party code accordingly.
2011-12-20 17:48:51 -05:00
Mauricio Carneiro 07128a2ad2 ReadUtils cleanup
* Removed all clipping functionality from ReadUtils (it should all be done using the ReadClipper now)
 * Cleaned up functionality that wasn't being used or had been superseded by other code (in an effort to reduce multiple unsupported implementations)
 * Made all meaningful functions public and added better comments/explanation to the headers
2011-12-20 17:48:40 -05:00
Mauricio Carneiro 1c4774c475 Static versions of the hard clipping utilities
For simplified access to the hard clipping utilities. No need to create a ReadClipper object if you are not doing multiple complicated clipping operations, just use the static methods.

 examples:
   ReadClipper.hardClipLowQualEnds(2);
   ReadClipper.hardClipAdaptorSequence();
2011-12-20 17:48:39 -05:00
Mauricio Carneiro f73ad1c2e2 Bugfix/Rewrite: Algorithm to determine adaptor boundaries
The algorithm wasn't accounting for the case where the read is the reverse strand and the insert size is negative.

    * Fixed and rewrote for more clarity (with Ryan, Mark and Eric).
    * Restructured the code to handle GATKSAMRecords only
    * Cleaned up the other structures and functions around it to minimize clutter and potential for error.
    * Added unit tests for all 4 cases of adaptor boundaries.
2011-12-20 17:48:39 -05:00
Mark DePristo 2e88c7fe61 Refactored forest.R to enable loading and testing of goNL and Autism data. RF looks really good on these data sets! 2011-12-20 16:41:07 -05:00
Mark DePristo b472bab54e Minor changes in default thread values
-- By default, now that we can run effectively on gsa4, we test up to nt 12
-- We now run 6 jobs by default, not 3
2011-12-20 14:05:48 -05:00
Mark DePristo 992d2fa632 Don't print unnecessary debugging info in GATKPerformanceOverTime.R 2011-12-20 14:05:48 -05:00
Mark DePristo a7891b7a2e Enabling VariantEval reports 2011-12-20 14:05:48 -05:00
Mark DePristo 0cc5c3d799 General improvements to Queue
-- Support for collecting resources info from DRMAA runners
-- Disabled the non-standard mem_free argument so that we can actually use our own SGE cluster gsa4
-- NCoresRequest is a testing queue script for this.
-- Added two command line arguments:
  -- multiCoreJerk: don't request multiple cores for jobs with nt > 1.  This was the old behavior but it's really not the best way to run parallel jobs.  Now with queue if you run nt = 4 the system requests 4 cores on your host.  If this flag is thrown, though, it will only request 1 and you'll just use 4, like a jerk
  -- job_parallel_env: parallel environment named used with SGE to request multicore jobs.  Equivalent to -pe job_parallel_env NT for NT > 1 jobs
2011-12-20 14:05:09 -05:00
Eric Banks 7204fcc2c3 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-20 12:59:11 -05:00
Eric Banks 8ade2d6ac2 max_alternate_alleles also ready to be made public 2011-12-20 12:59:02 -05:00
Eric Banks 6f52bd580b --multiallelic mode is not hidden anymore (but it is annotated as advanced); added docs 2011-12-20 12:47:38 -05:00
Mauricio Carneiro 37e0044c48 Removing unclipSoftClipBases from ReadUtils
* it was buggy and dangerous.
 * Updated Chris' code to use the ReadClipper.
2011-12-20 00:11:26 -05:00
Mauricio Carneiro 78d9bf7196 Added REVERT_SOFTCLIPPED_BASES capability to ReadClipper
* New ClippingOp REVERT_SOFTCLIPPED_BASES turns soft clipped bases into matches.
    * Added functionality to clipping op to revert all soft clip bases in a read into matches
    * Added revertSoftClipBases function to the ReadClipper for public use
    * Wrote systematic unit tests
2011-12-20 00:04:30 -05:00
Christopher Hartl 24585062f8 Merge branch 'incoming' 2011-12-19 23:16:36 -05:00
Christopher Hartl 67298f8a11 AFCR made public (for use in VSS)
Minor changes to ValidationSiteSelector logic (SampleSelectors determine whether a site is valid for output, no actual subset context need be operated on beyond that determination). Implementation of GL-based site selection. Minor changes to EJG.
2011-12-19 23:14:26 -05:00
Eric Banks 2d619d8633 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-19 15:38:02 -05:00
Eric Banks 06d385e619 Simplifying the interface a bit 2011-12-19 15:29:46 -05:00
Mauricio Carneiro 52b8d74e49 Ooops, turning it off until I choose better BAM files to test 2011-12-19 14:41:46 -05:00
Christopher Hartl 339ef92eac Goodbye SW by default. Now aligned reads that overlap intron-exon junctions are scored where they are by default, but warns the user (and flags the record in the VCF) if there's evidence to suggest that there is an indel throwing off the scoring (e.g. if the best score of a realigned unmapped read is >5 log orders better than the best score of a scored mapped read). Unmapped reads are still SW-aligned to the junction-junction sequence. This should result in a rather massive speedup, so far untested.
UGBoundAF has to go in at some point. In the process of rewriting the math for bounding the allele frequency (it was assuming uniform tails, which is silly since i derived the posterior distribution in closed form sometime back, just need to find it)
2011-12-19 12:18:18 -05:00
Christopher Hartl 8d55c5ea14 Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable 2011-12-19 12:17:55 -05:00
Mauricio Carneiro fb9950f3ad Integration tests for ReduceReads 2011-12-19 11:54:18 -05:00
Guillermo del Angel 758b672c43 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-19 17:06:26 +01:00