Commit Graph

8492 Commits (caa5da2fd2ec51aaf9aa43d1651eefdabd662e00)

Author SHA1 Message Date
Mauricio Carneiro 3358c132a8 Updating the MD5s
Clipping adaptor boundaries changed the results of CountCovariates which affected the PPP output.
a few more loci were visible to locus walkers.
2011-12-21 15:14:05 -05:00
Mauricio Carneiro a333144aaf more verbose output for updateSampleList.lua 2011-12-21 13:30:35 -05:00
Mauricio Carneiro 2e232e26da New name for ReduceReads scala script 2011-12-21 13:13:07 -05:00
Mauricio Carneiro 98f4cdecc8 Renaming ReduceReads script
name was confusing with the walker
2011-12-21 13:11:34 -05:00
Matt Hanna 4d65aefc7b Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-20 21:43:57 -05:00
Matt Hanna d50f9b98bb Make sure that the temporary ReadWalker performance improvement hack
works well in the binary release, jic GATK 1.4 arrives before I get
a Picard patch.
2011-12-20 21:42:30 -05:00
Mauricio Carneiro 731a463415 Updated IntegrationTests with new adaptor clipper
phew!
2011-12-20 17:48:52 -05:00
Mauricio Carneiro cadff40247 getRefCoordSoftUnclippedStart and End refactor
These functions are methods of the read, and supplement getAlignmentStart() and getUnclippedStart() by calculating the unclipped start counting only soft clips.

* Removed from ReadUtils
* Added to GATKSAMRecord
* Changed name to getSoftStart() and getSoftEnd
* Updated third party code accordingly.
2011-12-20 17:48:51 -05:00
Mauricio Carneiro 07128a2ad2 ReadUtils cleanup
* Removed all clipping functionality from ReadUtils (it should all be done using the ReadClipper now)
 * Cleaned up functionality that wasn't being used or had been superseded by other code (in an effort to reduce multiple unsupported implementations)
 * Made all meaningful functions public and added better comments/explanation to the headers
2011-12-20 17:48:40 -05:00
Mauricio Carneiro 1c4774c475 Static versions of the hard clipping utilities
For simplified access to the hard clipping utilities. No need to create a ReadClipper object if you are not doing multiple complicated clipping operations, just use the static methods.

 examples:
   ReadClipper.hardClipLowQualEnds(2);
   ReadClipper.hardClipAdaptorSequence();
2011-12-20 17:48:39 -05:00
Mauricio Carneiro f73ad1c2e2 Bugfix/Rewrite: Algorithm to determine adaptor boundaries
The algorithm wasn't accounting for the case where the read is the reverse strand and the insert size is negative.

    * Fixed and rewrote for more clarity (with Ryan, Mark and Eric).
    * Restructured the code to handle GATKSAMRecords only
    * Cleaned up the other structures and functions around it to minimize clutter and potential for error.
    * Added unit tests for all 4 cases of adaptor boundaries.
2011-12-20 17:48:39 -05:00
Mark DePristo 2e88c7fe61 Refactored forest.R to enable loading and testing of goNL and Autism data. RF looks really good on these data sets! 2011-12-20 16:41:07 -05:00
Mark DePristo b472bab54e Minor changes in default thread values
-- By default, now that we can run effectively on gsa4, we test up to nt 12
-- We now run 6 jobs by default, not 3
2011-12-20 14:05:48 -05:00
Mark DePristo 992d2fa632 Don't print unnecessary debugging info in GATKPerformanceOverTime.R 2011-12-20 14:05:48 -05:00
Mark DePristo a7891b7a2e Enabling VariantEval reports 2011-12-20 14:05:48 -05:00
Mark DePristo 0cc5c3d799 General improvements to Queue
-- Support for collecting resources info from DRMAA runners
-- Disabled the non-standard mem_free argument so that we can actually use our own SGE cluster gsa4
-- NCoresRequest is a testing queue script for this.
-- Added two command line arguments:
  -- multiCoreJerk: don't request multiple cores for jobs with nt > 1.  This was the old behavior but it's really not the best way to run parallel jobs.  Now with queue if you run nt = 4 the system requests 4 cores on your host.  If this flag is thrown, though, it will only request 1 and you'll just use 4, like a jerk
  -- job_parallel_env: parallel environment named used with SGE to request multicore jobs.  Equivalent to -pe job_parallel_env NT for NT > 1 jobs
2011-12-20 14:05:09 -05:00
Eric Banks 7204fcc2c3 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-20 12:59:11 -05:00
Eric Banks 8ade2d6ac2 max_alternate_alleles also ready to be made public 2011-12-20 12:59:02 -05:00
Eric Banks 6f52bd580b --multiallelic mode is not hidden anymore (but it is annotated as advanced); added docs 2011-12-20 12:47:38 -05:00
Mauricio Carneiro 37e0044c48 Removing unclipSoftClipBases from ReadUtils
* it was buggy and dangerous.
 * Updated Chris' code to use the ReadClipper.
2011-12-20 00:11:26 -05:00
Mauricio Carneiro 78d9bf7196 Added REVERT_SOFTCLIPPED_BASES capability to ReadClipper
* New ClippingOp REVERT_SOFTCLIPPED_BASES turns soft clipped bases into matches.
    * Added functionality to clipping op to revert all soft clip bases in a read into matches
    * Added revertSoftClipBases function to the ReadClipper for public use
    * Wrote systematic unit tests
2011-12-20 00:04:30 -05:00
Christopher Hartl 24585062f8 Merge branch 'incoming' 2011-12-19 23:16:36 -05:00
Christopher Hartl 67298f8a11 AFCR made public (for use in VSS)
Minor changes to ValidationSiteSelector logic (SampleSelectors determine whether a site is valid for output, no actual subset context need be operated on beyond that determination). Implementation of GL-based site selection. Minor changes to EJG.
2011-12-19 23:14:26 -05:00
Eric Banks 2d619d8633 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-19 15:38:02 -05:00
Eric Banks 06d385e619 Simplifying the interface a bit 2011-12-19 15:29:46 -05:00
Mauricio Carneiro 52b8d74e49 Ooops, turning it off until I choose better BAM files to test 2011-12-19 14:41:46 -05:00
Christopher Hartl 339ef92eac Goodbye SW by default. Now aligned reads that overlap intron-exon junctions are scored where they are by default, but warns the user (and flags the record in the VCF) if there's evidence to suggest that there is an indel throwing off the scoring (e.g. if the best score of a realigned unmapped read is >5 log orders better than the best score of a scored mapped read). Unmapped reads are still SW-aligned to the junction-junction sequence. This should result in a rather massive speedup, so far untested.
UGBoundAF has to go in at some point. In the process of rewriting the math for bounding the allele frequency (it was assuming uniform tails, which is silly since i derived the posterior distribution in closed form sometime back, just need to find it)
2011-12-19 12:18:18 -05:00
Christopher Hartl 8d55c5ea14 Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable 2011-12-19 12:17:55 -05:00
Mauricio Carneiro fb9950f3ad Integration tests for ReduceReads 2011-12-19 11:54:18 -05:00
Guillermo del Angel 758b672c43 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-19 17:06:26 +01:00
Christopher Hartl 418d22b67e Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable
Conflicts:
	private/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/IntronLossGenotyperV2.java
2011-12-19 10:59:18 -05:00
Christopher Hartl 69661da37d Moving ValidationSiteSelector to validation package in public under my ownership. JunctionGenotyper added and modified several times, this commit is due to merging conflix fixes. 2011-12-19 10:57:28 -05:00
Guillermo del Angel 85dc2239ad Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-19 16:53:07 +01:00
Mark DePristo cd28390f52 Merge branch 'yamagishi' 2011-12-19 10:52:57 -05:00
Mark DePristo 31ac6b063b Complete restructuring of performance script to use local (SSD) cached data on gsa4
-- New resources argument used to prefix all data files (ref, rods, bams)
-- Work around for CommandLineProgram double add bug in Queue (KS knows about it)
-- New test mode to just run a tiny intervals (can be run single threaded on cmdline)
2011-12-19 10:51:26 -05:00
Guillermo del Angel d5d0e94a11 a) Updating for posterity original scripts for VQSR project consensus (snp and indel) so that they use newest GATK syntax (also they were not run in the end). b) New scripts for redoing indel GL's, jobs are split per sample and per chromosome.
Another script (PhaseIndelRedoCombining.scala) does the combining of all pieces - not the best solution, but best for development/debugging to have two separate entities.
c) Experimental walker to assess GL concordance of two callsets - just accumulates squared error between each sample and site GL's and outputs at the end.
d) Another one-off to fill a callset of given GL's with greedy genotypes (genotype that maximizes PL) to have more meaningful metrics and QC of a given GL file.
2011-12-19 16:50:52 +01:00
Mark DePristo b0c2f223ab Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-19 10:45:27 -05:00
Laurent Francioli 16cc2b864e - Corrected bug causing cases where both parents are HET to be accounted twice in the TDT calculation - Adapted TDT Integration test to corrected version of TDT
Signed-off-by: Ryan Poplin <rpoplin@broadinstitute.org>
2011-12-19 10:30:59 -05:00
Eric Banks 5fd19ae734 Commented exactly how the results are represented from the exact model so developers can know how to use them. 2011-12-19 10:19:00 -05:00
Mark DePristo 5383c50654 Protect ourselves when iteration is present but there's only a single iteration in queueJobReport.R 2011-12-19 10:08:38 -05:00
Eric Banks 3069a689fe Bug fix: if there are multiple records at a given position, it turns out that SelectVariants would drop all variants that follow after one that fails filters (instead of dropping just the failing one). Added an integration test to cover this case. 2011-12-19 10:04:33 -05:00
Mauricio Carneiro 728d66cca4 Adding Picard imports to the Haplotype Caller
Not sure how this passed my tests before, but clearly these imports got deleted by extra-aggressive 'unused imports cleanup' by IntelliJ.
2011-12-19 09:47:48 -05:00
Mauricio Carneiro 5b678e3b94 Remove ClippingOp UnitTests
* all testing functionality is in the ReadClipperUnitTest, no need to double test.
* class and package naming cleanup
2011-12-19 07:49:26 -05:00
Matt Hanna 1ead00cac5 New fork of SamFileHeaderMerger should be cached at the thread level to enable fast (and valid) thread lookups. 2011-12-18 19:04:26 -05:00
Ryan Poplin bc842ab3a5 Adding option to VariantAnnotator to do strict allele matching when annotating with comp track concordance. 2011-12-18 15:27:23 -05:00
Ryan Poplin 953998dcd0 Now that getSampleDB is public in the walker base class this override in VariantAnnotator isn't necessary. 2011-12-18 14:38:59 -05:00
Eric Banks 76bd13a1ed Forgot to update the unit test 2011-12-18 01:13:49 -05:00
Eric Banks 07f9d14d9f Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-18 00:43:15 -05:00
Eric Banks 4d11b20118 updating HC test too 2011-12-18 00:43:01 -05:00
Eric Banks c5ffe0ab04 No reason to sum the normalized posteriors array to get Pr(AF>0) given that we can just compute 1.0 - array[0]. Integration tests change only because of trivial precision artifacts for reference calls using EMIT_ALL_SITES. 2011-12-18 00:31:47 -05:00