Commit Graph

8426 Commits (7204fcc2c353315f4c990a91761ba0a6afabdbf9)

Author SHA1 Message Date
Eric Banks 7204fcc2c3 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-20 12:59:11 -05:00
Eric Banks 8ade2d6ac2 max_alternate_alleles also ready to be made public 2011-12-20 12:59:02 -05:00
Eric Banks 6f52bd580b --multiallelic mode is not hidden anymore (but it is annotated as advanced); added docs 2011-12-20 12:47:38 -05:00
Mauricio Carneiro 37e0044c48 Removing unclipSoftClipBases from ReadUtils
* it was buggy and dangerous.
 * Updated Chris' code to use the ReadClipper.
2011-12-20 00:11:26 -05:00
Mauricio Carneiro 78d9bf7196 Added REVERT_SOFTCLIPPED_BASES capability to ReadClipper
* New ClippingOp REVERT_SOFTCLIPPED_BASES turns soft clipped bases into matches.
    * Added functionality to clipping op to revert all soft clip bases in a read into matches
    * Added revertSoftClipBases function to the ReadClipper for public use
    * Wrote systematic unit tests
2011-12-20 00:04:30 -05:00
Christopher Hartl 24585062f8 Merge branch 'incoming' 2011-12-19 23:16:36 -05:00
Christopher Hartl 67298f8a11 AFCR made public (for use in VSS)
Minor changes to ValidationSiteSelector logic (SampleSelectors determine whether a site is valid for output, no actual subset context need be operated on beyond that determination). Implementation of GL-based site selection. Minor changes to EJG.
2011-12-19 23:14:26 -05:00
Eric Banks 2d619d8633 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-19 15:38:02 -05:00
Eric Banks 06d385e619 Simplifying the interface a bit 2011-12-19 15:29:46 -05:00
Mauricio Carneiro 52b8d74e49 Ooops, turning it off until I choose better BAM files to test 2011-12-19 14:41:46 -05:00
Christopher Hartl 339ef92eac Goodbye SW by default. Now aligned reads that overlap intron-exon junctions are scored where they are by default, but warns the user (and flags the record in the VCF) if there's evidence to suggest that there is an indel throwing off the scoring (e.g. if the best score of a realigned unmapped read is >5 log orders better than the best score of a scored mapped read). Unmapped reads are still SW-aligned to the junction-junction sequence. This should result in a rather massive speedup, so far untested.
UGBoundAF has to go in at some point. In the process of rewriting the math for bounding the allele frequency (it was assuming uniform tails, which is silly since i derived the posterior distribution in closed form sometime back, just need to find it)
2011-12-19 12:18:18 -05:00
Christopher Hartl 8d55c5ea14 Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable 2011-12-19 12:17:55 -05:00
Mauricio Carneiro fb9950f3ad Integration tests for ReduceReads 2011-12-19 11:54:18 -05:00
Guillermo del Angel 758b672c43 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-19 17:06:26 +01:00
Christopher Hartl 418d22b67e Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable
Conflicts:
	private/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/IntronLossGenotyperV2.java
2011-12-19 10:59:18 -05:00
Christopher Hartl 69661da37d Moving ValidationSiteSelector to validation package in public under my ownership. JunctionGenotyper added and modified several times, this commit is due to merging conflix fixes. 2011-12-19 10:57:28 -05:00
Guillermo del Angel 85dc2239ad Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-19 16:53:07 +01:00
Mark DePristo cd28390f52 Merge branch 'yamagishi' 2011-12-19 10:52:57 -05:00
Mark DePristo 31ac6b063b Complete restructuring of performance script to use local (SSD) cached data on gsa4
-- New resources argument used to prefix all data files (ref, rods, bams)
-- Work around for CommandLineProgram double add bug in Queue (KS knows about it)
-- New test mode to just run a tiny intervals (can be run single threaded on cmdline)
2011-12-19 10:51:26 -05:00
Guillermo del Angel d5d0e94a11 a) Updating for posterity original scripts for VQSR project consensus (snp and indel) so that they use newest GATK syntax (also they were not run in the end). b) New scripts for redoing indel GL's, jobs are split per sample and per chromosome.
Another script (PhaseIndelRedoCombining.scala) does the combining of all pieces - not the best solution, but best for development/debugging to have two separate entities.
c) Experimental walker to assess GL concordance of two callsets - just accumulates squared error between each sample and site GL's and outputs at the end.
d) Another one-off to fill a callset of given GL's with greedy genotypes (genotype that maximizes PL) to have more meaningful metrics and QC of a given GL file.
2011-12-19 16:50:52 +01:00
Mark DePristo b0c2f223ab Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-19 10:45:27 -05:00
Laurent Francioli 16cc2b864e - Corrected bug causing cases where both parents are HET to be accounted twice in the TDT calculation - Adapted TDT Integration test to corrected version of TDT
Signed-off-by: Ryan Poplin <rpoplin@broadinstitute.org>
2011-12-19 10:30:59 -05:00
Eric Banks 5fd19ae734 Commented exactly how the results are represented from the exact model so developers can know how to use them. 2011-12-19 10:19:00 -05:00
Mark DePristo 5383c50654 Protect ourselves when iteration is present but there's only a single iteration in queueJobReport.R 2011-12-19 10:08:38 -05:00
Eric Banks 3069a689fe Bug fix: if there are multiple records at a given position, it turns out that SelectVariants would drop all variants that follow after one that fails filters (instead of dropping just the failing one). Added an integration test to cover this case. 2011-12-19 10:04:33 -05:00
Mauricio Carneiro 728d66cca4 Adding Picard imports to the Haplotype Caller
Not sure how this passed my tests before, but clearly these imports got deleted by extra-aggressive 'unused imports cleanup' by IntelliJ.
2011-12-19 09:47:48 -05:00
Mauricio Carneiro 5b678e3b94 Remove ClippingOp UnitTests
* all testing functionality is in the ReadClipperUnitTest, no need to double test.
* class and package naming cleanup
2011-12-19 07:49:26 -05:00
Matt Hanna 1ead00cac5 New fork of SamFileHeaderMerger should be cached at the thread level to enable fast (and valid) thread lookups. 2011-12-18 19:04:26 -05:00
Ryan Poplin bc842ab3a5 Adding option to VariantAnnotator to do strict allele matching when annotating with comp track concordance. 2011-12-18 15:27:23 -05:00
Ryan Poplin 953998dcd0 Now that getSampleDB is public in the walker base class this override in VariantAnnotator isn't necessary. 2011-12-18 14:38:59 -05:00
Eric Banks 76bd13a1ed Forgot to update the unit test 2011-12-18 01:13:49 -05:00
Eric Banks 07f9d14d9f Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-18 00:43:15 -05:00
Eric Banks 4d11b20118 updating HC test too 2011-12-18 00:43:01 -05:00
Eric Banks c5ffe0ab04 No reason to sum the normalized posteriors array to get Pr(AF>0) given that we can just compute 1.0 - array[0]. Integration tests change only because of trivial precision artifacts for reference calls using EMIT_ALL_SITES. 2011-12-18 00:31:47 -05:00
Eric Banks 6dc52d42bf Implemented the proper QUAL calculation for multi-allelic calls. Integration tests pass except for the ones making multi-allelic calls (duh) and one of the SLOD tests (which used to print 0 when one of the LODs was NaN but now we just don't print the SB annotation for that record). 2011-12-18 00:01:42 -05:00
Khalid Shakir 6059ca76e8 Removing cruft that snuck in last commit. 2011-12-16 23:00:16 -05:00
Khalid Shakir 7486696c07 When using bam list mode in HSP deriving VCF name from bam list instead of requiring an additional parameter.
Creating a single temporary directory per ant test run instead of a putting temp files across all runs in the same directory.
Updated various tests for above items and other small fixes.
2011-12-16 18:09:25 -05:00
Mauricio Carneiro e5df9e0684 cleaner test output
cleaned up the debug "pass" messages in the unit tests
2011-12-16 18:04:00 -05:00
Mauricio Carneiro fcc21180e8 Added hardClipLeadingInsertions UnitTest for the ReadClipper
fixed issue where a read starting with an insertion followed by a deletion would break, clipper can now safely clip the insertion and the deletion if that's the case.

note: test is turned off until contract changes to allow hanging insertions (left/right).
2011-12-16 18:02:47 -05:00
Mauricio Carneiro 075be52adc Added hardClipByReferenceCoordinates (left and right tails) UnitTest for the ReadClipper 2011-12-16 18:01:33 -05:00
Mauricio Carneiro 5bba44d693 Added hardClipByReferenceCoordinates UnitTest for the ReadClipper
* fixed edge case when requested to hard clip beginning of a read that had hanging soft clipped bases on the left tail.
* fixed edge case when requested to hard clip end of a read that had hanging soft clipped bases on the right tail.
* fixed AlignmentStart of a clipped read that results in only hard clips and soft clips

note: added tests to all these beautiful cases...
2011-12-16 18:01:33 -05:00
Mauricio Carneiro 5838ba529d Added hardClipByReadCoordinates UnitTest for the ReadClipper 2011-12-16 18:01:33 -05:00
Mauricio Carneiro c26295919e Added hardClipBothEndsByReferenceCoordinates UnitTest for the ReadClipper 2011-12-16 18:01:33 -05:00
Mark DePristo 1994c3e3bc Only print warning about allele incompatibility when running there are genotypes in the file in CombineVariants 2011-12-16 16:50:51 -05:00
Mark DePristo 1863da4d18 Spawn a more reasonable number of jobs in GATKPerformanceOverTime 2011-12-16 16:50:40 -05:00
Mark DePristo b6067be952 Support for selecting only variants with specific IDs from a file in SelectVariants
-- Cleaned up unused variables as well
2011-12-16 16:50:39 -05:00
Mark DePristo d6d2f49c88 Don't print log if there are no BAMs 2011-12-16 16:50:36 -05:00
Mark DePristo dbc2ed2887 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-16 15:12:22 -05:00
Mark DePristo 1179588475 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-16 15:11:59 -05:00
Mark DePristo 0e0d022e58 RandomForest.R now caches trees to disk to save cpu and storage costs
-- Vastly more efficient to write out trees to disk than recompute them all of the time
2011-12-16 15:10:51 -05:00