Eric Banks
7204fcc2c3
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-20 12:59:11 -05:00
Eric Banks
8ade2d6ac2
max_alternate_alleles also ready to be made public
2011-12-20 12:59:02 -05:00
Eric Banks
6f52bd580b
--multiallelic mode is not hidden anymore (but it is annotated as advanced); added docs
2011-12-20 12:47:38 -05:00
Mauricio Carneiro
37e0044c48
Removing unclipSoftClipBases from ReadUtils
...
* it was buggy and dangerous.
* Updated Chris' code to use the ReadClipper.
2011-12-20 00:11:26 -05:00
Mauricio Carneiro
78d9bf7196
Added REVERT_SOFTCLIPPED_BASES capability to ReadClipper
...
* New ClippingOp REVERT_SOFTCLIPPED_BASES turns soft clipped bases into matches.
* Added functionality to clipping op to revert all soft clip bases in a read into matches
* Added revertSoftClipBases function to the ReadClipper for public use
* Wrote systematic unit tests
2011-12-20 00:04:30 -05:00
Christopher Hartl
24585062f8
Merge branch 'incoming'
2011-12-19 23:16:36 -05:00
Christopher Hartl
67298f8a11
AFCR made public (for use in VSS)
...
Minor changes to ValidationSiteSelector logic (SampleSelectors determine whether a site is valid for output, no actual subset context need be operated on beyond that determination). Implementation of GL-based site selection. Minor changes to EJG.
2011-12-19 23:14:26 -05:00
Eric Banks
2d619d8633
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-19 15:38:02 -05:00
Eric Banks
06d385e619
Simplifying the interface a bit
2011-12-19 15:29:46 -05:00
Mauricio Carneiro
52b8d74e49
Ooops, turning it off until I choose better BAM files to test
2011-12-19 14:41:46 -05:00
Christopher Hartl
339ef92eac
Goodbye SW by default. Now aligned reads that overlap intron-exon junctions are scored where they are by default, but warns the user (and flags the record in the VCF) if there's evidence to suggest that there is an indel throwing off the scoring (e.g. if the best score of a realigned unmapped read is >5 log orders better than the best score of a scored mapped read). Unmapped reads are still SW-aligned to the junction-junction sequence. This should result in a rather massive speedup, so far untested.
...
UGBoundAF has to go in at some point. In the process of rewriting the math for bounding the allele frequency (it was assuming uniform tails, which is silly since i derived the posterior distribution in closed form sometime back, just need to find it)
2011-12-19 12:18:18 -05:00
Christopher Hartl
8d55c5ea14
Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable
2011-12-19 12:17:55 -05:00
Mauricio Carneiro
fb9950f3ad
Integration tests for ReduceReads
2011-12-19 11:54:18 -05:00
Guillermo del Angel
758b672c43
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-19 17:06:26 +01:00
Christopher Hartl
418d22b67e
Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable
...
Conflicts:
private/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/IntronLossGenotyperV2.java
2011-12-19 10:59:18 -05:00
Christopher Hartl
69661da37d
Moving ValidationSiteSelector to validation package in public under my ownership. JunctionGenotyper added and modified several times, this commit is due to merging conflix fixes.
2011-12-19 10:57:28 -05:00
Guillermo del Angel
85dc2239ad
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-19 16:53:07 +01:00
Mark DePristo
cd28390f52
Merge branch 'yamagishi'
2011-12-19 10:52:57 -05:00
Mark DePristo
31ac6b063b
Complete restructuring of performance script to use local (SSD) cached data on gsa4
...
-- New resources argument used to prefix all data files (ref, rods, bams)
-- Work around for CommandLineProgram double add bug in Queue (KS knows about it)
-- New test mode to just run a tiny intervals (can be run single threaded on cmdline)
2011-12-19 10:51:26 -05:00
Guillermo del Angel
d5d0e94a11
a) Updating for posterity original scripts for VQSR project consensus (snp and indel) so that they use newest GATK syntax (also they were not run in the end). b) New scripts for redoing indel GL's, jobs are split per sample and per chromosome.
...
Another script (PhaseIndelRedoCombining.scala) does the combining of all pieces - not the best solution, but best for development/debugging to have two separate entities.
c) Experimental walker to assess GL concordance of two callsets - just accumulates squared error between each sample and site GL's and outputs at the end.
d) Another one-off to fill a callset of given GL's with greedy genotypes (genotype that maximizes PL) to have more meaningful metrics and QC of a given GL file.
2011-12-19 16:50:52 +01:00
Mark DePristo
b0c2f223ab
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-19 10:45:27 -05:00
Laurent Francioli
16cc2b864e
- Corrected bug causing cases where both parents are HET to be accounted twice in the TDT calculation - Adapted TDT Integration test to corrected version of TDT
...
Signed-off-by: Ryan Poplin <rpoplin@broadinstitute.org>
2011-12-19 10:30:59 -05:00
Eric Banks
5fd19ae734
Commented exactly how the results are represented from the exact model so developers can know how to use them.
2011-12-19 10:19:00 -05:00
Mark DePristo
5383c50654
Protect ourselves when iteration is present but there's only a single iteration in queueJobReport.R
2011-12-19 10:08:38 -05:00
Eric Banks
3069a689fe
Bug fix: if there are multiple records at a given position, it turns out that SelectVariants would drop all variants that follow after one that fails filters (instead of dropping just the failing one). Added an integration test to cover this case.
2011-12-19 10:04:33 -05:00
Mauricio Carneiro
728d66cca4
Adding Picard imports to the Haplotype Caller
...
Not sure how this passed my tests before, but clearly these imports got deleted by extra-aggressive 'unused imports cleanup' by IntelliJ.
2011-12-19 09:47:48 -05:00
Mauricio Carneiro
5b678e3b94
Remove ClippingOp UnitTests
...
* all testing functionality is in the ReadClipperUnitTest, no need to double test.
* class and package naming cleanup
2011-12-19 07:49:26 -05:00
Matt Hanna
1ead00cac5
New fork of SamFileHeaderMerger should be cached at the thread level to enable fast (and valid) thread lookups.
2011-12-18 19:04:26 -05:00
Ryan Poplin
bc842ab3a5
Adding option to VariantAnnotator to do strict allele matching when annotating with comp track concordance.
2011-12-18 15:27:23 -05:00
Ryan Poplin
953998dcd0
Now that getSampleDB is public in the walker base class this override in VariantAnnotator isn't necessary.
2011-12-18 14:38:59 -05:00
Eric Banks
76bd13a1ed
Forgot to update the unit test
2011-12-18 01:13:49 -05:00
Eric Banks
07f9d14d9f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-18 00:43:15 -05:00
Eric Banks
4d11b20118
updating HC test too
2011-12-18 00:43:01 -05:00
Eric Banks
c5ffe0ab04
No reason to sum the normalized posteriors array to get Pr(AF>0) given that we can just compute 1.0 - array[0]. Integration tests change only because of trivial precision artifacts for reference calls using EMIT_ALL_SITES.
2011-12-18 00:31:47 -05:00
Eric Banks
6dc52d42bf
Implemented the proper QUAL calculation for multi-allelic calls. Integration tests pass except for the ones making multi-allelic calls (duh) and one of the SLOD tests (which used to print 0 when one of the LODs was NaN but now we just don't print the SB annotation for that record).
2011-12-18 00:01:42 -05:00
Khalid Shakir
6059ca76e8
Removing cruft that snuck in last commit.
2011-12-16 23:00:16 -05:00
Khalid Shakir
7486696c07
When using bam list mode in HSP deriving VCF name from bam list instead of requiring an additional parameter.
...
Creating a single temporary directory per ant test run instead of a putting temp files across all runs in the same directory.
Updated various tests for above items and other small fixes.
2011-12-16 18:09:25 -05:00
Mauricio Carneiro
e5df9e0684
cleaner test output
...
cleaned up the debug "pass" messages in the unit tests
2011-12-16 18:04:00 -05:00
Mauricio Carneiro
fcc21180e8
Added hardClipLeadingInsertions UnitTest for the ReadClipper
...
fixed issue where a read starting with an insertion followed by a deletion would break, clipper can now safely clip the insertion and the deletion if that's the case.
note: test is turned off until contract changes to allow hanging insertions (left/right).
2011-12-16 18:02:47 -05:00
Mauricio Carneiro
075be52adc
Added hardClipByReferenceCoordinates (left and right tails) UnitTest for the ReadClipper
2011-12-16 18:01:33 -05:00
Mauricio Carneiro
5bba44d693
Added hardClipByReferenceCoordinates UnitTest for the ReadClipper
...
* fixed edge case when requested to hard clip beginning of a read that had hanging soft clipped bases on the left tail.
* fixed edge case when requested to hard clip end of a read that had hanging soft clipped bases on the right tail.
* fixed AlignmentStart of a clipped read that results in only hard clips and soft clips
note: added tests to all these beautiful cases...
2011-12-16 18:01:33 -05:00
Mauricio Carneiro
5838ba529d
Added hardClipByReadCoordinates UnitTest for the ReadClipper
2011-12-16 18:01:33 -05:00
Mauricio Carneiro
c26295919e
Added hardClipBothEndsByReferenceCoordinates UnitTest for the ReadClipper
2011-12-16 18:01:33 -05:00
Mark DePristo
1994c3e3bc
Only print warning about allele incompatibility when running there are genotypes in the file in CombineVariants
2011-12-16 16:50:51 -05:00
Mark DePristo
1863da4d18
Spawn a more reasonable number of jobs in GATKPerformanceOverTime
2011-12-16 16:50:40 -05:00
Mark DePristo
b6067be952
Support for selecting only variants with specific IDs from a file in SelectVariants
...
-- Cleaned up unused variables as well
2011-12-16 16:50:39 -05:00
Mark DePristo
d6d2f49c88
Don't print log if there are no BAMs
2011-12-16 16:50:36 -05:00
Mark DePristo
dbc2ed2887
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-16 15:12:22 -05:00
Mark DePristo
1179588475
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-16 15:11:59 -05:00
Mark DePristo
0e0d022e58
RandomForest.R now caches trees to disk to save cpu and storage costs
...
-- Vastly more efficient to write out trees to disk than recompute them all of the time
2011-12-16 15:10:51 -05:00