Guillermo del Angel
b399424a9c
Fix integration test affected by non-calling all-zero PL samples, and add a more complicated multi-sample integration test from a phase 1 case, GBR with mixed technologies and complex input alleles
2011-09-09 20:44:47 -04:00
Guillermo del Angel
e95d484757
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-09 18:31:14 -04:00
Guillermo del Angel
a807205fc3
a) Minor optimization to softMax() computation to avoid redundant operations, results in about 5-10% increase in speed in indel calling.
...
b) Added (but left commented out since it may affect integration tests and to isolate commits) fix to per-sample DP reporting, so that deletions are included in count.
c) Bug fix to avoid having non-reference genotypes assigned to samples with PL=0,0,0. Correct behavior should be to no-call these samples, and to ignore these samples when computing AC distribution since their likelihoods are not informative.
2011-09-09 18:00:23 -04:00
Mauricio Carneiro
9e650dfc17
Fixing SelectVariants documentation
...
getting rid of messages telling users to go for the YAML file. The idea is to not support these anymore.
2011-09-09 16:25:31 -04:00
Ryan Poplin
1953edcd2d
updating Validate Variants deletion integration test
2011-09-09 13:39:08 -04:00
Ryan Poplin
9ada9b3ed4
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-09 13:15:36 -04:00
Ryan Poplin
354529bff3
adding Validate Variants integration test with a deletion
2011-09-09 13:15:24 -04:00
Ryan Poplin
91c949db74
Fixing ValidateVariants so that it validates deletion records. Fixing GATKdocs.
2011-09-09 12:57:14 -04:00
Eric Banks
51eb95d638
Missed these tests before
2011-09-09 11:46:37 -04:00
Eric Banks
6ad8943ca0
CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly.
2011-09-09 09:45:24 -04:00
Eric Banks
eaaba6eb51
Confirming that when stratifying by sample in VE the monomorphic sites for a given sample are not counted for the relevant metrics. Adding integration test to cover it.
2011-09-08 13:17:34 -04:00
Ryan Poplin
2636d216de
Adding indel vqsr integration test
2011-09-08 10:38:13 -04:00
Ryan Poplin
9cba1019c8
Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap
2011-09-08 09:25:13 -04:00
Ryan Poplin
e0020b2b29
Fixing PrintRODs. Now has input and only prints out one copy of each record
2011-09-08 08:58:37 -04:00
Ryan Poplin
29c968ab60
clean up
2011-09-08 08:42:43 -04:00
Ryan Poplin
59841f8232
Fixing genotype given alleles for indels. Only take the records that start at this locus.
2011-09-08 08:41:16 -04:00
Guillermo del Angel
45d54f6258
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 16:49:49 -04:00
Guillermo del Angel
9604fb2ba3
Necessary but not sufficient step to fix GenotypeGivenAlleles mode in UG which is now busted
2011-09-07 16:49:16 -04:00
Mark DePristo
2ded027762
Removed dysfunctional tranches support from VariantEval
2011-09-07 16:09:24 -04:00
Eric Banks
aa9e32f2f1
Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark.
2011-09-07 15:48:06 -04:00
Mark DePristo
d7e355b4b6
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 14:54:16 -04:00
Mark DePristo
9127849f5d
BugFix for unit test
2011-09-07 14:54:10 -04:00
Eric Banks
3a04955a30
We already had isPolymorphic and isMonomorphic in the VariantContext, but the implementation was incorrect for many edge cases (e.g. sites-only files, sites with samples who were no-called). Fixing. Moving on to VE now.
2011-09-07 14:01:42 -04:00
Guillermo del Angel
743bf7784c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 13:21:26 -04:00
Guillermo del Angel
5f22ef9a8c
Added missing javadoc info to Beagle arguments
2011-09-07 13:21:11 -04:00
Mark DePristo
3bcbfa6e06
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 13:13:17 -04:00
Mark DePristo
430da23446
At least 2 minutes must pass before a status message is printed, further stabilizing time estimates
2011-09-07 13:13:07 -04:00
Mauricio Carneiro
6857d0324e
Merge branch 'master' into rr
2011-09-07 12:59:08 -04:00
Mark DePristo
7e9e20fed0
Forgot to delete previous call
2011-09-07 12:54:52 -04:00
Mark DePristo
d23d620494
Pushing traversal engine timer start to as close to actual start as possible
...
-- Should make initial timings more accurate
2011-09-07 12:52:33 -04:00
Mark DePristo
6ff432e1f2
BugFix for TF argument to VariantEval, actually making it work properly
2011-09-07 12:50:17 -04:00
Mauricio Carneiro
131cb7effd
Bringing Reduce Reads bug fixes to the main repository
2011-09-07 12:25:53 -04:00
Mark DePristo
a1920397e8
Major bugfix for per sample VariantEval
...
-- per sample stratification was not being calculated correctly. The alt allele was always remaining, even if the genotype of the sample was hom-ref. Although conceptually fine, this breaks the assumptions of all of the eval modules, so per sample stratifications actually included all variants for everything. Eric is going to fix the system in general, so this commit may break the build.
2011-09-07 12:18:11 -04:00
Mark DePristo
d5641cfac5
Merge branch 'variantEvalST'
2011-09-07 10:44:23 -04:00
Mark DePristo
2f4cf82e3b
VariantEval cleanup. Added VariantType Stratification
...
-- ArrayList are List where possible
-- states refactored into VariantStratifier base class (reduces many lines of duplicate code)
-- Added VariantType stratification that partitions report by VariantContext.Type
2011-09-07 10:43:53 -04:00
Christopher Hartl
436f6eb52b
Reverting Eric's change and pushing in some command-line-option documentation.
2011-09-07 08:53:30 -04:00
Eric Banks
1ef8a1750a
I asked nicely and got nothing. Then I threatened and still got nothing. So I am carrying through on my threats. Guillermo, you have a short reprieve because you were away on vacation, but let's get yours done tomorrow afternoon.
2011-09-06 21:07:49 -04:00
Eric Banks
da9c8ab386
Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly.
2011-09-06 20:39:42 -04:00
Mark DePristo
3db7ecb920
ReducedRead flag cached in GATKSAMRecord. 20% performance improvement
2011-09-06 15:11:38 -04:00
Roger Zurawicki
47607a7eff
Fixed bug where deletions messed up interval clipping
...
- Instead of using readLength, the ReadUtil function are used to get a proper read coordinate
- Added debug info in interval clipping ( with -dl)
NOTE: method might not be safe for production and checks need to be added to the ClippingOp code
2011-09-06 14:25:57 -04:00
Khalid Shakir
0adb388dee
Fixed bug in SelectVariants that was annotating sample_file / exclude_sample_file as @Argument instead of @Input meaning they weren't tracked in Queue.
...
Updates for HybridSelectionPipeline:
- Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5.
- If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds.
- Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches.
- Removed GSA_PRODUCTION_ONLY flag from indel caller.
- Updated indel hard filters based on delangel's analysis.
- Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.
2011-09-06 12:41:46 -04:00
Mauricio Carneiro
08ae6c0c61
ReadClipper is now handling unmapped reads
2011-09-02 11:32:30 -04:00
Eric Banks
d241f0e903
Adding docs for the pcr error rate argument.
2011-09-01 21:57:02 -04:00
Eric Banks
827fe6130c
Adding hidden printing option. Also, always run UG in mode GENOTYPE_GIVEN_ALLELES given that we don't actually test for the correct alleles (otherwise UG may choose a different allele and we may falsely validate the wrong one).
2011-09-01 11:40:35 -04:00
Mark DePristo
1aa4b12ff0
Reduced the number of combinations being tested here, which was overkill
2011-09-01 10:42:43 -04:00
Mark DePristo
ac49b8d26b
Conditional support for PerformanceTrackingQuerySource to measure Tribble / GATK bridge performance
...
-- Removed DEBUG option, instead use MEASURE_TRIBBLE_QUERY_PERFORMANCE in RMDTrackerBuilder
2011-09-01 10:41:55 -04:00
Mauricio Carneiro
4b5a7046c5
Making ReadLengthDistribution Public
...
Found this neat little walker Kiran wrote stashed in the private tree. Very useful. Generalized it a bit, added GATKDocs and moved it to public. I might include it as a QC step on the pacbio processing pipeline.
* generalize it so it works with non pair ended reads.
* generalize it to work with no read group information
2011-08-31 15:52:28 -04:00
Mauricio Carneiro
7d79de91c5
Merge branch 'master' into rr
2011-08-30 02:50:19 -04:00
Mauricio Carneiro
0cd9438ac2
fixed soft unclipped calculation
...
* getRefCoordSoftUnclippedEnd was not resetting the shift when hitting insertions. Fixed.
* getReadCoordinateForReferenceCoordinateBeforeAlignmentEnd was returning the wrong read coordinate position. Fixed.
2011-08-30 02:45:29 -04:00
Mauricio Carneiro
fd540592ab
Added RMS calculation for consensus MQ
...
Consensus MQ is now the average of the RMS of the mapping qualities of the reads making each site.
2011-08-30 02:45:20 -04:00