Commit Graph

7340 Commits (cd2c511c4ae8a7d13ca6fe3604308ca5fdea5c00)

Author SHA1 Message Date
Mark DePristo cd2c511c4a GCF improvements
-- Support for streaming VCF writing via the VCFWriter interface
-- GCF now has a header and a footer.  The header is minimal, and contains a forward pointer to the position of the footer in the file.
-- Readers now read the header, and then jump to the footer to get the rest of the "header" information
-- Version now a field in GCF
2011-09-07 23:28:46 -04:00
Mark DePristo fe5724b6ea Refactored indexing part of StandardVCFWriter into superclass
-- Now other implementations of the VCFWriter can easily share common functions, such as writing an index on the fly
2011-09-07 23:27:08 -04:00
Mark DePristo 01b6177ce1 Renaming GVCF -> GCF 2011-09-07 17:10:56 -04:00
Mark DePristo b220ed0d75 Merge branch 'master' into rodrewrite 2011-09-07 17:05:35 -04:00
Guillermo del Angel 45d54f6258 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-07 16:49:49 -04:00
Guillermo del Angel 9604fb2ba3 Necessary but not sufficient step to fix GenotypeGivenAlleles mode in UG which is now busted 2011-09-07 16:49:16 -04:00
Mark DePristo 2ded027762 Removed dysfunctional tranches support from VariantEval 2011-09-07 16:09:24 -04:00
Eric Banks aa9e32f2f1 Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark. 2011-09-07 15:48:06 -04:00
Mark DePristo d7e355b4b6 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-07 14:54:16 -04:00
Mark DePristo 9127849f5d BugFix for unit test 2011-09-07 14:54:10 -04:00
Mark DePristo 0037b61e5d Class of scala file should be close to filename, not MDP 2011-09-07 14:48:54 -04:00
Eric Banks 3a04955a30 We already had isPolymorphic and isMonomorphic in the VariantContext, but the implementation was incorrect for many edge cases (e.g. sites-only files, sites with samples who were no-called). Fixing. Moving on to VE now. 2011-09-07 14:01:42 -04:00
Mauricio Carneiro ee9d599558 Just cleaning up
clean up old commented code from tha data processing pipeline.
2011-09-07 13:32:40 -04:00
Guillermo del Angel 743bf7784c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-07 13:21:26 -04:00
Guillermo del Angel 5f22ef9a8c Added missing javadoc info to Beagle arguments 2011-09-07 13:21:11 -04:00
Mark DePristo 3bcbfa6e06 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-07 13:13:17 -04:00
Mark DePristo 430da23446 At least 2 minutes must pass before a status message is printed, further stabilizing time estimates 2011-09-07 13:13:07 -04:00
Mauricio Carneiro 6857d0324e Merge branch 'master' into rr 2011-09-07 12:59:08 -04:00
Mark DePristo 7e9e20fed0 Forgot to delete previous call 2011-09-07 12:54:52 -04:00
Mark DePristo d23d620494 Pushing traversal engine timer start to as close to actual start as possible
-- Should make initial timings more accurate
2011-09-07 12:52:33 -04:00
Mark DePristo 6ff432e1f2 BugFix for TF argument to VariantEval, actually making it work properly 2011-09-07 12:50:17 -04:00
Mauricio Carneiro 131cb7effd Bringing Reduce Reads bug fixes to the main repository 2011-09-07 12:25:53 -04:00
Mark DePristo a1920397e8 Major bugfix for per sample VariantEval
-- per sample stratification was not being calculated correctly.  The alt allele was always remaining, even if the genotype of the sample was hom-ref.  Although conceptually fine, this breaks the assumptions of all of the eval modules, so per sample stratifications actually included all variants for everything.  Eric is going to fix the system in general, so this commit may break the build.
2011-09-07 12:18:11 -04:00
Mark DePristo a02636a1ac Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/ebanks/Sting_rodrefactor into rodrewrite 2011-09-07 10:50:00 -04:00
Mark DePristo d5641cfac5 Merge branch 'variantEvalST' 2011-09-07 10:44:23 -04:00
Mark DePristo 2f4cf82e3b VariantEval cleanup. Added VariantType Stratification
-- ArrayList are List where possible
-- states refactored into VariantStratifier base class (reduces many lines of duplicate code)
-- Added VariantType stratification that partitions report by VariantContext.Type
2011-09-07 10:43:53 -04:00
Christopher Hartl 436f6eb52b Reverting Eric's change and pushing in some command-line-option documentation. 2011-09-07 08:53:30 -04:00
Eric Banks 1ef8a1750a I asked nicely and got nothing. Then I threatened and still got nothing. So I am carrying through on my threats. Guillermo, you have a short reprieve because you were away on vacation, but let's get yours done tomorrow afternoon. 2011-09-06 21:07:49 -04:00
Eric Banks da9c8ab386 Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly. 2011-09-06 20:39:42 -04:00
Mark DePristo 9559115ad5 Bugfix for singleton runs. Now with histograms where possible 2011-09-06 16:54:01 -04:00
Mark DePristo 388d377677 Merge branch 'rrOpt' 2011-09-06 15:11:49 -04:00
Mark DePristo 3db7ecb920 ReducedRead flag cached in GATKSAMRecord. 20% performance improvement 2011-09-06 15:11:38 -04:00
Mark DePristo 2d5509e8a6 Now includes the RQ flag in the consensus reads 2011-09-06 15:11:38 -04:00
Mark DePristo 284f83469b ReducedRead flag cached in GATKSAMRecord. 20% performance improvement 2011-09-06 15:09:37 -04:00
Mark DePristo f3ab7d7c0d Now includes the RQ flag in the consensus reads 2011-09-06 14:42:27 -04:00
Roger Zurawicki 47607a7eff Fixed bug where deletions messed up interval clipping
- Instead of using readLength, the ReadUtil function are used to get a proper read coordinate
 - Added debug info in interval clipping ( with -dl)

  NOTE: method might not be safe for production and checks need to be added to the ClippingOp code
2011-09-06 14:25:57 -04:00
Mark DePristo b0b803aa00 Fix for default value of maximum_consensus_base_qual, which can be at most 94 2011-09-06 14:12:54 -04:00
Khalid Shakir 0adb388dee Fixed bug in SelectVariants that was annotating sample_file / exclude_sample_file as @Argument instead of @Input meaning they weren't tracked in Queue.
Updates for HybridSelectionPipeline:
- Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5.
- If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds.
- Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches.
- Removed GSA_PRODUCTION_ONLY flag from indel caller.
- Updated indel hard filters based on delangel's analysis.
- Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.
2011-09-06 12:41:46 -04:00
Mark DePristo d471617c65 GATK binary VCF (gvcf) prototype format for efficiency testing
-- Very minimal working version that can read / write binary VCFs with genotypes
-- Already 10x faster for sites, 5x for fully parsed genotypes, and 1000x for skipping genotypes when reading
2011-09-02 21:15:19 -04:00
Mark DePristo 048202d18e Bugfix for cached quals 2011-09-02 21:13:28 -04:00
Mark DePristo 03aa04e37c Simple refactoring to make formating functions public 2011-09-02 21:13:08 -04:00
Mark DePristo 124ef6c483 MISSING_VALUE now gets defaultValue in getAttribute functions 2011-09-02 21:12:28 -04:00
Mauricio Carneiro 28d782b4c7 Allowing multiple dnsnp and indel files in the DPP 2011-09-02 13:38:47 -04:00
Ryan Poplin 8da36a965e Moving the ReadClipping further upstream into the HaplotypeCaller 2011-09-02 13:27:47 -04:00
Mark DePristo 82f2131777 Simplied getAttributeAsX interfaces
-- Removed versions getAttribriteAsX(key) that except on not having the value.
-- Removed version that getAttributeAsXNoException(key)
-- The only available assessors are now getAttributeAsX(key, default).
-- This single accessors properly handle their argument types, so if the value is a double it is returned directly for getAttributeAsDouble(), or if it's a string it's converted to a double.  If the key isn't found, default is returned.
2011-09-02 12:27:11 -04:00
Mauricio Carneiro 08ae6c0c61 ReadClipper is now handling unmapped reads 2011-09-02 11:32:30 -04:00
Mark DePristo c57198a1b9 Optimizations in VCFCodec
-- Don't create an empty LinkedHashSet() for PASS fields.   Just return Collections.emptySet() instead.
-- For filter fields with actual values, returns an unmodifiableSet instead of one that can be changed
2011-09-02 08:46:17 -04:00
Mark DePristo c3ea96d856 Removing many unused functions of unquestionable purpose 2011-09-02 08:42:01 -04:00
Eric Banks d241f0e903 Adding docs for the pcr error rate argument. 2011-09-01 21:57:02 -04:00
Mauricio Carneiro ad4ea0b80b Merged bug fix from Stable into Unstable 2011-09-01 18:14:45 -04:00