Commit Graph

9897 Commits (344c3aeb1d89165afa75d3bce7c19df222fd10bf)

Author SHA1 Message Date
Eric Banks 344c3aeb1d Cleanup from previous commit 2012-07-03 14:42:44 -04:00
Eric Banks 0b37d44b0d Optimizations for the RecalDatum to make BQSR (Count Covariates) much faster. Needs some cleanup. 2012-07-03 13:05:11 -04:00
Eric Banks 031322ff00 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-03 00:12:59 -04:00
Eric Banks a4670113bd Refactored/renamed the nested integer array; cleaned up code a bit. 2012-07-03 00:12:33 -04:00
Ryan Poplin f92139dd82 Ooops, UG VA path for rank sum tests aren't happy with empty lists. Disabling clipping rank sum test for now. 2012-07-02 21:12:42 -04:00
Ryan Poplin 7aa3f04402 Updating HC integration tests. 2012-07-02 20:43:52 -04:00
Ryan Poplin 7e7b4cd1b9 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-02 16:37:54 -04:00
Ryan Poplin b807ff63ef HaplotypeCaller now creates MNP and complex substitutions by using LD information to decide if events segregate together on haplotypes. Added unit test. 2012-07-02 16:37:39 -04:00
Mauricio Carneiro 3cea080aa8 Cache SoftStart() and SoftEnd() in the GATKSAMRecord
these are costly operations when done repeatedly on the same read.
2012-07-02 16:22:00 -04:00
Mauricio Carneiro 88a02fa2cb Fixing but for reads with cigars like 9S54H
When hard-clipping predict when the read is going to be fully hard clipped to the point where only soft/hard-clips are left in the read and preemptively eliminate the read before the SAMRecord mathematics on malformed cigars kills the GATK.
2012-07-02 16:22:00 -04:00
Joel Thibault 9ee58d323a Pass the original GATK unsafe parameter to the VcfGatherFunction 2012-07-02 16:03:11 -04:00
Mark DePristo 1b0a775773 Disabling bcf2 reading from samtools because it's 1 basis; updating select variants integrationtest 2012-07-02 15:55:42 -04:00
Eric Banks cac72bce91 Initial version of int indexed mapping for BQSR. Will be cleaned up in a bit. 2012-07-02 14:33:33 -04:00
Mark DePristo 602729c09d Moved parallel tests from SelectVariants to separate SelectVariantsParallelIntegrationTest
-- Enabled previous tests -- all now working
-- Added modern test against new VCF as well
2012-07-02 11:40:28 -04:00
Mark DePristo bcd2e13d8b Adding duplicate header line keys is a logger.debug not logger.warn message now 2012-07-02 11:39:34 -04:00
Mark DePristo 01e04992f8 Fixed compatibilities in AbstractVCFCodec that resulted in key=; being parsed as written as key; in VCF output 2012-07-02 11:38:59 -04:00
Eric Banks c94c8a9c09 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-02 08:53:01 -04:00
Mark DePristo 337e434c01 Test files for LENIENT_VCF_PROCESSING and repairing VCF headers 2012-07-01 15:52:04 -04:00
Mark DePristo 7aff4446d4 Added unit tests for header repairing capabilities in the GATK engine 2012-07-01 15:38:10 -04:00
Mark DePristo 480b32e759 BCF2 is now officially zero-based open-interval, and that's how the GATK does it now 2012-07-01 14:59:27 -04:00
Ryan Poplin b6093ff02c Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-01 10:32:37 -04:00
Mark DePristo 9b87dcda4f Fixing remaining integration test errors. Adding missing ex2.bcf 2012-06-30 16:23:11 -04:00
David Roazen d7d140eb10 Move archived qscripts from private/scala to private/archive/scala 2012-06-30 13:15:57 -04:00
David Roazen 6dc8f7c049 Revert "Moving scala archive files to the actual archive"
This reverts commit 1ea2cbeda707f3aa72c18221ee97c9b1f3560884.
2012-06-30 13:04:40 -04:00
David Roazen 1ff2384c0c Revert "Removing extra directory level"
This reverts commit f83536ecefb7cc2414bbc5e0381c2208ab4eab68.
2012-06-30 13:04:20 -04:00
Mark DePristo 40c03ab2f5 Removing extra directory level 2012-06-30 12:16:09 -04:00
Mark DePristo 955bab0802 Moving scala archive files to the actual archive 2012-06-30 11:37:27 -04:00
Mark DePristo 5ad9a98a15 Minor bugfixes / consistency fixes to filter strings of Genotypes and AC/AF annotations
-- GenotypeBuilder now sorts the list of filter strings so that the output is in a consistent order
-- calculateChromosomeCounts removes the AC/AF fields entirely when there are no alt alleles, to be on VCF spec for A defined info field values
2012-06-30 11:22:49 -04:00
Mark DePristo 6ebefcc89f Added contig and info field header lines 2012-06-30 11:22:49 -04:00
Mark DePristo 385a3c630f Added check in VariantContext.validate to ensure that getEnd() == END value when present
-- Fixed bug in VariantDataManager that this validation mode was intended to detect going forward
-- Still no VariantRecalibrationWalkersIntegrationTest for indels with BCF2 but that's because LowQual is missing from test VCF
2012-06-30 11:22:48 -04:00
Mark DePristo 893630af53 Enabling symbolic alleles in BCF2
-- Bugfix for VCFDiffableReader: don't add null filters to object
-- BCF2Codec uses new VCFAlleleClipper to handle clipping / unclipping of alleles
-- AbstractVCFCodec: decodeLoc uses full decode() [still doesn't decode genotypes] to avoid dangerous code duplication.  Refactored code that clipped alleles and determined end position into updateBuilderAllelesAndStop method that uses new VCFAlleleClipper. Fixed bug by ensuring the VCF codec always uses the END field in the INFO when it's provided, not just in the case where the there's a biallelic symbolic allele
-- Brand new home for allele clipping / padding routines in VCFAlleleClipper.  Actually documented this code, which results in lots of **** negative comments on the code quality.  Eric has promised that he and Ami are going to rethink this code from scratch.  Fixed many nasty bugs in here, cleaning up unnecessary branches, etc.  Added UnitTests in VCFAlleleClipper that actually test the code full.  In the process of testing I discovered lots of edge cases that don't work, and I've commented out failing tests or manually skipped them, noting how this tests need to be fixed.  Even introduced some minor optimizations
-- VariantContext: validateAllele was broken in the case where there were mixed symbolic and concrete alleles, failing validation for no reason.  Fixed.
-- Added computeEndFromAlleles() function to VariantContextUtils and VariantContextBuilder for convenience calculating where the VC really ends given alleles
--
2012-06-30 11:22:48 -04:00
Mark DePristo 16276f81a1 BCF2 with support symbolic alleles
-- refactored allele clipping / padding code into VCFAlleleClipping class, and added much needed docs and TODOs for methods dev guys
-- Added real unit tests for (some) clipping operations in VCFUtilsUnitTest
2012-06-30 11:22:48 -04:00
Mark DePristo 86feea917e Updating MD5s to reflect new FT fixed count of 1 not UNBOUNDED 2012-06-30 11:22:47 -04:00
Mark DePristo 6bea28ae6f Genotype filters are now just Strings, not Set<String> 2012-06-30 11:22:47 -04:00
Guillermo del Angel f631be8d80 UnifiedGenotyperEngine.calculateGenotypes() is not only used in UG but in other walkers - vc attributes shouldn't be inherited by default or it may cause undefined behaviour in those walkers, so only inherit attributes from input vc in case of UG calling this function 2012-06-29 23:51:52 -04:00
Guillermo del Angel 14274c43d9 Added integration tests for true pools, using a 1 MB extract from the 93-pool large scale validation data. Test -glm BOTH and -glm INDEL 2012-06-29 14:22:33 -04:00
Guillermo del Angel b945fb22a1 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-06-29 14:21:46 -04:00
Joel Thibault cbc79443b0 Add intervals file option to Tester scripts 2012-06-29 13:49:47 -04:00
Joel Thibault f95b9848b4 Add new scripts for VCF and BCF testing 2012-06-29 13:49:47 -04:00
Joel Thibault 256704c03d Copy intervals/intersection processing to new MongoDBIntersection file 2012-06-29 13:49:45 -04:00
Ryan Poplin 23e13df9d4 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-06-29 11:31:08 -04:00
Guillermo del Angel 65037b87da Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-06-29 11:08:44 -04:00
Guillermo del Angel 5a9a37ba01 Pool caller improvements: a) Log ref sample depth at every called site (will add more ref-related annotations later), b) Make -glm POOLBOTH work in case we want to genotype snp's and indels together, c) indel bug fix (pool and non-pool): prevent a bad GenomeLoc to be formed if we're running GGA and incoming alleles are larger than ref window size (typically 400 bb) 2012-06-29 11:08:16 -04:00
Eric Banks 96ea334bf2 Disable caching in BQSR for now since it significantly slows down computation; will look into this in a bit. 2012-06-28 15:27:44 -04:00
Joel Thibault e9d750297a Updates for the inline/extended attributes split 2012-06-28 13:38:00 -04:00
Joel Thibault abe74dc32d Navel -> GXDB 2012-06-28 13:38:00 -04:00
Ryan Poplin 05791ebf80 Adding the Clipping rank sum test: If alternate-supporting reads have more hard clipping than reference-supporting reads this is evidence for error. 2012-06-28 13:22:56 -04:00
Ryan Poplin d12ec92a55 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-06-28 12:57:59 -04:00
Ryan Poplin 5bb0693888 Bug fix for HC GGA mode. Shouldn't try to add an indel into the haplotype if that haplotype already contains the event of interest. Misc minor assembly param changes. Turning off capping of base qualities by base indel qualities until we can evaluate that change. 2012-06-28 12:57:51 -04:00
Khalid Shakir 1ce0b9d519 Throwing UnknownTribbleType exception instead of CommandLineException when an unknown tribble type is specified. 2012-06-28 11:28:04 -04:00