Ryan Poplin
9e8e78de15
Adding the model name to the VQSR filter lines so that they don't get clobbered with consecutive VQSR runs for SNPs and then indels.
2012-07-03 14:30:37 -04:00
Eric Banks
0b37d44b0d
Optimizations for the RecalDatum to make BQSR (Count Covariates) much faster. Needs some cleanup.
2012-07-03 13:05:11 -04:00
Eric Banks
031322ff00
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-03 00:12:59 -04:00
Eric Banks
a4670113bd
Refactored/renamed the nested integer array; cleaned up code a bit.
2012-07-03 00:12:33 -04:00
Ryan Poplin
f92139dd82
Ooops, UG VA path for rank sum tests aren't happy with empty lists. Disabling clipping rank sum test for now.
2012-07-02 21:12:42 -04:00
Ryan Poplin
7aa3f04402
Updating HC integration tests.
2012-07-02 20:43:52 -04:00
Ryan Poplin
7e7b4cd1b9
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-02 16:37:54 -04:00
Ryan Poplin
b807ff63ef
HaplotypeCaller now creates MNP and complex substitutions by using LD information to decide if events segregate together on haplotypes. Added unit test.
2012-07-02 16:37:39 -04:00
Mauricio Carneiro
3cea080aa8
Cache SoftStart() and SoftEnd() in the GATKSAMRecord
...
these are costly operations when done repeatedly on the same read.
2012-07-02 16:22:00 -04:00
Mauricio Carneiro
88a02fa2cb
Fixing but for reads with cigars like 9S54H
...
When hard-clipping predict when the read is going to be fully hard clipped to the point where only soft/hard-clips are left in the read and preemptively eliminate the read before the SAMRecord mathematics on malformed cigars kills the GATK.
2012-07-02 16:22:00 -04:00
Joel Thibault
9ee58d323a
Pass the original GATK unsafe parameter to the VcfGatherFunction
2012-07-02 16:03:11 -04:00
Mark DePristo
1b0a775773
Disabling bcf2 reading from samtools because it's 1 basis; updating select variants integrationtest
2012-07-02 15:55:42 -04:00
Eric Banks
cac72bce91
Initial version of int indexed mapping for BQSR. Will be cleaned up in a bit.
2012-07-02 14:33:33 -04:00
Mark DePristo
602729c09d
Moved parallel tests from SelectVariants to separate SelectVariantsParallelIntegrationTest
...
-- Enabled previous tests -- all now working
-- Added modern test against new VCF as well
2012-07-02 11:40:28 -04:00
Mark DePristo
bcd2e13d8b
Adding duplicate header line keys is a logger.debug not logger.warn message now
2012-07-02 11:39:34 -04:00
Mark DePristo
01e04992f8
Fixed compatibilities in AbstractVCFCodec that resulted in key=; being parsed as written as key; in VCF output
2012-07-02 11:38:59 -04:00
Eric Banks
c94c8a9c09
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-02 08:53:01 -04:00
Mark DePristo
337e434c01
Test files for LENIENT_VCF_PROCESSING and repairing VCF headers
2012-07-01 15:52:04 -04:00
Mark DePristo
7aff4446d4
Added unit tests for header repairing capabilities in the GATK engine
2012-07-01 15:38:10 -04:00
Mark DePristo
480b32e759
BCF2 is now officially zero-based open-interval, and that's how the GATK does it now
2012-07-01 14:59:27 -04:00
Ryan Poplin
b6093ff02c
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-01 10:32:37 -04:00
Mark DePristo
9b87dcda4f
Fixing remaining integration test errors. Adding missing ex2.bcf
2012-06-30 16:23:11 -04:00
David Roazen
d7d140eb10
Move archived qscripts from private/scala to private/archive/scala
2012-06-30 13:15:57 -04:00
David Roazen
6dc8f7c049
Revert "Moving scala archive files to the actual archive"
...
This reverts commit 1ea2cbeda707f3aa72c18221ee97c9b1f3560884.
2012-06-30 13:04:40 -04:00
David Roazen
1ff2384c0c
Revert "Removing extra directory level"
...
This reverts commit f83536ecefb7cc2414bbc5e0381c2208ab4eab68.
2012-06-30 13:04:20 -04:00
Mark DePristo
40c03ab2f5
Removing extra directory level
2012-06-30 12:16:09 -04:00
Mark DePristo
955bab0802
Moving scala archive files to the actual archive
2012-06-30 11:37:27 -04:00
Mark DePristo
5ad9a98a15
Minor bugfixes / consistency fixes to filter strings of Genotypes and AC/AF annotations
...
-- GenotypeBuilder now sorts the list of filter strings so that the output is in a consistent order
-- calculateChromosomeCounts removes the AC/AF fields entirely when there are no alt alleles, to be on VCF spec for A defined info field values
2012-06-30 11:22:49 -04:00
Mark DePristo
6ebefcc89f
Added contig and info field header lines
2012-06-30 11:22:49 -04:00
Mark DePristo
385a3c630f
Added check in VariantContext.validate to ensure that getEnd() == END value when present
...
-- Fixed bug in VariantDataManager that this validation mode was intended to detect going forward
-- Still no VariantRecalibrationWalkersIntegrationTest for indels with BCF2 but that's because LowQual is missing from test VCF
2012-06-30 11:22:48 -04:00
Mark DePristo
893630af53
Enabling symbolic alleles in BCF2
...
-- Bugfix for VCFDiffableReader: don't add null filters to object
-- BCF2Codec uses new VCFAlleleClipper to handle clipping / unclipping of alleles
-- AbstractVCFCodec: decodeLoc uses full decode() [still doesn't decode genotypes] to avoid dangerous code duplication. Refactored code that clipped alleles and determined end position into updateBuilderAllelesAndStop method that uses new VCFAlleleClipper. Fixed bug by ensuring the VCF codec always uses the END field in the INFO when it's provided, not just in the case where the there's a biallelic symbolic allele
-- Brand new home for allele clipping / padding routines in VCFAlleleClipper. Actually documented this code, which results in lots of **** negative comments on the code quality. Eric has promised that he and Ami are going to rethink this code from scratch. Fixed many nasty bugs in here, cleaning up unnecessary branches, etc. Added UnitTests in VCFAlleleClipper that actually test the code full. In the process of testing I discovered lots of edge cases that don't work, and I've commented out failing tests or manually skipped them, noting how this tests need to be fixed. Even introduced some minor optimizations
-- VariantContext: validateAllele was broken in the case where there were mixed symbolic and concrete alleles, failing validation for no reason. Fixed.
-- Added computeEndFromAlleles() function to VariantContextUtils and VariantContextBuilder for convenience calculating where the VC really ends given alleles
--
2012-06-30 11:22:48 -04:00
Mark DePristo
16276f81a1
BCF2 with support symbolic alleles
...
-- refactored allele clipping / padding code into VCFAlleleClipping class, and added much needed docs and TODOs for methods dev guys
-- Added real unit tests for (some) clipping operations in VCFUtilsUnitTest
2012-06-30 11:22:48 -04:00
Mark DePristo
86feea917e
Updating MD5s to reflect new FT fixed count of 1 not UNBOUNDED
2012-06-30 11:22:47 -04:00
Mark DePristo
6bea28ae6f
Genotype filters are now just Strings, not Set<String>
2012-06-30 11:22:47 -04:00
Guillermo del Angel
f631be8d80
UnifiedGenotyperEngine.calculateGenotypes() is not only used in UG but in other walkers - vc attributes shouldn't be inherited by default or it may cause undefined behaviour in those walkers, so only inherit attributes from input vc in case of UG calling this function
2012-06-29 23:51:52 -04:00
Guillermo del Angel
14274c43d9
Added integration tests for true pools, using a 1 MB extract from the 93-pool large scale validation data. Test -glm BOTH and -glm INDEL
2012-06-29 14:22:33 -04:00
Guillermo del Angel
b945fb22a1
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-29 14:21:46 -04:00
Joel Thibault
cbc79443b0
Add intervals file option to Tester scripts
2012-06-29 13:49:47 -04:00
Joel Thibault
f95b9848b4
Add new scripts for VCF and BCF testing
2012-06-29 13:49:47 -04:00
Joel Thibault
256704c03d
Copy intervals/intersection processing to new MongoDBIntersection file
2012-06-29 13:49:45 -04:00
Ryan Poplin
23e13df9d4
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-29 11:31:08 -04:00
Guillermo del Angel
65037b87da
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-29 11:08:44 -04:00
Guillermo del Angel
5a9a37ba01
Pool caller improvements: a) Log ref sample depth at every called site (will add more ref-related annotations later), b) Make -glm POOLBOTH work in case we want to genotype snp's and indels together, c) indel bug fix (pool and non-pool): prevent a bad GenomeLoc to be formed if we're running GGA and incoming alleles are larger than ref window size (typically 400 bb)
2012-06-29 11:08:16 -04:00
Eric Banks
96ea334bf2
Disable caching in BQSR for now since it significantly slows down computation; will look into this in a bit.
2012-06-28 15:27:44 -04:00
Joel Thibault
e9d750297a
Updates for the inline/extended attributes split
2012-06-28 13:38:00 -04:00
Joel Thibault
abe74dc32d
Navel -> GXDB
2012-06-28 13:38:00 -04:00
Ryan Poplin
05791ebf80
Adding the Clipping rank sum test: If alternate-supporting reads have more hard clipping than reference-supporting reads this is evidence for error.
2012-06-28 13:22:56 -04:00
Ryan Poplin
d12ec92a55
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-28 12:57:59 -04:00
Ryan Poplin
5bb0693888
Bug fix for HC GGA mode. Shouldn't try to add an indel into the haplotype if that haplotype already contains the event of interest. Misc minor assembly param changes. Turning off capping of base qualities by base indel qualities until we can evaluate that change.
2012-06-28 12:57:51 -04:00
Khalid Shakir
1ce0b9d519
Throwing UnknownTribbleType exception instead of CommandLineException when an unknown tribble type is specified.
2012-06-28 11:28:04 -04:00