Commit Graph

11422 Commits (ffbd4d85f2e0112b32df0bbba00330b00a0806cf)

Author SHA1 Message Date
depristo 40b3941678 Merge pull request #7 from jsilter/master
Add removeCall to NA12878KnowledgeBase
2012-12-17 08:05:55 -08:00
Eric Banks 762f184262 Bug fix for strict validation: rsID checking wasn't working if there were multiple IDs 2012-12-17 10:32:41 -05:00
Eric Banks 1d18ee26cc Merge remote-tracking branch 'unstable/master' 2012-12-17 09:03:03 -05:00
Yossi Farjoun ea704d688f chose smaller buffer size for the bufferedStream 2012-12-15 13:01:38 -05:00
Yossi Farjoun 6da2338ea7 removed comments and uneeded imports 2012-12-15 12:31:37 -05:00
Yossi Farjoun 19dd2d628a some changes.
some changes.
2012-12-14 17:21:32 -05:00
Jacob Silterra 8198414352 Add removeCall, which removes all calls matching the provided MongoVariantContext. _id and Date fields are excluded. Generally intended te remove only a single call 2012-12-14 14:54:30 -05:00
Mauricio Carneiro 5f1afb4136 Fixing an off-by-one clipping error in ReduceReads for reads off the contig
Reads that are soft-clipped off the contig (before the beginning of the contig) were being soft-clipped to position 0 instead of 1 because of an off-by-one issue. Fixed and included in the integration test.
2012-12-13 22:10:11 -05:00
Mauricio Carneiro 74344a3871 Bringing in the changes from the CMI repo 2012-12-13 21:59:37 -05:00
Eric Banks 696bf95fba Fix for PBT bug reported on the forum: the AD is actually output correctly now (rather than with 'null' or some gibberish memory pointer). 2012-12-13 23:28:30 +00:00
Mark DePristo aeab932c63 Actual working version of unflushing VCFWriter
-- Uses high-performance local writer backed by byte array that writes the entire VCF line in some write operation to the underlying output stream.
-- Fixes problems with indexing of unflushed writes while still allowing efficient block zipping
-- Same (or better) IO performance as previous implementation
-- IndexingVariantContextWriter now properly closes the underlying output stream when it's closed
-- Updated compressed VCF output file
2012-12-13 16:15:08 -05:00
Guillermo del Angel 97e880654b Bug fix in setting output file names 2012-12-13 15:08:22 -05:00
Guillermo del Angel cf44ec1ace Experiment: don't compress picard BAM outputs 2012-12-12 17:45:55 -05:00
Guillermo del Angel 021929788c Don't compress intermediate BAMs, will save time 2012-12-12 17:40:25 -05:00
Yossi Farjoun 5e66109268 Replaced a useless getInt with a skipInt to remove 1/4 of the initial seek time in the BAM Index. 2012-12-12 17:08:11 -05:00
Chris Hartl 5780926254 Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable 2012-12-12 15:42:40 -05:00
Eric Banks 62eaffdf0a Fix docs for ReadBackedPhasing 2012-12-12 20:28:04 +00:00
Eric Banks bba63a3b0e Fix for GSA-615: UnifiedGenotyperEngine.getGLModelsToUse takes 5% of the runtime of UG, should be optimized away. 2012-12-12 20:25:45 +00:00
Ryan Poplin 211a6e78ea Further related bug fixes to GGA mode in the HC: some variants (especially MNPs) were causing problems because they don't have to start at the current location to match the allele being genotyped. Fixed. 2012-12-12 14:53:02 -05:00
Mauricio Carneiro 33290bfe0c Added integration test to catch the read off contig in ReduceReads.
So upstream changes won't break it again.
2012-12-12 13:49:54 -05:00
Mauricio Carneiro a52e3c7e15 Revert "Bug fix for RR: don't let the softclip start position be less than 1"
this introduced a bug in reduce reads by de-activating it's hard clipping of the out of bounds soft-clips (specially in the MT).
DEV-322 #resolve #time 4m

This reverts commit 42acfd9d0bccfc0411944c342a5b889f5feae736.
2012-12-12 13:09:39 -05:00
Mark DePristo 5632c13bf2 Resolves GSA-681 / Compressed VCF.gz output is too big because of unnecessary call to flush().
-- Now compressed output VCFs are properly blocked compressed (i.e., they are actually smaller than the uncompressed VCF)
2012-12-12 10:27:07 -05:00
Guillermo del Angel 216f92276c Disable scatter-gather with PrintReads since we're already setting high nct so it's unnecessary 2012-12-12 09:06:37 -05:00
Kristian Cibulskis 0e5b1093fb initial implementation of contamination estimation, tested on single gene (which doesn't have enough data) waiting to test on exome/chr20 2012-12-11 15:59:59 -05:00
Mark DePristo dd52a70d45 Fix AFCalcResult unit test
-- I was simply passing in the wrong values into the function.  Fixed the calls, and expanded the docs on what needs to be passed in.
2012-12-11 10:40:12 -05:00
Ami Levy-Moonshine 6bf31065e3 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-11 10:34:50 -05:00
Ami Levy-Moonshine 2f99569dda change the md5 in one of the CV intergration tests, since it wasn't use the priority list when printing the origin of the annotation (the setValue field) 2012-12-10 22:48:15 -05:00
Ami Levy-Moonshine 2e3284f306 Continue to fix the case where PRIORITIZE is used but no priority list is given. While fixing that case I also removed unnecessary sorting, when the prioeity list is not provied. When the priority list is not provided, it will continue to be null. Thus, the number of original Variant Contexts should be given as a new parameter to simpleMerge (since priority might be null). This new parameter is used for checking if there are filtered VC, when annotationOrigin is true. 2012-12-10 22:23:58 -05:00
Mauricio Carneiro 19372225af Merge Broad's GATK and CMI gatk 2012-12-10 15:33:38 -05:00
Mauricio Carneiro 8a115edbaf ReduceReads is now scattered by contig
It's no longer safe to scatter/gather by interval because now we don't hard-clip to the intervals anymore.
2012-12-10 15:25:27 -05:00
Eric Banks bdda63d973 Related bug fixes to GGA mode in the HC: some variants (especially MNPs) were causing problems because they don't have to start at the current location to match the allele being genotyped. Fixed. 2012-12-10 14:47:04 -05:00
Ryan Poplin ceb5431dcb Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-10 12:24:47 -05:00
Ryan Poplin c84ff9d75e Adding explicit true negative assessment category to the AssessNA12878 walker. 2012-12-10 12:24:43 -05:00
Ami Levy-Moonshine 573ace4403 restore the right version of VariantContextUtils.java in my unstable dir 2012-12-10 10:28:56 -05:00
David Roazen 46edab6d6a Use the new downsampling implementation by default
-Switch back to the old implementation, if needed, with --use_legacy_downsampler

-LocusIteratorByStateExperimental becomes the new LocusIteratorByState, and
the original LocusIteratorByState becomes LegacyLocusIteratorByState

-Similarly, the ExperimentalReadShardBalancer becomes the new ReadShardBalancer,
with the old one renamed to LegacyReadShardBalancer

-Performance improvements: locus traversals used to be 20% slower in the new
downsampling implementation, now they are roughly the same speed.

-Tests show a very high level of concordance with UG calls from the previous
implementation, with some new calls and edge cases that still require more examination.

-With the new implementation, can now use -dcov with ReadWalkers to set a limit
on the max # of reads per alignment start position per sample. Appropriate value
for ReadWalker dcov may be in the single digits for some tools, but this too
requires more investigation.
2012-12-10 09:44:50 -05:00
Ami Levy-Moonshine 5460c96137 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-09 23:43:57 -05:00
Ami Levy-Moonshine 3a420d163e (1) changes in catVariants (work still under development) (2) changes to CV to throw an error when GenotypeMergeType is PRIORITIZE but no priority (rod_priority_list) is not given. Reported by TechnicalVault on the forum on Nov 14 2012 2012-12-09 23:40:03 -05:00
Eric Banks 2637f512f8 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-09 02:09:40 -05:00
Eric Banks 574d5b467f Bug fix for indel HMM: protect against situation where long reads (e.g. Sanger) in a pileup can lead to a read starting after the haplotype end for a given haplotype. 2012-12-09 02:09:34 -05:00
Mark DePristo 9b6ee0576f Fix bugs in the consensus genotype creation algorithm for the NA12878 KB
-- Was screwing up mixed reviewed / non-reviewed sites.  Now only considered reviewed calls, if any are present, or all calls if no reviewed sites are found
-- Was just taking the first genotype, now it properly looks at all of the genotype calls and makes a reasonable guess what the answer should be
-- Added unit tests for the consensus creation algorithm
2012-12-08 13:18:07 -05:00
Mark DePristo bf8421eeb7 Fixes GSA-671 / AFCalcResult.log10pNonRefByAllele should really be log10pRefByAllele
-- The current implementation of AFCalcResult contains a map from allele -> log10pNonRef. The only use of this field is to support the isPolymorphic function per allele. The call to this function looks like isPolymorphic(allele, QUAL). The QUAL is a phred-scaled threshold where you want to include alleles where the log10pNonRef >= QUAL (appropriately transformed). The problem is that when log10pNonRef is large, it quickly gets set to 0, while it's complementary log10pRef value has a meaningful log10 value. For example, if log10pRef = -100 (not an uncommonly large value), log10pNonRef = 0.0.
-- In order to preserve precision and allow us to more finally differentiate high QUAL from low QUAL (but still poly) sites we should store log10pRef values instead, and test that log10pRef <= threshold.
-- See https://jira.broadinstitute.org/browse/GSA-671 for more information.
2012-12-07 16:03:40 -05:00
Ryan Poplin 3355216366 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-07 15:35:17 -05:00
Ryan Poplin 9573648f85 Changes to count the sites which might be present in some of the input rods but not present at all in other rods. Now loop over the input rod names instead of looping over the tracker results. 2012-12-07 15:35:08 -05:00
Joel Thibault 3b0e3767bf Add a test for a read that extends off the end of chr1 2012-12-07 14:07:15 -05:00
Mauricio Carneiro 58e39a8468 Enabling 4-way parallel by default in FastQ2BAM
DEV-317
2012-12-06 17:27:54 -05:00
Joel Thibault cc4e3ec589 Update TODO list 2012-12-06 12:06:47 -05:00
Mark DePristo abd94b2976 Bugfix for handling invalid records in NA12878 KB
-- The previous approach tried to remove the entire MongoVariantContext but when it was malformed was prone to error.  Now just grabs the _id and uses it to remove the bad record.
2012-12-06 10:24:24 -05:00
Eric Banks 406adb8d44 The allele biased downsampling should not abort if there's a reduced read. Rather it should always keep the RR and downsample only original reads in the pileup. 2012-12-05 23:15:36 -05:00
Mauricio Carneiro 6d22f4f737 Bringing latest performance updates from the GATK to CMI 2012-12-05 21:40:03 -05:00
Mark DePristo dbf721968d PrintReads large-scale test to protect against another major low-level performance issue 2012-12-05 21:36:27 -05:00