gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	e6f468b647	Refactored the quasi-useful IndelType annotation into the more useful VariantType. The indels are still annotated as before, but now all other variant types are annotated too. I'm doing this because of requests on the forum but am not making it standard. If we find it to be useful we can turn it on by default later.	2012-12-17 11:54:47 -05:00
Ryan Poplin	6171419e6c	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-17 11:27:52 -05:00
Ryan Poplin	98f18b5f9e	Changing the HC over to using the non-contamination-downsampled read maps for the purposes of annotations. This behavior now matches the UG. There is a new command line option to go back to the older behavior to explore the differences.	2012-12-17 11:27:44 -05:00
depristo	40b3941678	Merge pull request #7 from jsilter/master Add removeCall to NA12878KnowledgeBase	2012-12-17 08:05:55 -08:00
Eric Banks	762f184262	Bug fix for strict validation: rsID checking wasn't working if there were multiple IDs	2012-12-17 10:32:41 -05:00
Eric Banks	1d18ee26cc	Merge remote-tracking branch 'unstable/master'	2012-12-17 09:03:03 -05:00
Jacob Silterra	8198414352	Add removeCall, which removes all calls matching the provided MongoVariantContext. _id and Date fields are excluded. Generally intended te remove only a single call	2012-12-14 14:54:30 -05:00
Mauricio Carneiro	5f1afb4136	Fixing an off-by-one clipping error in ReduceReads for reads off the contig Reads that are soft-clipped off the contig (before the beginning of the contig) were being soft-clipped to position 0 instead of 1 because of an off-by-one issue. Fixed and included in the integration test.	2012-12-13 22:10:11 -05:00
Mauricio Carneiro	74344a3871	Bringing in the changes from the CMI repo	2012-12-13 21:59:37 -05:00
Eric Banks	696bf95fba	Fix for PBT bug reported on the forum: the AD is actually output correctly now (rather than with 'null' or some gibberish memory pointer).	2012-12-13 23:28:30 +00:00
Mark DePristo	aeab932c63	Actual working version of unflushing VCFWriter -- Uses high-performance local writer backed by byte array that writes the entire VCF line in some write operation to the underlying output stream. -- Fixes problems with indexing of unflushed writes while still allowing efficient block zipping -- Same (or better) IO performance as previous implementation -- IndexingVariantContextWriter now properly closes the underlying output stream when it's closed -- Updated compressed VCF output file	2012-12-13 16:15:08 -05:00
Guillermo del Angel	97e880654b	Bug fix in setting output file names	2012-12-13 15:08:22 -05:00
Guillermo del Angel	cf44ec1ace	Experiment: don't compress picard BAM outputs	2012-12-12 17:45:55 -05:00
Guillermo del Angel	021929788c	Don't compress intermediate BAMs, will save time	2012-12-12 17:40:25 -05:00
Yossi Farjoun	5e66109268	Replaced a useless getInt with a skipInt to remove 1/4 of the initial seek time in the BAM Index.	2012-12-12 17:08:11 -05:00
Eric Banks	62eaffdf0a	Fix docs for ReadBackedPhasing	2012-12-12 20:28:04 +00:00
Eric Banks	bba63a3b0e	Fix for GSA-615: UnifiedGenotyperEngine.getGLModelsToUse takes 5% of the runtime of UG, should be optimized away.	2012-12-12 20:25:45 +00:00
Ryan Poplin	211a6e78ea	Further related bug fixes to GGA mode in the HC: some variants (especially MNPs) were causing problems because they don't have to start at the current location to match the allele being genotyped. Fixed.	2012-12-12 14:53:02 -05:00
Mauricio Carneiro	33290bfe0c	Added integration test to catch the read off contig in ReduceReads. So upstream changes won't break it again.	2012-12-12 13:49:54 -05:00
Mauricio Carneiro	a52e3c7e15	Revert "Bug fix for RR: don't let the softclip start position be less than 1" this introduced a bug in reduce reads by de-activating it's hard clipping of the out of bounds soft-clips (specially in the MT). DEV-322 #resolve #time 4m This reverts commit 42acfd9d0bccfc0411944c342a5b889f5feae736.	2012-12-12 13:09:39 -05:00
Mark DePristo	5632c13bf2	Resolves GSA-681 / Compressed VCF.gz output is too big because of unnecessary call to flush(). -- Now compressed output VCFs are properly blocked compressed (i.e., they are actually smaller than the uncompressed VCF)	2012-12-12 10:27:07 -05:00
Guillermo del Angel	216f92276c	Disable scatter-gather with PrintReads since we're already setting high nct so it's unnecessary	2012-12-12 09:06:37 -05:00
Kristian Cibulskis	0e5b1093fb	initial implementation of contamination estimation, tested on single gene (which doesn't have enough data) waiting to test on exome/chr20	2012-12-11 15:59:59 -05:00
Mark DePristo	dd52a70d45	Fix AFCalcResult unit test -- I was simply passing in the wrong values into the function. Fixed the calls, and expanded the docs on what needs to be passed in.	2012-12-11 10:40:12 -05:00
Ami Levy-Moonshine	6bf31065e3	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-11 10:34:50 -05:00
Ami Levy-Moonshine	2f99569dda	change the md5 in one of the CV intergration tests, since it wasn't use the priority list when printing the origin of the annotation (the setValue field)	2012-12-10 22:48:15 -05:00
Ami Levy-Moonshine	2e3284f306	Continue to fix the case where PRIORITIZE is used but no priority list is given. While fixing that case I also removed unnecessary sorting, when the prioeity list is not provied. When the priority list is not provided, it will continue to be null. Thus, the number of original Variant Contexts should be given as a new parameter to simpleMerge (since priority might be null). This new parameter is used for checking if there are filtered VC, when annotationOrigin is true.	2012-12-10 22:23:58 -05:00
Mauricio Carneiro	19372225af	Merge Broad's GATK and CMI gatk	2012-12-10 15:33:38 -05:00
Mauricio Carneiro	8a115edbaf	ReduceReads is now scattered by contig It's no longer safe to scatter/gather by interval because now we don't hard-clip to the intervals anymore.	2012-12-10 15:25:27 -05:00
Eric Banks	bdda63d973	Related bug fixes to GGA mode in the HC: some variants (especially MNPs) were causing problems because they don't have to start at the current location to match the allele being genotyped. Fixed.	2012-12-10 14:47:04 -05:00
Ryan Poplin	ceb5431dcb	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-10 12:24:47 -05:00
Ryan Poplin	c84ff9d75e	Adding explicit true negative assessment category to the AssessNA12878 walker.	2012-12-10 12:24:43 -05:00
Ami Levy-Moonshine	573ace4403	restore the right version of VariantContextUtils.java in my unstable dir	2012-12-10 10:28:56 -05:00
David Roazen	46edab6d6a	Use the new downsampling implementation by default -Switch back to the old implementation, if needed, with --use_legacy_downsampler -LocusIteratorByStateExperimental becomes the new LocusIteratorByState, and the original LocusIteratorByState becomes LegacyLocusIteratorByState -Similarly, the ExperimentalReadShardBalancer becomes the new ReadShardBalancer, with the old one renamed to LegacyReadShardBalancer -Performance improvements: locus traversals used to be 20% slower in the new downsampling implementation, now they are roughly the same speed. -Tests show a very high level of concordance with UG calls from the previous implementation, with some new calls and edge cases that still require more examination. -With the new implementation, can now use -dcov with ReadWalkers to set a limit on the max # of reads per alignment start position per sample. Appropriate value for ReadWalker dcov may be in the single digits for some tools, but this too requires more investigation.	2012-12-10 09:44:50 -05:00
Ami Levy-Moonshine	5460c96137	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-09 23:43:57 -05:00
Ami Levy-Moonshine	3a420d163e	(1) changes in catVariants (work still under development) (2) changes to CV to throw an error when GenotypeMergeType is PRIORITIZE but no priority (rod_priority_list) is not given. Reported by TechnicalVault on the forum on Nov 14 2012	2012-12-09 23:40:03 -05:00
Eric Banks	2637f512f8	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-09 02:09:40 -05:00
Eric Banks	574d5b467f	Bug fix for indel HMM: protect against situation where long reads (e.g. Sanger) in a pileup can lead to a read starting after the haplotype end for a given haplotype.	2012-12-09 02:09:34 -05:00
Mark DePristo	9b6ee0576f	Fix bugs in the consensus genotype creation algorithm for the NA12878 KB -- Was screwing up mixed reviewed / non-reviewed sites. Now only considered reviewed calls, if any are present, or all calls if no reviewed sites are found -- Was just taking the first genotype, now it properly looks at all of the genotype calls and makes a reasonable guess what the answer should be -- Added unit tests for the consensus creation algorithm	2012-12-08 13:18:07 -05:00
Mark DePristo	bf8421eeb7	Fixes GSA-671 / AFCalcResult.log10pNonRefByAllele should really be log10pRefByAllele -- The current implementation of AFCalcResult contains a map from allele -> log10pNonRef. The only use of this field is to support the isPolymorphic function per allele. The call to this function looks like isPolymorphic(allele, QUAL). The QUAL is a phred-scaled threshold where you want to include alleles where the log10pNonRef >= QUAL (appropriately transformed). The problem is that when log10pNonRef is large, it quickly gets set to 0, while it's complementary log10pRef value has a meaningful log10 value. For example, if log10pRef = -100 (not an uncommonly large value), log10pNonRef = 0.0. -- In order to preserve precision and allow us to more finally differentiate high QUAL from low QUAL (but still poly) sites we should store log10pRef values instead, and test that log10pRef <= threshold. -- See https://jira.broadinstitute.org/browse/GSA-671 for more information.	2012-12-07 16:03:40 -05:00
Ryan Poplin	3355216366	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-07 15:35:17 -05:00
Ryan Poplin	9573648f85	Changes to count the sites which might be present in some of the input rods but not present at all in other rods. Now loop over the input rod names instead of looping over the tracker results.	2012-12-07 15:35:08 -05:00
Joel Thibault	3b0e3767bf	Add a test for a read that extends off the end of chr1	2012-12-07 14:07:15 -05:00
Mauricio Carneiro	58e39a8468	Enabling 4-way parallel by default in FastQ2BAM DEV-317	2012-12-06 17:27:54 -05:00
Joel Thibault	cc4e3ec589	Update TODO list	2012-12-06 12:06:47 -05:00
Mark DePristo	abd94b2976	Bugfix for handling invalid records in NA12878 KB -- The previous approach tried to remove the entire MongoVariantContext but when it was malformed was prone to error. Now just grabs the _id and uses it to remove the bad record.	2012-12-06 10:24:24 -05:00
Eric Banks	406adb8d44	The allele biased downsampling should not abort if there's a reduced read. Rather it should always keep the RR and downsample only original reads in the pileup.	2012-12-05 23:15:36 -05:00
Mauricio Carneiro	6d22f4f737	Bringing latest performance updates from the GATK to CMI	2012-12-05 21:40:03 -05:00
Mark DePristo	dbf721968d	PrintReads large-scale test to protect against another major low-level performance issue	2012-12-05 21:36:27 -05:00
Ryan Poplin	00c23bf704	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-05 15:53:05 -05:00

1 2 3 4 5 ...

11305 Commits (e6f468b647518f8e3ef8fb030bb4d4eee95eceea) All Branches Search

11305 Commits (e6f468b647518f8e3ef8fb030bb4d4eee95eceea)

All Branches