gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	bba63a3b0e	Fix for GSA-615: UnifiedGenotyperEngine.getGLModelsToUse takes 5% of the runtime of UG, should be optimized away.	2012-12-12 20:25:45 +00:00
Ryan Poplin	211a6e78ea	Further related bug fixes to GGA mode in the HC: some variants (especially MNPs) were causing problems because they don't have to start at the current location to match the allele being genotyped. Fixed.	2012-12-12 14:53:02 -05:00
Mark DePristo	5632c13bf2	Resolves GSA-681 / Compressed VCF.gz output is too big because of unnecessary call to flush(). -- Now compressed output VCFs are properly blocked compressed (i.e., they are actually smaller than the uncompressed VCF)	2012-12-12 10:27:07 -05:00
Mark DePristo	dd52a70d45	Fix AFCalcResult unit test -- I was simply passing in the wrong values into the function. Fixed the calls, and expanded the docs on what needs to be passed in.	2012-12-11 10:40:12 -05:00
Ami Levy-Moonshine	6bf31065e3	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-11 10:34:50 -05:00
Ami Levy-Moonshine	2f99569dda	change the md5 in one of the CV intergration tests, since it wasn't use the priority list when printing the origin of the annotation (the setValue field)	2012-12-10 22:48:15 -05:00
Ami Levy-Moonshine	2e3284f306	Continue to fix the case where PRIORITIZE is used but no priority list is given. While fixing that case I also removed unnecessary sorting, when the prioeity list is not provied. When the priority list is not provided, it will continue to be null. Thus, the number of original Variant Contexts should be given as a new parameter to simpleMerge (since priority might be null). This new parameter is used for checking if there are filtered VC, when annotationOrigin is true.	2012-12-10 22:23:58 -05:00
Mauricio Carneiro	8a115edbaf	ReduceReads is now scattered by contig It's no longer safe to scatter/gather by interval because now we don't hard-clip to the intervals anymore.	2012-12-10 15:25:27 -05:00
Eric Banks	bdda63d973	Related bug fixes to GGA mode in the HC: some variants (especially MNPs) were causing problems because they don't have to start at the current location to match the allele being genotyped. Fixed.	2012-12-10 14:47:04 -05:00
Ryan Poplin	ceb5431dcb	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-10 12:24:47 -05:00
Ryan Poplin	c84ff9d75e	Adding explicit true negative assessment category to the AssessNA12878 walker.	2012-12-10 12:24:43 -05:00
Ami Levy-Moonshine	573ace4403	restore the right version of VariantContextUtils.java in my unstable dir	2012-12-10 10:28:56 -05:00
David Roazen	46edab6d6a	Use the new downsampling implementation by default -Switch back to the old implementation, if needed, with --use_legacy_downsampler -LocusIteratorByStateExperimental becomes the new LocusIteratorByState, and the original LocusIteratorByState becomes LegacyLocusIteratorByState -Similarly, the ExperimentalReadShardBalancer becomes the new ReadShardBalancer, with the old one renamed to LegacyReadShardBalancer -Performance improvements: locus traversals used to be 20% slower in the new downsampling implementation, now they are roughly the same speed. -Tests show a very high level of concordance with UG calls from the previous implementation, with some new calls and edge cases that still require more examination. -With the new implementation, can now use -dcov with ReadWalkers to set a limit on the max # of reads per alignment start position per sample. Appropriate value for ReadWalker dcov may be in the single digits for some tools, but this too requires more investigation.	2012-12-10 09:44:50 -05:00
Ami Levy-Moonshine	5460c96137	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-09 23:43:57 -05:00
Ami Levy-Moonshine	3a420d163e	(1) changes in catVariants (work still under development) (2) changes to CV to throw an error when GenotypeMergeType is PRIORITIZE but no priority (rod_priority_list) is not given. Reported by TechnicalVault on the forum on Nov 14 2012	2012-12-09 23:40:03 -05:00
Eric Banks	2637f512f8	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-09 02:09:40 -05:00
Eric Banks	574d5b467f	Bug fix for indel HMM: protect against situation where long reads (e.g. Sanger) in a pileup can lead to a read starting after the haplotype end for a given haplotype.	2012-12-09 02:09:34 -05:00
Mark DePristo	9b6ee0576f	Fix bugs in the consensus genotype creation algorithm for the NA12878 KB -- Was screwing up mixed reviewed / non-reviewed sites. Now only considered reviewed calls, if any are present, or all calls if no reviewed sites are found -- Was just taking the first genotype, now it properly looks at all of the genotype calls and makes a reasonable guess what the answer should be -- Added unit tests for the consensus creation algorithm	2012-12-08 13:18:07 -05:00
Mark DePristo	bf8421eeb7	Fixes GSA-671 / AFCalcResult.log10pNonRefByAllele should really be log10pRefByAllele -- The current implementation of AFCalcResult contains a map from allele -> log10pNonRef. The only use of this field is to support the isPolymorphic function per allele. The call to this function looks like isPolymorphic(allele, QUAL). The QUAL is a phred-scaled threshold where you want to include alleles where the log10pNonRef >= QUAL (appropriately transformed). The problem is that when log10pNonRef is large, it quickly gets set to 0, while it's complementary log10pRef value has a meaningful log10 value. For example, if log10pRef = -100 (not an uncommonly large value), log10pNonRef = 0.0. -- In order to preserve precision and allow us to more finally differentiate high QUAL from low QUAL (but still poly) sites we should store log10pRef values instead, and test that log10pRef <= threshold. -- See https://jira.broadinstitute.org/browse/GSA-671 for more information.	2012-12-07 16:03:40 -05:00
Ryan Poplin	3355216366	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-07 15:35:17 -05:00
Ryan Poplin	9573648f85	Changes to count the sites which might be present in some of the input rods but not present at all in other rods. Now loop over the input rod names instead of looping over the tracker results.	2012-12-07 15:35:08 -05:00
Joel Thibault	3b0e3767bf	Add a test for a read that extends off the end of chr1	2012-12-07 14:07:15 -05:00
Joel Thibault	cc4e3ec589	Update TODO list	2012-12-06 12:06:47 -05:00
Mark DePristo	abd94b2976	Bugfix for handling invalid records in NA12878 KB -- The previous approach tried to remove the entire MongoVariantContext but when it was malformed was prone to error. Now just grabs the _id and uses it to remove the bad record.	2012-12-06 10:24:24 -05:00
Eric Banks	406adb8d44	The allele biased downsampling should not abort if there's a reduced read. Rather it should always keep the RR and downsample only original reads in the pileup.	2012-12-05 23:15:36 -05:00
Mark DePristo	dbf721968d	PrintReads large-scale test to protect against another major low-level performance issue	2012-12-05 21:36:27 -05:00
Ryan Poplin	00c23bf704	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-05 15:53:05 -05:00
Ryan Poplin	234ff64556	Changes to AssessNA12878 to allow for 100s of input callsets to assess against the database.	2012-12-05 15:52:57 -05:00
Ami Levy-Moonshine	5d78a61f7a	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-05 15:07:12 -05:00
Mark DePristo	d0cab795b7	Got caught in the middle of a bad integration test, that was fixed in independent push. Moved test bam into testdata.	2012-12-05 14:49:22 -05:00
Mark DePristo	465694078e	Major performance improvement to the GATK engine -- The NanoSchedule timing code (in NSRuntimeProfile) was crazy expensive, but never showed up in the profilers. Removed all of the timing code from the NanoScheduler, the NSRuntimeProfile itself, and updated the unit tests. -- For tools that largely pass through data quickly, this change reduces runtimes by as much as 10x. For the RealignerTargetCreator example, the runtime before this commit was 3 hours, and after is 30 minutes (6x improvement). -- Took this opportunity to improve the GATK ProgressMeter. NotifyOfProgress now just keeps track of the maximum position seen, and a separate daemon thread ProgressMeterDaemon periodically wakes up and prints the current progress. This removes all inner loop calls to the GATK timers. -- The history of the bug started here: http://gatkforums.broadinstitute.org/discussion/comment/2402#Comment_2402	2012-12-05 14:49:22 -05:00
Mark DePristo	2b601571e7	Better error handling in NanoScheduler -- The previous nanoscheduler would deadlock in the case where an Error, not an Exception, was thrown. Errors, like out of memory, would cause the whole system to die. This bugfix resolves that issue	2012-12-05 14:49:22 -05:00
Mark DePristo	51dbb562c9	Reduce amount of debugging information from NA12878KnowledgeBaseServer	2012-12-05 14:49:22 -05:00
Mauricio Carneiro	efe256ec09	binary search implementation to find the minimum coverage speeds up the walker from 7 days to 12 minutes on chr20.	2012-12-05 14:45:57 -05:00
Eric Banks	0c925856cb	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-05 02:00:39 -05:00
Eric Banks	ef87b18e09	In retrospect, it wasn't a good idea to have FisherStrand handle reduced reads since they are always on the forward strand. For now, FS ignores reduced reads but I've added a note (and JIRA) to make this work once the RR het compression is enabled (since we will have directionality in reads then).	2012-12-05 02:00:35 -05:00
Mauricio Carneiro	13896356ad	Added bootstrapping and fixed the GLM model of the FMCC	2012-12-05 01:32:19 -05:00
Mauricio Carneiro	30f013aeb0	Added a copy() method for ReadBackedPileups necessary to create new alignment contexts with hard-copies of the pileup.	2012-12-05 01:32:18 -05:00
Mauricio Carneiro	6feda540a4	Better error message for SimpleGATKReports	2012-12-05 01:32:18 -05:00
Eric Banks	726332db79	Disabling the testNoCmdLineHeaderStdout test in UG because it keeps crashing when I run it locally	2012-12-05 00:54:00 -05:00
Randal Moore	8d2d0253a2	introduce a level of indirection for the forum URLs - this new function will allow me a place to morph the URL into something that is supported by Confluence Signed-off-by: Eric Banks <ebanks@broadinstitute.org>	2012-12-03 22:33:02 -05:00
Eric Banks	1af41754e3	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-03 22:01:11 -05:00
Eric Banks	bca860723a	Updating tests to handle bad validation data files (that used the wrong qual score encoding); overrides push from stable.	2012-12-03 22:01:07 -05:00
Eric Banks	387c0defed	don't change md5 here because I am handling it separately from unstable with a better command-line in the test	2012-12-03 21:49:45 -05:00
Eric Banks	ef95757311	Fix MD5 because of a need to fix a busted bam file in our validation directory (it used the wrong quality score encoding...)	2012-12-03 21:46:46 -05:00
Menachem Fromer	472381245a	Allow for more refined control of memory and queues to run with	2012-12-03 17:07:03 -05:00
Eric Banks	67932b357d	Bug fix for RR: don't let the softclip start position be less than 1	2012-12-03 15:59:14 -05:00
Ryan Poplin	d5ed184691	Updating the HC integration test md5s. According to the NA12878 knowledge base this commit cuts down the FP rate by more than 50 percent with no loss in sensitivity.	2012-12-03 15:38:59 -05:00
Ryan Poplin	a47da9bb2f	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-03 14:30:14 -05:00
Ryan Poplin	156d6a5e0b	misc minor bug fixes to GenotypingEngine.	2012-12-03 12:47:35 -05:00

1 2 3 4 5 ...

11274 Commits (bba63a3b0ed94bf4d604d5a7f15e33f0f52fa930) All Branches Search

11274 Commits (bba63a3b0ed94bf4d604d5a7f15e33f0f52fa930)

All Branches