gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Yossi Farjoun	6ed9eb3da9	GATKBAMIndex now passes unit test! Problem was that SeekableBufferedStream seems to have a bug: it will read beyond the end of a file if asked to.	2012-12-18 17:32:26 -05:00
eitanbanks	002ce9c1d5	Merge pull request #8 from yfarjoun/master Huge speedup in initial traversal of BAM index files (x20 speed!)	2012-12-18 10:16:53 -08:00
Eric Banks	18728ec5bd	Updates to the bundle script: 1. Add the symbolic 'current' link for the new bundle dir 2. Don't gzip and copy .out files 3. Don't call chr20 SNPs on the example BAM because it's now just a few reads on chr1	2012-12-18 11:16:42 -05:00
Mark DePristo	16eb1c5436	Optimization to TraverseReadsNano -- Don't just read all inputs into a list, and then provide an iterator to that list, actually make a real iterator so NanoScheduler input thread can contribute meaningfully to the work load -- Use NanoScheduler progress function, instead of home-grown updater	2012-12-18 10:14:47 -05:00
Mark DePristo	b33f804cdc	Inline increment function in RecalDatum to avoid minor duplication of work and multiple synchronized method calls	2012-12-17 16:47:27 -05:00
Mark DePristo	66d32f646b	Minor cleanup of BAQ calculation (final variables, etc)	2012-12-17 16:47:27 -05:00
Mark DePristo	67fe81391c	ProgressMeter optimization: don't do genome loc formatting, but instead create an object that only formats when printing is actually needed	2012-12-17 16:47:27 -05:00
Mark DePristo	1de2f527b9	Optimization of recalibrateRead -- Refactor calculation so that upfront constant values are pre-computed, and cached, and their values just looked up during application -- Trivial comment on how we might use BAQ better in BaseRecalibrator	2012-12-17 16:47:27 -05:00
Mark DePristo	bd6cda7542	Trivial optimization of TraverseReadsNano -- don't format the shard toString if logger isn't debug enabled	2012-12-17 16:47:27 -05:00
Mark DePristo	a481d006f0	Optimizations for applying BQSR table with PrintReads -- Cleaned up code in updateDataForRead so that constant values where not computed in inner loops -- BaseRecalibrator doesn't create it's own fasta index reader, it just piggy backs on the GATK one -- ReadCovariates <init> now uses a thread local cache for it's int[][][] keys member variable. This stops us from recreating an expensive array over and over. In order to make this really work had to update recordValues in ContextCovariate so it writes 0s over base values its skipping because of low quality base clipping. Previously the values in the ReadCovariates keys were 0 because they were never modified by ContextCovariates. Now these values are actually zero'd out explicitly by the covariates.	2012-12-17 16:47:27 -05:00
Mark DePristo	5ec25797b3	Optimizations for BaseRecalibrator -- No longer computes at each update the overall read group table. Now computes this derived table only at the end of the computation, using the ByQual table as input. Reduces BQSR runtime by 1/3 in my test	2012-12-17 16:47:27 -05:00
Eric Banks	e6f468b647	Refactored the quasi-useful IndelType annotation into the more useful VariantType. The indels are still annotated as before, but now all other variant types are annotated too. I'm doing this because of requests on the forum but am not making it standard. If we find it to be useful we can turn it on by default later.	2012-12-17 11:54:47 -05:00
Ryan Poplin	6171419e6c	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-17 11:27:52 -05:00
Ryan Poplin	98f18b5f9e	Changing the HC over to using the non-contamination-downsampled read maps for the purposes of annotations. This behavior now matches the UG. There is a new command line option to go back to the older behavior to explore the differences.	2012-12-17 11:27:44 -05:00
depristo	40b3941678	Merge pull request #7 from jsilter/master Add removeCall to NA12878KnowledgeBase	2012-12-17 08:05:55 -08:00
Eric Banks	762f184262	Bug fix for strict validation: rsID checking wasn't working if there were multiple IDs	2012-12-17 10:32:41 -05:00
Eric Banks	1d18ee26cc	Merge remote-tracking branch 'unstable/master'	2012-12-17 09:03:03 -05:00
Yossi Farjoun	ea704d688f	chose smaller buffer size for the bufferedStream	2012-12-15 13:01:38 -05:00
Yossi Farjoun	6da2338ea7	removed comments and uneeded imports	2012-12-15 12:31:37 -05:00
Yossi Farjoun	19dd2d628a	some changes. some changes.	2012-12-14 17:21:32 -05:00
Jacob Silterra	8198414352	Add removeCall, which removes all calls matching the provided MongoVariantContext. _id and Date fields are excluded. Generally intended te remove only a single call	2012-12-14 14:54:30 -05:00
Mauricio Carneiro	5f1afb4136	Fixing an off-by-one clipping error in ReduceReads for reads off the contig Reads that are soft-clipped off the contig (before the beginning of the contig) were being soft-clipped to position 0 instead of 1 because of an off-by-one issue. Fixed and included in the integration test.	2012-12-13 22:10:11 -05:00
Mauricio Carneiro	74344a3871	Bringing in the changes from the CMI repo	2012-12-13 21:59:37 -05:00
Eric Banks	696bf95fba	Fix for PBT bug reported on the forum: the AD is actually output correctly now (rather than with 'null' or some gibberish memory pointer).	2012-12-13 23:28:30 +00:00
Mark DePristo	aeab932c63	Actual working version of unflushing VCFWriter -- Uses high-performance local writer backed by byte array that writes the entire VCF line in some write operation to the underlying output stream. -- Fixes problems with indexing of unflushed writes while still allowing efficient block zipping -- Same (or better) IO performance as previous implementation -- IndexingVariantContextWriter now properly closes the underlying output stream when it's closed -- Updated compressed VCF output file	2012-12-13 16:15:08 -05:00
Guillermo del Angel	97e880654b	Bug fix in setting output file names	2012-12-13 15:08:22 -05:00
Guillermo del Angel	cf44ec1ace	Experiment: don't compress picard BAM outputs	2012-12-12 17:45:55 -05:00
Guillermo del Angel	021929788c	Don't compress intermediate BAMs, will save time	2012-12-12 17:40:25 -05:00
Yossi Farjoun	5e66109268	Replaced a useless getInt with a skipInt to remove 1/4 of the initial seek time in the BAM Index.	2012-12-12 17:08:11 -05:00
Eric Banks	62eaffdf0a	Fix docs for ReadBackedPhasing	2012-12-12 20:28:04 +00:00
Eric Banks	bba63a3b0e	Fix for GSA-615: UnifiedGenotyperEngine.getGLModelsToUse takes 5% of the runtime of UG, should be optimized away.	2012-12-12 20:25:45 +00:00
Ryan Poplin	211a6e78ea	Further related bug fixes to GGA mode in the HC: some variants (especially MNPs) were causing problems because they don't have to start at the current location to match the allele being genotyped. Fixed.	2012-12-12 14:53:02 -05:00
Mauricio Carneiro	33290bfe0c	Added integration test to catch the read off contig in ReduceReads. So upstream changes won't break it again.	2012-12-12 13:49:54 -05:00
Mauricio Carneiro	a52e3c7e15	Revert "Bug fix for RR: don't let the softclip start position be less than 1" this introduced a bug in reduce reads by de-activating it's hard clipping of the out of bounds soft-clips (specially in the MT). DEV-322 #resolve #time 4m This reverts commit 42acfd9d0bccfc0411944c342a5b889f5feae736.	2012-12-12 13:09:39 -05:00
Mark DePristo	5632c13bf2	Resolves GSA-681 / Compressed VCF.gz output is too big because of unnecessary call to flush(). -- Now compressed output VCFs are properly blocked compressed (i.e., they are actually smaller than the uncompressed VCF)	2012-12-12 10:27:07 -05:00
Guillermo del Angel	216f92276c	Disable scatter-gather with PrintReads since we're already setting high nct so it's unnecessary	2012-12-12 09:06:37 -05:00
Kristian Cibulskis	0e5b1093fb	initial implementation of contamination estimation, tested on single gene (which doesn't have enough data) waiting to test on exome/chr20	2012-12-11 15:59:59 -05:00
Mark DePristo	dd52a70d45	Fix AFCalcResult unit test -- I was simply passing in the wrong values into the function. Fixed the calls, and expanded the docs on what needs to be passed in.	2012-12-11 10:40:12 -05:00
Ami Levy-Moonshine	6bf31065e3	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-11 10:34:50 -05:00
Ami Levy-Moonshine	2f99569dda	change the md5 in one of the CV intergration tests, since it wasn't use the priority list when printing the origin of the annotation (the setValue field)	2012-12-10 22:48:15 -05:00
Ami Levy-Moonshine	2e3284f306	Continue to fix the case where PRIORITIZE is used but no priority list is given. While fixing that case I also removed unnecessary sorting, when the prioeity list is not provied. When the priority list is not provided, it will continue to be null. Thus, the number of original Variant Contexts should be given as a new parameter to simpleMerge (since priority might be null). This new parameter is used for checking if there are filtered VC, when annotationOrigin is true.	2012-12-10 22:23:58 -05:00
Mauricio Carneiro	19372225af	Merge Broad's GATK and CMI gatk	2012-12-10 15:33:38 -05:00
Mauricio Carneiro	8a115edbaf	ReduceReads is now scattered by contig It's no longer safe to scatter/gather by interval because now we don't hard-clip to the intervals anymore.	2012-12-10 15:25:27 -05:00
Eric Banks	bdda63d973	Related bug fixes to GGA mode in the HC: some variants (especially MNPs) were causing problems because they don't have to start at the current location to match the allele being genotyped. Fixed.	2012-12-10 14:47:04 -05:00
Ryan Poplin	ceb5431dcb	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-10 12:24:47 -05:00
Ryan Poplin	c84ff9d75e	Adding explicit true negative assessment category to the AssessNA12878 walker.	2012-12-10 12:24:43 -05:00
Ami Levy-Moonshine	573ace4403	restore the right version of VariantContextUtils.java in my unstable dir	2012-12-10 10:28:56 -05:00
David Roazen	46edab6d6a	Use the new downsampling implementation by default -Switch back to the old implementation, if needed, with --use_legacy_downsampler -LocusIteratorByStateExperimental becomes the new LocusIteratorByState, and the original LocusIteratorByState becomes LegacyLocusIteratorByState -Similarly, the ExperimentalReadShardBalancer becomes the new ReadShardBalancer, with the old one renamed to LegacyReadShardBalancer -Performance improvements: locus traversals used to be 20% slower in the new downsampling implementation, now they are roughly the same speed. -Tests show a very high level of concordance with UG calls from the previous implementation, with some new calls and edge cases that still require more examination. -With the new implementation, can now use -dcov with ReadWalkers to set a limit on the max # of reads per alignment start position per sample. Appropriate value for ReadWalker dcov may be in the single digits for some tools, but this too requires more investigation.	2012-12-10 09:44:50 -05:00
Ami Levy-Moonshine	5460c96137	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-09 23:43:57 -05:00
Ami Levy-Moonshine	3a420d163e	(1) changes in catVariants (work still under development) (2) changes to CV to throw an error when GenotypeMergeType is PRIORITIZE but no priority (rod_priority_list) is not given. Reported by TechnicalVault on the forum on Nov 14 2012	2012-12-09 23:40:03 -05:00

1 2 3 4 5 ...

11319 Commits (6ed9eb3da9eed02e54dd893a4f4fb60b4caa514b) All Branches Search

11319 Commits (6ed9eb3da9eed02e54dd893a4f4fb60b4caa514b)

All Branches