gatk-3.8

Commit Graph

Author	SHA1	Message	Date
carneiro	5e9a8f9cb3	Implemented a new argument (-DQS --defaultQualityScore) that allows GATK to deal with BAM files missing quality scores. If a value is specified, all reads are filled with the default quality score. Appropriate exception is thrown if -DQS is not provided and BAM file doesn't have quality scores for every base. Adding the first version of the techdev pipeline (tdPipeline) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 22:25:08 +00:00
depristo	5d2c2bd280	Just refactoring into utils/baq directory git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4795 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-06 17:43:43 +00:00
depristo	a5b3aac864	Engine-level BAQ calculation now available in the GATK [totally experimental right now]. -baq argument to disable (NONE), to only use the tags in the BAM (USE_TAG_ONLY), use the tag when present but calculate on the fly as necessary (CALCULATE_AS_NECESSARY), and to always recalculate (RECALCULATE_ALWAYS). BAQ.java contains the complete implementation, for those interested. ValidateBAQWalker is a useful QC tool for verifying the BAQ is correct. BAQSamIterator applies BAQ to reads, as needed, in the engine. Let me know if you encounter any problems. Before prime-time, needs a caching implementation of IndexedFastaReader to avoid loading lots of reference data all of the time git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4787 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-04 20:23:06 +00:00
bthomas	374c0deba2	Updating the core LocusWalker tools to include the Sample infrastructure that I added last month. This commit touches a lot of files, but only significantly changes a few: LocusIteratorByState and ReadBackedPileup and associated classes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4711 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-19 19:59:05 +00:00
hanna	8e36a07bea	Convert GenomeLocParser into an instance variable. This change is required for anything that needs to be simultaneously aware of multiple references, eg Queue's interval sharding code, liftover support, distributed GATK etc. GenomeLocParser instances must now be used to create/parse GenomeLocs. GenomeLocParser instances are available in walkers by calling either -getToolkit().getGenomeLocParser() or -refContext.getGenomeLocParser() This is an intermediate change; GenomeLocParser will eventually be merged with the reference, but we're not clear exactly how to do that yet. This will become clearer when contig aliasing is implemented. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4642 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-10 17:59:50 +00:00
hanna	eee134baf2	Chris found a bug in the downsampler where, if the number of reads entering the pileup at the next alignment start is large, we don't add as many of those incoming reads as we should. No integration tests were affected. Thanks, Chris! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4378 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 11:18:12 +00:00
kshakir	edaa278edd	Removed cases where various toolkit functions were accessing GenomeAnalysisEngine.instance. This will allow other programs like Queue to reuse the functionality. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4351 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-25 02:49:30 +00:00
depristo	7880863eb7	Final step in error refactoring. GATK exception is now ReviewedStingException, indicating that this exception is really what one wants. Only use this exception when you have thought about StingException vs. UserException and made a real decision. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4267 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 15:07:38 +00:00
depristo	7ad8fbdd5a	Moved GATKException to exceptions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4266 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:47:19 +00:00
depristo	595907e98e	Moving StingException git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4262 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:34:15 +00:00
depristo	40e6179911	Penultimate step in exception system overhaul. UserError is now UserException. This class should be used for all communication with the USER for problems with their inputs. Engine now validates sequence dictionaries for compatibility, detecting not only lack of overlap but now inconsistent headers (b36 ref with v37 BAM, for example) as well as ref / bam order inconsistency. New -U option to allow users to tolerate dangerous seq dict issues. WalkerTest system now supports testing for exceptions (see email and wiki for docs). Tests for vcf and bam vs. ref incompatibility. Waiting on Tribble seq dict improvements to detect b36 VCF with b37 ref (currently cannot tell this is wrong. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4258 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:02:43 +00:00
depristo	1de713f354	Massive review of maybe 50% of the exceptions in the GATK. GATKException is a tmp. tracker so that I can tell which StingExceptions I've reviewed. Please don't use it. If you are working on new code and are considering throwing exceptions, it's either UserError or StingException, please git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4246 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-09 23:21:17 +00:00
depristo	6a30617a60	Initial implementation of UserError exceptions and error message overhaul. UserErrors and their subclasses UserError.MalFormedBam for example should be used when the GATK detects errors on part of the user. The output for errors is now much clearer and hopefully will reduce GS posts. Please start using UserError and its subclasses in your code. I've replace some, but not all, of the StingExceptions in the GATK with UserError where appropriate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4239 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-09 11:32:20 +00:00
hanna	fb177c4fee	If only dcov is specified, assume that selected downsample type is BY_SAMPLE. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4147 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 17:35:41 +00:00
hanna	de5ccfb0b1	Moved hasPileupBeenDownsampled() based on Eric's request. Also eliminated @Deprecated constructors from AlignmentContext. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4142 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 16:12:05 +00:00
hanna	d773b3264b	Eliminated -mrl option. Eliminated -fmq0 option. Eliminated read group hallucination. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4133 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-26 21:38:03 +00:00
hanna	41d57b7139	Massive cleanup of read filtering. - Eliminate reduncancy of filter application. - Track filter metrics per-shard to facitate per merging. - Flatten counting iterator hierarchy for easier debugging. - Rename Reads class to ReadProperties and track it outside of the Sting iterators. Note: because shards are currently tied so closely to reads and not the merged triplet of <reads,ref,RODs>, the metrics classes are managed by the SAMDataSource when they should be managed by something more general. For now, we're hacking the reads data source to manage the metrics; in the future, something more general should manage the metrics classes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4015 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 20:17:11 +00:00
kshakir	4f51a02dea	Changed logging level to default at INFO instead of WARN. Changes to StingUtils command line for use in Queue, replacing Queue's use of property files. Updates to walkers used in existing QScripts to add @Input/@Output. RMD used in @Required/@Allows now has a new default equal to "any" type. New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions. Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.) Removed dependency on BroadCore by porting LSF job submitter to scala. Ivy now pulls down module dependencies from maven. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 16:42:48 +00:00
depristo	7c42e6994f	FindBugs fixes throughout the code base git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3823 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-18 16:29:59 +00:00
hanna	96034aee0e	Cleanup for Steve Hershman's issue. In the midst of doing this, I discovered that the semantics for which reads are in an extended event pileup are not clear at this point. Eric and I have planned a future clarification for this and the two of us will discuss who will implement this clarification and when it'll happen. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3809 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 18:57:58 +00:00
hanna	cab8394103	The sharding system now buffers reads, with a size determined by command-line argument. Will investigate whether/how this impacts performance on low-pass data and, if it works well, will create a more automatic version of the tool. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3709 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-01 22:28:55 +00:00
hanna	c9d5345150	Redo StratifiedAlignmentContext to use ReadBackedPileup's stratification options. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3699 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-01 02:46:05 +00:00
hanna	3a9d426ca8	Added hasPileupBeenDownsampled() boolean to ReadBackedPileup, so that a pileup can report whether or not (but not how much) it's been downsampled. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3649 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-28 04:56:33 +00:00
hanna	c806ffba5f	Switching over DownsamplingLocusIteratorByState -> LocusIteratorByState. Some operations will not be as fast as they could be because the workflow is currently merge sam records (sharding) -> split sam records (LocusIteratorByState) -> merge records (LocusIteraotorByState) -> split records (StratifiedAlignmentContext), but this will be fixed when StratifiedAlignmentContext is updated to take advantage of the new functionality in ReadBackedPileup. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3599 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-21 02:11:42 +00:00
hanna	1d50fc7087	Misc bug fixes: fix tracking of nInsertions with sample-split pileup constructor. Fix performance issue building up pileups from pileups of individual sample data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3598 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-20 20:32:27 +00:00
hanna	f18ac069e2	A refactoring / unification of ReadBackedPileup and ReadBackedExtendedEventPileup. Provides a cleaner interface with extended events inheriting all of the basic RBP functionality. Implementation is still slightly messy, but should allow users to provide separate implementations of methods for sample split pileups and unsplit pileups for efficiency's sake. Methods not covered by unit/integration tests have not been sufficiently tested yet. Unit tests will follow this week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3597 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-20 04:42:26 +00:00
hanna	5050b19457	We're unable to make the naive deduper more worldly, so we're killing it instead. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3587 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-18 13:54:27 +00:00
hanna	612c3fdd9d	First pass at eliminating the old sharding system. Classes required for the original sharding system are gone where I could identify them, but hierarchies that split to support two sharding systems have not yet been taken apart. @Eric: ~4k lines. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3580 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-17 20:17:31 +00:00
hanna	c1595a383a	More bugfixes for cases where no sample name is present. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3578 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-17 16:46:02 +00:00
hanna	5972ad1199	Fixes to mrl integration. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3573 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-16 20:40:10 +00:00
hanna	e77f76f8e1	Reenabled downsampling by sample after basic sanity testing and fixes of the new implementation. Hard testing and performance enhancements are still pending. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3566 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-16 17:23:27 +00:00
hanna	c3b68cc58d	Rethinking DownsamplingLocusIteratorByState with a flattened read structure. Samples are kept independent while processing, and only merged back in a priority queue if necessary in a special variant of the ReadBackedPileup. This code is not live yet except in the case of naive deduping. Downsampling by sample temporarily disabled, and the ReadBackedPileup variant is sketchy and not well integrated with StratifiedAlignmentContext or the walkers. Cleanup to follow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3540 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-13 01:47:02 +00:00
hanna	f55f32d4ee	Bug fix. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3526 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-10 01:53:26 +00:00
hanna	dbee21a50f	Bugfixes for the case when no read groups / no samples are available. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3523 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 18:47:05 +00:00
hanna	84563b37e5	Partial flattening of the hanger data structure. Hanger data structure is not currently as flat as it could / should be, but it's already comparable to the speed of the reference implementation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3512 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 16:28:49 +00:00
hanna	c2858c8988	Minor performance enhancement. Checkpoint commit before major performance overhaul. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3504 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-08 21:39:39 +00:00
hanna	199e4208cd	Bug fixes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3497 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-08 00:30:33 +00:00
hanna	52ab9f2417	Feature parity between LocusIteratorByState, DownsamplingLocusIteratorByState, including pushing mrl / the LocusOverflowTracker into LocusIteratorByState. Note that the 'Matt Hanna exception', is still enabled because I haven't yet validated the performance of the DownsamplingLocusIteratorByState when running without downsampling. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3496 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-07 22:58:21 +00:00
hanna	5c4d070566	Push Mark's changes in LocusIteratorByState into DownsamplingLocusIteratorByState in preparation for merging the two into one. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3495 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-07 17:29:30 +00:00
depristo	e2b41082af	GATK now does automatic adaptor filtering in locus iterators (but not expt. downsampling iterator). General support for LocusIteratorFilters just like read filters but only applying at particular bases. Updated tools with new MD5 sums due to adaptor bases in their integrationtest data. Not that as a side effect here reads close to each other with odd orientations are also filtered out. Updated minor argument to VariantRecalibrator to change the qStep value on the command line git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3481 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-02 22:26:32 +00:00
depristo	2b02324587	Support for detecting and automatically excluding reads reading into the adaptor sequence and, if desired, also only showing the first pair when two reads overlap in the fragment. Not enabled, an intermediate check in before updating and verifying the impact on locus walkers everywhere. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3465 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-30 18:00:12 +00:00
hanna	b10950c691	Simple performance optimization -- cache the number of reads in the locus hanger. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3417 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-21 19:26:16 +00:00
hanna	388dd8d64d	Fixing bugs in downsampler introduced when I added Ryan's dup eliminator. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3407 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-21 02:53:12 +00:00
hanna	017ab6b690	Experimental versions of downsampler and Ryan's deduper are now available either as walker attributes or from the command-line. Not ready yet! Downsampling/deduping works in a general sense, but this approach has not been completely optimized or validated. Use with caution. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3392 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 05:40:05 +00:00
hanna	0791beab8f	Checking in downsampling iterator alongside LocusIteratorByState, and removing the reference implementation. Also implemented a heap size monitor that can be used to programmatically report the current heap size. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3367 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-17 21:00:44 +00:00
aaron	2c55ac1374	fixes for parallel processing problems with Tribble, a small bug in the resource pool, and some more documentation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3349 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-12 06:13:26 +00:00
hanna	6868ce988f	Fix hanging bug reported by Susanne Pfeifer (tiffy @ get satisfaction) where, if the last read(s) in a shard all have an indel in roughly the same location and that indel isn't covered by any other reads, LocusIteratorByState goes into an infinite loop. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3348 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-11 17:31:19 +00:00
hanna	76efa757f0	Switched over to reviewed version of Picard patch. In process, did some optimization to the IntervalSharder which improved startup time 5-10x when dynamically merging many BAMs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3331 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-08 14:12:22 +00:00
hanna	8bb15ef812	Checking in the reference implementation of the downsampler for back comparison. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3278 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-30 15:41:13 +00:00
hanna	9e107513d0	In the new sharding system, if no read group is present, hallucinate one. Added for test compatibility, but not sure whether we still need this feature. TODO: Poll the group about this feature. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2949 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-07 23:01:34 +00:00

1 2 3 4

162 Commits (73acfa654a3eb3d7e21e91945739fb8d1ab0452e)