gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	cf91d894e4	Fix build problems with tests	2012-08-31 13:42:41 -04:00
Mark DePristo	817ece37a2	General infrastructure for ReadTransformers -- These are like read filters but can be applied either on input, on output, of handled by the walker -- Previous example of BAQ now uses the general framework -- Resulted in massive conceptual cleanup of SAMDataSource and ReadProperties! Yeah! -- BQSR now uses this framework. We can now do BQSR on input, on output, or within a walker -- PrintReads now handles all read transformers in the walker in map, enabling us to parallelize PrintReads with BAQ and BQSR -- Currently BQSR is excepting in parallel, which subsequent commit with fix -- Removed global variable setting in GenomeAnalysisEngine for BAQ, as command line parameters are cleanly handled by ReadTransformer infrastructure -- In principle ReadFilters are just a special kind of ReadTransformer, but this refactoring is larger than I can do. It's a JIRA entry -- Many files touched simply due to the refactoring and renaming of classes	2012-08-31 13:42:41 -04:00
Christopher Hartl	143fbead03	Adding an experimental format field annotation that calculates the per-sample residual dosage after accounting for LD. It's meant to be run in a single pass over a chromosome, for instance. Currently it does not work due to a bug in the variant annotator engine, see GSA-532. When that's fixed it'll likely reveal broken code.	2012-08-31 04:04:00 -04:00
Eric Banks	ac0c44720b	I started to put together a set of unit tests for the PileupElement creation functionality of LocusIteratorByState and found pretty quickly that it's definitely still busted for indels. The data provider is nowhere near comprehensive yet, but I need to sit back and think about how to really test some of the functionality of LIBS. Committing what I have for now because at the very least it'll be helpful going forward (failing tests are commented out with TODO).	2012-08-30 22:49:13 -04:00
Mark DePristo	39400c56a9	Update md5s for VQSR, as VQSLOD is now a double and gets the standard double precision treatment in VCF	2012-08-30 19:41:49 -04:00
Mark DePristo	2f749b5e52	Added ThreadSafeMapReduce interface, super of TreeReducible -- A higher level interface to declare parallelism capability of a walker. This interface means that the walker can be multi-threaded, but doesn't necessarily support TreeReducible interface, which forces you to have a combine ReduceType operation that isn't appropriate for parallel read walkers -- Updated ReadWalkers to implement ThreadSafeMapReduce not TreeReducible	2012-08-30 19:41:49 -04:00
Mark DePristo	544740d45d	tasking for n threads should give you n threads in NanoScheduler, not n - 1	2012-08-30 19:41:49 -04:00
Mark DePristo	1212dfd2ef	Reduce the number of test combinations in ReadBasedREferenceOrderedView	2012-08-30 19:41:49 -04:00
Mark DePristo	7a462399ce	Fix GSA-529: Fix RODs for parallel read walkers -- TraverseReadsNano modified to read in all input data before invoking maps, so the input to TraverseReadsNano is a MapData object holding the sam record, the ref context, and the refmetadatatracker. -- Update ValidateRODForReads to be tree reducible, using synchronized map and explicitly sort the output map from locations -> counts in onTraversalDone -- Expanded integration tests to test nt 1, 2, 4.	2012-08-30 19:41:49 -04:00
Mark DePristo	7d95176539	Bugfix to compareTo and equals in GenomeLoc -- Yes, GenomeLoc.compareTo was broken. The compareTo function only considered the contig and start position, but not the stop, when comparing genome locs. -- Updated GenomeLoc.compareTo function to account for stop. Updated GATK code where necessary to fix resulting problems that depended on this. -- Added unit tests to ensure that hashcode, equals, and compareTo are all correct for GenomeLocs	2012-08-30 19:41:49 -04:00
Mark DePristo	5a9610d875	ReadShards now default to 10K (up from 1K) reads per samFile up to 250K -- This should help make the inputs for parallel read walkers a little meater, and avoid spinning the shard creation infrastructure so often	2012-08-30 19:41:49 -04:00
Christopher Hartl	5a142fe265	After dicussion with Ryan/Eric, the Structural_Indel variant type is now gone, and has been entirely replaced with the access pattern .isStructuralIndel(). This makes it a strict subtype of indel. I agree that this method is a bit more sensible. In addition, fix for GSA-310. If supplied -rf argument does not match a known read filter, the list of read filters will be printed, and users directed to the documentation for more information.	2012-08-30 17:57:31 -04:00
Mark DePristo	82b2845b9f	Fix: GSA-531 ApplyRecalibration writing to BCF: java.lang.String cannot be cast to java.lang.Double -- LOD must be added a double to attributes, not as string, so that it can be written out as BCF	2012-08-30 16:59:57 -04:00
Mark DePristo	76853806b0	Print out the time when downloads finished from S3	2012-08-30 10:15:11 -04:00
Mark DePristo	21dd70ed36	Test to ensure that ReadBasedReferenceOrderedView produces stateless objects -- Stateless objects are required for nano-scheduling. This means you can take the RefMetaDataTracker provided by ReadBasedReferenceOrderedView, store it way, get another from the same view, and the original one behaves the same.	2012-08-30 10:15:11 -04:00
Mark DePristo	ce3d1f89ea	ReadShard are no longer allowed to span multiple contigs -- Previous behavior was unnecessary and causes all sorts of problems with RODs for reads. The old implementation simply failed in this case. The new code handles this correctly by forcing shards to have all of their data on a single contig. -- Added a PrintReads integration test to ensure this behavior is correct -- Adding test BAMs that have < 200 reads and span across contig boundaries	2012-08-30 10:15:11 -04:00
Mark DePristo	a3f443c1cc	Part 4 of GSA-462: Consistent RODBinding access across Ref and Read trackers -- Restoring ValidateRODForReads, which passes without integration test changes	2012-08-30 10:15:11 -04:00
Mark DePristo	53376b9423	Part III of GSA-462: Consistent RODBinding access across Ref and Read trackers -- shardSpan is only calculated when there some ROD is live in the GATK. No sense in paying the cost per read when you don't need it -- Update contract to allow null span or unmapped span (good catch unittests!)	2012-08-30 10:15:10 -04:00
Mark DePristo	1200848bbf	Part II of GSA-462: Consistent RODBinding access across Ref and Read trackers -- Deleted ReadMetaDataTracker -- Added function to ReadShard to give us the span from the left most position of the reads in the shard to the right most, which is needed for the new view	2012-08-30 10:15:10 -04:00
Mark DePristo	972be8b4a4	Part I of GSA-462: Consistent RODBinding access across Ref and Read trackers -- ReadMetaDataTracker is dead! Long live the RefMetaDataTracker. Read walkers will soon just take RefMetaDataTracker objects. In this commit they take a class that trivially extends them -- Rewrote ReadBasedReferenceOrderedView to produce RefMetaDataTrackers not the old class. -- This new implementation produces thread-safe objects (i.e., holds no points to shared state). Suitable for use (to be tested) with nano scheduling -- Simplified interfaces to use the simplest data structures (PeekableIterator) not the LocusAwareSeekableIterator, since I both hate those classes and this is on the long term trajectory to remove those from the GATK entirely. -- Massively expanded DataProvider unit tests for ReadBasedReferenceOrderedView -- Note that the old implementation of offset -> ROD in ReadRefMetaDataTracker was broken for any read not completely matching the reference. Rather than provide broken code the ReadMetaDataTracker only provides a "bag of RODs" interface. If you want to work with the relationship between the read and the RODs in your tool you need to manage the CIGAR element itself. -- This commit breaks the new read walker BQSR, but Ryan knows this is coming -- Subsequent commit will be retiring / fixing ValidateRODForReads	2012-08-30 10:15:10 -04:00
Mark DePristo	8fc6a0a68b	Cleanup RefMetaDataTracker before refactoring ReadMetaDataTracker	2012-08-30 10:13:06 -04:00
Ryan Poplin	b85ded8389	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-30 10:11:48 -04:00
Ryan Poplin	650ff29e62	oops, I meant to use binary OR here.	2012-08-30 10:11:29 -04:00
Ryan Poplin	57d997f06f	Fixing bug from when FragmentUtils merging function moved over to the soft clipped start instead of the unclipped start	2012-08-30 10:10:43 -04:00
Ryan Poplin	f9bab37015	Merged bug fix from Stable into Unstable	2012-08-30 09:21:24 -04:00
Ryan Poplin	eb63221875	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2012-08-30 09:19:35 -04:00
Ryan Poplin	81d5eca975	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-30 09:10:56 -04:00
Ryan Poplin	35baf0b155	This along with Mauricio's previous commit (thanks!) fixes GSA-522. There are no longer any modifications to reads in the map calls of ActiveRegion walkers. Added the bam which identified this error as a new integration test.	2012-08-30 09:07:36 -04:00
Eric Banks	1acf0f0b2c	Fixing bug in fasta .fai generation: trim the contig names to the first whitespace if one appears. We now generate indexes identical to samtools.	2012-08-29 22:36:27 -04:00
Eric Banks	4d38befe86	Merged bug fix from Stable into Unstable	2012-08-29 15:13:56 -04:00
Eric Banks	150a969279	Be careful with String manipulation when constructing alleles in SomaticIndelDetector	2012-08-29 15:13:28 -04:00
Eric Banks	ce55ba98f4	Don't try to left align indels in unmapped reads (which for some reason can still have CIGARs) because the ref context is null.	2012-08-29 15:01:11 -04:00
Ryan Poplin	4ea38bbfe8	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-29 11:39:30 -04:00
Ryan Poplin	b132a33dac	Delocalized BQSR now BAQs the reads on the fly instead of on input.	2012-08-29 11:39:07 -04:00
Mauricio Carneiro	69b56e11c8	ReadClipper won't modify the original read Reverting back to the original implementation, but now including write N's and write Q0's due to walkers that look at the same read multiple times in different reference windows	2012-08-29 11:33:19 -04:00
Ryan Poplin	e12ae65d33	Changing the commenting style in the BQSR	2012-08-29 11:27:45 -04:00
Mark DePristo	19cc0b373e	Some code review comments for Ryan	2012-08-28 17:06:08 -04:00
Khalid Shakir	f45226f01e	Updated HSPTest expected values to match FS changes in depristo's 3baf52 commit.	2012-08-28 16:34:55 -04:00
Ryan Poplin	6d6ca090c6	RecalDatums now hold doubles so the test for equality needs an epsilon.	2012-08-28 16:00:52 -04:00
Ryan Poplin	18eca3544e	Initial commit of the delocalized BQSR written as a read walker.	2012-08-28 15:24:20 -04:00
Eric Banks	e74c527d47	Register the depricated walkers as depricated starting in v2.2 so that users get a helpful error message	2012-08-28 10:19:18 -04:00
Eric Banks	67d348a31d	Retiring the alignment walkers and related integration test since we don't want to support them anymore.	2012-08-28 10:16:49 -04:00
Mark DePristo	0f4acaae1b	Update MD5s with new FS score	2012-08-28 08:06:47 -04:00
Mark DePristo	4b8d9c3915	Actually load the library necessary to compactPDF -- Old version was buggy in that if you didn't load "tools" package in your script it wouldn't compact the resulting PDF! Fixed	2012-08-28 08:06:47 -04:00
Mark DePristo	2996693c9f	FisherStrand now computed with and without filtering low-qual bases, and least significant pvalue is kept -- Old way (filtering for Q > 17 bases) resulted in biased FS when the site was good but there was a systematic shift in the QUAL of REF and ALT between strands of the reads (sometimes happens) -- New way (taking all bases) was consistent with BaseQualRankSum and other tests, but there can be a lot of low qual reference bases on one strand in some techs (ION/PROTON/PACBIO) because of the preference for introducing an indel vs. a mismatch. -- This implementation allows us to have our cake and eat it to by computing both p-values, and taking the maximum one (i.e., least significant). -- No integration tests updated yet -- still exploring the consequences of this change	2012-08-28 08:06:47 -04:00
Eric Banks	bedcdbdc5f	Fixing merge conflict	2012-08-27 12:16:51 -04:00
Eric Banks	3d476487c6	LIBS is totally busted for deletions. Putting a check in AD for bad pileup event bases so that we don't produce busted alleles. We must fix LIBS ASAP.	2012-08-27 12:13:12 -04:00
Mark DePristo	63a9ae817a	Ensure thread-safety of CachingIndexedFastaSequenceFile -- Cosmetic cleanup of ReadReferenceView -- TraverseReadsNano provides the reference context, since it's thread-safe -- Cleanup CachingIndexedFastaSequenceFile. Add docs, remove unnecessary setters -- Expand CachingIndexedFastaSequenceFileUnitTest to test explicitly multi-threaded safety.	2012-08-27 12:11:54 -04:00
Mark DePristo	e5b1f1c7f4	Add simple main function to unit test so we can run the nano scheduler test from the command line	2012-08-27 12:11:54 -04:00
Khalid Shakir	2d1ea7124b	One less Queue command line requirement: -tempDir now defaults to .queue/tmp. Also moved queueScatterGather to .queue/scatterGather.	2012-08-27 12:04:50 -04:00

1 2 3 4 5 ...

10437 Commits (cf91d894e4c17d9a7af17abc1bdadecf3443e5bf) All Branches Search

10437 Commits (cf91d894e4c17d9a7af17abc1bdadecf3443e5bf)

All Branches