gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	c1ac0e2760	BCF2 cleanup -- allowMissingVCFHeaders is now part of -U argument. If you want specifically unsafe VCF processing you need -U LENIENT_VCF_PROCESSING. Updated lots of files to use this -- LENIENT_VCF_PROCESSING disables on the fly VCF header cleanup. This is now implemented via a member variable, not a class variable, which I believe was changing the GATK behavior during integration tests, causing some files to fail that pass when run as a single test because the header reading behavior was changing depending on previous failures.	2012-06-26 15:28:33 -04:00
Eric Banks	62cee2fb5b	Feature request from Tim that could be useful to all: there's now an --interval_padding argument that specifies how many basepairs to add to each of the intervals provided with -L (on both ends). This is particularly useful when trying to run over the exome plus flanks and don't want to have to pre-compute the flanks (just use e.g. --interval_padding 50). Added integration test to cover this feature.	2012-06-18 21:36:27 -04:00
Mark DePristo	43ad890fcc	Finalizing BCF2 v2 -- FastGenotypes are the default in the engine. Use --useSlowGenotypes engine argument to return to old representation -- Cleanup of BCF2Codec. Good error handling. Added contracts and docs. -- Added a few more contacts and docs to BCF2Decoder -- Optimized encodePrimitive in BCF2Encoder -- Removed genotype filter field exceptions -- Docs and cleanup of BCF2GenotypeFieldDecoders -- Deleted unused BCF2TestWalker -- Docs and cleanup of BCF2Types -- Faster version of decodeInts in VCFCodec -- BCF2Writer -- Support for writing a sites only file -- Lots of TODOs for future optimizations -- Removed lack of filter field support -- No longer uses the alleleMap from VCFWriter, which was a Allele -> String, now uses Allele -> Integer which is faster and more natural -- Lots of docs and contracts -- Docs for GenotypeBuilder. More filter creation routines (unfiltered, for example) -- More extensive tests in VariantContextTestProfiler, including variable length strings in genotypes and genotype filters. Better genotype comparisons	2012-06-14 16:42:29 -04:00
Mark DePristo	d37a8a0bc8	Efficient Genotype object Intermediate commit -- Created a new Genotype interface with a more limited set of operations -- Old genotype object is now SlowGenotype. New genotype object is FastGenotype. They can be used interchangable -- There's no way to create Genotypes directly any longer. You have to use GenotypeBuilder just like VariantContextBuilder -- Modified lots and lots of code to use GenotypeBuilder -- Added a temporary hidden argument to engine to use FastGenotype by default. Current default is SlowGenotype -- Lots of bug fixes to BCF2 codec and encoder. -- Feature additions -- Now properly handles BCF2 -> BCF2 without decoding or encoding from scratch the BCF2 genotype bytes -- Cleaned up semantics of subContextFromSamples. There's one function that either rederives or not the alleles from the subsetted genotypes -- MASSIVE BUGFIX in SelectVariants. The code has been decoding genotypes always, even if you were not subsetting down samples. Fixed!	2012-06-14 16:42:24 -04:00
Mark DePristo	6ca71fe3b4	GATK tests use public/testdata not /humgen/ as much as possible	2012-05-24 10:58:58 -04:00
Mauricio Carneiro	87e6bea6c1	Adding engine capability to quantize qualities. * Added parameter -qq to quantize qualities using a recalibration report * Added options to quantize using the recalibration report quantization levels, new nLevels and no quantization. * Updated BQSR scripts to make use of the new parameters	2012-04-08 21:07:51 -04:00
Eric Banks	99d27ddcc4	Had some free time, so I unplugged extended events from the walkers. Now they exist only in LocusIteratorByState, but ReadProperties.generateExtendedEvents() always returns false so that block is never actually executed anymore. I don't want to touch LIBS because I think David is in there right now.	2012-04-02 14:27:36 -04:00
Eric Banks	5e79046c98	Minor change but I realized from Mark's commit that the code I stole it from was flawed	2012-03-20 08:55:56 -04:00
Ryan Poplin	41ffd08d53	On the fly base quality score recalibration now happens up front in a SAMIterator on input instead of in a lazy-loading fashion if the BQSR table is provided as an engine argument. On the fly recalibration is now completely hooked up and live.	2012-02-13 12:35:09 -05:00
Ryan Poplin	b7ffd144e8	Cleaning up the covariate classes and removing unused code from the bqsr optimizations in 2009.	2012-02-06 08:54:42 -05:00
Ryan Poplin	5343f8ba67	Initial version of on-the-fly, lazy loading base quality score recalibration. It isn't completely hooked up yet but I'm committing so Mauricio and Mark can see how I envision it will fit together. Look it over and give any feedback. With the exception of the Solid specific code we are very very close to being able to remove TableRecalibrationWalker from the code base and just replace it with PrintReads -BQSR recal.csv	2012-02-05 13:09:03 -05:00
Ryan Poplin	ace9333068	Active region walkers can now see the reads in a buffer around thier active reigons. This buffer size is specified as a walker annotation. Intervals are internally extended by this buffer size so that the extra reads make their way through the traversal engine but the walker author only needs to see the original interval. Also, several corner case bug fixes in active region traversal.	2012-01-19 22:05:08 -05:00
Ryan Poplin	a6886a4cc0	Initial commit of the Active Region Traversal. Not ready to be used by anyone yet.	2012-01-04 17:03:21 -05:00
Eric Banks	8762313a0d	Better TODO message	2011-12-22 20:54:35 -05:00
Eric Banks	6d260ec6ae	Start printing traversal stats after 30 seconds. I can't stand waiting 2 minutes.	2011-12-22 15:40:59 -05:00
Matt Hanna	15533e08df	Fixed issue with RODWalker parallelization. Turns out that someone previously upped the declared size of a ROD shard to 100M bases, making each ROD shard larger than the size of chr20. Why didn't we see this in Stable? Because the ShardStrategy/ShardStrategyFactory mechanism was dutifully ignoring the shard size specification. When I rolled the ShardStrategy/ShardStrategyFactory mechanics back into the DataSources as part of the async I/O project, I inadvertently reenabled this specifier.	2011-12-07 11:55:42 -05:00
Matt Hanna	b65db6a854	First draft of a test script for I/O performance with the new asynchronous I/O processing. Also includes convenience parameters for specifying the IO/CPU threading balance outside of a tag. Will be killed when Queue gets better support for tagged arguments (hopefully soon).	2011-11-30 13:13:16 -05:00
Matt Hanna	8bb4d4dca3	First pass of the asynchronous block loader. Block loads are only triggered on queue empty at this point. Disabled by default (enable with nt:io=?).	2011-11-18 15:02:59 -05:00
Eric Banks	b66556f4a0	Update error message so that it's clear ReadPair Walkers are exceptions	2011-11-15 09:22:57 -05:00
Eric Banks	0ca7428e76	Allow processing of empty intervals, but warn user when this case is encountered.	2011-10-28 12:12:14 -04:00
Eric Banks	6ba08a103d	Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory.	2011-10-28 09:23:25 -04:00
Eric Banks	ccfd853b34	Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed.	2011-10-27 20:43:50 -04:00
Eric Banks	2f21b6ecfb	Removed debugging output	2011-10-26 15:50:20 -04:00
Eric Banks	b39fcb1bea	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-26 15:44:25 -04:00
Eric Banks	9424e8b2ca	Initial working version of new interval system in which the argument for -L (and -XL) is allowed to be a rod file (e.g. VCF). Old samtools-style intervals still behave as before. BTI is no longer supported. The merging (union or intersection) of intervals is now consistently applied to all -L (or -XL) intervals, which is nice. More testing needed.	2011-10-26 14:11:49 -04:00
Khalid Shakir	89a581a66f	Added ability to specify arguments in files via -args/--arg_file Pushing back downsample and read filter args so they show up in getApproximateCommandLineArgs()	2011-10-24 15:58:34 -04:00
Mark DePristo	c7864c7256	Filter application order is now deterministic, in the order defined by the walker -- For no apparent reason we were using a HashSet to store the ReadFilters, so the order of operations was really arbitrarily applied. The order now is (1) the order of the walker intrinsic filters (2) read group black list (if provided) (3) command line filters (if provided)	2011-10-06 18:51:40 -07:00
Mark DePristo	6a573437af	Details documentation arguments for -ped	2011-10-05 15:00:58 -07:00
Mark DePristo	b20689ff55	No longer supports extraProperties -- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem -- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record. If the two records are inconsistent, an error is thrown -- addSample() in Sample.class now invokes mergeSample() when appropriate -- Validation types are now only STRICT or SILENT -- Validation code implemented in SampleDBBuilder -- Extensive unit tests for SampleDBBuilder	2011-10-03 19:20:33 -07:00
Mark DePristo	2e3dc52088	Minor function renaming	2011-10-03 14:41:13 -07:00
Mark DePristo	89ac50e86e	SampleDataSource -> SampleDB	2011-10-03 09:33:30 -07:00
Mark DePristo	810e8ad011	Removed getXByReaders() function from the engine -- These could be simplied in their downstream uses -- Or they could be replaced with a generic getSAMFileHeaders() function and then apply the getSamples(header) as desired downstream	2011-09-30 10:43:51 -04:00
Mark DePristo	178ba24c27	Move getSamplesForSamFile to SampleUtils -- A nearly identical piece of code already lived in SampleUtils. Now there are two functions, one taking a regular header and another grabbing the merged header from the GATK engine itself. Much cleaner	2011-09-30 10:28:18 -04:00
Mark DePristo	5c9227cf5e	Further cleanup of Sample database -- Removing more and more unnecessary code -- Partial removal of type safe Sample usage. On the road to SampleDB only	2011-09-29 11:50:05 -04:00
Mark DePristo	2a0cd556d3	Further cleanup of Sample -- Cleaned up interface functions in GAE -- Added Walker.getSampleDB() function which is an easier option for tools to get the samples db	2011-09-29 10:34:51 -04:00
Mark DePristo	e76f381628	Moved sample package from DataSources to gatk, and renamed it samples -- All associated changes to the codebase are just header updates	2011-09-29 09:57:15 -04:00
Mark DePristo	b7511c5ff3	Fixed long-standing bug in tribble index creation -- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index. This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write -- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary. This can be used conveniently everywhere, and is what's written into the Tribble index -- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils -- VCFWriter now requires the master sequence dictionary -- Updated walkers that create VCFWriters to provide the master sequence dictionary	2011-09-20 10:53:18 -04:00
Mark DePristo	1eab9be35d	Now with accurate javadoc	2011-08-22 17:25:15 -04:00
Mark DePristo	c797616c65	If you have one sample in your BAM, getToolkit().getSamples().size() == 2 Also deleted double initializationm, where a line of code was duplicated in creating the GATK engine.	2011-08-18 11:51:53 -04:00
Mark DePristo	f6563c0f9f	Removed support for RMD in @Requires and @Allows Merge as well Conflicts: private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java	2011-08-03 15:36:55 -04:00
Mark DePristo	f3ad4ec94b	Removed annoying FastaSequenceIndexBuilderProgressListener infrastructure that was just a boolean switch on whether to print progress or not.	2011-07-27 22:06:23 -04:00
Mark DePristo	f3049fba63	refdata directory cleanup Removing unused files RODRecordIterator, ReferenceOrderedData, QueryableTrack, RMDTrackCreationException, GATKFeatureIterator, ReferenceOrderedDataUnitTest Refactored dbSNP and refseq utilities to be closer to the other files implementing these features	2011-07-25 13:21:52 -04:00
Mark DePristo	9992c373be	Optimize imports run on the whole project, public and private. I just got too tired of all of the unused imports floating around. Confirmed that the system builds after the changes.	2011-07-17 20:29:58 -04:00
David Roazen	3c9497788e	Reorganized the codebase beneath top-level public and private directories, removing the playground and oneoffprojects directories in the process. Updated build.xml accordingly.	2011-06-28 06:55:19 -04:00

44 Commits (a5df8f1277d7dc1bc75dfb837f0598a0e0220c34)