gatk-3.8

Commit Graph

Author	SHA1	Message	Date
chartl	b81228fec1	Minor bug fixes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5603 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-08 17:30:40 +00:00
hanna	437db28937	Incorporating Khalid's feedback. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5602 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-08 16:22:49 +00:00
chartl	cc58e19621	This is now running. Expect results in a few weeks when the ~7k jobs have percolated through the week queue. Pray gsa1 doesn't go down. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5593 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-07 21:12:59 +00:00
chartl	6a26957b65	Bug squashing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5592 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-07 20:11:28 +00:00
chartl	a1b7d28375	Initial VQSR full search script git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5591 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-07 20:03:48 +00:00
rpoplin	febb883511	updates to MDCP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5586 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-06 19:44:46 +00:00
hanna	798fb6a7a2	First draft of a script to measure performance of read walkers when merging dynamically. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5570 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-04 15:35:14 +00:00
carneiro	b722ebf244	quick help/comments updates to match the wikipage. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5569 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-04 12:55:55 +00:00
depristo	349661b958	Renamed StratifyAlignmentContext to AlignmentContextUtils, and StatiefyContextType to ReadOrientation. Also, went through the system and deleted all references to second bases. That ship passed long ago. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5563 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-03 15:35:09 +00:00
rpoplin	40a25af58e	Bug fixes in MDCP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5561 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-03 00:04:38 +00:00
depristo	f2c4356a40	Minor usability improvements to the standard eval script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5551 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-01 17:36:50 +00:00
carneiro	0a772688fe	implementation of the Gatherer class for CountCovariates, which makes it now scatter/gatherable. Kudos to the @Gather annotation Khalid just introduced! QuickCCTest is my test script for the gatherer. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5547 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-31 21:15:21 +00:00
carneiro	20344a27b4	Quick updates to the data processing pipeline after successfully cleaning the papuans. It now scatter gathers everything and runs in the hour queue for low pass data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5546 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-31 21:13:33 +00:00
carneiro	5d26c66769	Count Covariates is almost scatter-gatherable now! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5537 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-29 22:25:33 +00:00
rpoplin	5ddc0e464a	Under guidance from Matt added ability to use key-value tags with ROD binding command line arguments, so now one can say -B:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf and get the tags in a walker. Look at ContrastiveRecalibrator for an example of how to use the new ReferenceOrderedDataSource.getTags(). Removed references to FDR in tranches since we are only using truth sensitivity. Finally fixed long standing bug where tranche filters weren't set appropriately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5536 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-29 21:04:09 +00:00
carneiro	c3f70cc5cb	DPP: Updated after some tests with BWA. Still needs more testing. MDP: Removed ApplyVariantCut as it's no longer necessary with VQSR2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5534 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-29 18:22:09 +00:00
carneiro	ccdc021207	Added BWA (option) to the data processing pipeline. Lots of testing still happening... little fix to the calling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5528 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-28 20:17:57 +00:00
depristo	cdb0bde952	Bringing script up to date git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5526 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-27 20:49:07 +00:00
depristo	bae0b6cba8	A script for playing with BEAGLE refinement parameters. Supports construction of reference panels from NGS data sets with varying niteration and calibration curve parameters, as well as imputing missing genotypes in a VCF with this reference panel, and comparison to a deeply sequenced individual. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5523 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-27 12:44:25 +00:00
chartl	fe7f45ee2e	First pass at recalibrating associations, with optional data whitening. Modification to the TableCodec so it can natively read bedgraph files (just needed to add an extra header marker: "track"). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5515 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-25 19:35:39 +00:00
kshakir	e47513f043	Minor updates to match the wiki documentation. Upper cased the PartitionType enum values. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5506 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 20:22:23 +00:00
carneiro	1281c842ad	quick updates to conform with the new picard bam function structure git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5505 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 16:58:37 +00:00
kshakir	f3e94ef2be	Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output. JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar. JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar. Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath. Walkers from the GATK package are now also embedded into the Queue package. Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP. Removed the GATK jar argument from the example QScripts. Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts: 1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers. 2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3 Removed other unused code. Re-fixed dry run function ordering. Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 14:03:51 +00:00
chartl	cd90fdeca1	Right. The issue was not setting the scatter/gather classes appropriately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5501 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-23 20:08:53 +00:00
chartl	3c1bf40a45	QScript for scatter-gathering regional association (not quite as easy as using the built-in extension, due to the multiplexer). Currently does not work due to something I'm missing re: scatter gather class, this commit is an interim one. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5500 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-23 19:42:29 +00:00
carneiro	3414bccb46	documentation changes to agree with the wiki git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5494 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 21:48:49 +00:00
carneiro	28149e5c5e	GenotypeAndValidate version 2, ready to be used. - now it differentiates between confident REF calls and not confident calls. - you can now use a BAM file as the truth set. - output is much clearer now dataProcessingPipeline version 2, ready to be used. - All the processing is now done at the sample level - Reads the input bam file headers to combine all lanes of the same sample. - Cleaning is now scattered/gathered. Inteligently breaks down in as many intervals as possible, given the dataset. - Outputs one processed bam file per sample (and a .list file with all processed files listed) - Much faster, low pass (read Papuans) can run in the hour queue. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5493 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 20:18:02 +00:00
carneiro	748787c509	helper script to the papuan processing... minor updates git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5489 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 14:11:02 +00:00
kshakir	f6d4b0aaf5	Using an embedded version of Picard for merging un-indexed bam files after scatter/gather instead of requiring the QScripts to specify the picard JAR. May do this for the GATK jar too. Fixed initialization of pending counts when using -startFromScratch so the count doesn't start at zero and end at -<#njobs>. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5483 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-21 18:20:01 +00:00
carneiro	1198a90ac7	cosmetic change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5481 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-21 15:46:04 +00:00
carneiro	96628457cb	pacbio calling pipeline also using VQSR2 now, minor updates on the other pipelines to get the papuans through. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5479 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 22:06:52 +00:00
carneiro	4e449905d1	methods development pipeline now sports VQSR2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5478 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 22:00:46 +00:00
carneiro	c9442e4b21	now merging bam files per sample and processing according to cleaning options. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5477 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 21:31:29 +00:00
carneiro	18fac5112c	first step towards the new sample based processing pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5471 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 19:25:15 +00:00
depristo	abc7d1aef9	BeagleOutputToVCF now accepts an option to keep monomorphic sites. This is useful to genotype a single sample, where having AC=0 just means that the sample is hom-ref at the site. ProduceBeagleInputWalker can optionally emit a beagle markers file, necessary to use the beagled reference panel for imputation. Also supports the VQSR calibration curve idea that a site can be flagged as a certain FP, based on the VQSLOD field. This allows us to have both continuous quality in the refinement of sites as well as hard filtering at some threshold so we don't end up with lots of sites with all 1/3 1/3 1/3 likelihoods for all samples (i.e., a definite FP site where we don't know anything about the samples). Added a new VariantsToBeagleUnphased walker that writes out a marker drive hard-call unphased genotypes file suitable for imputating missing genotypes with a reference panel with beagle. Can optionally keep back a fraction of sites, marked as missing in the genotypes file, for assessment of imputation accuracy and power. The bootstrap sites can be written to a separate VCF for assessment as well. Finally, my general Queue script for creating and evaluating reference panels from VCF files. Supports explicitly genotyping a BAM file at each panel SNP site, for assessment of imputation accuracy of a reference panel. Lots of options for exploring the impact of the VQS likelihooods, multiple VCFs for constructing the reference panel, as well as fraction of sites left out in assessing the panel's power. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5467 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 03:08:38 +00:00
carneiro	55e5971b3b	this is a oneoff script to clean the papuans and test TargetCreator and IndelRealigner with scatter gathering. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5457 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-17 17:09:53 +00:00
rpoplin	9c413fbc9e	not useful git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5450 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-15 22:47:55 +00:00
carneiro	42f70d9e07	join all per-lane Bams before doing target realigning and indel cleaning. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5435 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-14 16:11:03 +00:00
depristo	d01d4fdeb5	Optimized version of produce beagle tool, along with experimental (hidden) support for combining likelihoods depending on estimate false positive rate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5430 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-12 02:06:28 +00:00
fromer	0b45de14ed	Some minor updates to fully utilize the functionality of reduceByInterval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5411 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-09 20:38:08 +00:00
depristo	bf2e02f472	Generic, easy-to-use variant evaluation Queue script that tests indel and SNP call sets against standard evaluation data sets for sensitivity and specificity git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5391 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 18:03:29 +00:00
depristo	5c979633f0	Due to a problem in the way that dynamic type selection works, I've added an explicit (temporary) ability to restrict VE to specific variant types (SNPs, INDELs, etc), so that calculations will work when a site has a SNP in dbSNP but is called as an indel, causing the SNP site to mysteriously disappear from the comp track, a huge problem for validation report. VEU updated to allow both dynamic type (old) and just returning everything in the track. Also, created a standard Queue script that calculates a suite of standard indel and SNP assessment results. Will be the basis for a general evaluation Queue script with standardized data files for SNPs and Indels. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5385 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-06 19:31:12 +00:00
chartl	a40a8006b5	Added in unit tests for the statistics calculated by the test runner; and bug-fixes to the calculations; so we have some assurance that the statistics coming out the back-end are correct. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5380 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-06 16:54:02 +00:00
chartl	9ca1dd5d62	Miscellaneous changes: - RefMetaDataTracker: grabbing variant contexts given a prefix (not sure where else this was implemented, if someone can show me I'll remove it) - VCFUtils: grabbing VCF headers given a prefix - MathUtils: Useful functions for calculating statistics on collections of Numbers - VariantAnnotator: Made isUniqueHeaderLine a public static method -- maybe this should go into a different class. Not sure. - Associations: PluginManager now used to propagate classes, implementations for Z,T,U tests, slight alteration to format to make the objects stored in the window optionally different from those returned by whatever statistic is run across the window Added: - MannWhitneyU. Started to fix up WilcoxonRankSum but there are comments in there questioning the validity of some of the code, and I'm sure that it's actually doing a U test. This implementation includes the direct calculation of p-values for small sample sizes, and a uniform approximation for when one of the sample sets is small, and the other large. Unit tests to follow. - BootstrapCallsMerger: takes n VCFs which have been called on the same samples; merges them together while averaging the annotations - BootstrapCalls.q: qscript for testing the effectiveness of boostrap low-pass calling on the exome git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5372 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-03 22:43:36 +00:00
carneiro	0daa65b9ef	quick and dirty 'close your eyes' solution to run the papuans over the weekend. Will be properly fixed soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5370 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-03 21:42:22 +00:00
carneiro	8ab6eee1cf	gold standard creates its own tranches and vcf files now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5347 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-01 17:48:40 +00:00
chartl	0723b0f44c	Generalized association is now working. Output is in a horrific format. Implementation of T-testing. Improvements are to look for classes dynamically (a la VariantEval/VariantAnnotator), beautify output, and do optimizations where they exist. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5341 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-01 01:23:37 +00:00
rpoplin	ce34a8a918	New hidden option in VQSR to not parse the genotypes of the incoming training data. Updated VQSR training in methods development pipeline to be more in line with best practices. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5340 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-28 23:19:51 +00:00
carneiro	c7a51f0de7	fixed 1kg pilot dindel calls vcf file and combined all vcfs into one master dindel file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5335 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-28 19:04:58 +00:00
depristo	146756de79	Class name to reflect actual file name. manySampleUGPerformance now operates on 1000 samples! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5326 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-26 23:36:04 +00:00

1 2 3 4 5 ...

262 Commits (80d547ae71627e2f292a1a9c3d2b70f8e7efd76a)