gatk-3.8

Commit Graph

Author	SHA1	Message	Date
carneiro	2524216d4b	Added the R script for VQSR git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5898 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-27 21:56:56 +00:00
carneiro	3a2e32eef3	wex is wex, wgs is wgs.... i think i got it right this time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5828 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-20 16:44:25 +00:00
carneiro	76c87c9f1d	trio WGS was creating trio WEX filenames. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5822 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 17:45:45 +00:00
carneiro	ebcd333ed8	Quick small updates: SelectVariants: typo MethodsDevelopmentPipeline: Added CEU Trio WGS dataset git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5818 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 20:08:39 +00:00
rpoplin	4bbce42861	Renaming ContrastiveRecalibrator --> VariantRecalibrator in preparation for move to core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5733 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 18:12:47 +00:00
rpoplin	3224bbe750	New visualization output for VQSR. It creates the R script file on the fly and then runs Rscript on it. Adding 1000G Project consensus code. First pass of having VQSR work with missing data by marginalizing over the missing dimension for that data point (thanks Chris and Bob for ideas). Updated math functions to use apache math commons instead of approximations from wikipedia. New parameters available for the priors based on further reading in Bishop and looking at the new visualizations. Updated integration test to use more modern files. Updated MDCP to use new best practices w.r.t. annotations. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5723 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 19:14:42 +00:00
rpoplin	05ad6ecf72	bug fix in MDCP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5613 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-11 18:27:47 +00:00
rpoplin	febb883511	updates to MDCP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5586 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-06 19:44:46 +00:00
rpoplin	40a25af58e	Bug fixes in MDCP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5561 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-03 00:04:38 +00:00
rpoplin	5ddc0e464a	Under guidance from Matt added ability to use key-value tags with ROD binding command line arguments, so now one can say -B:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf and get the tags in a walker. Look at ContrastiveRecalibrator for an example of how to use the new ReferenceOrderedDataSource.getTags(). Removed references to FDR in tranches since we are only using truth sensitivity. Finally fixed long standing bug where tranche filters weren't set appropriately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5536 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-29 21:04:09 +00:00
carneiro	c3f70cc5cb	DPP: Updated after some tests with BWA. Still needs more testing. MDP: Removed ApplyVariantCut as it's no longer necessary with VQSR2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5534 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-29 18:22:09 +00:00
carneiro	ccdc021207	Added BWA (option) to the data processing pipeline. Lots of testing still happening... little fix to the calling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5528 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-28 20:17:57 +00:00
kshakir	f3e94ef2be	Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output. JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar. JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar. Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath. Walkers from the GATK package are now also embedded into the Queue package. Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP. Removed the GATK jar argument from the example QScripts. Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts: 1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers. 2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3 Removed other unused code. Re-fixed dry run function ordering. Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 14:03:51 +00:00
carneiro	1198a90ac7	cosmetic change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5481 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-21 15:46:04 +00:00
carneiro	4e449905d1	methods development pipeline now sports VQSR2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5478 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 22:00:46 +00:00
carneiro	8ab6eee1cf	gold standard creates its own tranches and vcf files now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5347 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-01 17:48:40 +00:00
rpoplin	ce34a8a918	New hidden option in VQSR to not parse the genotypes of the incoming training data. Updated VQSR training in methods development pipeline to be more in line with best practices. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5340 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-28 23:19:51 +00:00
carneiro	fd5d1f9cfc	minor cosmetic changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5322 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-25 21:56:35 +00:00
carneiro	81414a21dd	dpp: back to using 4gb memory assuming all is right with IndelRealigner now. mdcp: Some class structural changes due to the inclusion of indel calls. ApplyCut now chooses the tranche differently for each dataset. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5319 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-25 19:21:02 +00:00
carneiro	6db3210387	the data processing pipeline needs more memory... directory updates in the methods pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5305 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-24 17:22:58 +00:00
carneiro	897a333aba	Methods Development Pipeline now has the option of calling indels with the -indels parameter. Also updated some databases and the new NA12878 HiSeq hg19 that Tim just funneled to us, is updated and called. Small fixes on the data processing pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5304 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-24 17:12:55 +00:00
rpoplin	255cc246a2	Change in Methods development pipeline: dbsnp130 can't be used for anything, changed it to dbsnp129. Optimization for HaplotypeScore and the to-be-committed ReadRosRankSumTest in AlignmentUtils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5301 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-24 16:09:03 +00:00
carneiro	ecfb51bcd8	Few organizational changes, queue output is now categorized and hidden. Also changed NA12878.Wex to dbsnp 129. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5290 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-22 22:49:38 +00:00
carneiro	8ea71fd294	minor dataset chages. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5289 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-22 20:18:10 +00:00
carneiro	c61dd2f09f	data processing pipeline now has on the fly bam indexing (powered by Matt) some new parameters, Indel Cleaning with constrain movement and fixMates is gone. setting up methods development pipeline for some cosmetic changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5277 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-18 23:13:54 +00:00
carneiro	50c870cfce	Data Processing Pipeline: local indel realignment, mark duplicates and BQSR. Done. Pacbio pipeline: now all pacbio bams have baq annotated in so running UG is uber fast. Methods pipeline: minor cosmetic changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5253 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-16 17:22:30 +00:00
carneiro	87e19a17ae	small updates to the variant eval part of the pipeline, some updates to the pacbio specific pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5244 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 16:19:07 +00:00
carneiro	c630701a76	Following Ryan's suggestion, I am moving the Methods Development Calling pipeline to the Core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5226 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-10 17:36:05 +00:00

28 Commits (0d07c979e936c8f5a276e812cdbcae9278d2d635)