gatk-3.8

Commit Graph

Author	SHA1	Message	Date
rpoplin	40a25af58e	Bug fixes in MDCP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5561 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-03 00:04:38 +00:00
depristo	f2c4356a40	Minor usability improvements to the standard eval script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5551 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-01 17:36:50 +00:00
carneiro	0a772688fe	implementation of the Gatherer class for CountCovariates, which makes it now scatter/gatherable. Kudos to the @Gather annotation Khalid just introduced! QuickCCTest is my test script for the gatherer. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5547 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-31 21:15:21 +00:00
carneiro	20344a27b4	Quick updates to the data processing pipeline after successfully cleaning the papuans. It now scatter gathers everything and runs in the hour queue for low pass data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5546 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-31 21:13:33 +00:00
carneiro	5d26c66769	Count Covariates is almost scatter-gatherable now! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5537 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-29 22:25:33 +00:00
rpoplin	5ddc0e464a	Under guidance from Matt added ability to use key-value tags with ROD binding command line arguments, so now one can say -B:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf and get the tags in a walker. Look at ContrastiveRecalibrator for an example of how to use the new ReferenceOrderedDataSource.getTags(). Removed references to FDR in tranches since we are only using truth sensitivity. Finally fixed long standing bug where tranche filters weren't set appropriately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5536 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-29 21:04:09 +00:00
carneiro	c3f70cc5cb	DPP: Updated after some tests with BWA. Still needs more testing. MDP: Removed ApplyVariantCut as it's no longer necessary with VQSR2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5534 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-29 18:22:09 +00:00
carneiro	ccdc021207	Added BWA (option) to the data processing pipeline. Lots of testing still happening... little fix to the calling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5528 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-28 20:17:57 +00:00
depristo	cdb0bde952	Bringing script up to date git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5526 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-27 20:49:07 +00:00
depristo	bae0b6cba8	A script for playing with BEAGLE refinement parameters. Supports construction of reference panels from NGS data sets with varying niteration and calibration curve parameters, as well as imputing missing genotypes in a VCF with this reference panel, and comparison to a deeply sequenced individual. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5523 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-27 12:44:25 +00:00
chartl	fe7f45ee2e	First pass at recalibrating associations, with optional data whitening. Modification to the TableCodec so it can natively read bedgraph files (just needed to add an extra header marker: "track"). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5515 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-25 19:35:39 +00:00
kshakir	e47513f043	Minor updates to match the wiki documentation. Upper cased the PartitionType enum values. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5506 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 20:22:23 +00:00
carneiro	1281c842ad	quick updates to conform with the new picard bam function structure git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5505 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 16:58:37 +00:00
kshakir	f3e94ef2be	Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output. JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar. JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar. Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath. Walkers from the GATK package are now also embedded into the Queue package. Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP. Removed the GATK jar argument from the example QScripts. Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts: 1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers. 2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3 Removed other unused code. Re-fixed dry run function ordering. Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 14:03:51 +00:00
chartl	cd90fdeca1	Right. The issue was not setting the scatter/gather classes appropriately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5501 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-23 20:08:53 +00:00
chartl	3c1bf40a45	QScript for scatter-gathering regional association (not quite as easy as using the built-in extension, due to the multiplexer). Currently does not work due to something I'm missing re: scatter gather class, this commit is an interim one. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5500 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-23 19:42:29 +00:00
carneiro	3414bccb46	documentation changes to agree with the wiki git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5494 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 21:48:49 +00:00
carneiro	28149e5c5e	GenotypeAndValidate version 2, ready to be used. - now it differentiates between confident REF calls and not confident calls. - you can now use a BAM file as the truth set. - output is much clearer now dataProcessingPipeline version 2, ready to be used. - All the processing is now done at the sample level - Reads the input bam file headers to combine all lanes of the same sample. - Cleaning is now scattered/gathered. Inteligently breaks down in as many intervals as possible, given the dataset. - Outputs one processed bam file per sample (and a .list file with all processed files listed) - Much faster, low pass (read Papuans) can run in the hour queue. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5493 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 20:18:02 +00:00
carneiro	748787c509	helper script to the papuan processing... minor updates git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5489 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 14:11:02 +00:00
kshakir	f6d4b0aaf5	Using an embedded version of Picard for merging un-indexed bam files after scatter/gather instead of requiring the QScripts to specify the picard JAR. May do this for the GATK jar too. Fixed initialization of pending counts when using -startFromScratch so the count doesn't start at zero and end at -<#njobs>. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5483 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-21 18:20:01 +00:00
carneiro	1198a90ac7	cosmetic change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5481 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-21 15:46:04 +00:00
carneiro	96628457cb	pacbio calling pipeline also using VQSR2 now, minor updates on the other pipelines to get the papuans through. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5479 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 22:06:52 +00:00
carneiro	4e449905d1	methods development pipeline now sports VQSR2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5478 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 22:00:46 +00:00
carneiro	c9442e4b21	now merging bam files per sample and processing according to cleaning options. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5477 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 21:31:29 +00:00
carneiro	18fac5112c	first step towards the new sample based processing pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5471 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 19:25:15 +00:00
depristo	abc7d1aef9	BeagleOutputToVCF now accepts an option to keep monomorphic sites. This is useful to genotype a single sample, where having AC=0 just means that the sample is hom-ref at the site. ProduceBeagleInputWalker can optionally emit a beagle markers file, necessary to use the beagled reference panel for imputation. Also supports the VQSR calibration curve idea that a site can be flagged as a certain FP, based on the VQSLOD field. This allows us to have both continuous quality in the refinement of sites as well as hard filtering at some threshold so we don't end up with lots of sites with all 1/3 1/3 1/3 likelihoods for all samples (i.e., a definite FP site where we don't know anything about the samples). Added a new VariantsToBeagleUnphased walker that writes out a marker drive hard-call unphased genotypes file suitable for imputating missing genotypes with a reference panel with beagle. Can optionally keep back a fraction of sites, marked as missing in the genotypes file, for assessment of imputation accuracy and power. The bootstrap sites can be written to a separate VCF for assessment as well. Finally, my general Queue script for creating and evaluating reference panels from VCF files. Supports explicitly genotyping a BAM file at each panel SNP site, for assessment of imputation accuracy of a reference panel. Lots of options for exploring the impact of the VQS likelihooods, multiple VCFs for constructing the reference panel, as well as fraction of sites left out in assessing the panel's power. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5467 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 03:08:38 +00:00
carneiro	55e5971b3b	this is a oneoff script to clean the papuans and test TargetCreator and IndelRealigner with scatter gathering. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5457 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-17 17:09:53 +00:00
rpoplin	9c413fbc9e	not useful git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5450 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-15 22:47:55 +00:00
carneiro	42f70d9e07	join all per-lane Bams before doing target realigning and indel cleaning. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5435 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-14 16:11:03 +00:00
depristo	d01d4fdeb5	Optimized version of produce beagle tool, along with experimental (hidden) support for combining likelihoods depending on estimate false positive rate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5430 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-12 02:06:28 +00:00
fromer	0b45de14ed	Some minor updates to fully utilize the functionality of reduceByInterval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5411 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-09 20:38:08 +00:00
depristo	bf2e02f472	Generic, easy-to-use variant evaluation Queue script that tests indel and SNP call sets against standard evaluation data sets for sensitivity and specificity git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5391 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 18:03:29 +00:00
depristo	5c979633f0	Due to a problem in the way that dynamic type selection works, I've added an explicit (temporary) ability to restrict VE to specific variant types (SNPs, INDELs, etc), so that calculations will work when a site has a SNP in dbSNP but is called as an indel, causing the SNP site to mysteriously disappear from the comp track, a huge problem for validation report. VEU updated to allow both dynamic type (old) and just returning everything in the track. Also, created a standard Queue script that calculates a suite of standard indel and SNP assessment results. Will be the basis for a general evaluation Queue script with standardized data files for SNPs and Indels. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5385 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-06 19:31:12 +00:00
chartl	a40a8006b5	Added in unit tests for the statistics calculated by the test runner; and bug-fixes to the calculations; so we have some assurance that the statistics coming out the back-end are correct. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5380 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-06 16:54:02 +00:00
chartl	9ca1dd5d62	Miscellaneous changes: - RefMetaDataTracker: grabbing variant contexts given a prefix (not sure where else this was implemented, if someone can show me I'll remove it) - VCFUtils: grabbing VCF headers given a prefix - MathUtils: Useful functions for calculating statistics on collections of Numbers - VariantAnnotator: Made isUniqueHeaderLine a public static method -- maybe this should go into a different class. Not sure. - Associations: PluginManager now used to propagate classes, implementations for Z,T,U tests, slight alteration to format to make the objects stored in the window optionally different from those returned by whatever statistic is run across the window Added: - MannWhitneyU. Started to fix up WilcoxonRankSum but there are comments in there questioning the validity of some of the code, and I'm sure that it's actually doing a U test. This implementation includes the direct calculation of p-values for small sample sizes, and a uniform approximation for when one of the sample sets is small, and the other large. Unit tests to follow. - BootstrapCallsMerger: takes n VCFs which have been called on the same samples; merges them together while averaging the annotations - BootstrapCalls.q: qscript for testing the effectiveness of boostrap low-pass calling on the exome git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5372 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-03 22:43:36 +00:00
carneiro	0daa65b9ef	quick and dirty 'close your eyes' solution to run the papuans over the weekend. Will be properly fixed soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5370 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-03 21:42:22 +00:00
carneiro	8ab6eee1cf	gold standard creates its own tranches and vcf files now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5347 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-01 17:48:40 +00:00
chartl	0723b0f44c	Generalized association is now working. Output is in a horrific format. Implementation of T-testing. Improvements are to look for classes dynamically (a la VariantEval/VariantAnnotator), beautify output, and do optimizations where they exist. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5341 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-01 01:23:37 +00:00
rpoplin	ce34a8a918	New hidden option in VQSR to not parse the genotypes of the incoming training data. Updated VQSR training in methods development pipeline to be more in line with best practices. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5340 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-28 23:19:51 +00:00
carneiro	c7a51f0de7	fixed 1kg pilot dindel calls vcf file and combined all vcfs into one master dindel file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5335 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-28 19:04:58 +00:00
depristo	146756de79	Class name to reflect actual file name. manySampleUGPerformance now operates on 1000 samples! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5326 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-26 23:36:04 +00:00
chartl	b089d35b21	Fix expand intervals to do the right thing: - No more duplicate intervals - Truncation at intervals that already exist, e.g. exists: \|--------\| \|-------\| new: \|---------\| fixed: \|-----\| note that weird instances like: exists: \|-\| \|-\| \|-\| new: \|---------------------\| fixed: \|----\| e.g. you're truncated to the nearest interval on whatever side. In general many behaviors could happen in this instance, this is the one currently implemented. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5323 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-26 04:19:01 +00:00
carneiro	fd5d1f9cfc	minor cosmetic changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5322 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-25 21:56:35 +00:00
carneiro	81414a21dd	dpp: back to using 4gb memory assuming all is right with IndelRealigner now. mdcp: Some class structural changes due to the inclusion of indel calls. ApplyCut now chooses the tranche differently for each dataset. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5319 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-25 19:21:02 +00:00
kshakir	3e0a722672	MFCP waits for other pipelines to finish by using the previous log file of one pipeline as virtual input to the next pipeline. Using the name of the yaml in the log file name instead of each writing each to "queue.out" so that two yamls can run from the same directory without creating cycles in the graph. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5318 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-25 17:51:01 +00:00
carneiro	6db3210387	the data processing pipeline needs more memory... directory updates in the methods pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5305 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-24 17:22:58 +00:00
carneiro	897a333aba	Methods Development Pipeline now has the option of calling indels with the -indels parameter. Also updated some databases and the new NA12878 HiSeq hg19 that Tim just funneled to us, is updated and called. Small fixes on the data processing pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5304 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-24 17:12:55 +00:00
rpoplin	255cc246a2	Change in Methods development pipeline: dbsnp130 can't be used for anything, changed it to dbsnp129. Optimization for HaplotypeScore and the to-be-committed ReadRosRankSumTest in AlignmentUtils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5301 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-24 16:09:03 +00:00
chartl	97e1a5262e	-ct x no longer includes coverage in the previous bin BatchMerge - additional support for indels (can't just test the alternate allele when it's an extended event, must also specify that you want to use the dindel model when you actually test the allele) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5300 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-24 15:52:04 +00:00
kshakir	f1f9bd6dcc	Due to recent LSF hiccups put a very brief (.5-2min) retry around getting status. Can't wait too long because statuses are archived an hour after exit. TODO: Switch to bulk status checks and add status archive lookups. Sending SIGTERM(15) instead of SIGKILL(9) to allow for graceful termination of child process. Printing out the name of the QScripts in the compile error text. Added a pipelineretry -PR pass through for the MFCP and MFCPTest. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5295 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-23 18:59:08 +00:00
chartl	07d381ec51	BatchMerge now uses the correct UG settings, recently added by Eric ExpandIntervals now checks that identical intervals are not created by (un)fortunately-spaced targets VCFExtractIntervals no longer creates duplicate intervals in the case where a VCF has multiple entries at the same site git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5294 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-23 18:46:15 +00:00
carneiro	2a48ec1307	now only accepts intervals files if the user specifically requests to report bams at interval only. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5291 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-23 16:49:58 +00:00
carneiro	ecfb51bcd8	Few organizational changes, queue output is now categorized and hidden. Also changed NA12878.Wex to dbsnp 129. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5290 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-22 22:49:38 +00:00
carneiro	8ea71fd294	minor dataset chages. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5289 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-22 20:18:10 +00:00
carneiro	c61dd2f09f	data processing pipeline now has on the fly bam indexing (powered by Matt) some new parameters, Indel Cleaning with constrain movement and fixMates is gone. setting up methods development pipeline for some cosmetic changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5277 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-18 23:13:54 +00:00
depristo	d97ed3e080	Comments for Mauricio git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5275 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-18 16:58:34 +00:00
carneiro	acad3ada06	changed baq to calculate_as_necessary. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5270 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 23:50:46 +00:00
carneiro	7f9ca6b28a	full data processing pipeline, now deleting intermediate files and performing both phases (per lane and combined) of the processing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5269 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 23:34:00 +00:00
kiran	4f83151c4e	Evaluates within standard target and expanded target separately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5268 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 23:04:24 +00:00
kshakir	860b172ef1	Defaulting the MFCP to run without a tear script. Added a missing virtual output for the inner FCP, so that Queue can tell a run of the FCP is dot-done. Enabled the MCFPTest for the first time, running without the tear script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5264 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 21:13:14 +00:00
kshakir	a189454343	FCP only adds the expand intervals QFunction once per script instead of once per QFunction using the ExpandTargets scala trait. Eval dbSNP's type now based on eval dbSNP instead of genotype dbSNP. Using an external treemap instead of the JGraphT internal node set to speed up larger graph generation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5261 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 19:09:03 +00:00
carneiro	497e9ab83b	too hasty... cleaning up debug messages ;) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5257 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 02:11:03 +00:00
carneiro	b4da843c49	now processes either a single bam file or a list of bam files in parallel. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5256 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 02:07:22 +00:00
carneiro	50c870cfce	Data Processing Pipeline: local indel realignment, mark duplicates and BQSR. Done. Pacbio pipeline: now all pacbio bams have baq annotated in so running UG is uber fast. Methods pipeline: minor cosmetic changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5253 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-16 17:22:30 +00:00
kiran	c0a4af3809	Expands targets by 50-bp on both sides when the expandIntervals argument is greater than 0. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5251 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-16 14:47:52 +00:00
carneiro	6d3b878dde	data processing pipeline script already does: . Local Indel Realignment . Mark Duplicates will do: . Base Quality Score Recalibration (soon) it's working with a single BAM for testing, but will work with a list of bam files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5250 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 21:49:05 +00:00
corin	d2efea6003	This is a draft of the improved and prettified pipeline. It may not yet compile, but Kiran is taking over adding a few more things as I finish up other tasks. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5248 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 19:35:00 +00:00
kshakir	d185c2961f	Added pipeline for calling FCP in batches called MultiFullCallingPipeline. Bug smashes for the MCFP: Synchronized access to LSF library and modifications to the QGraph. If values are missing from the graph with -run make sure to exit with a non-zero. Refactored QGraph to pre-generate a unique Int for each QNode speeding up getHashCode/equals inside the graph. Added jobPriority and removed jobLimitSeconds from QFunction. All scatter gather is by default in a single sub directory queueScatterGather. Moved some FCPTest into BaseTest/PipelineTest for use by MFCPTest. Rev'ed the 1000G bams used for validation from v1 to v2 and added code to look for the bams before running other tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5247 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 18:26:14 +00:00
carneiro	87e19a17ae	small updates to the variant eval part of the pipeline, some updates to the pacbio specific pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5244 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 16:19:07 +00:00
chartl	851b3e71f9	Major revision of the batch merge script. All sites are now used, hooks for some UG settings, no longer reliant on the pipeline management library (pipeline libs are probably going to go away -- nobody uses them) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5241 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-14 23:52:05 +00:00
fromer	d6e3f2eba6	Added GC content calculator for CNV data git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5240 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-14 22:29:55 +00:00
carneiro	5f10fffa47	merge intervals now prints a sorted list in the end. added the ccs datasets to the pbCalling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5233 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-11 20:57:59 +00:00
carneiro	50c2fa3c3a	this -1 made ALL the difference in the world. Minor bug fix. Regular updates to the pbCalling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5232 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-11 19:25:09 +00:00
fromer	cdf53188d6	Updated DoC to work with scatter-gather; and, also manually implemented scatter-gather by sample above the scatter-gather by interval. Thansk to Khalid for his support! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5231 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-11 19:14:42 +00:00
carneiro	c630701a76	Following Ryan's suggestion, I am moving the Methods Development Calling pipeline to the Core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5226 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-10 17:36:05 +00:00
carneiro	9c2c5efe35	a modified version of the Methods Development calling pipeline made to work with pacbio data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5225 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-10 16:06:50 +00:00
fromer	947cc44854	Thanks to Matt for walking me through a proper version of VCF_BAM_utilities! Feel free to add to it, or use it to get the samples in a VCF file, a BAM file, or a collection of BAM files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5223 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-09 18:08:27 +00:00
kshakir	4d1cca95bb	Removed deprecated getDbsnpFile. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-08 21:12:15 +00:00
carneiro	e5cfc6ae74	NA12878 hg19 dataset was included to the methods pipeline. (and I am running it) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5217 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-08 16:17:46 +00:00
fromer	8d0f1b75d5	Added queue/util/BAMutilities Object [with BAM and VCF parsing utilities], which is now used by my qscripts that robustly split runs by sample git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5214 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-07 22:17:29 +00:00
kshakir	8040998c15	Renamed the pipeline yaml dbsnpFile to genotypeDbsnp, and added an evalDbsnp. Added a genotypeDbsnpType and evalDbsnpType to check the extensions for .vcf or .rod. Moved renaming of "recalibrated" bams to "cleaned" from sed to yaml generation template (see diff for more info). Renamed fCP.q to FCP.q. Though it's still disabled until VariantEval is updated, added changes above to the FCPTest. Removed refseq table from the queue.sh wrapper script. Only specified in the yaml. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5213 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-07 22:01:09 +00:00
fromer	3c1a026c94	Updated script to properly bin DoC values so that down-sampling corresponds to range of DoC values obtainable git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5208 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-07 16:47:55 +00:00
depristo	c4707631e2	MethodsDevelopmentPipeline is now the test bed for large scale AWS_S3 logging. Can be disabled from command line if this is necessary git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5203 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-06 17:03:45 +00:00
fromer	8b8b4fced1	Removed explicit memoryLimit, so that memLimit given on the command-line will NOT be ignored... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5202 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-06 01:55:17 +00:00
depristo	fe4aa58d35	Removing unused class git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5197 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 22:22:28 +00:00
fromer	4cdc974c5f	Preliminary Qscript to run DoC for the purpose of CNV detection git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5194 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 21:25:59 +00:00
corin	cd6ace1b47	Includes UG version of indel genotyping rather than IGV2 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5191 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 20:25:46 +00:00
carneiro	358a400474	made ApplyVariantCut a default part of the pipeline, added the -noCut option if you don't want to use it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5189 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 19:29:36 +00:00
carneiro	7af003666d	added optional argument -cut to apply the variant cut to the ts recalibrated vcf. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5183 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 17:34:40 +00:00
chartl	5398cf620a	Bug fixes in the in process function (spoiled by python: was not closing my writers). SortByRef now works somewhat like the perl script does, rather than doing a memory-expensive sort. Adding a QTools qscript which is kinda clunky, and will be used mostly for integration tests of these IPFs, pending some better way to construct argument collections and function accessors at compile-time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5182 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 17:32:46 +00:00
carneiro	cf15819db5	updated to work with the new VariantEval. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5176 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 17:46:07 +00:00
rpoplin	47357b726e	Fixing import GenotypeCalculationModel since it doesn't exist anymore. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5175 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 15:39:43 +00:00
fromer	7605f0e6c1	Corrected input/output definitions for Queue git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5173 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 07:39:00 +00:00
fromer	3839fd1a25	Updated phasing pipeline to properly read samples from VCF and BAM files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5172 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 07:16:05 +00:00
fromer	798955b006	After discussing with Mark, revert to "Master merging" of phase information from VCFs. This has the advantage of creating minimal phased VCFs from RBP, from which phase info is merged into the original "master VCF". Also, updated Genotype.sameGenotype() to be simpler and NOT REVERSE the ignorePhase flag in comparing Allele lists/sets git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5167 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-01 19:50:15 +00:00
fromer	a89400b20c	Simple implementation to retrieve relevant BAM files for each sample git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5152 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-01 00:03:03 +00:00
fromer	f258363cfc	Minor bug fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5150 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 22:29:28 +00:00
fromer	742bd44728	Changed output file to be user-defined git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5149 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 22:15:26 +00:00
fromer	6c99dc4dab	Take (partial) ownership of phasing 1000G chr20 calls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5147 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 21:49:41 +00:00
kshakir	23578b7402	Pipeline tests will only start from scratch after "ant clean", making it faster to debug downstream issues when re-running "ant pipelinetest -Dpipeline.run=run". Updated the FCP, the test, and the ADPR to handle an issue with the ADPR locating the yaml generated by the FCPTest. Does not solve the ADPR error: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5126 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-29 19:44:03 +00:00
kshakir	2ef66af903	Moved the maximum number of intervals check from FCP to the Queue core so that scatter gather will no longer blow up if you specify a scatter count that is too high. Moved the BamListWriter from FCP to ListWriterFunction in the Queue core. Added an ExampleCountLoci QScript along with an example pipeline integration test which checks MD5s. Added a few more utility methods to PipelineTest including a currentGATK variable that points to the GATK jar. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5121 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-28 23:33:58 +00:00
corin	b25d131481	updated to work with the new tearsheet git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5113 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-28 18:49:11 +00:00
carneiro	cae4b9b0de	quick update with the correct CEU trio bam file and it's final location. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5098 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-27 19:17:19 +00:00
ebanks	68729045ca	Always best to use the left-aligned version of the dbsnp vcf git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5091 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-26 20:21:50 +00:00
delangel	fa0c476b82	Script for calling indels in all phase 1 samples - VQSR part still needs work but raw calling is done git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5052 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-22 14:07:10 +00:00
carneiro	a0731eaa81	updated NA12878 Trio gold standard data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5048 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:48:31 +00:00
depristo	94b64ec54a	Moving scala script into analysis directory git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5047 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:42:18 +00:00
depristo	b45566760e	intermediate checkin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5045 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:39:25 +00:00
rpoplin	b6497c404f	Moving Phase1Calling qscript over to using the cleaned, pre-BAQed bams git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5039 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 02:41:20 +00:00
carneiro	fc73569d62	Added NA12878 Trio dataset to the pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5037 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 23:15:33 +00:00
kshakir	8855f080c2	For the fullCallingPipeline.q: - Reading the refseq table from the YAML if not specified on the command line. - Removed obsolete -bigMemQueue now that CombineVariants runs in 4g. - Added a -mountDir /broad/software option to work around adpr automount issues. - Merged the LSF preexec used for automount into the shell script used to execute tasks. - Using the LSF C Library to determine when jobs are complete instead of postexec. - Updated queue.sh to match the changes above. - Updated the FCPTest to match the changes above. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5036 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 22:34:43 +00:00
depristo	41c8552d0a	Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 12:54:03 +00:00
kshakir	4d611e53e7	Passing the ADPR R script to FCPTest. Changed the FCP.q to use an InProcessFunction work around the -runDir issue GSA-420. Tested the FCPTest using the following dotkits and "ant clean pipelinetest -Dpipeline.run=run": - R-2.11 - Oracle-full-client - .cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5029 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 06:08:45 +00:00
corin	50fcebb0c4	Incorporates tearsheet and plot production with database access into standard pipeline. Note that the following dotkit packages must be run before the adpr will be correctly generated: R-2.10, Oracle-full-client, cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1 This also removes the unused titv argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5024 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 20:48:42 +00:00
rpoplin	55eb0387ac	Another relevant qscript. I use this one to do thousands of variant recalibration jobs to search for optimal parameters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5019 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 18:17:32 +00:00
chartl	a463dbcda1	Refactoring the qscript directory; oneoffs, playground, and core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5017 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 15:23:40 +00:00
rpoplin	7db9601c9d	Checking in the 1000G phase1 cleaning and calling scripts for posterity's sake, but also to show everyone what the current best practices for VQSR training looks like. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5015 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 14:32:52 +00:00
rpoplin	457c59e737	Use the sites-only HapMap files in the Methods development pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5013 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-18 20:50:09 +00:00
carneiro	35a4f1e366	.Added VariantEval as an optional step in the pipeline. .Lifted to HapMap 3.3 .Lifted to dbSNP 132 where possible. .Added the CEU-Trio WEx(hg19) dataset .Added some options to the pipeline You can now use : -dataset WEX -dataset HiSeq ... to choose which datasets to run through the pipeline. You can now without BAQ and indel mask: -noBAQ -noMASK Choose not to run the gold standard comparison analysis: -skipGoldStandard Activate the VariantEval walker analysis on the Recalibrated vcf: -eval The default behavior is to run exactly like it used to, so this version shouldn't change the way you used to use the pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5004 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-14 21:55:02 +00:00
carneiro	c4f9b262e5	removing the tech dev pipeline script from the repository to keep the methods development pipeline as the reference script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4992 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 18:15:55 +00:00
carneiro	9e93091e9a	-baqGOP now takes phred scaled scores instead of probabilities in the command line. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4982 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 00:06:38 +00:00
kshakir	8ba3a5a43f	Command lines for locally run Queue jobs no longer have to be escaped differently than bsub'ed jobs. GSA-410 Local job runs now can run command lines longer than than 4096 on our linux machines. When determining if the help text and Queue extensions need to be rebuilt, use the .class files not the .java so that GATK oneoffs are picked up correctly. Added the most basic of all example QScripts for debugging, Hello World. Minor updates to copy/pasted LSF code to reduce ant javadoc warnings by a third. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4970 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-10 21:07:29 +00:00
kshakir	b34e2f733f	Removed stochasticity from IndelRealigner by random sampling using and seed based on the read list. Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set. Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found. Now building all scala code at the same time, just like all java code is compiled at the same time. Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile. Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles. Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified. Fixed a couple errors when the <javadoc> task is run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 22:03:36 +00:00
chartl	3e7802a3e0	Minor changes to a qscript and the GQ constants on PrivatePermutations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4956 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 18:26:21 +00:00
carneiro	5e9a8f9cb3	Implemented a new argument (-DQS --defaultQualityScore) that allows GATK to deal with BAM files missing quality scores. If a value is specified, all reads are filled with the default quality score. Appropriate exception is thrown if -DQS is not provided and BAM file doesn't have quality scores for every base. Adding the first version of the techdev pipeline (tdPipeline) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 22:25:08 +00:00
rpoplin	20f29e4690	In the Methods development pipeline the call confidence threshold must be lowered from the default value for lowpass calling. What a bone-headed mistake! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4941 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 20:30:55 +00:00
corin	6d809321d3	Updating combien variants memory limit and dcov default for the full calling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4907 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-24 03:06:50 +00:00
depristo	5265f943b0	phasing per sample. tmp checkin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4898 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 20:14:06 +00:00
corin	e7569cfe6f	Updated dbsnp version usage. Calling with 132, but still using 129 for eval to maintain consistant known/novel eval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4895 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 17:37:27 +00:00
chartl	2235245af0	PrivatePermutations generalized to compute transition counts and average probabilities (and thus was renamed). Changes in some pipelines to reflect the change. Bugfix in the batch merging pipeline (it would halt because the allele VCF for genotyping batches could become off-spec). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4894 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 15:16:15 +00:00
rpoplin	7185fcb47b	Committing my notes about the methods development pipeline so we stay synced up while I'm on vacation. Cheers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4891 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 21:14:20 +00:00
chartl	80770dc032	Expanded target pipeline complete. Stop trying to be clever about scatter-gather; wait until functional SG is built-in to Q. Til then, a lazy version of the fullCallingPipeline. Seems to take a long time to generate the graph though... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4888 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 00:56:16 +00:00
kshakir	758d14a261	Checking in scripts used for testing the linear index MAX_FEATURES_PER_BIN. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4887 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 21:25:36 +00:00
chartl	fc33901810	Graph structure must be known at compile time. Removing GroupIntervals until a future point where in-process-functions can predict their output based on inputs [though this is probably forever: the inputs may not exist at compile time!] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4886 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 21:22:58 +00:00
chartl	61d5daa65c	EXTREME interval processing. Still undergoing testing. + GroupIntervals allows user-defined scattering (e.g. take an interval list file, split it into k smaller interval list files by number of lines) + ExpandIntervals expands the intervals, either by widening them, or allowing the definition for nearby intervals (e.g. flanks starting 1bp before and after, ending 10bp after that) + IntersectIntervals takes n interval lists, writes 1 interval list that is the n-way intersection of all of them git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4885 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 19:42:50 +00:00
rpoplin	4ca1da1d07	Updating the NA12878.HiSeq bam file to be the correct bam file in the methods development qscript. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4879 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 14:53:10 +00:00
rpoplin	8fac346ac1	Misc cleanup in Methods Development Qscript git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4878 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 04:24:25 +00:00
rpoplin	34ab5b4889	Turning on BAQ in Methods Development pipeline. A new dataset is added: 363 EUR samples from the November 1000G release. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4877 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-19 21:13:25 +00:00
chartl	8118a439c0	Commit for Khalid git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4876 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 22:24:18 +00:00
rpoplin	15a33545f4	Updating Methods development pipeline qscript with the bam lists for all the data sets. It is ready for people to start running with it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4875 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 22:19:14 +00:00
corin	f0ab7b849a	Adding a window size variable to avoid indel genotyper error git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4873 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 04:19:54 +00:00
rpoplin	bdef4e775a	Initial checkin of methods development pipeline qscript. It allows the methods dev team to run an overnight job which calls and recalibrates a variety of data sets and allows for an end-to-end sanity check of final results for potential changes to the methods. It isn't meant to be used by anybody quite yet, but shows the general structure and flow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4871 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 22:14:02 +00:00
rpoplin	095fc1922a	By popular demand I'm adding the qscript I used to do the 660 bamfile 1000G calling for ASHG. It does cleaning, BAQing, and merging in 3mb chunks genome-wide then calls SNPs on those temporary bams. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4866 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 18:49:03 +00:00
depristo	32d5397c01	Experimental support for sided annotations. Currently not more/less valuable than two-tailed testing. Future experiments are needed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4864 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 15:08:31 +00:00
chartl	0d18bd1011	Now that addAll() is in the superclass, no longer need this definition (which, without override, prevents the script from compiling anyway) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4862 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 05:36:31 +00:00
chartl	3e75431bc8	Thanks to mark: VCFInfoToTable removed in favor of a more flexible walker. Slight change to the argument structure of the walker to make it play more nicely with Queue: the field list parsing is pushed into the command line system (e.g. the variable is exposed as a List<String> and not a String, so Queue doesn't have to join a list into a string only to have it broken out again. This also allows the user to specify -F field1 -F field2 -F field3 if he/she so desires. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4842 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 03:33:36 +00:00
chartl	2217837845	Commit for Khalid -- should be a scala version of vcf2table but for some reason the run method isn't getting called. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4841 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 00:44:15 +00:00
chartl	f36861eeee	One more little bfix -- the issue was not the grep command, but instead the NFS in the awk; i changed it to ++count in the last commit which was really responsible for the fix. Then this ultra-escaping semi-broke teh grep again. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4831 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 20:36:14 +00:00
chartl	d34c5640d2	Bugfix for clf version of extract samples. Due to dynamic shell creation and bsubs and whatnot, the OR pipe for grep ("a\|b") needs to be super-escaped ("a\\\\\\\\\|b"). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4829 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 19:06:30 +00:00
chartl	f795b25c47	In-process versions of sample extraction and interval-list conversion for VCF files. Required an in-process-function branch of the queue library. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4827 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 17:36:53 +00:00

1 2 3 4 5 ...

353 Commits (c2ec2891d1e185b4cc0f30e2dfd18991e2837b69)