gatk-3.8

Commit Graph

Author	SHA1	Message	Date
chartl	a40a8006b5	Added in unit tests for the statistics calculated by the test runner; and bug-fixes to the calculations; so we have some assurance that the statistics coming out the back-end are correct. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5380 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-06 16:54:02 +00:00
chartl	9ca1dd5d62	Miscellaneous changes: - RefMetaDataTracker: grabbing variant contexts given a prefix (not sure where else this was implemented, if someone can show me I'll remove it) - VCFUtils: grabbing VCF headers given a prefix - MathUtils: Useful functions for calculating statistics on collections of Numbers - VariantAnnotator: Made isUniqueHeaderLine a public static method -- maybe this should go into a different class. Not sure. - Associations: PluginManager now used to propagate classes, implementations for Z,T,U tests, slight alteration to format to make the objects stored in the window optionally different from those returned by whatever statistic is run across the window Added: - MannWhitneyU. Started to fix up WilcoxonRankSum but there are comments in there questioning the validity of some of the code, and I'm sure that it's actually doing a U test. This implementation includes the direct calculation of p-values for small sample sizes, and a uniform approximation for when one of the sample sets is small, and the other large. Unit tests to follow. - BootstrapCallsMerger: takes n VCFs which have been called on the same samples; merges them together while averaging the annotations - BootstrapCalls.q: qscript for testing the effectiveness of boostrap low-pass calling on the exome git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5372 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-03 22:43:36 +00:00
carneiro	0daa65b9ef	quick and dirty 'close your eyes' solution to run the papuans over the weekend. Will be properly fixed soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5370 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-03 21:42:22 +00:00
carneiro	8ab6eee1cf	gold standard creates its own tranches and vcf files now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5347 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-01 17:48:40 +00:00
chartl	0723b0f44c	Generalized association is now working. Output is in a horrific format. Implementation of T-testing. Improvements are to look for classes dynamically (a la VariantEval/VariantAnnotator), beautify output, and do optimizations where they exist. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5341 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-01 01:23:37 +00:00
rpoplin	ce34a8a918	New hidden option in VQSR to not parse the genotypes of the incoming training data. Updated VQSR training in methods development pipeline to be more in line with best practices. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5340 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-28 23:19:51 +00:00
carneiro	c7a51f0de7	fixed 1kg pilot dindel calls vcf file and combined all vcfs into one master dindel file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5335 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-28 19:04:58 +00:00
depristo	146756de79	Class name to reflect actual file name. manySampleUGPerformance now operates on 1000 samples! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5326 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-26 23:36:04 +00:00
chartl	b089d35b21	Fix expand intervals to do the right thing: - No more duplicate intervals - Truncation at intervals that already exist, e.g. exists: \|--------\| \|-------\| new: \|---------\| fixed: \|-----\| note that weird instances like: exists: \|-\| \|-\| \|-\| new: \|---------------------\| fixed: \|----\| e.g. you're truncated to the nearest interval on whatever side. In general many behaviors could happen in this instance, this is the one currently implemented. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5323 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-26 04:19:01 +00:00
carneiro	fd5d1f9cfc	minor cosmetic changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5322 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-25 21:56:35 +00:00
carneiro	81414a21dd	dpp: back to using 4gb memory assuming all is right with IndelRealigner now. mdcp: Some class structural changes due to the inclusion of indel calls. ApplyCut now chooses the tranche differently for each dataset. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5319 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-25 19:21:02 +00:00
kshakir	3e0a722672	MFCP waits for other pipelines to finish by using the previous log file of one pipeline as virtual input to the next pipeline. Using the name of the yaml in the log file name instead of each writing each to "queue.out" so that two yamls can run from the same directory without creating cycles in the graph. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5318 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-25 17:51:01 +00:00
carneiro	6db3210387	the data processing pipeline needs more memory... directory updates in the methods pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5305 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-24 17:22:58 +00:00
carneiro	897a333aba	Methods Development Pipeline now has the option of calling indels with the -indels parameter. Also updated some databases and the new NA12878 HiSeq hg19 that Tim just funneled to us, is updated and called. Small fixes on the data processing pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5304 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-24 17:12:55 +00:00
rpoplin	255cc246a2	Change in Methods development pipeline: dbsnp130 can't be used for anything, changed it to dbsnp129. Optimization for HaplotypeScore and the to-be-committed ReadRosRankSumTest in AlignmentUtils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5301 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-24 16:09:03 +00:00
chartl	97e1a5262e	-ct x no longer includes coverage in the previous bin BatchMerge - additional support for indels (can't just test the alternate allele when it's an extended event, must also specify that you want to use the dindel model when you actually test the allele) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5300 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-24 15:52:04 +00:00
kshakir	f1f9bd6dcc	Due to recent LSF hiccups put a very brief (.5-2min) retry around getting status. Can't wait too long because statuses are archived an hour after exit. TODO: Switch to bulk status checks and add status archive lookups. Sending SIGTERM(15) instead of SIGKILL(9) to allow for graceful termination of child process. Printing out the name of the QScripts in the compile error text. Added a pipelineretry -PR pass through for the MFCP and MFCPTest. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5295 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-23 18:59:08 +00:00
chartl	07d381ec51	BatchMerge now uses the correct UG settings, recently added by Eric ExpandIntervals now checks that identical intervals are not created by (un)fortunately-spaced targets VCFExtractIntervals no longer creates duplicate intervals in the case where a VCF has multiple entries at the same site git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5294 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-23 18:46:15 +00:00
carneiro	2a48ec1307	now only accepts intervals files if the user specifically requests to report bams at interval only. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5291 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-23 16:49:58 +00:00
carneiro	ecfb51bcd8	Few organizational changes, queue output is now categorized and hidden. Also changed NA12878.Wex to dbsnp 129. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5290 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-22 22:49:38 +00:00
carneiro	8ea71fd294	minor dataset chages. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5289 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-22 20:18:10 +00:00
carneiro	c61dd2f09f	data processing pipeline now has on the fly bam indexing (powered by Matt) some new parameters, Indel Cleaning with constrain movement and fixMates is gone. setting up methods development pipeline for some cosmetic changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5277 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-18 23:13:54 +00:00
depristo	d97ed3e080	Comments for Mauricio git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5275 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-18 16:58:34 +00:00
carneiro	acad3ada06	changed baq to calculate_as_necessary. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5270 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 23:50:46 +00:00
carneiro	7f9ca6b28a	full data processing pipeline, now deleting intermediate files and performing both phases (per lane and combined) of the processing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5269 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 23:34:00 +00:00
kiran	4f83151c4e	Evaluates within standard target and expanded target separately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5268 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 23:04:24 +00:00
kshakir	860b172ef1	Defaulting the MFCP to run without a tear script. Added a missing virtual output for the inner FCP, so that Queue can tell a run of the FCP is dot-done. Enabled the MCFPTest for the first time, running without the tear script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5264 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 21:13:14 +00:00
kshakir	a189454343	FCP only adds the expand intervals QFunction once per script instead of once per QFunction using the ExpandTargets scala trait. Eval dbSNP's type now based on eval dbSNP instead of genotype dbSNP. Using an external treemap instead of the JGraphT internal node set to speed up larger graph generation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5261 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 19:09:03 +00:00
carneiro	497e9ab83b	too hasty... cleaning up debug messages ;) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5257 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 02:11:03 +00:00
carneiro	b4da843c49	now processes either a single bam file or a list of bam files in parallel. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5256 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 02:07:22 +00:00
carneiro	50c870cfce	Data Processing Pipeline: local indel realignment, mark duplicates and BQSR. Done. Pacbio pipeline: now all pacbio bams have baq annotated in so running UG is uber fast. Methods pipeline: minor cosmetic changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5253 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-16 17:22:30 +00:00
kiran	c0a4af3809	Expands targets by 50-bp on both sides when the expandIntervals argument is greater than 0. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5251 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-16 14:47:52 +00:00
carneiro	6d3b878dde	data processing pipeline script already does: . Local Indel Realignment . Mark Duplicates will do: . Base Quality Score Recalibration (soon) it's working with a single BAM for testing, but will work with a list of bam files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5250 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 21:49:05 +00:00
corin	d2efea6003	This is a draft of the improved and prettified pipeline. It may not yet compile, but Kiran is taking over adding a few more things as I finish up other tasks. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5248 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 19:35:00 +00:00
kshakir	d185c2961f	Added pipeline for calling FCP in batches called MultiFullCallingPipeline. Bug smashes for the MCFP: Synchronized access to LSF library and modifications to the QGraph. If values are missing from the graph with -run make sure to exit with a non-zero. Refactored QGraph to pre-generate a unique Int for each QNode speeding up getHashCode/equals inside the graph. Added jobPriority and removed jobLimitSeconds from QFunction. All scatter gather is by default in a single sub directory queueScatterGather. Moved some FCPTest into BaseTest/PipelineTest for use by MFCPTest. Rev'ed the 1000G bams used for validation from v1 to v2 and added code to look for the bams before running other tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5247 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 18:26:14 +00:00
carneiro	87e19a17ae	small updates to the variant eval part of the pipeline, some updates to the pacbio specific pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5244 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 16:19:07 +00:00
chartl	851b3e71f9	Major revision of the batch merge script. All sites are now used, hooks for some UG settings, no longer reliant on the pipeline management library (pipeline libs are probably going to go away -- nobody uses them) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5241 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-14 23:52:05 +00:00
fromer	d6e3f2eba6	Added GC content calculator for CNV data git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5240 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-14 22:29:55 +00:00
carneiro	5f10fffa47	merge intervals now prints a sorted list in the end. added the ccs datasets to the pbCalling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5233 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-11 20:57:59 +00:00
carneiro	50c2fa3c3a	this -1 made ALL the difference in the world. Minor bug fix. Regular updates to the pbCalling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5232 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-11 19:25:09 +00:00
fromer	cdf53188d6	Updated DoC to work with scatter-gather; and, also manually implemented scatter-gather by sample above the scatter-gather by interval. Thansk to Khalid for his support! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5231 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-11 19:14:42 +00:00
carneiro	c630701a76	Following Ryan's suggestion, I am moving the Methods Development Calling pipeline to the Core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5226 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-10 17:36:05 +00:00
carneiro	9c2c5efe35	a modified version of the Methods Development calling pipeline made to work with pacbio data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5225 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-10 16:06:50 +00:00
fromer	947cc44854	Thanks to Matt for walking me through a proper version of VCF_BAM_utilities! Feel free to add to it, or use it to get the samples in a VCF file, a BAM file, or a collection of BAM files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5223 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-09 18:08:27 +00:00
kshakir	4d1cca95bb	Removed deprecated getDbsnpFile. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-08 21:12:15 +00:00
carneiro	e5cfc6ae74	NA12878 hg19 dataset was included to the methods pipeline. (and I am running it) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5217 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-08 16:17:46 +00:00
fromer	8d0f1b75d5	Added queue/util/BAMutilities Object [with BAM and VCF parsing utilities], which is now used by my qscripts that robustly split runs by sample git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5214 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-07 22:17:29 +00:00
kshakir	8040998c15	Renamed the pipeline yaml dbsnpFile to genotypeDbsnp, and added an evalDbsnp. Added a genotypeDbsnpType and evalDbsnpType to check the extensions for .vcf or .rod. Moved renaming of "recalibrated" bams to "cleaned" from sed to yaml generation template (see diff for more info). Renamed fCP.q to FCP.q. Though it's still disabled until VariantEval is updated, added changes above to the FCPTest. Removed refseq table from the queue.sh wrapper script. Only specified in the yaml. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5213 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-07 22:01:09 +00:00
fromer	3c1a026c94	Updated script to properly bin DoC values so that down-sampling corresponds to range of DoC values obtainable git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5208 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-07 16:47:55 +00:00
depristo	c4707631e2	MethodsDevelopmentPipeline is now the test bed for large scale AWS_S3 logging. Can be disabled from command line if this is necessary git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5203 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-06 17:03:45 +00:00
fromer	8b8b4fced1	Removed explicit memoryLimit, so that memLimit given on the command-line will NOT be ignored... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5202 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-06 01:55:17 +00:00
depristo	fe4aa58d35	Removing unused class git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5197 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 22:22:28 +00:00
fromer	4cdc974c5f	Preliminary Qscript to run DoC for the purpose of CNV detection git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5194 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 21:25:59 +00:00
corin	cd6ace1b47	Includes UG version of indel genotyping rather than IGV2 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5191 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 20:25:46 +00:00
carneiro	358a400474	made ApplyVariantCut a default part of the pipeline, added the -noCut option if you don't want to use it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5189 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 19:29:36 +00:00
carneiro	7af003666d	added optional argument -cut to apply the variant cut to the ts recalibrated vcf. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5183 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 17:34:40 +00:00
chartl	5398cf620a	Bug fixes in the in process function (spoiled by python: was not closing my writers). SortByRef now works somewhat like the perl script does, rather than doing a memory-expensive sort. Adding a QTools qscript which is kinda clunky, and will be used mostly for integration tests of these IPFs, pending some better way to construct argument collections and function accessors at compile-time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5182 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 17:32:46 +00:00
carneiro	cf15819db5	updated to work with the new VariantEval. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5176 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 17:46:07 +00:00
rpoplin	47357b726e	Fixing import GenotypeCalculationModel since it doesn't exist anymore. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5175 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 15:39:43 +00:00
fromer	7605f0e6c1	Corrected input/output definitions for Queue git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5173 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 07:39:00 +00:00
fromer	3839fd1a25	Updated phasing pipeline to properly read samples from VCF and BAM files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5172 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 07:16:05 +00:00
fromer	798955b006	After discussing with Mark, revert to "Master merging" of phase information from VCFs. This has the advantage of creating minimal phased VCFs from RBP, from which phase info is merged into the original "master VCF". Also, updated Genotype.sameGenotype() to be simpler and NOT REVERSE the ignorePhase flag in comparing Allele lists/sets git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5167 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-01 19:50:15 +00:00
fromer	a89400b20c	Simple implementation to retrieve relevant BAM files for each sample git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5152 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-01 00:03:03 +00:00
fromer	f258363cfc	Minor bug fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5150 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 22:29:28 +00:00
fromer	742bd44728	Changed output file to be user-defined git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5149 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 22:15:26 +00:00
fromer	6c99dc4dab	Take (partial) ownership of phasing 1000G chr20 calls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5147 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 21:49:41 +00:00
kshakir	23578b7402	Pipeline tests will only start from scratch after "ant clean", making it faster to debug downstream issues when re-running "ant pipelinetest -Dpipeline.run=run". Updated the FCP, the test, and the ADPR to handle an issue with the ADPR locating the yaml generated by the FCPTest. Does not solve the ADPR error: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5126 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-29 19:44:03 +00:00
kshakir	2ef66af903	Moved the maximum number of intervals check from FCP to the Queue core so that scatter gather will no longer blow up if you specify a scatter count that is too high. Moved the BamListWriter from FCP to ListWriterFunction in the Queue core. Added an ExampleCountLoci QScript along with an example pipeline integration test which checks MD5s. Added a few more utility methods to PipelineTest including a currentGATK variable that points to the GATK jar. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5121 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-28 23:33:58 +00:00
corin	b25d131481	updated to work with the new tearsheet git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5113 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-28 18:49:11 +00:00
carneiro	cae4b9b0de	quick update with the correct CEU trio bam file and it's final location. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5098 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-27 19:17:19 +00:00
ebanks	68729045ca	Always best to use the left-aligned version of the dbsnp vcf git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5091 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-26 20:21:50 +00:00
delangel	fa0c476b82	Script for calling indels in all phase 1 samples - VQSR part still needs work but raw calling is done git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5052 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-22 14:07:10 +00:00
carneiro	a0731eaa81	updated NA12878 Trio gold standard data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5048 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:48:31 +00:00
depristo	94b64ec54a	Moving scala script into analysis directory git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5047 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:42:18 +00:00
depristo	b45566760e	intermediate checkin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5045 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:39:25 +00:00
rpoplin	b6497c404f	Moving Phase1Calling qscript over to using the cleaned, pre-BAQed bams git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5039 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 02:41:20 +00:00
carneiro	fc73569d62	Added NA12878 Trio dataset to the pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5037 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 23:15:33 +00:00
kshakir	8855f080c2	For the fullCallingPipeline.q: - Reading the refseq table from the YAML if not specified on the command line. - Removed obsolete -bigMemQueue now that CombineVariants runs in 4g. - Added a -mountDir /broad/software option to work around adpr automount issues. - Merged the LSF preexec used for automount into the shell script used to execute tasks. - Using the LSF C Library to determine when jobs are complete instead of postexec. - Updated queue.sh to match the changes above. - Updated the FCPTest to match the changes above. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5036 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 22:34:43 +00:00
depristo	41c8552d0a	Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 12:54:03 +00:00
kshakir	4d611e53e7	Passing the ADPR R script to FCPTest. Changed the FCP.q to use an InProcessFunction work around the -runDir issue GSA-420. Tested the FCPTest using the following dotkits and "ant clean pipelinetest -Dpipeline.run=run": - R-2.11 - Oracle-full-client - .cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5029 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 06:08:45 +00:00
corin	50fcebb0c4	Incorporates tearsheet and plot production with database access into standard pipeline. Note that the following dotkit packages must be run before the adpr will be correctly generated: R-2.10, Oracle-full-client, cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1 This also removes the unused titv argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5024 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 20:48:42 +00:00
rpoplin	55eb0387ac	Another relevant qscript. I use this one to do thousands of variant recalibration jobs to search for optimal parameters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5019 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 18:17:32 +00:00
chartl	a463dbcda1	Refactoring the qscript directory; oneoffs, playground, and core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5017 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 15:23:40 +00:00
rpoplin	7db9601c9d	Checking in the 1000G phase1 cleaning and calling scripts for posterity's sake, but also to show everyone what the current best practices for VQSR training looks like. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5015 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 14:32:52 +00:00
rpoplin	457c59e737	Use the sites-only HapMap files in the Methods development pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5013 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-18 20:50:09 +00:00
carneiro	35a4f1e366	.Added VariantEval as an optional step in the pipeline. .Lifted to HapMap 3.3 .Lifted to dbSNP 132 where possible. .Added the CEU-Trio WEx(hg19) dataset .Added some options to the pipeline You can now use : -dataset WEX -dataset HiSeq ... to choose which datasets to run through the pipeline. You can now without BAQ and indel mask: -noBAQ -noMASK Choose not to run the gold standard comparison analysis: -skipGoldStandard Activate the VariantEval walker analysis on the Recalibrated vcf: -eval The default behavior is to run exactly like it used to, so this version shouldn't change the way you used to use the pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5004 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-14 21:55:02 +00:00
carneiro	c4f9b262e5	removing the tech dev pipeline script from the repository to keep the methods development pipeline as the reference script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4992 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 18:15:55 +00:00
carneiro	9e93091e9a	-baqGOP now takes phred scaled scores instead of probabilities in the command line. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4982 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 00:06:38 +00:00
kshakir	8ba3a5a43f	Command lines for locally run Queue jobs no longer have to be escaped differently than bsub'ed jobs. GSA-410 Local job runs now can run command lines longer than than 4096 on our linux machines. When determining if the help text and Queue extensions need to be rebuilt, use the .class files not the .java so that GATK oneoffs are picked up correctly. Added the most basic of all example QScripts for debugging, Hello World. Minor updates to copy/pasted LSF code to reduce ant javadoc warnings by a third. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4970 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-10 21:07:29 +00:00
kshakir	b34e2f733f	Removed stochasticity from IndelRealigner by random sampling using and seed based on the read list. Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set. Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found. Now building all scala code at the same time, just like all java code is compiled at the same time. Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile. Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles. Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified. Fixed a couple errors when the <javadoc> task is run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 22:03:36 +00:00
chartl	3e7802a3e0	Minor changes to a qscript and the GQ constants on PrivatePermutations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4956 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 18:26:21 +00:00
carneiro	5e9a8f9cb3	Implemented a new argument (-DQS --defaultQualityScore) that allows GATK to deal with BAM files missing quality scores. If a value is specified, all reads are filled with the default quality score. Appropriate exception is thrown if -DQS is not provided and BAM file doesn't have quality scores for every base. Adding the first version of the techdev pipeline (tdPipeline) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 22:25:08 +00:00
rpoplin	20f29e4690	In the Methods development pipeline the call confidence threshold must be lowered from the default value for lowpass calling. What a bone-headed mistake! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4941 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 20:30:55 +00:00
corin	6d809321d3	Updating combien variants memory limit and dcov default for the full calling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4907 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-24 03:06:50 +00:00
depristo	5265f943b0	phasing per sample. tmp checkin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4898 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 20:14:06 +00:00
corin	e7569cfe6f	Updated dbsnp version usage. Calling with 132, but still using 129 for eval to maintain consistant known/novel eval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4895 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 17:37:27 +00:00
chartl	2235245af0	PrivatePermutations generalized to compute transition counts and average probabilities (and thus was renamed). Changes in some pipelines to reflect the change. Bugfix in the batch merging pipeline (it would halt because the allele VCF for genotyping batches could become off-spec). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4894 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 15:16:15 +00:00
rpoplin	7185fcb47b	Committing my notes about the methods development pipeline so we stay synced up while I'm on vacation. Cheers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4891 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 21:14:20 +00:00
chartl	80770dc032	Expanded target pipeline complete. Stop trying to be clever about scatter-gather; wait until functional SG is built-in to Q. Til then, a lazy version of the fullCallingPipeline. Seems to take a long time to generate the graph though... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4888 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 00:56:16 +00:00
kshakir	758d14a261	Checking in scripts used for testing the linear index MAX_FEATURES_PER_BIN. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4887 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 21:25:36 +00:00

1 2 3 4 5 ...

270 Commits (e5ef8388fc494d6553e167ec8aec3fecdfffa18f)