gatk-3.8

Commit Graph

Author	SHA1	Message	Date
carneiro	c61dd2f09f	data processing pipeline now has on the fly bam indexing (powered by Matt) some new parameters, Indel Cleaning with constrain movement and fixMates is gone. setting up methods development pipeline for some cosmetic changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5277 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-18 23:13:54 +00:00
depristo	d97ed3e080	Comments for Mauricio git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5275 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-18 16:58:34 +00:00
carneiro	acad3ada06	changed baq to calculate_as_necessary. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5270 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 23:50:46 +00:00
carneiro	7f9ca6b28a	full data processing pipeline, now deleting intermediate files and performing both phases (per lane and combined) of the processing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5269 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 23:34:00 +00:00
kiran	4f83151c4e	Evaluates within standard target and expanded target separately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5268 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 23:04:24 +00:00
kshakir	860b172ef1	Defaulting the MFCP to run without a tear script. Added a missing virtual output for the inner FCP, so that Queue can tell a run of the FCP is dot-done. Enabled the MCFPTest for the first time, running without the tear script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5264 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 21:13:14 +00:00
kshakir	a189454343	FCP only adds the expand intervals QFunction once per script instead of once per QFunction using the ExpandTargets scala trait. Eval dbSNP's type now based on eval dbSNP instead of genotype dbSNP. Using an external treemap instead of the JGraphT internal node set to speed up larger graph generation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5261 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 19:09:03 +00:00
carneiro	497e9ab83b	too hasty... cleaning up debug messages ;) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5257 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 02:11:03 +00:00
carneiro	b4da843c49	now processes either a single bam file or a list of bam files in parallel. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5256 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-17 02:07:22 +00:00
carneiro	50c870cfce	Data Processing Pipeline: local indel realignment, mark duplicates and BQSR. Done. Pacbio pipeline: now all pacbio bams have baq annotated in so running UG is uber fast. Methods pipeline: minor cosmetic changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5253 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-16 17:22:30 +00:00
kiran	c0a4af3809	Expands targets by 50-bp on both sides when the expandIntervals argument is greater than 0. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5251 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-16 14:47:52 +00:00
carneiro	6d3b878dde	data processing pipeline script already does: . Local Indel Realignment . Mark Duplicates will do: . Base Quality Score Recalibration (soon) it's working with a single BAM for testing, but will work with a list of bam files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5250 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 21:49:05 +00:00
corin	d2efea6003	This is a draft of the improved and prettified pipeline. It may not yet compile, but Kiran is taking over adding a few more things as I finish up other tasks. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5248 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 19:35:00 +00:00
kshakir	d185c2961f	Added pipeline for calling FCP in batches called MultiFullCallingPipeline. Bug smashes for the MCFP: Synchronized access to LSF library and modifications to the QGraph. If values are missing from the graph with -run make sure to exit with a non-zero. Refactored QGraph to pre-generate a unique Int for each QNode speeding up getHashCode/equals inside the graph. Added jobPriority and removed jobLimitSeconds from QFunction. All scatter gather is by default in a single sub directory queueScatterGather. Moved some FCPTest into BaseTest/PipelineTest for use by MFCPTest. Rev'ed the 1000G bams used for validation from v1 to v2 and added code to look for the bams before running other tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5247 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 18:26:14 +00:00
carneiro	87e19a17ae	small updates to the variant eval part of the pipeline, some updates to the pacbio specific pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5244 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-15 16:19:07 +00:00
chartl	851b3e71f9	Major revision of the batch merge script. All sites are now used, hooks for some UG settings, no longer reliant on the pipeline management library (pipeline libs are probably going to go away -- nobody uses them) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5241 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-14 23:52:05 +00:00
fromer	d6e3f2eba6	Added GC content calculator for CNV data git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5240 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-14 22:29:55 +00:00
carneiro	5f10fffa47	merge intervals now prints a sorted list in the end. added the ccs datasets to the pbCalling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5233 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-11 20:57:59 +00:00
carneiro	50c2fa3c3a	this -1 made ALL the difference in the world. Minor bug fix. Regular updates to the pbCalling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5232 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-11 19:25:09 +00:00
fromer	cdf53188d6	Updated DoC to work with scatter-gather; and, also manually implemented scatter-gather by sample above the scatter-gather by interval. Thansk to Khalid for his support! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5231 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-11 19:14:42 +00:00
carneiro	c630701a76	Following Ryan's suggestion, I am moving the Methods Development Calling pipeline to the Core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5226 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-10 17:36:05 +00:00
carneiro	9c2c5efe35	a modified version of the Methods Development calling pipeline made to work with pacbio data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5225 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-10 16:06:50 +00:00
fromer	947cc44854	Thanks to Matt for walking me through a proper version of VCF_BAM_utilities! Feel free to add to it, or use it to get the samples in a VCF file, a BAM file, or a collection of BAM files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5223 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-09 18:08:27 +00:00
kshakir	4d1cca95bb	Removed deprecated getDbsnpFile. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-08 21:12:15 +00:00
carneiro	e5cfc6ae74	NA12878 hg19 dataset was included to the methods pipeline. (and I am running it) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5217 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-08 16:17:46 +00:00
fromer	8d0f1b75d5	Added queue/util/BAMutilities Object [with BAM and VCF parsing utilities], which is now used by my qscripts that robustly split runs by sample git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5214 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-07 22:17:29 +00:00
kshakir	8040998c15	Renamed the pipeline yaml dbsnpFile to genotypeDbsnp, and added an evalDbsnp. Added a genotypeDbsnpType and evalDbsnpType to check the extensions for .vcf or .rod. Moved renaming of "recalibrated" bams to "cleaned" from sed to yaml generation template (see diff for more info). Renamed fCP.q to FCP.q. Though it's still disabled until VariantEval is updated, added changes above to the FCPTest. Removed refseq table from the queue.sh wrapper script. Only specified in the yaml. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5213 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-07 22:01:09 +00:00
fromer	3c1a026c94	Updated script to properly bin DoC values so that down-sampling corresponds to range of DoC values obtainable git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5208 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-07 16:47:55 +00:00
depristo	c4707631e2	MethodsDevelopmentPipeline is now the test bed for large scale AWS_S3 logging. Can be disabled from command line if this is necessary git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5203 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-06 17:03:45 +00:00
fromer	8b8b4fced1	Removed explicit memoryLimit, so that memLimit given on the command-line will NOT be ignored... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5202 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-06 01:55:17 +00:00
depristo	fe4aa58d35	Removing unused class git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5197 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 22:22:28 +00:00
fromer	4cdc974c5f	Preliminary Qscript to run DoC for the purpose of CNV detection git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5194 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 21:25:59 +00:00
corin	cd6ace1b47	Includes UG version of indel genotyping rather than IGV2 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5191 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 20:25:46 +00:00
carneiro	358a400474	made ApplyVariantCut a default part of the pipeline, added the -noCut option if you don't want to use it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5189 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 19:29:36 +00:00
carneiro	7af003666d	added optional argument -cut to apply the variant cut to the ts recalibrated vcf. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5183 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 17:34:40 +00:00
chartl	5398cf620a	Bug fixes in the in process function (spoiled by python: was not closing my writers). SortByRef now works somewhat like the perl script does, rather than doing a memory-expensive sort. Adding a QTools qscript which is kinda clunky, and will be used mostly for integration tests of these IPFs, pending some better way to construct argument collections and function accessors at compile-time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5182 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 17:32:46 +00:00
carneiro	cf15819db5	updated to work with the new VariantEval. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5176 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 17:46:07 +00:00
rpoplin	47357b726e	Fixing import GenotypeCalculationModel since it doesn't exist anymore. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5175 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 15:39:43 +00:00
fromer	7605f0e6c1	Corrected input/output definitions for Queue git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5173 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 07:39:00 +00:00
fromer	3839fd1a25	Updated phasing pipeline to properly read samples from VCF and BAM files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5172 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 07:16:05 +00:00
fromer	798955b006	After discussing with Mark, revert to "Master merging" of phase information from VCFs. This has the advantage of creating minimal phased VCFs from RBP, from which phase info is merged into the original "master VCF". Also, updated Genotype.sameGenotype() to be simpler and NOT REVERSE the ignorePhase flag in comparing Allele lists/sets git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5167 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-01 19:50:15 +00:00
fromer	a89400b20c	Simple implementation to retrieve relevant BAM files for each sample git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5152 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-01 00:03:03 +00:00
fromer	f258363cfc	Minor bug fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5150 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 22:29:28 +00:00
fromer	742bd44728	Changed output file to be user-defined git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5149 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 22:15:26 +00:00
fromer	6c99dc4dab	Take (partial) ownership of phasing 1000G chr20 calls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5147 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 21:49:41 +00:00
kshakir	23578b7402	Pipeline tests will only start from scratch after "ant clean", making it faster to debug downstream issues when re-running "ant pipelinetest -Dpipeline.run=run". Updated the FCP, the test, and the ADPR to handle an issue with the ADPR locating the yaml generated by the FCPTest. Does not solve the ADPR error: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5126 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-29 19:44:03 +00:00
kshakir	2ef66af903	Moved the maximum number of intervals check from FCP to the Queue core so that scatter gather will no longer blow up if you specify a scatter count that is too high. Moved the BamListWriter from FCP to ListWriterFunction in the Queue core. Added an ExampleCountLoci QScript along with an example pipeline integration test which checks MD5s. Added a few more utility methods to PipelineTest including a currentGATK variable that points to the GATK jar. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5121 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-28 23:33:58 +00:00
corin	b25d131481	updated to work with the new tearsheet git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5113 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-28 18:49:11 +00:00
carneiro	cae4b9b0de	quick update with the correct CEU trio bam file and it's final location. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5098 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-27 19:17:19 +00:00
ebanks	68729045ca	Always best to use the left-aligned version of the dbsnp vcf git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5091 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-26 20:21:50 +00:00
delangel	fa0c476b82	Script for calling indels in all phase 1 samples - VQSR part still needs work but raw calling is done git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5052 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-22 14:07:10 +00:00
carneiro	a0731eaa81	updated NA12878 Trio gold standard data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5048 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:48:31 +00:00
depristo	94b64ec54a	Moving scala script into analysis directory git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5047 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:42:18 +00:00
depristo	b45566760e	intermediate checkin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5045 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:39:25 +00:00
rpoplin	b6497c404f	Moving Phase1Calling qscript over to using the cleaned, pre-BAQed bams git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5039 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 02:41:20 +00:00
carneiro	fc73569d62	Added NA12878 Trio dataset to the pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5037 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 23:15:33 +00:00
kshakir	8855f080c2	For the fullCallingPipeline.q: - Reading the refseq table from the YAML if not specified on the command line. - Removed obsolete -bigMemQueue now that CombineVariants runs in 4g. - Added a -mountDir /broad/software option to work around adpr automount issues. - Merged the LSF preexec used for automount into the shell script used to execute tasks. - Using the LSF C Library to determine when jobs are complete instead of postexec. - Updated queue.sh to match the changes above. - Updated the FCPTest to match the changes above. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5036 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 22:34:43 +00:00
depristo	41c8552d0a	Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 12:54:03 +00:00
kshakir	4d611e53e7	Passing the ADPR R script to FCPTest. Changed the FCP.q to use an InProcessFunction work around the -runDir issue GSA-420. Tested the FCPTest using the following dotkits and "ant clean pipelinetest -Dpipeline.run=run": - R-2.11 - Oracle-full-client - .cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5029 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 06:08:45 +00:00
corin	50fcebb0c4	Incorporates tearsheet and plot production with database access into standard pipeline. Note that the following dotkit packages must be run before the adpr will be correctly generated: R-2.10, Oracle-full-client, cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1 This also removes the unused titv argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5024 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 20:48:42 +00:00
rpoplin	55eb0387ac	Another relevant qscript. I use this one to do thousands of variant recalibration jobs to search for optimal parameters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5019 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 18:17:32 +00:00
chartl	a463dbcda1	Refactoring the qscript directory; oneoffs, playground, and core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5017 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 15:23:40 +00:00
rpoplin	7db9601c9d	Checking in the 1000G phase1 cleaning and calling scripts for posterity's sake, but also to show everyone what the current best practices for VQSR training looks like. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5015 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 14:32:52 +00:00
rpoplin	457c59e737	Use the sites-only HapMap files in the Methods development pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5013 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-18 20:50:09 +00:00
carneiro	35a4f1e366	.Added VariantEval as an optional step in the pipeline. .Lifted to HapMap 3.3 .Lifted to dbSNP 132 where possible. .Added the CEU-Trio WEx(hg19) dataset .Added some options to the pipeline You can now use : -dataset WEX -dataset HiSeq ... to choose which datasets to run through the pipeline. You can now without BAQ and indel mask: -noBAQ -noMASK Choose not to run the gold standard comparison analysis: -skipGoldStandard Activate the VariantEval walker analysis on the Recalibrated vcf: -eval The default behavior is to run exactly like it used to, so this version shouldn't change the way you used to use the pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5004 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-14 21:55:02 +00:00
carneiro	c4f9b262e5	removing the tech dev pipeline script from the repository to keep the methods development pipeline as the reference script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4992 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 18:15:55 +00:00
carneiro	9e93091e9a	-baqGOP now takes phred scaled scores instead of probabilities in the command line. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4982 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 00:06:38 +00:00
kshakir	8ba3a5a43f	Command lines for locally run Queue jobs no longer have to be escaped differently than bsub'ed jobs. GSA-410 Local job runs now can run command lines longer than than 4096 on our linux machines. When determining if the help text and Queue extensions need to be rebuilt, use the .class files not the .java so that GATK oneoffs are picked up correctly. Added the most basic of all example QScripts for debugging, Hello World. Minor updates to copy/pasted LSF code to reduce ant javadoc warnings by a third. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4970 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-10 21:07:29 +00:00
kshakir	b34e2f733f	Removed stochasticity from IndelRealigner by random sampling using and seed based on the read list. Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set. Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found. Now building all scala code at the same time, just like all java code is compiled at the same time. Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile. Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles. Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified. Fixed a couple errors when the <javadoc> task is run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 22:03:36 +00:00
chartl	3e7802a3e0	Minor changes to a qscript and the GQ constants on PrivatePermutations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4956 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 18:26:21 +00:00
carneiro	5e9a8f9cb3	Implemented a new argument (-DQS --defaultQualityScore) that allows GATK to deal with BAM files missing quality scores. If a value is specified, all reads are filled with the default quality score. Appropriate exception is thrown if -DQS is not provided and BAM file doesn't have quality scores for every base. Adding the first version of the techdev pipeline (tdPipeline) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 22:25:08 +00:00
rpoplin	20f29e4690	In the Methods development pipeline the call confidence threshold must be lowered from the default value for lowpass calling. What a bone-headed mistake! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4941 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 20:30:55 +00:00
corin	6d809321d3	Updating combien variants memory limit and dcov default for the full calling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4907 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-24 03:06:50 +00:00
depristo	5265f943b0	phasing per sample. tmp checkin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4898 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 20:14:06 +00:00
corin	e7569cfe6f	Updated dbsnp version usage. Calling with 132, but still using 129 for eval to maintain consistant known/novel eval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4895 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 17:37:27 +00:00
chartl	2235245af0	PrivatePermutations generalized to compute transition counts and average probabilities (and thus was renamed). Changes in some pipelines to reflect the change. Bugfix in the batch merging pipeline (it would halt because the allele VCF for genotyping batches could become off-spec). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4894 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 15:16:15 +00:00
rpoplin	7185fcb47b	Committing my notes about the methods development pipeline so we stay synced up while I'm on vacation. Cheers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4891 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 21:14:20 +00:00
chartl	80770dc032	Expanded target pipeline complete. Stop trying to be clever about scatter-gather; wait until functional SG is built-in to Q. Til then, a lazy version of the fullCallingPipeline. Seems to take a long time to generate the graph though... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4888 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 00:56:16 +00:00
kshakir	758d14a261	Checking in scripts used for testing the linear index MAX_FEATURES_PER_BIN. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4887 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 21:25:36 +00:00
chartl	fc33901810	Graph structure must be known at compile time. Removing GroupIntervals until a future point where in-process-functions can predict their output based on inputs [though this is probably forever: the inputs may not exist at compile time!] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4886 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 21:22:58 +00:00
chartl	61d5daa65c	EXTREME interval processing. Still undergoing testing. + GroupIntervals allows user-defined scattering (e.g. take an interval list file, split it into k smaller interval list files by number of lines) + ExpandIntervals expands the intervals, either by widening them, or allowing the definition for nearby intervals (e.g. flanks starting 1bp before and after, ending 10bp after that) + IntersectIntervals takes n interval lists, writes 1 interval list that is the n-way intersection of all of them git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4885 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 19:42:50 +00:00
rpoplin	4ca1da1d07	Updating the NA12878.HiSeq bam file to be the correct bam file in the methods development qscript. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4879 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 14:53:10 +00:00
rpoplin	8fac346ac1	Misc cleanup in Methods Development Qscript git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4878 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 04:24:25 +00:00
rpoplin	34ab5b4889	Turning on BAQ in Methods Development pipeline. A new dataset is added: 363 EUR samples from the November 1000G release. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4877 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-19 21:13:25 +00:00
chartl	8118a439c0	Commit for Khalid git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4876 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 22:24:18 +00:00
rpoplin	15a33545f4	Updating Methods development pipeline qscript with the bam lists for all the data sets. It is ready for people to start running with it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4875 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 22:19:14 +00:00
corin	f0ab7b849a	Adding a window size variable to avoid indel genotyper error git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4873 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 04:19:54 +00:00
rpoplin	bdef4e775a	Initial checkin of methods development pipeline qscript. It allows the methods dev team to run an overnight job which calls and recalibrates a variety of data sets and allows for an end-to-end sanity check of final results for potential changes to the methods. It isn't meant to be used by anybody quite yet, but shows the general structure and flow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4871 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 22:14:02 +00:00
rpoplin	095fc1922a	By popular demand I'm adding the qscript I used to do the 660 bamfile 1000G calling for ASHG. It does cleaning, BAQing, and merging in 3mb chunks genome-wide then calls SNPs on those temporary bams. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4866 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 18:49:03 +00:00
depristo	32d5397c01	Experimental support for sided annotations. Currently not more/less valuable than two-tailed testing. Future experiments are needed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4864 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 15:08:31 +00:00
chartl	0d18bd1011	Now that addAll() is in the superclass, no longer need this definition (which, without override, prevents the script from compiling anyway) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4862 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 05:36:31 +00:00
chartl	3e75431bc8	Thanks to mark: VCFInfoToTable removed in favor of a more flexible walker. Slight change to the argument structure of the walker to make it play more nicely with Queue: the field list parsing is pushed into the command line system (e.g. the variable is exposed as a List<String> and not a String, so Queue doesn't have to join a list into a string only to have it broken out again. This also allows the user to specify -F field1 -F field2 -F field3 if he/she so desires. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4842 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 03:33:36 +00:00
chartl	2217837845	Commit for Khalid -- should be a scala version of vcf2table but for some reason the run method isn't getting called. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4841 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 00:44:15 +00:00
chartl	f36861eeee	One more little bfix -- the issue was not the grep command, but instead the NFS in the awk; i changed it to ++count in the last commit which was really responsible for the fix. Then this ultra-escaping semi-broke teh grep again. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4831 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 20:36:14 +00:00
chartl	d34c5640d2	Bugfix for clf version of extract samples. Due to dynamic shell creation and bsubs and whatnot, the OR pipe for grep ("a\|b") needs to be super-escaped ("a\\\\\\\\\|b"). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4829 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 19:06:30 +00:00
chartl	f795b25c47	In-process versions of sample extraction and interval-list conversion for VCF files. Required an in-process-function branch of the queue library. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4827 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 17:36:53 +00:00
depristo	e219f6a4b5	Q script to run VQSR on a whole variety of common data sets. To be used as a basis for general methods development pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4826 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 16:55:52 +00:00
chartl	7bc2049031	Updates and bug fixes to private mutations qscript and pipeline libraries. Hand filter strings are now not busted (boo to having to escape quotes); convenience method added to VariantCalling to propagate standard trait data to a given GATK command line -- should be made more scala-esque in the future. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4824 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 04:55:13 +00:00
chartl	cf75caf653	java changes: VariantEvalWalker's logger is made public, so that variant eval modules can access it through the parent object. DesignFileGenerator comment lists how best to bind things to it, and the feature accessor is better refined to grab the genome loc. (old change) scala changes: convenience addAll( List[CommandLineFunction] ) added to QScript class (and thus removed from the fCPV2) useful command line functions added to a new library package for command line functions (these are fast simple VCF command lines) bug fixed in ProjectManagement for the class where there's only one batch to be batch-merged (not really part of the use-case, but an edge-condition that came up during pipeline testing) first draft of a private mutations pipeline which will be elaborated in future git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4823 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-12 05:10:45 +00:00
chartl	81290d238d	Restructuring my qscripts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4821 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-11 20:58:45 +00:00

1 2 3 4 5

249 Commits (7b452ea2b9ad2d2f3e8bbfcbfb0e818f0a87f18d)