Commit Graph

182 Commits (d3660aa00e3becb26d1f321ee2ff94ff8a8cfe28)

Author SHA1 Message Date
carneiro 5f10fffa47 merge intervals now prints a sorted list in the end.
added the ccs datasets to the pbCalling pipeline.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5233 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 20:57:59 +00:00
carneiro 50c2fa3c3a this -1 made ALL the difference in the world. Minor bug fix.
Regular updates to the pbCalling pipeline.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5232 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 19:25:09 +00:00
fromer cdf53188d6 Updated DoC to work with scatter-gather; and, also manually implemented scatter-gather by sample above the scatter-gather by interval. Thansk to Khalid for his support!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5231 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 19:14:42 +00:00
carneiro c630701a76 Following Ryan's suggestion, I am moving the Methods Development Calling pipeline to the Core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5226 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-10 17:36:05 +00:00
carneiro 9c2c5efe35 a modified version of the Methods Development calling pipeline made to work with pacbio data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5225 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-10 16:06:50 +00:00
fromer 947cc44854 Thanks to Matt for walking me through a proper version of VCF_BAM_utilities! Feel free to add to it, or use it to get the samples in a VCF file, a BAM file, or a collection of BAM files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5223 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-09 18:08:27 +00:00
kshakir 4d1cca95bb Removed deprecated getDbsnpFile.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:12:15 +00:00
carneiro e5cfc6ae74 NA12878 hg19 dataset was included to the methods pipeline. (and I am running it)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5217 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 16:17:46 +00:00
fromer 8d0f1b75d5 Added queue/util/BAMutilities Object [with BAM and VCF parsing utilities], which is now used by my qscripts that robustly split runs by sample
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5214 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 22:17:29 +00:00
kshakir 8040998c15 Renamed the pipeline yaml dbsnpFile to genotypeDbsnp, and added an evalDbsnp.
Added a genotypeDbsnpType and evalDbsnpType to check the extensions for .vcf or .rod.
Moved renaming of "recalibrated" bams to "cleaned" from sed to yaml generation template (see diff for more info).
Renamed fCP.q to FCP.q.
Though it's still disabled until VariantEval is updated, added changes above to the FCPTest.
Removed refseq table from the queue.sh wrapper script. Only specified in the yaml.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5213 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 22:01:09 +00:00
fromer 3c1a026c94 Updated script to properly bin DoC values so that down-sampling corresponds to range of DoC values obtainable
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5208 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 16:47:55 +00:00
depristo c4707631e2 MethodsDevelopmentPipeline is now the test bed for large scale AWS_S3 logging. Can be disabled from command line if this is necessary
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5203 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 17:03:45 +00:00
fromer 8b8b4fced1 Removed explicit memoryLimit, so that memLimit given on the command-line will NOT be ignored...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5202 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 01:55:17 +00:00
depristo fe4aa58d35 Removing unused class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5197 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:22:28 +00:00
fromer 4cdc974c5f Preliminary Qscript to run DoC for the purpose of CNV detection
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5194 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 21:25:59 +00:00
corin cd6ace1b47 Includes UG version of indel genotyping rather than IGV2
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5191 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 20:25:46 +00:00
carneiro 358a400474 made ApplyVariantCut a default part of the pipeline, added the -noCut option if you don't want to use it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5189 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 19:29:36 +00:00
carneiro 7af003666d added optional argument -cut to apply the variant cut to the ts recalibrated vcf.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5183 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:34:40 +00:00
chartl 5398cf620a Bug fixes in the in process function (spoiled by python: was not closing my writers). SortByRef now works somewhat like the perl script does, rather than doing a memory-expensive sort. Adding a QTools qscript which is kinda clunky, and will be used mostly for integration tests of these IPFs, pending some better way to construct argument collections and function accessors at compile-time.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5182 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:32:46 +00:00
carneiro cf15819db5 updated to work with the new VariantEval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5176 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 17:46:07 +00:00
rpoplin 47357b726e Fixing import GenotypeCalculationModel since it doesn't exist anymore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5175 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 15:39:43 +00:00
fromer 7605f0e6c1 Corrected input/output definitions for Queue
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5173 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 07:39:00 +00:00
fromer 3839fd1a25 Updated phasing pipeline to properly read samples from VCF and BAM files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5172 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 07:16:05 +00:00
fromer 798955b006 After discussing with Mark, revert to "Master merging" of phase information from VCFs. This has the advantage of creating minimal phased VCFs from RBP, from which phase info is merged into the original "master VCF". Also, updated Genotype.sameGenotype() to be simpler and NOT REVERSE the ignorePhase flag in comparing Allele lists/sets
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5167 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 19:50:15 +00:00
fromer a89400b20c Simple implementation to retrieve relevant BAM files for each sample
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5152 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 00:03:03 +00:00
fromer f258363cfc Minor bug fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5150 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 22:29:28 +00:00
fromer 742bd44728 Changed output file to be user-defined
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5149 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 22:15:26 +00:00
fromer 6c99dc4dab Take (partial) ownership of phasing 1000G chr20 calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5147 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 21:49:41 +00:00
kshakir 23578b7402 Pipeline tests will only start from scratch after "ant clean", making it faster to debug downstream issues when re-running "ant pipelinetest -Dpipeline.run=run".
Updated the FCP, the test, and the ADPR to handle an issue with the ADPR locating the yaml generated by the FCPTest.
Does not solve the ADPR error: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5126 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-29 19:44:03 +00:00
kshakir 2ef66af903 Moved the maximum number of intervals check from FCP to the Queue core so that scatter gather will no longer blow up if you specify a scatter count that is too high.
Moved the BamListWriter from FCP to ListWriterFunction in the Queue core.
Added an ExampleCountLoci QScript along with an example pipeline integration test which checks MD5s.
Added a few more utility methods to PipelineTest including a currentGATK variable that points to the GATK jar.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5121 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 23:33:58 +00:00
corin b25d131481 updated to work with the new tearsheet
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5113 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 18:49:11 +00:00
carneiro cae4b9b0de quick update with the correct CEU trio bam file and it's final location.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5098 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 19:17:19 +00:00
ebanks 68729045ca Always best to use the left-aligned version of the dbsnp vcf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5091 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 20:21:50 +00:00
delangel fa0c476b82 Script for calling indels in all phase 1 samples - VQSR part still needs work but raw calling is done
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5052 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-22 14:07:10 +00:00
carneiro a0731eaa81 updated NA12878 Trio gold standard data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5048 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:48:31 +00:00
depristo 94b64ec54a Moving scala script into analysis directory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5047 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:42:18 +00:00
depristo b45566760e intermediate checkin
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5045 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:39:25 +00:00
rpoplin b6497c404f Moving Phase1Calling qscript over to using the cleaned, pre-BAQed bams
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5039 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 02:41:20 +00:00
carneiro fc73569d62 Added NA12878 Trio dataset to the pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5037 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 23:15:33 +00:00
kshakir 8855f080c2 For the fullCallingPipeline.q:
- Reading the refseq table from the YAML if not specified on the command line.
 - Removed obsolete -bigMemQueue now that CombineVariants runs in 4g.
 - Added a -mountDir /broad/software option to work around adpr automount issues.
 - Merged the LSF preexec used for automount into the shell script used to execute tasks.
 - Using the LSF C Library to determine when jobs are complete instead of postexec.
 - Updated queue.sh to match the changes above.
 - Updated the FCPTest to match the changes above.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5036 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 22:34:43 +00:00
depristo 41c8552d0a Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:54:03 +00:00
kshakir 4d611e53e7 Passing the ADPR R script to FCPTest.
Changed the FCP.q to use an InProcessFunction work around the -runDir issue GSA-420.
Tested the FCPTest using the following dotkits and "ant clean pipelinetest -Dpipeline.run=run":
  - R-2.11
  - Oracle-full-client
  - .cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5029 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 06:08:45 +00:00
corin 50fcebb0c4 Incorporates tearsheet and plot production with database access into standard pipeline. Note that the following dotkit packages must be run before the adpr will be correctly generated:
R-2.10, 
Oracle-full-client, 
cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1

This also removes the unused titv argument


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5024 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 20:48:42 +00:00
rpoplin 55eb0387ac Another relevant qscript. I use this one to do thousands of variant recalibration jobs to search for optimal parameters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5019 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 18:17:32 +00:00
chartl a463dbcda1 Refactoring the qscript directory; oneoffs, playground, and core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5017 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 15:23:40 +00:00
rpoplin 7db9601c9d Checking in the 1000G phase1 cleaning and calling scripts for posterity's sake, but also to show everyone what the current best practices for VQSR training looks like.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5015 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 14:32:52 +00:00
rpoplin 457c59e737 Use the sites-only HapMap files in the Methods development pipeline
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5013 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 20:50:09 +00:00
carneiro 35a4f1e366 .Added VariantEval as an optional step in the pipeline.
.Lifted to HapMap 3.3
.Lifted to dbSNP 132 where possible.
.Added the CEU-Trio WEx(hg19) dataset 
.Added some options to the pipeline

You can now use : 

-dataset WEX
-dataset HiSeq
...

to choose which datasets to run through the pipeline.

You can now without BAQ and indel mask:

-noBAQ 
-noMASK

Choose not to run the gold standard comparison analysis:

-skipGoldStandard

Activate the VariantEval walker analysis on the Recalibrated vcf:

-eval

The default behavior is to run exactly like it used to, so this version shouldn't change the way you used to use the pipeline.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5004 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 21:55:02 +00:00
carneiro c4f9b262e5 removing the tech dev pipeline script from the repository to keep the methods development pipeline as the reference script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4992 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 18:15:55 +00:00
carneiro 9e93091e9a -baqGOP now takes phred scaled scores instead of probabilities in the command line.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4982 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 00:06:38 +00:00