Commit Graph

248 Commits (b1ff371c8f70bd8070fdb57825826540cbe96f37)

Author SHA1 Message Date
carneiro 7af003666d added optional argument -cut to apply the variant cut to the ts recalibrated vcf.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5183 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:34:40 +00:00
chartl 5398cf620a Bug fixes in the in process function (spoiled by python: was not closing my writers). SortByRef now works somewhat like the perl script does, rather than doing a memory-expensive sort. Adding a QTools qscript which is kinda clunky, and will be used mostly for integration tests of these IPFs, pending some better way to construct argument collections and function accessors at compile-time.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5182 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:32:46 +00:00
chartl a9d0921529 That variable name could only lead to trouble.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5180 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 05:03:48 +00:00
chartl 9515f94242 Commiting a simple merge IPF for use with qscripts (currently use a long grep, awk, pipe command, which can be unsafe and is hard to extend). Tests for all these functions coming soon. Also, IntelliJ + intermittent VPN connection = botched repository.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5179 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 05:01:21 +00:00
carneiro cf15819db5 updated to work with the new VariantEval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5176 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 17:46:07 +00:00
rpoplin 47357b726e Fixing import GenotypeCalculationModel since it doesn't exist anymore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5175 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 15:39:43 +00:00
fromer 7605f0e6c1 Corrected input/output definitions for Queue
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5173 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 07:39:00 +00:00
fromer 3839fd1a25 Updated phasing pipeline to properly read samples from VCF and BAM files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5172 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 07:16:05 +00:00
fromer 798955b006 After discussing with Mark, revert to "Master merging" of phase information from VCFs. This has the advantage of creating minimal phased VCFs from RBP, from which phase info is merged into the original "master VCF". Also, updated Genotype.sameGenotype() to be simpler and NOT REVERSE the ignorePhase flag in comparing Allele lists/sets
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5167 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 19:50:15 +00:00
fromer a89400b20c Simple implementation to retrieve relevant BAM files for each sample
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5152 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 00:03:03 +00:00
kshakir e74f28ad89 If there's an LSF queue maximum time limit set and the user hasn't specified one for this job, pass on the queue defined maximum limit with the job.
Updated LibBatIntegrationTest to use proper networked temp directory accessible by local machines and nodes.
Disabling the FCPTest until the VE3 is incorporated into the fullCallingPipeline.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5151 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 23:13:09 +00:00
fromer f258363cfc Minor bug fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5150 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 22:29:28 +00:00
fromer 742bd44728 Changed output file to be user-defined
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5149 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 22:15:26 +00:00
fromer 6c99dc4dab Take (partial) ownership of phasing 1000G chr20 calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5147 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 21:49:41 +00:00
chartl 4d9bc84bd5 Initial commit of in-process helper functions for making the BCM more robust
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5144 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 19:18:31 +00:00
kshakir d4f744a4d4 Checking if the interval files exist before using them to calculate the minimum scatter parts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5143 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 18:07:34 +00:00
kshakir 57353294cc Copying jobLimitSeconds to clones.
Some cleanup and refactoring around copying values to clones.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5128 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-30 06:35:53 +00:00
kshakir e19b5d17b4 Related to last checkin, need to create the directory when writing the yamlthe first time after an ant clean.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5127 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-29 20:45:44 +00:00
kshakir 23578b7402 Pipeline tests will only start from scratch after "ant clean", making it faster to debug downstream issues when re-running "ant pipelinetest -Dpipeline.run=run".
Updated the FCP, the test, and the ADPR to handle an issue with the ADPR locating the yaml generated by the FCPTest.
Does not solve the ADPR error: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5126 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-29 19:44:03 +00:00
kshakir b0a3c70f90 Updated paths to new bams.
Metrics of the new bams have changed slightly but should still fall within test toleraneces.
Will reset metrics in a later checkin after confirming changes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5125 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-29 10:55:26 +00:00
kshakir 4ee4fd47e9 Moved the test name and the job queue into the spec.
Defaulting to the hour queue for running pipeline tests.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5122 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-29 00:07:25 +00:00
kshakir 2ef66af903 Moved the maximum number of intervals check from FCP to the Queue core so that scatter gather will no longer blow up if you specify a scatter count that is too high.
Moved the BamListWriter from FCP to ListWriterFunction in the Queue core.
Added an ExampleCountLoci QScript along with an example pipeline integration test which checks MD5s.
Added a few more utility methods to PipelineTest including a currentGATK variable that points to the GATK jar.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5121 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 23:33:58 +00:00
corin b25d131481 updated to work with the new tearsheet
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5113 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 18:49:11 +00:00
carneiro cae4b9b0de quick update with the correct CEU trio bam file and it's final location.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5098 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 19:17:19 +00:00
ebanks 68729045ca Always best to use the left-aligned version of the dbsnp vcf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5091 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 20:21:50 +00:00
kshakir df2e7bd355 Disabled FCPTest whilst we figure out where the C426 bams went.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5078 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 05:11:57 +00:00
kshakir ce5b11317b Moved some shutdown logic from the LSF job runner into the QGraph.
Because of Java's type erasure JobManagers must provide runtime access to the runner class to shutdown.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5076 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 20:28:54 +00:00
kshakir b3c9b9bfbe +1 file that should have been with the last checkin.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5069 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 05:31:17 +00:00
kshakir 9923e05e0a Moved MD5 utils from WalkerTest to BaseTest for use by PipelineTests.
Moved VariantEval validation from FCPTest to PipelineTest.
Cleaned up some duplicate code for writing temp files during tests.
Moved FCPTest to playground namespace to match move for FCP.q.
Added a basic HelloWorldPipelineTest for the HelloWorld QScript. 
Moved duplicated error handling from JobRunners into the FunctionEdge.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5068 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 04:11:49 +00:00
kshakir 76ee57639d Updated FCPTest to match changes to UG in r5058.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5066 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 19:30:02 +00:00
delangel fa0c476b82 Script for calling indels in all phase 1 samples - VQSR part still needs work but raw calling is done
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5052 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-22 14:07:10 +00:00
carneiro a0731eaa81 updated NA12878 Trio gold standard data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5048 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:48:31 +00:00
depristo 94b64ec54a Moving scala script into analysis directory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5047 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:42:18 +00:00
depristo b45566760e intermediate checkin
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5045 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:39:25 +00:00
kshakir 6fbd18c759 Cleaning up obsolete code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5044 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 16:27:35 +00:00
kshakir 8d46cf3604 Testing a configuration change for build system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5043 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 14:44:41 +00:00
rpoplin b6497c404f Moving Phase1Calling qscript over to using the cleaned, pre-BAQed bams
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5039 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 02:41:20 +00:00
carneiro fc73569d62 Added NA12878 Trio dataset to the pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5037 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 23:15:33 +00:00
kshakir 8855f080c2 For the fullCallingPipeline.q:
- Reading the refseq table from the YAML if not specified on the command line.
 - Removed obsolete -bigMemQueue now that CombineVariants runs in 4g.
 - Added a -mountDir /broad/software option to work around adpr automount issues.
 - Merged the LSF preexec used for automount into the shell script used to execute tasks.
 - Using the LSF C Library to determine when jobs are complete instead of postexec.
 - Updated queue.sh to match the changes above.
 - Updated the FCPTest to match the changes above.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5036 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 22:34:43 +00:00
depristo 41c8552d0a Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:54:03 +00:00
kshakir 4d611e53e7 Passing the ADPR R script to FCPTest.
Changed the FCP.q to use an InProcessFunction work around the -runDir issue GSA-420.
Tested the FCPTest using the following dotkits and "ant clean pipelinetest -Dpipeline.run=run":
  - R-2.11
  - Oracle-full-client
  - .cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5029 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 06:08:45 +00:00
kshakir acc2f1c9fe Updated FCPTest to use the new path to fullCallingPipeline.q changed in r5017.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5027 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 21:43:43 +00:00
corin 2824e8224c removes unused titv argument
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5025 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 20:49:12 +00:00
corin 50fcebb0c4 Incorporates tearsheet and plot production with database access into standard pipeline. Note that the following dotkit packages must be run before the adpr will be correctly generated:
R-2.10, 
Oracle-full-client, 
cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1

This also removes the unused titv argument


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5024 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 20:48:42 +00:00
rpoplin 55eb0387ac Another relevant qscript. I use this one to do thousands of variant recalibration jobs to search for optimal parameters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5019 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 18:17:32 +00:00
chartl a463dbcda1 Refactoring the qscript directory; oneoffs, playground, and core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5017 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 15:23:40 +00:00
rpoplin 7db9601c9d Checking in the 1000G phase1 cleaning and calling scripts for posterity's sake, but also to show everyone what the current best practices for VQSR training looks like.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5015 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 14:32:52 +00:00
rpoplin 457c59e737 Use the sites-only HapMap files in the Methods development pipeline
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5013 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 20:50:09 +00:00
rpoplin 00453919d2 VQSR now only uses the valid polymorphic sites for training and truth sensitivity calculations. Any number of tracks whose ROD binding begins with the name truth can be used as truth sensitivity tracks.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5012 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 20:48:19 +00:00
carneiro 35a4f1e366 .Added VariantEval as an optional step in the pipeline.
.Lifted to HapMap 3.3
.Lifted to dbSNP 132 where possible.
.Added the CEU-Trio WEx(hg19) dataset 
.Added some options to the pipeline

You can now use : 

-dataset WEX
-dataset HiSeq
...

to choose which datasets to run through the pipeline.

You can now without BAQ and indel mask:

-noBAQ 
-noMASK

Choose not to run the gold standard comparison analysis:

-skipGoldStandard

Activate the VariantEval walker analysis on the Recalibrated vcf:

-eval

The default behavior is to run exactly like it used to, so this version shouldn't change the way you used to use the pipeline.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5004 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 21:55:02 +00:00