depristo
8ece2b9230
Distributed GATK analysis scripts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5049 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 22:09:07 +00:00
carneiro
a0731eaa81
updated NA12878 Trio gold standard data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5048 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:48:31 +00:00
depristo
94b64ec54a
Moving scala script into analysis directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5047 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:42:18 +00:00
depristo
63e8103c4e
A new top-level directory to hold analysis scripts associated with specific analyses
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5046 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:40:02 +00:00
depristo
b45566760e
intermediate checkin
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5045 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:39:25 +00:00
kshakir
6fbd18c759
Cleaning up obsolete code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5044 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 16:27:35 +00:00
kshakir
8d46cf3604
Testing a configuration change for build system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5043 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 14:44:41 +00:00
ebanks
78a43faebe
Adding options to warn instead of erroring out (so that you can see all errors in one shot) and to skip filtered records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5042 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 05:24:28 +00:00
ebanks
02b5d4357f
Deprecated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5041 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 05:05:07 +00:00
ebanks
c3dbbe7f91
Bug fix: don't assume users won't use arbitrary rods on the commandline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5040 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 04:59:28 +00:00
rpoplin
b6497c404f
Moving Phase1Calling qscript over to using the cleaned, pre-BAQed bams
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5039 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 02:41:20 +00:00
hanna
aea121a9d5
<key>=<value> tagging support for command-line arguments. Unfortunately, still
...
very hard to validate and still very hard to use (requires core hacking to
support additional tags).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5038 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 00:22:42 +00:00
carneiro
fc73569d62
Added NA12878 Trio dataset to the pipeline.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5037 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 23:15:33 +00:00
kshakir
8855f080c2
For the fullCallingPipeline.q:
...
- Reading the refseq table from the YAML if not specified on the command line.
- Removed obsolete -bigMemQueue now that CombineVariants runs in 4g.
- Added a -mountDir /broad/software option to work around adpr automount issues.
- Merged the LSF preexec used for automount into the shell script used to execute tasks.
- Using the LSF C Library to determine when jobs are complete instead of postexec.
- Updated queue.sh to match the changes above.
- Updated the FCPTest to match the changes above.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5036 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 22:34:43 +00:00
hanna
8831ec3dce
Some refactoring and cleanup around the area of my sleep-deprived integration
...
test typo, which Khalid already fixed for me. Sorry, Khalid!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5035 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 15:03:14 +00:00
kshakir
3022f4dfa0
Fixed missing space character in testSimpleVCFStreaming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5034 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 14:49:38 +00:00
depristo
e4ac1e6171
Removing unused file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5033 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 13:03:55 +00:00
depristo
85553cf5cb
V2 cleaner, easily testing, shared memory and distributed GATK job management. Serious unit testing. Very much cleaner processing. Some code cleanup remains in removing now unused classes but the system is ready for general testing. Confirmed that one can run the UG 100 ways parallel without error, but edge cases may remain.
...
See documentation at:
http://www.broadinstitute.org/gsa/wiki/index.php/Parallelism_and_the_GATK#Distributed_Parallelism_.28Experimental.29
for examples on how to run this, or the testing Scala script
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5032 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:58:13 +00:00
depristo
41c8552d0a
Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:54:03 +00:00
depristo
cacdac3914
Major refactoring of shards. No longer uses interfaces but is now an actual object hierarchy with most of the important and common functionality pushed up to base classes. Eliminated a lot of duplicated code, and the shards are much more understandable now. Also now require a GenomeLocParser to work with their own GenomeLocs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5030 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:36:56 +00:00
kshakir
4d611e53e7
Passing the ADPR R script to FCPTest.
...
Changed the FCP.q to use an InProcessFunction work around the -runDir issue GSA-420.
Tested the FCPTest using the following dotkits and "ant clean pipelinetest -Dpipeline.run=run":
- R-2.11
- Oracle-full-client
- .cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5029 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 06:08:45 +00:00
hanna
7087c2f422
Very simple integration tests for basic VCF streaming functionality.
...
Rather than try to fork the integration test process to get a pipe source
and sink, creates a new named pipe by Runtime.exec()ing the 'mkfifo' shell
command. We'll see whether this proves to be a reliable method for testing
streaming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5028 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 04:38:54 +00:00
kshakir
acc2f1c9fe
Updated FCPTest to use the new path to fullCallingPipeline.q changed in r5017.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5027 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 21:43:43 +00:00
corin
22582960be
Adding the backdrop to the current version of the tearsheet so it is always available
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5026 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 20:56:11 +00:00
corin
2824e8224c
removes unused titv argument
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5025 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 20:49:12 +00:00
corin
50fcebb0c4
Incorporates tearsheet and plot production with database access into standard pipeline. Note that the following dotkit packages must be run before the adpr will be correctly generated:
...
R-2.10,
Oracle-full-client,
cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1
This also removes the unused titv argument
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5024 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 20:48:42 +00:00
kshakir
2b895ffb7f
Updated the HG19 reference from v0 to v1 after the v0 was zeroed out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5023 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 20:30:25 +00:00
corin
af60666f5d
An example template for the shell script used to run the full calling pipeline on the broad system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5022 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 20:28:59 +00:00
corin
dfcd45181a
Some minor tweaks to the tearsheet generator that incorperate the gsalib more universally and create a more accurate output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5021 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 20:27:02 +00:00
kshakir
c901fb6d70
Now populating the refseq and dbsnp in awk instead of retrieving from firehose.
...
Added refseq table to the pipeline object.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5020 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 18:19:10 +00:00
rpoplin
55eb0387ac
Another relevant qscript. I use this one to do thousands of variant recalibration jobs to search for optimal parameters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5019 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 18:17:32 +00:00
chartl
82f81a83c2
Mauricio shall not escape my vengeance!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5018 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 15:24:37 +00:00
chartl
a463dbcda1
Refactoring the qscript directory; oneoffs, playground, and core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5017 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 15:23:40 +00:00
rpoplin
24bc843ae8
Dynamically change the log message update rate so that short jobs receive frequent updates while longer running jobs receive fewer updates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5016 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 15:09:11 +00:00
rpoplin
7db9601c9d
Checking in the 1000G phase1 cleaning and calling scripts for posterity's sake, but also to show everyone what the current best practices for VQSR training looks like.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5015 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 14:32:52 +00:00
rpoplin
bd2af33a16
misc clean up in VQSR
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5014 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 21:04:31 +00:00
rpoplin
457c59e737
Use the sites-only HapMap files in the Methods development pipeline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5013 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 20:50:09 +00:00
rpoplin
00453919d2
VQSR now only uses the valid polymorphic sites for training and truth sensitivity calculations. Any number of tracks whose ROD binding begins with the name truth can be used as truth sensitivity tracks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5012 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 20:48:19 +00:00
fromer
4bec93e3e4
Permit retrieval of read names for debugging purposes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5011 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 16:09:34 +00:00
depristo
f8ba76d87c
Incremental commit for distributed computation. Appears to work but has potential deadlock situation not yet debugged. Do not use yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5010 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-17 21:23:09 +00:00
kiran
2f4a436719
Throw an exception if no eval rods are specified. If one or more samples are specified, subset the 'all' VariantContext to just the specified samples. This is useful when you want to see what effect dropping certain samples will have on the metrics and you don't want to go through SelectVariants first.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5009 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-17 06:46:10 +00:00
ebanks
366c3a0b8f
Incompatible chain files are user exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5008 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-16 05:26:47 +00:00
depristo
cc4e193670
Removal of empty directories
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5007 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-15 22:43:51 +00:00
depristo
a88708ebfa
Moving GLF code to archive
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5006 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-15 22:42:42 +00:00
depristo
797e07b0c3
Archived home of GLF code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5005 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-15 22:37:20 +00:00
carneiro
35a4f1e366
.Added VariantEval as an optional step in the pipeline.
...
.Lifted to HapMap 3.3
.Lifted to dbSNP 132 where possible.
.Added the CEU-Trio WEx(hg19) dataset
.Added some options to the pipeline
You can now use :
-dataset WEX
-dataset HiSeq
...
to choose which datasets to run through the pipeline.
You can now without BAQ and indel mask:
-noBAQ
-noMASK
Choose not to run the gold standard comparison analysis:
-skipGoldStandard
Activate the VariantEval walker analysis on the Recalibrated vcf:
-eval
The default behavior is to run exactly like it used to, so this version shouldn't change the way you used to use the pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5004 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 21:55:02 +00:00
hanna
579e0d59fa
Rewrote warning message to discourage use of unsafe mode.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5003 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 21:32:53 +00:00
kshakir
2355a55067
Added a QFunction.jobLimitSeconds for experimentation, currently only used as the equivalent of bsub -W.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5002 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 19:45:03 +00:00
hanna
af31d02a2d
Fix concurrency issue that periodically kills VariantEvalIntegrationTest --
...
a member field of RMDTrackBuilder was getting rebuilt every time it was
called, creating concurrency issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5001 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 18:52:21 +00:00
kiran
73acfa654a
Fixed double-counting bug. Fixed issue where evaluation module with an update2() method wasn't getting called if the comp track was null. Added a column to the output report indicating the table name for easy greppability. Fixed an issue where, if sample-level stratification was not required, the sample-level VCs would be generated anyway.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5000 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 14:06:43 +00:00