carneiro
c61dd2f09f
data processing pipeline now has on the fly bam indexing (powered by Matt) some new parameters, Indel Cleaning with constrain movement and fixMates is gone.
...
setting up methods development pipeline for some cosmetic changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5277 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 23:13:54 +00:00
depristo
d97ed3e080
Comments for Mauricio
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5275 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 16:58:34 +00:00
carneiro
acad3ada06
changed baq to calculate_as_necessary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5270 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:50:46 +00:00
carneiro
7f9ca6b28a
full data processing pipeline, now deleting intermediate files and performing both phases (per lane and combined) of the processing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5269 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:34:00 +00:00
carneiro
497e9ab83b
too hasty... cleaning up debug messages ;)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5257 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 02:11:03 +00:00
carneiro
b4da843c49
now processes either a single bam file or a list of bam files in parallel.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5256 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 02:07:22 +00:00
carneiro
50c870cfce
Data Processing Pipeline: local indel realignment, mark duplicates and BQSR. Done.
...
Pacbio pipeline: now all pacbio bams have baq annotated in so running UG is uber fast.
Methods pipeline: minor cosmetic changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5253 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 17:22:30 +00:00
carneiro
6d3b878dde
data processing pipeline script already does:
...
. Local Indel Realignment
. Mark Duplicates
will do:
. Base Quality Score Recalibration (soon)
it's working with a single BAM for testing, but will work with a list of bam files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5250 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 21:49:05 +00:00
carneiro
87e19a17ae
small updates to the variant eval part of the pipeline, some updates to the pacbio specific pipeline.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5244 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 16:19:07 +00:00
fromer
d6e3f2eba6
Added GC content calculator for CNV data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5240 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 22:29:55 +00:00
carneiro
5f10fffa47
merge intervals now prints a sorted list in the end.
...
added the ccs datasets to the pbCalling pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5233 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 20:57:59 +00:00
carneiro
50c2fa3c3a
this -1 made ALL the difference in the world. Minor bug fix.
...
Regular updates to the pbCalling pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5232 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 19:25:09 +00:00
fromer
cdf53188d6
Updated DoC to work with scatter-gather; and, also manually implemented scatter-gather by sample above the scatter-gather by interval. Thansk to Khalid for his support!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5231 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 19:14:42 +00:00
carneiro
c630701a76
Following Ryan's suggestion, I am moving the Methods Development Calling pipeline to the Core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5226 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-10 17:36:05 +00:00
carneiro
9c2c5efe35
a modified version of the Methods Development calling pipeline made to work with pacbio data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5225 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-10 16:06:50 +00:00
fromer
947cc44854
Thanks to Matt for walking me through a proper version of VCF_BAM_utilities! Feel free to add to it, or use it to get the samples in a VCF file, a BAM file, or a collection of BAM files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5223 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-09 18:08:27 +00:00
kshakir
4d1cca95bb
Removed deprecated getDbsnpFile.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:12:15 +00:00
carneiro
e5cfc6ae74
NA12878 hg19 dataset was included to the methods pipeline. (and I am running it)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5217 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 16:17:46 +00:00
fromer
8d0f1b75d5
Added queue/util/BAMutilities Object [with BAM and VCF parsing utilities], which is now used by my qscripts that robustly split runs by sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5214 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 22:17:29 +00:00
fromer
3c1a026c94
Updated script to properly bin DoC values so that down-sampling corresponds to range of DoC values obtainable
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5208 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 16:47:55 +00:00
depristo
c4707631e2
MethodsDevelopmentPipeline is now the test bed for large scale AWS_S3 logging. Can be disabled from command line if this is necessary
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5203 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 17:03:45 +00:00
fromer
8b8b4fced1
Removed explicit memoryLimit, so that memLimit given on the command-line will NOT be ignored...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5202 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 01:55:17 +00:00
depristo
fe4aa58d35
Removing unused class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5197 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:22:28 +00:00
fromer
4cdc974c5f
Preliminary Qscript to run DoC for the purpose of CNV detection
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5194 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 21:25:59 +00:00
carneiro
358a400474
made ApplyVariantCut a default part of the pipeline, added the -noCut option if you don't want to use it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5189 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 19:29:36 +00:00
carneiro
7af003666d
added optional argument -cut to apply the variant cut to the ts recalibrated vcf.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5183 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:34:40 +00:00
chartl
5398cf620a
Bug fixes in the in process function (spoiled by python: was not closing my writers). SortByRef now works somewhat like the perl script does, rather than doing a memory-expensive sort. Adding a QTools qscript which is kinda clunky, and will be used mostly for integration tests of these IPFs, pending some better way to construct argument collections and function accessors at compile-time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5182 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:32:46 +00:00
carneiro
cf15819db5
updated to work with the new VariantEval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5176 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 17:46:07 +00:00
rpoplin
47357b726e
Fixing import GenotypeCalculationModel since it doesn't exist anymore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5175 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 15:39:43 +00:00
fromer
7605f0e6c1
Corrected input/output definitions for Queue
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5173 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 07:39:00 +00:00
fromer
3839fd1a25
Updated phasing pipeline to properly read samples from VCF and BAM files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5172 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 07:16:05 +00:00
fromer
798955b006
After discussing with Mark, revert to "Master merging" of phase information from VCFs. This has the advantage of creating minimal phased VCFs from RBP, from which phase info is merged into the original "master VCF". Also, updated Genotype.sameGenotype() to be simpler and NOT REVERSE the ignorePhase flag in comparing Allele lists/sets
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5167 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 19:50:15 +00:00
fromer
a89400b20c
Simple implementation to retrieve relevant BAM files for each sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5152 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 00:03:03 +00:00
fromer
f258363cfc
Minor bug fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5150 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 22:29:28 +00:00
fromer
742bd44728
Changed output file to be user-defined
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5149 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 22:15:26 +00:00
fromer
6c99dc4dab
Take (partial) ownership of phasing 1000G chr20 calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5147 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 21:49:41 +00:00
carneiro
cae4b9b0de
quick update with the correct CEU trio bam file and it's final location.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5098 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 19:17:19 +00:00
ebanks
68729045ca
Always best to use the left-aligned version of the dbsnp vcf
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5091 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 20:21:50 +00:00
delangel
fa0c476b82
Script for calling indels in all phase 1 samples - VQSR part still needs work but raw calling is done
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5052 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-22 14:07:10 +00:00
carneiro
a0731eaa81
updated NA12878 Trio gold standard data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5048 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:48:31 +00:00
depristo
94b64ec54a
Moving scala script into analysis directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5047 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:42:18 +00:00
depristo
b45566760e
intermediate checkin
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5045 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 18:39:25 +00:00
rpoplin
b6497c404f
Moving Phase1Calling qscript over to using the cleaned, pre-BAQed bams
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5039 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 02:41:20 +00:00
carneiro
fc73569d62
Added NA12878 Trio dataset to the pipeline.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5037 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 23:15:33 +00:00
depristo
41c8552d0a
Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:54:03 +00:00
rpoplin
55eb0387ac
Another relevant qscript. I use this one to do thousands of variant recalibration jobs to search for optimal parameters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5019 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 18:17:32 +00:00
chartl
a463dbcda1
Refactoring the qscript directory; oneoffs, playground, and core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5017 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 15:23:40 +00:00