carneiro
55e5971b3b
this is a oneoff script to clean the papuans and test TargetCreator and IndelRealigner with scatter gathering.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5457 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-17 17:09:53 +00:00
rpoplin
9c413fbc9e
not useful
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5450 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-15 22:47:55 +00:00
carneiro
42f70d9e07
join all per-lane Bams before doing target realigning and indel cleaning.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5435 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 16:11:03 +00:00
depristo
d01d4fdeb5
Optimized version of produce beagle tool, along with experimental (hidden) support for combining likelihoods depending on estimate false positive rate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5430 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 02:06:28 +00:00
fromer
0b45de14ed
Some minor updates to fully utilize the functionality of reduceByInterval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5411 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-09 20:38:08 +00:00
depristo
bf2e02f472
Generic, easy-to-use variant evaluation Queue script that tests indel and SNP call sets against standard evaluation data sets for sensitivity and specificity
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5391 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 18:03:29 +00:00
depristo
5c979633f0
Due to a problem in the way that dynamic type selection works, I've added an explicit (temporary) ability to restrict VE to specific variant types (SNPs, INDELs, etc), so that calculations will work when a site has a SNP in dbSNP but is called as an indel, causing the SNP site to mysteriously disappear from the comp track, a huge problem for validation report. VEU updated to allow both dynamic type (old) and just returning everything in the track.
...
Also, created a standard Queue script that calculates a suite of standard indel and SNP assessment results. Will be the basis for a general evaluation Queue script with standardized data files for SNPs and Indels.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5385 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 19:31:12 +00:00
chartl
a40a8006b5
Added in unit tests for the statistics calculated by the test runner; and bug-fixes to the calculations; so we have some assurance that the statistics coming out the back-end are correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5380 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 16:54:02 +00:00
chartl
9ca1dd5d62
Miscellaneous changes:
...
- RefMetaDataTracker: grabbing variant contexts given a prefix (not sure where else this was implemented, if someone can show me I'll remove it)
- VCFUtils: grabbing VCF headers given a prefix
- MathUtils: Useful functions for calculating statistics on collections of Numbers
- VariantAnnotator: Made isUniqueHeaderLine a public static method -- maybe this should go into a different class. Not sure.
- Associations: PluginManager now used to propagate classes, implementations for Z,T,U tests, slight alteration to format to make the objects stored
in the window optionally different from those returned by whatever statistic is run across the window
Added:
- MannWhitneyU. Started to fix up WilcoxonRankSum but there are comments in there questioning the validity of some of the code, and I'm sure that
it's actually doing a U test. This implementation includes the direct calculation of p-values for small sample sizes, and a uniform approximation
for when one of the sample sets is small, and the other large. Unit tests to follow.
- BootstrapCallsMerger: takes n VCFs which have been called on the same samples; merges them together while averaging the annotations
- BootstrapCalls.q: qscript for testing the effectiveness of boostrap low-pass calling on the exome
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5372 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 22:43:36 +00:00
carneiro
0daa65b9ef
quick and dirty 'close your eyes' solution to run the papuans over the weekend. Will be properly fixed soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5370 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 21:42:22 +00:00
chartl
0723b0f44c
Generalized association is now working. Output is in a horrific format. Implementation of T-testing. Improvements are to look for classes dynamically (a la VariantEval/VariantAnnotator), beautify output, and do optimizations where they exist.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5341 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 01:23:37 +00:00
carneiro
c7a51f0de7
fixed 1kg pilot dindel calls vcf file and combined all vcfs into one master dindel file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5335 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 19:04:58 +00:00
depristo
146756de79
Class name to reflect actual file name. manySampleUGPerformance now operates on 1000 samples!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5326 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-26 23:36:04 +00:00
chartl
b089d35b21
Fix expand intervals to do the right thing:
...
- No more duplicate intervals
- Truncation at intervals that already exist, e.g.
exists: |--------| |-------|
new: |---------|
fixed: |-----|
note that weird instances like:
exists: |-| |-| |-|
new: |---------------------|
fixed: |----|
e.g. you're truncated to the nearest interval on whatever side. In general many behaviors could happen in this instance, this is the one currently implemented.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5323 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-26 04:19:01 +00:00
carneiro
fd5d1f9cfc
minor cosmetic changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5322 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 21:56:35 +00:00
carneiro
81414a21dd
dpp: back to using 4gb memory assuming all is right with IndelRealigner now.
...
mdcp: Some class structural changes due to the inclusion of indel calls. ApplyCut now chooses the tranche differently for each dataset.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5319 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 19:21:02 +00:00
carneiro
6db3210387
the data processing pipeline needs more memory...
...
directory updates in the methods pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5305 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 17:22:58 +00:00
carneiro
897a333aba
Methods Development Pipeline now has the option of calling indels with the -indels parameter. Also updated some databases and the new NA12878 HiSeq hg19 that Tim just funneled to us, is updated and called.
...
Small fixes on the data processing pipeline
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5304 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 17:12:55 +00:00
carneiro
2a48ec1307
now only accepts intervals files if the user specifically requests to report bams at interval only.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5291 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 16:49:58 +00:00
carneiro
c61dd2f09f
data processing pipeline now has on the fly bam indexing (powered by Matt) some new parameters, Indel Cleaning with constrain movement and fixMates is gone.
...
setting up methods development pipeline for some cosmetic changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5277 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 23:13:54 +00:00
depristo
d97ed3e080
Comments for Mauricio
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5275 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 16:58:34 +00:00
carneiro
acad3ada06
changed baq to calculate_as_necessary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5270 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:50:46 +00:00
carneiro
7f9ca6b28a
full data processing pipeline, now deleting intermediate files and performing both phases (per lane and combined) of the processing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5269 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:34:00 +00:00
carneiro
497e9ab83b
too hasty... cleaning up debug messages ;)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5257 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 02:11:03 +00:00
carneiro
b4da843c49
now processes either a single bam file or a list of bam files in parallel.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5256 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 02:07:22 +00:00
carneiro
50c870cfce
Data Processing Pipeline: local indel realignment, mark duplicates and BQSR. Done.
...
Pacbio pipeline: now all pacbio bams have baq annotated in so running UG is uber fast.
Methods pipeline: minor cosmetic changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5253 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 17:22:30 +00:00
carneiro
6d3b878dde
data processing pipeline script already does:
...
. Local Indel Realignment
. Mark Duplicates
will do:
. Base Quality Score Recalibration (soon)
it's working with a single BAM for testing, but will work with a list of bam files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5250 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 21:49:05 +00:00
carneiro
87e19a17ae
small updates to the variant eval part of the pipeline, some updates to the pacbio specific pipeline.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5244 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 16:19:07 +00:00
fromer
d6e3f2eba6
Added GC content calculator for CNV data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5240 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 22:29:55 +00:00
carneiro
5f10fffa47
merge intervals now prints a sorted list in the end.
...
added the ccs datasets to the pbCalling pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5233 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 20:57:59 +00:00
carneiro
50c2fa3c3a
this -1 made ALL the difference in the world. Minor bug fix.
...
Regular updates to the pbCalling pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5232 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 19:25:09 +00:00
fromer
cdf53188d6
Updated DoC to work with scatter-gather; and, also manually implemented scatter-gather by sample above the scatter-gather by interval. Thansk to Khalid for his support!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5231 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 19:14:42 +00:00
carneiro
c630701a76
Following Ryan's suggestion, I am moving the Methods Development Calling pipeline to the Core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5226 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-10 17:36:05 +00:00
carneiro
9c2c5efe35
a modified version of the Methods Development calling pipeline made to work with pacbio data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5225 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-10 16:06:50 +00:00
fromer
947cc44854
Thanks to Matt for walking me through a proper version of VCF_BAM_utilities! Feel free to add to it, or use it to get the samples in a VCF file, a BAM file, or a collection of BAM files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5223 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-09 18:08:27 +00:00
kshakir
4d1cca95bb
Removed deprecated getDbsnpFile.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:12:15 +00:00
carneiro
e5cfc6ae74
NA12878 hg19 dataset was included to the methods pipeline. (and I am running it)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5217 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 16:17:46 +00:00
fromer
8d0f1b75d5
Added queue/util/BAMutilities Object [with BAM and VCF parsing utilities], which is now used by my qscripts that robustly split runs by sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5214 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 22:17:29 +00:00
fromer
3c1a026c94
Updated script to properly bin DoC values so that down-sampling corresponds to range of DoC values obtainable
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5208 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 16:47:55 +00:00
depristo
c4707631e2
MethodsDevelopmentPipeline is now the test bed for large scale AWS_S3 logging. Can be disabled from command line if this is necessary
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5203 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 17:03:45 +00:00
fromer
8b8b4fced1
Removed explicit memoryLimit, so that memLimit given on the command-line will NOT be ignored...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5202 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 01:55:17 +00:00
depristo
fe4aa58d35
Removing unused class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5197 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:22:28 +00:00
fromer
4cdc974c5f
Preliminary Qscript to run DoC for the purpose of CNV detection
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5194 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 21:25:59 +00:00
carneiro
358a400474
made ApplyVariantCut a default part of the pipeline, added the -noCut option if you don't want to use it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5189 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 19:29:36 +00:00
carneiro
7af003666d
added optional argument -cut to apply the variant cut to the ts recalibrated vcf.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5183 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:34:40 +00:00
chartl
5398cf620a
Bug fixes in the in process function (spoiled by python: was not closing my writers). SortByRef now works somewhat like the perl script does, rather than doing a memory-expensive sort. Adding a QTools qscript which is kinda clunky, and will be used mostly for integration tests of these IPFs, pending some better way to construct argument collections and function accessors at compile-time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5182 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:32:46 +00:00
carneiro
cf15819db5
updated to work with the new VariantEval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5176 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 17:46:07 +00:00
rpoplin
47357b726e
Fixing import GenotypeCalculationModel since it doesn't exist anymore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5175 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 15:39:43 +00:00
fromer
7605f0e6c1
Corrected input/output definitions for Queue
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5173 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 07:39:00 +00:00
fromer
3839fd1a25
Updated phasing pipeline to properly read samples from VCF and BAM files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5172 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 07:16:05 +00:00