depristo
da7de9960b
General bug fixes for snpSelector. More robust error checking and handling of NaN values.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2106 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-21 14:48:29 +00:00
depristo
52494d8176
cleanup of SNP selector -- ready for some additional testing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2042 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 21:46:31 +00:00
depristo
1a4d071d37
Better snpSelector, plus VCFmerge tool
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2022 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 22:02:57 +00:00
depristo
3990c6d950
snpSelector v3 -- bootstrapping support and VCF output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2004 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 22:48:51 +00:00
depristo
f777c806d6
snpSelector v2 -- code refactoring and support for comparison with known truth. Looks great.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1986 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 19:32:12 +00:00
depristo
7cb51dbc31
snpSelector v1 -- and supporting changes to VCF reader
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1983 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 23:00:46 +00:00
chartl
eca0942644
Oops. Let's make sure only to write calls that the pool supports to the auxiliary vcf files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1974 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 17:14:55 +00:00
chartl
fc17e75759
Put this puppy through its paces. Eliminated the sorting and header-handling stuff; that isn't the purvey of this script and should be handled downstream or by a script wrapper.
...
I also secretly handled another pesky overlow exception. Occasionally Syzygy could report lods of like -1000; e.g. posterior probabilities of one in one (((googol) googol) googol) googol which of course makes python blow up. Now we safely output an accurate posterior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1971 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 06:05:45 +00:00
chartl
3d9195f8b6
Added - converter from expanded summary to VCF (beautiful thing, really)
...
Removed - the ugly hackjob that was expanded summary to Geli
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1970 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 22:20:47 +00:00
depristo
d60c632099
Minor output improvement
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1965 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 13:20:55 +00:00
depristo
44ea55d338
Useful library for parsing VCF files, plus a general VCF->table converter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1964 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 13:14:04 +00:00
chartl
99337df929
Now looks up and propagates Syzygy's LOD scores into the appropriate field (so variantfiltration can adjust lod scores accurately)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1950 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 21:13:03 +00:00
chartl
7654051aee
Faster grepping
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1948 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 16:59:17 +00:00
chartl
4319ff0610
A python script that will convert pooled expanded summary files (from Jason Flannick's pipeline) into .geli files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1947 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 16:39:57 +00:00
depristo
c1e1d910cb
simple monitor for watching pilot 1 call progress
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1769 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 13:04:53 +00:00
depristo
de9f2b11da
Detects unmapped (no bai) bam files and doesn't blow up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1725 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 12:56:28 +00:00
ebanks
8349004414
Generalize the regexp for analysis files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1714 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:17:41 +00:00
depristo
3a341b2f06
Fixes for VariantEval for genotyping mode
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:01:43 +00:00
andrewk
d191e02c88
Automated parsing stats from VariantEval and outputting stats to "*.oneline_stats" files; needed to do larger culling of predictions vs. actual SNP call for Pilot 3 lanes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1620 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:40:11 +00:00
andrewk
e06e31d99f
Many generalization improvements - parameters, files as options - to script that runs pieces for predicting SNP calling performance for given SNP calling coverage
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1619 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:37:53 +00:00
andrewk
7eb21e55c1
Added die_on_fail which outputs an error message and stops execution if a farm or terminal command fails
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1618 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:30:13 +00:00
depristo
6e13a36059
Framework for ROD walkers -- totally experiment and not working right now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1600 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:13:15 +00:00
ebanks
70ec37661c
Fix merger command
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1584 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 13:13:23 +00:00
ebanks
45c794d066
pipeline is complete
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1583 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 13:09:37 +00:00
ebanks
8c33dd2393
enable job names
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1582 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 13:08:53 +00:00
depristo
fc0d9578f6
better feedback now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1579 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 12:43:45 +00:00
depristo
c988205884
Notes for Aaron in SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1576 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:18:51 +00:00
depristo
ec0f6f23c7
LocusIterationByState is now the system deafult. Fixed Aaron's build problem
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 01:28:05 +00:00
ebanks
2a6f3a03c9
update script to put pilot1 bams directly onto hphome
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1547 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 14:41:35 +00:00
ebanks
e716f9337d
A few more additions; almost done...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1541 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 01:50:22 +00:00
ebanks
5dbba6711c
Lots of changes: (I'll send email out in a sec)
...
1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it).
2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing).
3) Have indel rod print samples
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 01:12:09 +00:00
depristo
1c3d67f0f3
Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 22:26:57 +00:00
ebanks
3ac5ac066f
Checking in Michael's DoC parameterization script;
...
this functionality will eventually be moved into VariantFiltration
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1515 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 15:07:49 +00:00
ebanks
d804a119dc
script to run the complete pilot2 pipeline: from cleaning to calling to filtering
...
[not quite finished though]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1512 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 14:35:55 +00:00
depristo
b01ac9de0c
High performance LocusIterator implementation. Now with greatly reduced memory impact and 2x (and more potentially) speed ups of raw locus iteration. General performance improvements to SSG with empirical probs. You can enable high-performance locus iteration with the -LIBS arg. It's still testing but passes validing pileup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1510 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 03:06:25 +00:00
andrewk
2402dcd4c9
Give usage message if no arguments provided.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1483 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 00:28:43 +00:00
andrewk
ee05ddde16
Added command line options to make the barcode analysis script executable by end users.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1455 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-24 21:15:09 +00:00
ebanks
0e7c158949
I've pulled out the functionality of the analyzer into a single python file which doesn't require all of the irrelevant config parameters (which would cause problems for external users). I'll release this and the simple config file to 1KG for use in analyzing recalibration efforts.
...
Please note that this is literally my first foray into the wonderful world of python. There could very well be a much more elegant way of releasing the script to external users without having to duplicate the file. If so, anyone out there should (please) feel free to do so in a second release; but, for now, this needs to be online by tomorrow morning.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1404 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 02:56:43 +00:00
andrewk
afccbc44ec
Script that performs all the processing steps from raw Illumina reads through to analysis of barcoding and hybrid selection efficience as documented in the GATK tutorial; can automatically run all steps in series on the farm.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1354 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 00:22:53 +00:00
andrewk
eb4b9a743a
Script that runs most of the steps involved in validating the CoverageEval system that predicts performance for given depth of sequencing coverage across a genome.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1353 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 00:18:45 +00:00
andrewk
efd0fd1f0a
Short python script that takes paired-end BAMs and aligns them with BWA. Referenced in GSA wiki tutorial
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1351 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 00:04:10 +00:00
andrewk
1c648a2d5f
Skip compiled python files (*.pyc) in svn status output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1346 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 21:45:23 +00:00
depristo
d665d9714f
By default now writes output to JOBID.lsf.output instead of going to email -- based on recommendations from the cancer group
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1325 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 13:18:58 +00:00
andrewk
00f9bcd6d1
CoverageEval.py tool right before some major changes to the core of the code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1293 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 16:58:23 +00:00
depristo
702cdd087f
Actually listens to justPrint now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1253 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 16:52:46 +00:00
hanna
c25f84a01c
Regression: we lost our hack to work around BAM files with index problems (affects BAM files created before 23 Apr 2009 and traversed by interval). Added the hack back in, along with a much more explicit comment about why its there.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1248 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:41:37 +00:00
depristo
84d407ff3f
Fixing odd merge problem with VariantEval -- better cluster analysis (no cumsum), rodVariant is now an AllelicVariant
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1239 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 18:53:27 +00:00
andrewk
c8fcecbc6f
Added ParseDCCSequenceData.py to repository and made changes that allow an analysis of quantity of sequence data by platform and project, moved table / record system to a new module called FlatFileTable.py and built that into ParseDCCSequenceData and CoverageEval.py; changed lod threshold in CoverageEvalWalker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1201 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 22:04:26 +00:00
andrewk
d3daecfc4d
Added unit tests for function in ListUtils to randomly sample lists with replacement, updated AlleleFrequencyEstimate to provide a callType of HomRef, HetSNP, HomSNP, update indices in CoverageEval.py, and made a lot of changes to CoverageWalker biggest one being that it directly calls SingleSampleGenotyper instead of implementing some parts of SSG itself.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1189 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 02:05:40 +00:00
depristo
f5b00c20d0
Updated python files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1182 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 14:15:39 +00:00