Commit Graph

104 Commits (95d381efe23a97cba4d3c7b50906e37fd7ce86f8)

Author SHA1 Message Date
depristo f777c806d6 snpSelector v2 -- code refactoring and support for comparison with known truth. Looks great.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1986 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 19:32:12 +00:00
depristo 7cb51dbc31 snpSelector v1 -- and supporting changes to VCF reader
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1983 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 23:00:46 +00:00
chartl eca0942644 Oops. Let's make sure only to write calls that the pool supports to the auxiliary vcf files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1974 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 17:14:55 +00:00
chartl fc17e75759 Put this puppy through its paces. Eliminated the sorting and header-handling stuff; that isn't the purvey of this script and should be handled downstream or by a script wrapper.
I also secretly handled another pesky overlow exception. Occasionally Syzygy could report lods of like -1000; e.g. posterior probabilities of one in one (((googol) googol) googol) googol which of course makes python blow up. Now we safely output an accurate posterior.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1971 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 06:05:45 +00:00
chartl 3d9195f8b6 Added - converter from expanded summary to VCF (beautiful thing, really)
Removed - the ugly hackjob that was expanded summary to Geli



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1970 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 22:20:47 +00:00
depristo d60c632099 Minor output improvement
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1965 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 13:20:55 +00:00
depristo 44ea55d338 Useful library for parsing VCF files, plus a general VCF->table converter
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1964 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 13:14:04 +00:00
chartl 99337df929 Now looks up and propagates Syzygy's LOD scores into the appropriate field (so variantfiltration can adjust lod scores accurately)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1950 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 21:13:03 +00:00
chartl 7654051aee Faster grepping
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1948 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 16:59:17 +00:00
chartl 4319ff0610 A python script that will convert pooled expanded summary files (from Jason Flannick's pipeline) into .geli files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1947 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 16:39:57 +00:00
depristo c1e1d910cb simple monitor for watching pilot 1 call progress
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1769 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 13:04:53 +00:00
depristo de9f2b11da Detects unmapped (no bai) bam files and doesn't blow up
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1725 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 12:56:28 +00:00
ebanks 8349004414 Generalize the regexp for analysis files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1714 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:17:41 +00:00
depristo 3a341b2f06 Fixes for VariantEval for genotyping mode
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:01:43 +00:00
andrewk d191e02c88 Automated parsing stats from VariantEval and outputting stats to "*.oneline_stats" files; needed to do larger culling of predictions vs. actual SNP call for Pilot 3 lanes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1620 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:40:11 +00:00
andrewk e06e31d99f Many generalization improvements - parameters, files as options - to script that runs pieces for predicting SNP calling performance for given SNP calling coverage
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1619 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:37:53 +00:00
andrewk 7eb21e55c1 Added die_on_fail which outputs an error message and stops execution if a farm or terminal command fails
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1618 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:30:13 +00:00
depristo 6e13a36059 Framework for ROD walkers -- totally experiment and not working right now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1600 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:13:15 +00:00
ebanks 70ec37661c Fix merger command
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1584 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 13:13:23 +00:00
ebanks 45c794d066 pipeline is complete
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1583 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 13:09:37 +00:00
ebanks 8c33dd2393 enable job names
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1582 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 13:08:53 +00:00
depristo fc0d9578f6 better feedback now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1579 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 12:43:45 +00:00
depristo c988205884 Notes for Aaron in SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1576 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:18:51 +00:00
depristo ec0f6f23c7 LocusIterationByState is now the system deafult. Fixed Aaron's build problem
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 01:28:05 +00:00
ebanks 2a6f3a03c9 update script to put pilot1 bams directly onto hphome
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1547 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 14:41:35 +00:00
ebanks e716f9337d A few more additions; almost done...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1541 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 01:50:22 +00:00
ebanks 5dbba6711c Lots of changes: (I'll send email out in a sec)
1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it).
2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing).
3) Have indel rod print samples


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 01:12:09 +00:00
depristo 1c3d67f0f3 Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 22:26:57 +00:00
ebanks 3ac5ac066f Checking in Michael's DoC parameterization script;
this functionality will eventually be moved into VariantFiltration


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1515 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 15:07:49 +00:00
ebanks d804a119dc script to run the complete pilot2 pipeline: from cleaning to calling to filtering
[not quite finished though]


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1512 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 14:35:55 +00:00
depristo b01ac9de0c High performance LocusIterator implementation. Now with greatly reduced memory impact and 2x (and more potentially) speed ups of raw locus iteration. General performance improvements to SSG with empirical probs. You can enable high-performance locus iteration with the -LIBS arg. It's still testing but passes validing pileup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1510 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 03:06:25 +00:00
andrewk 2402dcd4c9 Give usage message if no arguments provided.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1483 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 00:28:43 +00:00
andrewk ee05ddde16 Added command line options to make the barcode analysis script executable by end users.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1455 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-24 21:15:09 +00:00
ebanks 0e7c158949 I've pulled out the functionality of the analyzer into a single python file which doesn't require all of the irrelevant config parameters (which would cause problems for external users). I'll release this and the simple config file to 1KG for use in analyzing recalibration efforts.
Please note that this is literally my first foray into the wonderful world of python.  There could very well be a much more elegant way of releasing the script to external users without having to duplicate the file.  If so, anyone out there should (please) feel free to do so in a second release; but, for now, this needs to be online by tomorrow morning.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1404 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 02:56:43 +00:00
andrewk afccbc44ec Script that performs all the processing steps from raw Illumina reads through to analysis of barcoding and hybrid selection efficience as documented in the GATK tutorial; can automatically run all steps in series on the farm.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1354 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 00:22:53 +00:00
andrewk eb4b9a743a Script that runs most of the steps involved in validating the CoverageEval system that predicts performance for given depth of sequencing coverage across a genome.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1353 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 00:18:45 +00:00
andrewk efd0fd1f0a Short python script that takes paired-end BAMs and aligns them with BWA. Referenced in GSA wiki tutorial
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1351 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 00:04:10 +00:00
andrewk 1c648a2d5f Skip compiled python files (*.pyc) in svn status output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1346 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 21:45:23 +00:00
depristo d665d9714f By default now writes output to JOBID.lsf.output instead of going to email -- based on recommendations from the cancer group
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1325 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 13:18:58 +00:00
andrewk 00f9bcd6d1 CoverageEval.py tool right before some major changes to the core of the code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1293 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 16:58:23 +00:00
depristo 702cdd087f Actually listens to justPrint now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1253 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 16:52:46 +00:00
hanna c25f84a01c Regression: we lost our hack to work around BAM files with index problems (affects BAM files created before 23 Apr 2009 and traversed by interval). Added the hack back in, along with a much more explicit comment about why its there.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1248 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:41:37 +00:00
depristo 84d407ff3f Fixing odd merge problem with VariantEval -- better cluster analysis (no cumsum), rodVariant is now an AllelicVariant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1239 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 18:53:27 +00:00
andrewk c8fcecbc6f Added ParseDCCSequenceData.py to repository and made changes that allow an analysis of quantity of sequence data by platform and project, moved table / record system to a new module called FlatFileTable.py and built that into ParseDCCSequenceData and CoverageEval.py; changed lod threshold in CoverageEvalWalker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1201 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 22:04:26 +00:00
andrewk d3daecfc4d Added unit tests for function in ListUtils to randomly sample lists with replacement, updated AlleleFrequencyEstimate to provide a callType of HomRef, HetSNP, HomSNP, update indices in CoverageEval.py, and made a lot of changes to CoverageWalker biggest one being that it directly calls SingleSampleGenotyper instead of implementing some parts of SSG itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1189 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 02:05:40 +00:00
depristo f5b00c20d0 Updated python files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1182 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 14:15:39 +00:00
andrewk dcb8892568 Lot of code for coverage evaluation tools including first version of python script to evaluate the downsampled SSG callls made and the java code to make all the calls at Hapmap chip sites at various downsampling levels; ListUtils contains functions for randomnly subsetting lists (with replacement) which are useful for subsetting the same elements in both the reads and the offsets lists of a LocusWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1162 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 08:07:02 +00:00
depristo 5289230eb8 Version 0.2.1 (released) of the TableRecalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:50:55 +00:00
depristo caf5aef0f8 Much improved python analysis routines, as well as easier / more correct merging utility. Better R scripts, which now close recalibration data by the confidence of the quality score itself
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1081 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 01:12:35 +00:00
depristo c9e6cb72e1 Major improvements to python analysis code -- now computes a host of statistics about quality scores from the recal_data.csv file emitted by countcovariates. Includes average Q scores, medians, modes, stdev, coefficients of variation, RSME, and % bases > q10, q20, q30. Can finally quantify and compare the improvement of quality score recalibration.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1064 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-20 19:50:37 +00:00