Commit Graph

122 Commits (4c147329a9e20fc62651a1e43d19ff07d365c9f6)

Author SHA1 Message Date
chartl c1263e841c stop printing the debug info -- hurr
Also it turns out that sometimes there can be a call with zero total non-I/non-D bases -- so add one to numerator and denominator to prevent divide by zero error



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2262 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 16:17:38 +00:00
chartl 0c2d6d7e41 A brute-force script to convert Syzygy lod-score calls files into a proper VCF -- with some useful annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2261 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 16:07:06 +00:00
depristo 2c7cb912f0 Bug fixes for mixed none/valued attributes. also now assigns fake float values for display, if requested, for covariates using the -plottable flag
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2253 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 23:52:35 +00:00
depristo dbb8b86ed1 Minor updates to correctly handle emitting FN calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2231 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 22:53:17 +00:00
rpoplin 67179e2412 Initial checkin of AnalyzeCovariates.java which replaces analyzeRecalQuals_1KG.py and is updated to use the new Covariates system. It creates similar plots of residual error for each covariate that was used in the calculation. There is also an option to filter out base qualities below a given threshold.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2215 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 16:47:35 +00:00
depristo 8a87d5add1 misc. bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2212 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 14:36:03 +00:00
depristo c93d37d9fb continuing improvements in output of snpSelector
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2198 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 15:42:06 +00:00
depristo 2ea93385be Better support for comparison to truth. Now emits FP rates for each covariate if a truth file is provided. Also now writes out a detailed recal.log file that can be parsed directly into R
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2179 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 22:20:40 +00:00
chartl 662bbbd53b Awful stupid bug. This will use up one of my bad code offsets.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2178 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 20:09:33 +00:00
chartl fa2d564f2c And the compulsory one-second-later fix -- better handling of arguments (e.g. for callng from outside of /trunk/python/)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2177 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 20:02:43 +00:00
chartl 45673d7851 A quick and dirty script that, given a list of input VCF files, will output a new VCF file which looks identical to the first VCF file of the input list, except that the info field has been updated to reflect the union of all the INFO annotations across the VCF files
Note: this is primarily for use with two files with mostly disjoint annotations. It views "SB=2.5" as a different info field than "SB=2.2" and so will output as info "SB=2.5;SB=2.2". That is, it compares the full field string, rather than only the field name.

Usage:

./mergeVCFInfoFields I=[comma-delimited list of files] O=[output file]



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2176 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 20:01:29 +00:00
depristo 65da04ca85 Now uses the theoretically correct relationship between SNP FP and TP ratios for Illumina data. maxQ score for a snp is now 60
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2168 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 22:08:12 +00:00
depristo 03342c1fdd Restructuring and interface change to ReadBackedPileup. We now lower support the Pileup interface, the BasicPileup static methods, and the ReadBackedPileup class. Now everything is a ReadBackedPileup and all methods to manipulate pileups are off of it. Also provides the recommended iterable() interface of pileup elements so you can use the syntax for (PileupElement p : pileup) and access directly from p.getBase() and p.getQual() and p.getSecondBase(). Only a few straggler walkers use the old style interface -- but those walkers will be retired soon. Documentation coming in the AM. Please everyone use the new syntax, it's safer, and will be more efficient as soon as the LocusIteratorByState directly emits the ReadBackedPileup for the Alignment context, as opposed to the current interface. In the process of the change over, discovered several bugs in the second-best base code due to things getting out of sync, but these changes were resolved manually. All other integrationtests passed without modification.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2154 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 03:51:41 +00:00
depristo bc35a34f60 More informative printing, no longer prints tons of NaN warnings
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2139 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 15:45:48 +00:00
depristo da7de9960b General bug fixes for snpSelector. More robust error checking and handling of NaN values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2106 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-21 14:48:29 +00:00
depristo 52494d8176 cleanup of SNP selector -- ready for some additional testing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2042 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 21:46:31 +00:00
depristo 1a4d071d37 Better snpSelector, plus VCFmerge tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2022 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 22:02:57 +00:00
depristo 3990c6d950 snpSelector v3 -- bootstrapping support and VCF output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2004 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 22:48:51 +00:00
depristo f777c806d6 snpSelector v2 -- code refactoring and support for comparison with known truth. Looks great.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1986 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 19:32:12 +00:00
depristo 7cb51dbc31 snpSelector v1 -- and supporting changes to VCF reader
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1983 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 23:00:46 +00:00
chartl eca0942644 Oops. Let's make sure only to write calls that the pool supports to the auxiliary vcf files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1974 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 17:14:55 +00:00
chartl fc17e75759 Put this puppy through its paces. Eliminated the sorting and header-handling stuff; that isn't the purvey of this script and should be handled downstream or by a script wrapper.
I also secretly handled another pesky overlow exception. Occasionally Syzygy could report lods of like -1000; e.g. posterior probabilities of one in one (((googol) googol) googol) googol which of course makes python blow up. Now we safely output an accurate posterior.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1971 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 06:05:45 +00:00
chartl 3d9195f8b6 Added - converter from expanded summary to VCF (beautiful thing, really)
Removed - the ugly hackjob that was expanded summary to Geli



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1970 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 22:20:47 +00:00
depristo d60c632099 Minor output improvement
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1965 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 13:20:55 +00:00
depristo 44ea55d338 Useful library for parsing VCF files, plus a general VCF->table converter
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1964 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 13:14:04 +00:00
chartl 99337df929 Now looks up and propagates Syzygy's LOD scores into the appropriate field (so variantfiltration can adjust lod scores accurately)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1950 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 21:13:03 +00:00
chartl 7654051aee Faster grepping
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1948 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 16:59:17 +00:00
chartl 4319ff0610 A python script that will convert pooled expanded summary files (from Jason Flannick's pipeline) into .geli files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1947 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 16:39:57 +00:00
depristo c1e1d910cb simple monitor for watching pilot 1 call progress
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1769 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 13:04:53 +00:00
depristo de9f2b11da Detects unmapped (no bai) bam files and doesn't blow up
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1725 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 12:56:28 +00:00
ebanks 8349004414 Generalize the regexp for analysis files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1714 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:17:41 +00:00
depristo 3a341b2f06 Fixes for VariantEval for genotyping mode
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:01:43 +00:00
andrewk d191e02c88 Automated parsing stats from VariantEval and outputting stats to "*.oneline_stats" files; needed to do larger culling of predictions vs. actual SNP call for Pilot 3 lanes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1620 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:40:11 +00:00
andrewk e06e31d99f Many generalization improvements - parameters, files as options - to script that runs pieces for predicting SNP calling performance for given SNP calling coverage
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1619 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:37:53 +00:00
andrewk 7eb21e55c1 Added die_on_fail which outputs an error message and stops execution if a farm or terminal command fails
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1618 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:30:13 +00:00
depristo 6e13a36059 Framework for ROD walkers -- totally experiment and not working right now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1600 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:13:15 +00:00
ebanks 70ec37661c Fix merger command
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1584 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 13:13:23 +00:00
ebanks 45c794d066 pipeline is complete
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1583 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 13:09:37 +00:00
ebanks 8c33dd2393 enable job names
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1582 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 13:08:53 +00:00
depristo fc0d9578f6 better feedback now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1579 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 12:43:45 +00:00
depristo c988205884 Notes for Aaron in SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1576 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:18:51 +00:00
depristo ec0f6f23c7 LocusIterationByState is now the system deafult. Fixed Aaron's build problem
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 01:28:05 +00:00
ebanks 2a6f3a03c9 update script to put pilot1 bams directly onto hphome
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1547 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 14:41:35 +00:00
ebanks e716f9337d A few more additions; almost done...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1541 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 01:50:22 +00:00
ebanks 5dbba6711c Lots of changes: (I'll send email out in a sec)
1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it).
2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing).
3) Have indel rod print samples


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 01:12:09 +00:00
depristo 1c3d67f0f3 Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 22:26:57 +00:00
ebanks 3ac5ac066f Checking in Michael's DoC parameterization script;
this functionality will eventually be moved into VariantFiltration


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1515 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 15:07:49 +00:00
ebanks d804a119dc script to run the complete pilot2 pipeline: from cleaning to calling to filtering
[not quite finished though]


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1512 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 14:35:55 +00:00
depristo b01ac9de0c High performance LocusIterator implementation. Now with greatly reduced memory impact and 2x (and more potentially) speed ups of raw locus iteration. General performance improvements to SSG with empirical probs. You can enable high-performance locus iteration with the -LIBS arg. It's still testing but passes validing pileup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1510 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 03:06:25 +00:00
andrewk 2402dcd4c9 Give usage message if no arguments provided.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1483 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 00:28:43 +00:00