Commit Graph

140 Commits (c8c5c176cd8e4fca1db3ecb5cfda7807a1fbf649)

Author SHA1 Message Date
chartl dfe160ff77 Minor changes (additional info calculated)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2522 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 16:41:01 +00:00
depristo 588006ee92 Now supports strings in command line for farm submission
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2507 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 13:15:40 +00:00
depristo 9fb6533549 new -a option does fast merging of already sorted files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2500 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 13:55:39 +00:00
depristo 89f3ee614a minor printing fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2492 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 22:14:50 +00:00
depristo fcc80e8632 Completely rewritten duplicate traversal, more free of bugs, with integration tests for count duplicates walker validated on a TCGA hybrid capture lane.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2458 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:56:49 +00:00
andrewk 4e7e0432a2 Updated SNP calling power from coverage tools to work with new UnifiedGenotyper and DepthOfCoverage tools.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2378 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 20:44:30 +00:00
andrewk f5e547ed6e Add ability for flat file table parsing module to skip ahead to first occurence of a regular expression (use case: consistently parsing DepthOfCoverage output for histogram section of file across file format changes)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2377 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 20:38:50 +00:00
andrewk bf76019f22 Minor change to coverage evalution script, to update for new file format and add output fields
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2375 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 18:06:08 +00:00
depristo 0d2a761460 Bugfix for minBaseQuality to ignore deletion reads. LocusMismatch walker now allows us to skip every nths eligable site
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2357 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 14:38:39 +00:00
depristo 56467df49a minor improvements to snpSelector to work with hapmap chip VCF files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2343 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 17:59:32 +00:00
depristo b2dfe85648 Better support for reading truth file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2307 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 12:16:05 +00:00
chartl 6a4118ad3c grr, ought to actually assign it to the TRUTH_CALLS variable
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2302 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 23:31:46 +00:00
chartl 987fced151 Should read truth data from the parser options rather than direct from args
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2301 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 23:26:26 +00:00
chartl 8825211fdb Adding this to subversion so it's protected
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2299 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 21:26:17 +00:00
depristo 2632cb6b58 minor improvements to snp selector
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2275 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 03:37:14 +00:00
chartl b817db0962 Syzygy has a default LOD score of 0.91 on bases with no coverage, this is problematic. Set the minimum lod threshold to 1 because I just don't want to see that codswallop.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2268 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 23:29:14 +00:00
depristo 0753315156 updates to the python snp selector -- now sorts info fields and we stop printing unnecessary debugging info in vcf2table
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2265 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 22:16:02 +00:00
chartl 0f89a38473 forgot to commit this earlier
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2264 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 22:10:16 +00:00
chartl c1263e841c stop printing the debug info -- hurr
Also it turns out that sometimes there can be a call with zero total non-I/non-D bases -- so add one to numerator and denominator to prevent divide by zero error



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2262 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 16:17:38 +00:00
chartl 0c2d6d7e41 A brute-force script to convert Syzygy lod-score calls files into a proper VCF -- with some useful annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2261 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 16:07:06 +00:00
depristo 2c7cb912f0 Bug fixes for mixed none/valued attributes. also now assigns fake float values for display, if requested, for covariates using the -plottable flag
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2253 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 23:52:35 +00:00
depristo dbb8b86ed1 Minor updates to correctly handle emitting FN calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2231 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 22:53:17 +00:00
rpoplin 67179e2412 Initial checkin of AnalyzeCovariates.java which replaces analyzeRecalQuals_1KG.py and is updated to use the new Covariates system. It creates similar plots of residual error for each covariate that was used in the calculation. There is also an option to filter out base qualities below a given threshold.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2215 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 16:47:35 +00:00
depristo 8a87d5add1 misc. bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2212 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 14:36:03 +00:00
depristo c93d37d9fb continuing improvements in output of snpSelector
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2198 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 15:42:06 +00:00
depristo 2ea93385be Better support for comparison to truth. Now emits FP rates for each covariate if a truth file is provided. Also now writes out a detailed recal.log file that can be parsed directly into R
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2179 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 22:20:40 +00:00
chartl 662bbbd53b Awful stupid bug. This will use up one of my bad code offsets.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2178 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 20:09:33 +00:00
chartl fa2d564f2c And the compulsory one-second-later fix -- better handling of arguments (e.g. for callng from outside of /trunk/python/)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2177 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 20:02:43 +00:00
chartl 45673d7851 A quick and dirty script that, given a list of input VCF files, will output a new VCF file which looks identical to the first VCF file of the input list, except that the info field has been updated to reflect the union of all the INFO annotations across the VCF files
Note: this is primarily for use with two files with mostly disjoint annotations. It views "SB=2.5" as a different info field than "SB=2.2" and so will output as info "SB=2.5;SB=2.2". That is, it compares the full field string, rather than only the field name.

Usage:

./mergeVCFInfoFields I=[comma-delimited list of files] O=[output file]



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2176 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 20:01:29 +00:00
depristo 65da04ca85 Now uses the theoretically correct relationship between SNP FP and TP ratios for Illumina data. maxQ score for a snp is now 60
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2168 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 22:08:12 +00:00
depristo 03342c1fdd Restructuring and interface change to ReadBackedPileup. We now lower support the Pileup interface, the BasicPileup static methods, and the ReadBackedPileup class. Now everything is a ReadBackedPileup and all methods to manipulate pileups are off of it. Also provides the recommended iterable() interface of pileup elements so you can use the syntax for (PileupElement p : pileup) and access directly from p.getBase() and p.getQual() and p.getSecondBase(). Only a few straggler walkers use the old style interface -- but those walkers will be retired soon. Documentation coming in the AM. Please everyone use the new syntax, it's safer, and will be more efficient as soon as the LocusIteratorByState directly emits the ReadBackedPileup for the Alignment context, as opposed to the current interface. In the process of the change over, discovered several bugs in the second-best base code due to things getting out of sync, but these changes were resolved manually. All other integrationtests passed without modification.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2154 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 03:51:41 +00:00
depristo bc35a34f60 More informative printing, no longer prints tons of NaN warnings
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2139 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 15:45:48 +00:00
depristo da7de9960b General bug fixes for snpSelector. More robust error checking and handling of NaN values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2106 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-21 14:48:29 +00:00
depristo 52494d8176 cleanup of SNP selector -- ready for some additional testing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2042 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 21:46:31 +00:00
depristo 1a4d071d37 Better snpSelector, plus VCFmerge tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2022 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 22:02:57 +00:00
depristo 3990c6d950 snpSelector v3 -- bootstrapping support and VCF output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2004 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 22:48:51 +00:00
depristo f777c806d6 snpSelector v2 -- code refactoring and support for comparison with known truth. Looks great.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1986 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 19:32:12 +00:00
depristo 7cb51dbc31 snpSelector v1 -- and supporting changes to VCF reader
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1983 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 23:00:46 +00:00
chartl eca0942644 Oops. Let's make sure only to write calls that the pool supports to the auxiliary vcf files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1974 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 17:14:55 +00:00
chartl fc17e75759 Put this puppy through its paces. Eliminated the sorting and header-handling stuff; that isn't the purvey of this script and should be handled downstream or by a script wrapper.
I also secretly handled another pesky overlow exception. Occasionally Syzygy could report lods of like -1000; e.g. posterior probabilities of one in one (((googol) googol) googol) googol which of course makes python blow up. Now we safely output an accurate posterior.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1971 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 06:05:45 +00:00
chartl 3d9195f8b6 Added - converter from expanded summary to VCF (beautiful thing, really)
Removed - the ugly hackjob that was expanded summary to Geli



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1970 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 22:20:47 +00:00
depristo d60c632099 Minor output improvement
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1965 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 13:20:55 +00:00
depristo 44ea55d338 Useful library for parsing VCF files, plus a general VCF->table converter
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1964 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 13:14:04 +00:00
chartl 99337df929 Now looks up and propagates Syzygy's LOD scores into the appropriate field (so variantfiltration can adjust lod scores accurately)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1950 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 21:13:03 +00:00
chartl 7654051aee Faster grepping
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1948 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 16:59:17 +00:00
chartl 4319ff0610 A python script that will convert pooled expanded summary files (from Jason Flannick's pipeline) into .geli files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1947 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 16:39:57 +00:00
depristo c1e1d910cb simple monitor for watching pilot 1 call progress
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1769 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 13:04:53 +00:00
depristo de9f2b11da Detects unmapped (no bai) bam files and doesn't blow up
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1725 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 12:56:28 +00:00
ebanks 8349004414 Generalize the regexp for analysis files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1714 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:17:41 +00:00
depristo 3a341b2f06 Fixes for VariantEval for genotyping mode
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:01:43 +00:00