Commit Graph

263 Commits (8a0e813b0402a1facd4e03def3921dc36731deb6)

Author SHA1 Message Date
hanna c25f84a01c Regression: we lost our hack to work around BAM files with index problems (affects BAM files created before 23 Apr 2009 and traversed by interval). Added the hack back in, along with a much more explicit comment about why its there.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1248 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:41:37 +00:00
depristo 84d407ff3f Fixing odd merge problem with VariantEval -- better cluster analysis (no cumsum), rodVariant is now an AllelicVariant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1239 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 18:53:27 +00:00
andrewk c8fcecbc6f Added ParseDCCSequenceData.py to repository and made changes that allow an analysis of quantity of sequence data by platform and project, moved table / record system to a new module called FlatFileTable.py and built that into ParseDCCSequenceData and CoverageEval.py; changed lod threshold in CoverageEvalWalker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1201 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 22:04:26 +00:00
andrewk d3daecfc4d Added unit tests for function in ListUtils to randomly sample lists with replacement, updated AlleleFrequencyEstimate to provide a callType of HomRef, HetSNP, HomSNP, update indices in CoverageEval.py, and made a lot of changes to CoverageWalker biggest one being that it directly calls SingleSampleGenotyper instead of implementing some parts of SSG itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1189 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 02:05:40 +00:00
depristo f5b00c20d0 Updated python files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1182 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 14:15:39 +00:00
andrewk dcb8892568 Lot of code for coverage evaluation tools including first version of python script to evaluate the downsampled SSG callls made and the java code to make all the calls at Hapmap chip sites at various downsampling levels; ListUtils contains functions for randomnly subsetting lists (with replacement) which are useful for subsetting the same elements in both the reads and the offsets lists of a LocusWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1162 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 08:07:02 +00:00
depristo 5289230eb8 Version 0.2.1 (released) of the TableRecalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:50:55 +00:00
depristo caf5aef0f8 Much improved python analysis routines, as well as easier / more correct merging utility. Better R scripts, which now close recalibration data by the confidence of the quality score itself
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1081 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 01:12:35 +00:00
depristo c9e6cb72e1 Major improvements to python analysis code -- now computes a host of statistics about quality scores from the recal_data.csv file emitted by countcovariates. Includes average Q scores, medians, modes, stdev, coefficients of variation, RSME, and % bases > q10, q20, q30. Can finally quantify and compare the improvement of quality score recalibration.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1064 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-20 19:50:37 +00:00
depristo 9e26550b0d Apprach v2. Added python analysis script, so java no longer must be used to analyses quality score data. About to refactor out lots of unneeded code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1063 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-20 16:00:23 +00:00
depristo 8ac40e8e2d Updated version of the recalibration tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1060 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 17:45:47 +00:00
andrewk 17a5b50ea4 Script that aligns paired-end BAMs using BWA.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1042 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 18:14:58 +00:00
depristo 260fd0dc45 Trivial change
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1000 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 19:11:28 +00:00
depristo fb7ba47fff Now does really neightbor distance calculation, as well as true snp cluster counting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@998 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 16:29:26 +00:00
depristo 1fb241a8b8 Now supports resume and dry runningRecalQual.py
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@996 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 23:31:59 +00:00
hanna 5440dd13df Preparation for point release of read calibrator: no artificial heap size limit, no duplicate dbsnp records.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@986 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:39:33 +00:00
hanna e77dfe9983 Allow script to be easily modified to support different platforms.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@955 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 16:06:57 +00:00
depristo 7fa84ea157 10x speedup of recalibration walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@954 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 15:39:40 +00:00
hanna 5fa3f7ed3a Added absolute path bug fix for Mark.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@949 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 02:25:17 +00:00
hanna 127c321d0a Cut over to 1kG version of fasta / reference. Updated doc with latest version of tool summary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@940 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 21:11:44 +00:00
hanna 596773e6c6 Cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@931 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 20:25:08 +00:00
hanna e6aa058ec4 Tighten up error handling a bit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@920 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 03:40:50 +00:00
depristo 819862e04e major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 23:34:37 +00:00
hanna 050d55cdb0 Basic graph support for testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@916 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 21:04:01 +00:00
hanna 2035d7dfd3 Revert some debug code in RecalQual.py. Make LogisticRegression easier to Ctrl-C out of.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@904 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 01:53:48 +00:00
hanna 61ae00c7bf Lots of cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@903 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 01:26:10 +00:00
hanna 9689bb3331 Very early draft of script integrating the covariant counting / logistic regression. Deleted some unused code and spurious debug info.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@902 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 22:52:11 +00:00
hanna 40bc4ae39a The building blocks for segmenting covariate counting data by read group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@899 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 19:55:24 +00:00
depristo 67112c79a1 More robust individual genotypes to population script
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@893 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 00:12:31 +00:00
andrewk 7755476d36 Updated coverter to reflect change in contig ordering in Geli files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@888 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 10:05:28 +00:00
andrewk 080af519cb Added R script and uncommented a line in recal_qual.py
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@886 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 03:15:45 +00:00
andrewk b2eb724456 First commit of recalibration master control script for recalibrating quality scores.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@885 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 02:17:10 +00:00
depristo 3998085e4b more and better python scripts for dealing with calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@881 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 20:37:19 +00:00
andrewk 587d07da00 Merged functionality of two python scripts into LogRegression.py, some clarity updates to covariate and regression java files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@876 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 16:55:05 +00:00
depristo ae2eddec2d Improving, yet again, the merging of bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@874 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 13:31:12 +00:00
depristo 543c68cdd8 First version of individual geli files to population SNPS
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@865 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-31 15:29:10 +00:00
depristo 6adef28b97 Now supports automatic merging by population
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@864 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-31 15:28:44 +00:00
depristo e0803eabd9 enabled underlying filtering of zero mapping quality reads, vastly improves system performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@853 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 14:51:08 +00:00
depristo c72601322a now returns the farm id when submitting a job!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@825 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 22:23:24 +00:00
depristo 04e51c8d1d Better version of MergeBAMBatch -- more options for creating the file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@787 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:26:19 +00:00
depristo 3b1f84e15b Slightly improved interface to merging utility for multiple bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@757 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 12:54:41 +00:00
depristo e9f85ef920 Better merge support
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@748 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 21:18:51 +00:00
depristo 9dec783a82 Actually writes out a good header now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@744 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 13:34:52 +00:00
depristo 8e9e2f4502 Revised ROD system. Split the system in Basic type and interface. Enabled more control over rod accessing, including an initialize() function to fetch headers and other options from the file. Added general tabular rod, which has a named columns and supports a map<String,String> interface. Comes with shiny new Junit system for RODs. Also, added simple python script for accessing picard data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@716 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:06:28 +00:00
hanna de1c282e62 Reference-ordered data relies on bugs in the old command-line argument system to work. Update the ROD system to from -B track1 type1 file1 track2 type2 file2 to -B track1,type1,file1 -B track2,type2,file2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@640 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 15:28:19 +00:00
depristo 30218ee31a Better validation scripts and data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@562 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 17:40:07 +00:00
andrewk 58b2578c44 Several changes to CovariateCounter walker to print more tables (called vs. observed Q scores), bug fixes to LogisticRecalibrationWalker and LogisticRegressor, and print string functionality added to Pair.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@550 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 00:37:48 +00:00
depristo 3739682bef Actually has working version of the python script to merge multiple bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@530 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:15:55 +00:00
depristo 40a2b3eeb3 Basic logistic regression support for calibrating qualities; mostly for Andrew to experiment with
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@529 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:09:50 +00:00
andrewk 38c2f73457 LogRegression.py script that converts parameter files for each dinucleotide regression into one file to be read in by correction script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@528 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 18:31:26 +00:00
depristo b8a6f6e830 Support for indexBAM command
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@496 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 19:39:07 +00:00
depristo e842b543c9 Better validation scripts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@458 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 23:18:00 +00:00
depristo f47f640df6 Better debugging output and testing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@455 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 21:54:56 +00:00
depristo 2eabcfedb7 Fixed potential bug with next() operation returning empty contexts when a read contains a large deletion. We can now use the look ahead safely...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@439 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 21:41:30 +00:00
depristo 72a3d84ed2 General purpose pileup code -- you can use these features to obtain detailed pileup data from reads and offsets. Useful for all pileup based walkers. Expanded support for rodSAMPileup to enable the new ValidatingPileupWalker, which takes a samtools pileup output and checks that GATK gives identical output as samtools on a per base and per qual pileup. It's going to be a very useful validation tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@418 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:13:10 +00:00
depristo 49b2622e3d Helper utility for merging BAM files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@345 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:10:41 +00:00
depristo 9d35f0ca67 The system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@320 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 22:21:57 +00:00
andrewk 9dee9ab51c Added Hapmap data track (using rodGFF class for GFF file format) to toolkit as a command line option, Hapmap metrics to AlleleFrequencyMetricsWalker, and a python Geli2GFF file converter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@163 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 03:58:03 +00:00
hanna 2ee2623926 Move non-java code out of playground.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@154 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-23 19:31:38 +00:00
hanna 5031875507 Move to new directory organization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@35 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-11 20:58:01 +00:00
depristo bd1fadd9fe Validating walker for lots of bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@10 348d0f76-0448-11de-a6fe-93d51630548a
2009-02-28 17:05:08 +00:00
depristo e892c3fd98 Shouldn't be in the tree
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@9 348d0f76-0448-11de-a6fe-93d51630548a
2009-02-28 15:31:17 +00:00
depristo 17aabb38f9 Basic reorganization of tree
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@8 348d0f76-0448-11de-a6fe-93d51630548a
2009-02-28 15:28:56 +00:00