depristo
|
84d407ff3f
|
Fixing odd merge problem with VariantEval -- better cluster analysis (no cumsum), rodVariant is now an AllelicVariant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1239 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-14 18:53:27 +00:00 |
andrewk
|
c8fcecbc6f
|
Added ParseDCCSequenceData.py to repository and made changes that allow an analysis of quantity of sequence data by platform and project, moved table / record system to a new module called FlatFileTable.py and built that into ParseDCCSequenceData and CoverageEval.py; changed lod threshold in CoverageEvalWalker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1201 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-08 22:04:26 +00:00 |
andrewk
|
d3daecfc4d
|
Added unit tests for function in ListUtils to randomly sample lists with replacement, updated AlleleFrequencyEstimate to provide a callType of HomRef, HetSNP, HomSNP, update indices in CoverageEval.py, and made a lot of changes to CoverageWalker biggest one being that it directly calls SingleSampleGenotyper instead of implementing some parts of SSG itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1189 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-08 02:05:40 +00:00 |
depristo
|
f5b00c20d0
|
Updated python files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1182 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-07 14:15:39 +00:00 |
andrewk
|
dcb8892568
|
Lot of code for coverage evaluation tools including first version of python script to evaluate the downsampled SSG callls made and the java code to make all the calls at Hapmap chip sites at various downsampling levels; ListUtils contains functions for randomnly subsetting lists (with replacement) which are useful for subsetting the same elements in both the reads and the offsets lists of a LocusWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1162 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-03 08:07:02 +00:00 |
depristo
|
5289230eb8
|
Version 0.2.1 (released) of the TableRecalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-25 22:50:55 +00:00 |
depristo
|
caf5aef0f8
|
Much improved python analysis routines, as well as easier / more correct merging utility. Better R scripts, which now close recalibration data by the confidence of the quality score itself
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1081 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-24 01:12:35 +00:00 |
depristo
|
c9e6cb72e1
|
Major improvements to python analysis code -- now computes a host of statistics about quality scores from the recal_data.csv file emitted by countcovariates. Includes average Q scores, medians, modes, stdev, coefficients of variation, RSME, and % bases > q10, q20, q30. Can finally quantify and compare the improvement of quality score recalibration.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1064 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-20 19:50:37 +00:00 |
depristo
|
9e26550b0d
|
Apprach v2. Added python analysis script, so java no longer must be used to analyses quality score data. About to refactor out lots of unneeded code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1063 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-20 16:00:23 +00:00 |
depristo
|
8ac40e8e2d
|
Updated version of the recalibration tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1060 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-19 17:45:47 +00:00 |
andrewk
|
17a5b50ea4
|
Script that aligns paired-end BAMs using BWA.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1042 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-18 18:14:58 +00:00 |
depristo
|
260fd0dc45
|
Trivial change
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1000 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-12 19:11:28 +00:00 |
depristo
|
fb7ba47fff
|
Now does really neightbor distance calculation, as well as true snp cluster counting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@998 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-12 16:29:26 +00:00 |
depristo
|
1fb241a8b8
|
Now supports resume and dry runningRecalQual.py
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@996 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-11 23:31:59 +00:00 |
hanna
|
5440dd13df
|
Preparation for point release of read calibrator: no artificial heap size limit, no duplicate dbsnp records.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@986 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-11 18:39:33 +00:00 |
hanna
|
e77dfe9983
|
Allow script to be easily modified to support different platforms.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@955 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-09 16:06:57 +00:00 |
depristo
|
7fa84ea157
|
10x speedup of recalibration walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@954 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-09 15:39:40 +00:00 |
hanna
|
5fa3f7ed3a
|
Added absolute path bug fix for Mark.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@949 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-09 02:25:17 +00:00 |
hanna
|
127c321d0a
|
Cut over to 1kG version of fasta / reference. Updated doc with latest version of tool summary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@940 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-08 21:11:44 +00:00 |
hanna
|
596773e6c6
|
Cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@931 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-07 20:25:08 +00:00 |
hanna
|
e6aa058ec4
|
Tighten up error handling a bit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@920 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-06 03:40:50 +00:00 |
depristo
|
819862e04e
|
major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-05 23:34:37 +00:00 |
hanna
|
050d55cdb0
|
Basic graph support for testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@916 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-05 21:04:01 +00:00 |
hanna
|
2035d7dfd3
|
Revert some debug code in RecalQual.py. Make LogisticRegression easier to Ctrl-C out of.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@904 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-05 01:53:48 +00:00 |
hanna
|
61ae00c7bf
|
Lots of cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@903 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-05 01:26:10 +00:00 |
hanna
|
9689bb3331
|
Very early draft of script integrating the covariant counting / logistic regression. Deleted some unused code and spurious debug info.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@902 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-04 22:52:11 +00:00 |
hanna
|
40bc4ae39a
|
The building blocks for segmenting covariate counting data by read group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@899 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-04 19:55:24 +00:00 |
depristo
|
67112c79a1
|
More robust individual genotypes to population script
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@893 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-04 00:12:31 +00:00 |
andrewk
|
7755476d36
|
Updated coverter to reflect change in contig ordering in Geli files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@888 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-03 10:05:28 +00:00 |
andrewk
|
080af519cb
|
Added R script and uncommented a line in recal_qual.py
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@886 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-03 03:15:45 +00:00 |
andrewk
|
b2eb724456
|
First commit of recalibration master control script for recalibrating quality scores.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@885 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-03 02:17:10 +00:00 |
depristo
|
3998085e4b
|
more and better python scripts for dealing with calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@881 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-02 20:37:19 +00:00 |
andrewk
|
587d07da00
|
Merged functionality of two python scripts into LogRegression.py, some clarity updates to covariate and regression java files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@876 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-02 16:55:05 +00:00 |
depristo
|
ae2eddec2d
|
Improving, yet again, the merging of bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@874 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-02 13:31:12 +00:00 |
depristo
|
543c68cdd8
|
First version of individual geli files to population SNPS
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@865 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-31 15:29:10 +00:00 |
depristo
|
6adef28b97
|
Now supports automatic merging by population
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@864 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-31 15:28:44 +00:00 |
depristo
|
e0803eabd9
|
enabled underlying filtering of zero mapping quality reads, vastly improves system performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@853 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-29 14:51:08 +00:00 |
depristo
|
c72601322a
|
now returns the farm id when submitting a job!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@825 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-26 22:23:24 +00:00 |
depristo
|
04e51c8d1d
|
Better version of MergeBAMBatch -- more options for creating the file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@787 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-21 22:26:19 +00:00 |
depristo
|
3b1f84e15b
|
Slightly improved interface to merging utility for multiple bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@757 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-20 12:54:41 +00:00 |
depristo
|
e9f85ef920
|
Better merge support
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@748 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-18 21:18:51 +00:00 |
depristo
|
9dec783a82
|
Actually writes out a good header now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@744 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-18 13:34:52 +00:00 |
depristo
|
8e9e2f4502
|
Revised ROD system. Split the system in Basic type and interface. Enabled more control over rod accessing, including an initialize() function to fetch headers and other options from the file. Added general tabular rod, which has a named columns and supports a map<String,String> interface. Comes with shiny new Junit system for RODs. Also, added simple python script for accessing picard data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@716 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-14 21:06:28 +00:00 |
hanna
|
de1c282e62
|
Reference-ordered data relies on bugs in the old command-line argument system to work. Update the ROD system to from -B track1 type1 file1 track2 type2 file2 to -B track1,type1,file1 -B track2,type2,file2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@640 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-08 15:28:19 +00:00 |
depristo
|
30218ee31a
|
Better validation scripts and data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@562 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-29 17:40:07 +00:00 |
andrewk
|
58b2578c44
|
Several changes to CovariateCounter walker to print more tables (called vs. observed Q scores), bug fixes to LogisticRecalibrationWalker and LogisticRegressor, and print string functionality added to Pair.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@550 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-28 00:37:48 +00:00 |
depristo
|
3739682bef
|
Actually has working version of the python script to merge multiple bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@530 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-24 19:15:55 +00:00 |
depristo
|
40a2b3eeb3
|
Basic logistic regression support for calibrating qualities; mostly for Andrew to experiment with
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@529 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-24 19:09:50 +00:00 |
andrewk
|
38c2f73457
|
LogRegression.py script that converts parameter files for each dinucleotide regression into one file to be read in by correction script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@528 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-24 18:31:26 +00:00 |
depristo
|
b8a6f6e830
|
Support for indexBAM command
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@496 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-22 19:39:07 +00:00 |