andrewk
|
eb4b9a743a
|
Script that runs most of the steps involved in validating the CoverageEval system that predicts performance for given depth of sequencing coverage across a genome.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1353 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-31 00:18:45 +00:00 |
andrewk
|
efd0fd1f0a
|
Short python script that takes paired-end BAMs and aligns them with BWA. Referenced in GSA wiki tutorial
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1351 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-31 00:04:10 +00:00 |
andrewk
|
1c648a2d5f
|
Skip compiled python files (*.pyc) in svn status output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1346 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-30 21:45:23 +00:00 |
depristo
|
d665d9714f
|
By default now writes output to JOBID.lsf.output instead of going to email -- based on recommendations from the cancer group
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1325 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-28 13:18:58 +00:00 |
andrewk
|
00f9bcd6d1
|
CoverageEval.py tool right before some major changes to the core of the code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1293 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-22 16:58:23 +00:00 |
depristo
|
702cdd087f
|
Actually listens to justPrint now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1253 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-15 16:52:46 +00:00 |
hanna
|
c25f84a01c
|
Regression: we lost our hack to work around BAM files with index problems (affects BAM files created before 23 Apr 2009 and traversed by interval). Added the hack back in, along with a much more explicit comment about why its there.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1248 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-15 14:41:37 +00:00 |
depristo
|
84d407ff3f
|
Fixing odd merge problem with VariantEval -- better cluster analysis (no cumsum), rodVariant is now an AllelicVariant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1239 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-14 18:53:27 +00:00 |
andrewk
|
c8fcecbc6f
|
Added ParseDCCSequenceData.py to repository and made changes that allow an analysis of quantity of sequence data by platform and project, moved table / record system to a new module called FlatFileTable.py and built that into ParseDCCSequenceData and CoverageEval.py; changed lod threshold in CoverageEvalWalker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1201 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-08 22:04:26 +00:00 |
andrewk
|
d3daecfc4d
|
Added unit tests for function in ListUtils to randomly sample lists with replacement, updated AlleleFrequencyEstimate to provide a callType of HomRef, HetSNP, HomSNP, update indices in CoverageEval.py, and made a lot of changes to CoverageWalker biggest one being that it directly calls SingleSampleGenotyper instead of implementing some parts of SSG itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1189 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-08 02:05:40 +00:00 |
depristo
|
f5b00c20d0
|
Updated python files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1182 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-07 14:15:39 +00:00 |
andrewk
|
dcb8892568
|
Lot of code for coverage evaluation tools including first version of python script to evaluate the downsampled SSG callls made and the java code to make all the calls at Hapmap chip sites at various downsampling levels; ListUtils contains functions for randomnly subsetting lists (with replacement) which are useful for subsetting the same elements in both the reads and the offsets lists of a LocusWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1162 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-03 08:07:02 +00:00 |
depristo
|
5289230eb8
|
Version 0.2.1 (released) of the TableRecalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-25 22:50:55 +00:00 |
depristo
|
caf5aef0f8
|
Much improved python analysis routines, as well as easier / more correct merging utility. Better R scripts, which now close recalibration data by the confidence of the quality score itself
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1081 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-24 01:12:35 +00:00 |
depristo
|
c9e6cb72e1
|
Major improvements to python analysis code -- now computes a host of statistics about quality scores from the recal_data.csv file emitted by countcovariates. Includes average Q scores, medians, modes, stdev, coefficients of variation, RSME, and % bases > q10, q20, q30. Can finally quantify and compare the improvement of quality score recalibration.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1064 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-20 19:50:37 +00:00 |
depristo
|
9e26550b0d
|
Apprach v2. Added python analysis script, so java no longer must be used to analyses quality score data. About to refactor out lots of unneeded code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1063 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-20 16:00:23 +00:00 |
depristo
|
8ac40e8e2d
|
Updated version of the recalibration tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1060 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-19 17:45:47 +00:00 |
andrewk
|
17a5b50ea4
|
Script that aligns paired-end BAMs using BWA.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1042 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-18 18:14:58 +00:00 |
depristo
|
260fd0dc45
|
Trivial change
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1000 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-12 19:11:28 +00:00 |
depristo
|
fb7ba47fff
|
Now does really neightbor distance calculation, as well as true snp cluster counting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@998 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-12 16:29:26 +00:00 |
depristo
|
1fb241a8b8
|
Now supports resume and dry runningRecalQual.py
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@996 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-11 23:31:59 +00:00 |
hanna
|
5440dd13df
|
Preparation for point release of read calibrator: no artificial heap size limit, no duplicate dbsnp records.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@986 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-11 18:39:33 +00:00 |
hanna
|
e77dfe9983
|
Allow script to be easily modified to support different platforms.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@955 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-09 16:06:57 +00:00 |
depristo
|
7fa84ea157
|
10x speedup of recalibration walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@954 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-09 15:39:40 +00:00 |
hanna
|
5fa3f7ed3a
|
Added absolute path bug fix for Mark.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@949 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-09 02:25:17 +00:00 |
hanna
|
127c321d0a
|
Cut over to 1kG version of fasta / reference. Updated doc with latest version of tool summary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@940 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-08 21:11:44 +00:00 |
hanna
|
596773e6c6
|
Cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@931 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-07 20:25:08 +00:00 |
hanna
|
e6aa058ec4
|
Tighten up error handling a bit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@920 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-06 03:40:50 +00:00 |
depristo
|
819862e04e
|
major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-05 23:34:37 +00:00 |
hanna
|
050d55cdb0
|
Basic graph support for testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@916 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-05 21:04:01 +00:00 |
hanna
|
2035d7dfd3
|
Revert some debug code in RecalQual.py. Make LogisticRegression easier to Ctrl-C out of.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@904 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-05 01:53:48 +00:00 |
hanna
|
61ae00c7bf
|
Lots of cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@903 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-05 01:26:10 +00:00 |
hanna
|
9689bb3331
|
Very early draft of script integrating the covariant counting / logistic regression. Deleted some unused code and spurious debug info.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@902 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-04 22:52:11 +00:00 |
hanna
|
40bc4ae39a
|
The building blocks for segmenting covariate counting data by read group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@899 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-04 19:55:24 +00:00 |
depristo
|
67112c79a1
|
More robust individual genotypes to population script
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@893 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-04 00:12:31 +00:00 |
andrewk
|
7755476d36
|
Updated coverter to reflect change in contig ordering in Geli files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@888 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-03 10:05:28 +00:00 |
andrewk
|
080af519cb
|
Added R script and uncommented a line in recal_qual.py
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@886 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-03 03:15:45 +00:00 |
andrewk
|
b2eb724456
|
First commit of recalibration master control script for recalibrating quality scores.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@885 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-03 02:17:10 +00:00 |
depristo
|
3998085e4b
|
more and better python scripts for dealing with calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@881 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-02 20:37:19 +00:00 |
andrewk
|
587d07da00
|
Merged functionality of two python scripts into LogRegression.py, some clarity updates to covariate and regression java files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@876 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-02 16:55:05 +00:00 |
depristo
|
ae2eddec2d
|
Improving, yet again, the merging of bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@874 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-02 13:31:12 +00:00 |
depristo
|
543c68cdd8
|
First version of individual geli files to population SNPS
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@865 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-31 15:29:10 +00:00 |
depristo
|
6adef28b97
|
Now supports automatic merging by population
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@864 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-31 15:28:44 +00:00 |
depristo
|
e0803eabd9
|
enabled underlying filtering of zero mapping quality reads, vastly improves system performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@853 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-29 14:51:08 +00:00 |
depristo
|
c72601322a
|
now returns the farm id when submitting a job!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@825 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-26 22:23:24 +00:00 |
depristo
|
04e51c8d1d
|
Better version of MergeBAMBatch -- more options for creating the file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@787 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-21 22:26:19 +00:00 |
depristo
|
3b1f84e15b
|
Slightly improved interface to merging utility for multiple bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@757 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-20 12:54:41 +00:00 |
depristo
|
e9f85ef920
|
Better merge support
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@748 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-18 21:18:51 +00:00 |
depristo
|
9dec783a82
|
Actually writes out a good header now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@744 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-18 13:34:52 +00:00 |
depristo
|
8e9e2f4502
|
Revised ROD system. Split the system in Basic type and interface. Enabled more control over rod accessing, including an initialize() function to fetch headers and other options from the file. Added general tabular rod, which has a named columns and supports a map<String,String> interface. Comes with shiny new Junit system for RODs. Also, added simple python script for accessing picard data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@716 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-14 21:06:28 +00:00 |
hanna
|
de1c282e62
|
Reference-ordered data relies on bugs in the old command-line argument system to work. Update the ROD system to from -B track1 type1 file1 track2 type2 file2 to -B track1,type1,file1 -B track2,type2,file2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@640 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-08 15:28:19 +00:00 |
depristo
|
30218ee31a
|
Better validation scripts and data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@562 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-29 17:40:07 +00:00 |
andrewk
|
58b2578c44
|
Several changes to CovariateCounter walker to print more tables (called vs. observed Q scores), bug fixes to LogisticRecalibrationWalker and LogisticRegressor, and print string functionality added to Pair.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@550 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-28 00:37:48 +00:00 |
depristo
|
3739682bef
|
Actually has working version of the python script to merge multiple bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@530 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-24 19:15:55 +00:00 |
depristo
|
40a2b3eeb3
|
Basic logistic regression support for calibrating qualities; mostly for Andrew to experiment with
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@529 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-24 19:09:50 +00:00 |
andrewk
|
38c2f73457
|
LogRegression.py script that converts parameter files for each dinucleotide regression into one file to be read in by correction script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@528 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-24 18:31:26 +00:00 |
depristo
|
b8a6f6e830
|
Support for indexBAM command
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@496 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-22 19:39:07 +00:00 |
depristo
|
e842b543c9
|
Better validation scripts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@458 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-16 23:18:00 +00:00 |
depristo
|
f47f640df6
|
Better debugging output and testing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@455 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-16 21:54:56 +00:00 |
depristo
|
2eabcfedb7
|
Fixed potential bug with next() operation returning empty contexts when a read contains a large deletion. We can now use the look ahead safely...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@439 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-15 21:41:30 +00:00 |
depristo
|
72a3d84ed2
|
General purpose pileup code -- you can use these features to obtain detailed pileup data from reads and offsets. Useful for all pileup based walkers. Expanded support for rodSAMPileup to enable the new ValidatingPileupWalker, which takes a samtools pileup output and checks that GATK gives identical output as samtools on a per base and per qual pileup. It's going to be a very useful validation tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@418 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-14 22:13:10 +00:00 |
depristo
|
49b2622e3d
|
Helper utility for merging BAM files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@345 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-09 20:10:41 +00:00 |
depristo
|
9d35f0ca67
|
The system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@320 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-04-07 22:21:57 +00:00 |
andrewk
|
9dee9ab51c
|
Added Hapmap data track (using rodGFF class for GFF file format) to toolkit as a command line option, Hapmap metrics to AlleleFrequencyMetricsWalker, and a python Geli2GFF file converter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@163 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-03-24 03:58:03 +00:00 |
hanna
|
2ee2623926
|
Move non-java code out of playground.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@154 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-03-23 19:31:38 +00:00 |
hanna
|
5031875507
|
Move to new directory organization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@35 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-03-11 20:58:01 +00:00 |
depristo
|
bd1fadd9fe
|
Validating walker for lots of bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@10 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-02-28 17:05:08 +00:00 |
depristo
|
e892c3fd98
|
Shouldn't be in the tree
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@9 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-02-28 15:31:17 +00:00 |
depristo
|
17aabb38f9
|
Basic reorganization of tree
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@8 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-02-28 15:28:56 +00:00 |