Commit Graph

2436 Commits (74a5223b11c64873d18dedda5c8fab2c7a16bac9)

Author SHA1 Message Date
ebanks 74a5223b11 oops - didn't mean to check this in
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2914 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:28:22 +00:00
ebanks 5f3c80d9aa 1. To make indel calls, we need to get rid of the SNP-centricity of our code. First step is to have the reference be a String, not a char in the Genotype. Note that this is just a temporary patch until the genotype code is ported over to use VariantContext.
2. Significant refactoring of Plink code to work in the rods and use VariantContext.  More coming.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2913 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:26:40 +00:00
ebanks 6ceae22793 utility methods for genotype counts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2912 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:23:41 +00:00
kcibul 7578678f99 refactored to provide a sum of mismatch quality scores capability as well (used by Cancer)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2911 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 16:40:03 +00:00
aaron 232fcf829a removing the unsupported VCF validator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2909 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 15:45:33 +00:00
hanna 1b572b192a Stopgap fix for temporary problems sharding when indexless. A more compelling solution will come later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2908 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 02:59:14 +00:00
hanna 75a541b479 Fix nasty issue where shard boundaries aren't properly clipped during locus traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2907 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 23:31:58 +00:00
rpoplin af6e476df5 Copyright compliant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2905 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 15:29:34 +00:00
rpoplin 3a863d3e8c Initial check in of VariantOptimizer in playground. There is a Gaussian Mixture Model version and a k-Nearest Neighbors version. There is still lots of work to do. Nobody should be using it yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2904 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 15:26:18 +00:00
hanna 6133d73bf0 Locus (non-intervalled) traversal with new sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2903 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 01:58:44 +00:00
hanna 80f5d2829d Support for read interval sharding with proper filtering.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2902 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-27 20:26:34 +00:00
aaron d8fedd59be docs, cleanup, and some improvements to the iterators.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2901 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 22:36:04 +00:00
hanna b69c2d0f70 Cleanup. Remove some unnecessary methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2900 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 21:50:48 +00:00
hanna 30eb28886b Basic functionality for intervaled reads in new sharding system. Not currently filtering out cruft, so
the mode of operation is currently queryOverlapping rather than queryContained.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2899 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 21:41:55 +00:00
chartl cfff486338 This commit is for Kiran
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2898 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 18:18:38 +00:00
chartl 87f8fb7282 Quick commit in advance of Aaron's. Just a bunch of refactoring (private classes separated out, put in proper package). Also support added for coverage by read group rather than sample.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2897 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:39:47 +00:00
aaron 622554d7bd disable a part of the ROD for Reads code until the rest of the system goes live
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2896 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:15:42 +00:00
chartl 496ecc8186 Change in how overall coverage and means are stored in the DOCS object; change from keeping track of sample mean coverage to keeping track of sample total coverage (calculate means at the end)
This is a mid-way commit for Aaron



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2895 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:51:12 +00:00
hanna 1017a38f38 Initial refactoring of read traversal to make it easier to drop in intervalled reads traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2894 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:09:09 +00:00
depristo 9a6b384adb Support for no qual fields in VCF; better support for Mendelian violation calculations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2893 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 00:29:17 +00:00
aaron 246fa28386 RODs for reads phase 2: modified RODRecordList to implement List<ReferenceOrderedDatum> so I could stub it out for testing, added a FlashBackIterator which is needed to prevent the ResourcePool from opening infinity+1 iterators, and some other interfaces to make unit testing much smoother.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2892 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 22:48:55 +00:00
chartl 591102a841 Don't close the output stream if we're printing to stdout
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2891 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:50:58 +00:00
chartl 10cc71ceb0 Another midway commit for teh engineerz
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2890 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:24:02 +00:00
hanna 3289826892 Fix chartl's issue -- reduceInit() is sometimes called unnecessarily at the
end of a traversal.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2889 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:02:18 +00:00
chartl 3d92e5a737 Initial commit of integration test(s) for CoverageStatistics, currently in progress [midway commit is for Matt]
Modifications to CoverageStatistics - now includes and extends much of the behavior of DepthOfCoverage (per-base output, per-target output).

Additional functionality (coverage without deletions, base counts, by read group instead of by sample) is upcoming.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2888 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 20:25:07 +00:00
hanna 553d39bb00 Clean up the code a bit following the introduction of reduceByInterval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2887 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 01:20:22 +00:00
hanna 199b43fcf2 Reduce by interval alterations to interface with new sharding system. This checkin with be followed by a
simplification of some of the locus traversal code.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2886 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 00:16:50 +00:00
asivache 2572c24935 We were still dropping halves of some pairs, in which both reads were assigned to the same position. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2885 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 23:13:23 +00:00
aaron fef1154fc8 starting on RODs for Reads: made RODRecordList implement list<RODatum> (so we can sub in fake lists during testing), and removed unnecessary generic-ness. Removed BrokenRODSimulator, which isn't being used.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2884 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 22:11:53 +00:00
chartl 5df37968de Simplification of code segments; slight alteration to per-locus tabulation; added to-do items for cosmetic changes (mostly binning options and settigns)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2882 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:20:18 +00:00
asivache 27d3ef9458 Got rid of annoying commented printouts; no functional changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2881 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:12:30 +00:00
asivache d73bc490c2 Do not build alt consensuses from insertions that have an N in the inserted sequence. Seems to cause problems rather than solve any
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2880 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 03:00:26 +00:00
asivache 94d74d4f78 Multiple instances of the same consensus were all living happily together in the set of alt consensuses. As the result, we have been taking considerable performance hit from trying to align all reads to those instances over and over again. Fixed. Only one copy of any given alt consensus is now stored.
in class Consensus: 
1) use Arrays.equals() to compare java arrays!!
2) if object overrides equals() it also MUST provide appropriate hashCode() (thanks, Matt) 

As a side effect, a number of commented out debug prints are committed, still need them...

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2879 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 02:09:50 +00:00
chartl 1f673e9fab Float the bins with the given lower bound
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2878 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:48:53 +00:00
chartl 119d449b46 Formatting changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2877 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:43:15 +00:00
chartl 173956927b Summaries generated for firehose from DoC output have been migrated to its own walker to calculate aggregate coverage statistics in a parallelizable and fast way. This is an initial commit, bug-fixing and testing is upcoming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2876 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 18:41:02 +00:00
hanna 491b30e8de Eliminate a few stray loci that weren't being filtered out.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2875 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 18:00:52 +00:00
hanna fff15944fe Bug fix. Stopping condition of recurrence stopped too soon in some cases where an interval *contained* zero reads but *overlapped* with some reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2874 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 15:58:54 +00:00
hanna a0e8de40cf Bug fix: at one locus in the dataset, two reads were dropped.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2872 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 23:54:52 +00:00
aaron 5546aa4416 adding code to deal with the off-spec situation where our minimum likelihood is above the GLF max of 255.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2871 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 22:27:39 +00:00
hanna 88d0677379 Misc correctness enhancements: develop the bin selector into a recursive algorithm and return a shard when reads are missing. Also improve the performance of the read filter that clips reads not actually present in the shard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2870 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 22:19:06 +00:00
ebanks 8b555ff17c Killed the old cleaner code. Bye bye.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2868 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:49:58 +00:00
kshakir 3738b76320 Added a playground concordance analyzer for summarizing VariantEval across a group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2867 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:28:52 +00:00
ebanks a640bd2d79 ignore uninteresting extended events
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2866 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 19:55:46 +00:00
rpoplin 32e5dceef9 Moving comments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2865 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 19:27:31 +00:00
alecw b236714c8a Optimization - Added method to Covariates: void getValues( SAMRecord read, Comparable[] comparable ) which takes an array of size (at least) read.getReadLength() and fills it with covariate values for all positions in the given read. Made CovariateCounterWalker and TableRecalibrationWalker use this method instead of calling getValue(..) for each covariate and each offset.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2863 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 17:35:25 +00:00
ebanks 32d14d988e Overload parseIntervalRegion() to allow for the interval merging rule to be passed in (so one is not required to use the value from the GATK arg collection).
Now the IndelRealigner can use this functionality without being forced to merge  abutting intervals (which was actually causing a problem with the cleaning).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2862 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 04:13:54 +00:00
hanna cc09f48cd8 Correctness fix: index can concat chunks around shard edges, and my code didn't account for that.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2861 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 21:44:33 +00:00
chartl 0e05a3acb0 Adding depth of coverage features to firehose summary tools
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2860 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 19:47:16 +00:00
hanna 71f18e941f Significant performance improvements made by subtracting out the contents of the prior highest-level bin.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2859 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 16:46:16 +00:00