Commit Graph

2450 Commits (adea38fd5eda595d2359310be90c49a20e2396bc)

Author SHA1 Message Date
hanna adea38fd5e Sharding system fixes for corner cases generally related to lack of coverage
in the BAM file.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2928 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 18:59:21 +00:00
chartl a4d494c38b Add option to adhere to the PlinkRod naming convention [ProjectName]|c[Chrom]_p[Pos]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2927 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 18:31:27 +00:00
ebanks 0dd65461a1 Various improvements to plink, variant context, and VCF code.
We almost completely support indels. Not yet done with plink stuff.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2926 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 17:58:01 +00:00
aaron c8077b7a22 Waypoint check-in: a couple of changes to for Tribble, and adding some options to the integration test for passing in auxillary files that aren’t “%s” command line options.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2925 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 16:02:21 +00:00
chartl 6759acbdef Coverage statistics now fully implements DepthOfCoverage functionality, including the ability to print base counts. Minor changes to BaseUtils to support 'N' and 'D' characters. PickSequenomProbes now has the option to not print the whole window as part of the probe name (e.g. you just see PROJECT_NAME|CHR_POS and not PROJECT_NAME|CHR_POS_CHR_PROBESTART-PROBEND). Full integration tests for CoverageStatistics are forthcoming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2924 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 15:00:02 +00:00
hanna 023654696e First pass at handling SAMFileReaders using a SAMReaderID. This allows us to firewall
GATK users from the readers, which they could abuse in ways that could destabilize the GATK.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2923 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 00:59:32 +00:00
rpoplin b241e0915b Incremental update to VariantOptimizer. Refactored parts of the clustering code to make it more clear. More comments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2922 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 20:33:35 +00:00
asivache 073fdd8ec7 Let's try not to die suffocating when a bad region with humongous coverage is encountered. New option: -maxNumberOfReads (--mnr), with default of 10,000. If count of reads cached in the current window reaches the specified limit, the whole window is immediately shifted by the whole window length and all currently cached reads are dropped. NOTE: this also means that we are not going to call ANY indels from the current window, even though we could try using just the reads cached so far.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2921 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 17:34:30 +00:00
chartl 6ca6c98980 Can just give PickSequenomProbes a dbsnp rod to mask
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2920 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 16:50:58 +00:00
aaron ca2cd9d4f5 a little clean-up: move setting the bases of generated reads into Artificial SAM Utils now that the clean read injector test is gone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2919 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 16:31:45 +00:00
aaron 790d2a7776 adding the initial ROD for Reads support; more convenience methods in ReadMetaDataTracker to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2918 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 15:56:44 +00:00
ebanks 0e9a6826b0 Update to VCF code to get it up to spec.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2917 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 06:12:42 +00:00
ebanks 317fac8dff Better error message for --assume_single_sample_reads screw up
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2916 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 01:03:10 +00:00
hanna 104f4f7383 Mediocre implementation of reader pooling within the SAM data source. Will fix this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2915 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 22:35:02 +00:00
ebanks 74a5223b11 oops - didn't mean to check this in
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2914 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:28:22 +00:00
ebanks 5f3c80d9aa 1. To make indel calls, we need to get rid of the SNP-centricity of our code. First step is to have the reference be a String, not a char in the Genotype. Note that this is just a temporary patch until the genotype code is ported over to use VariantContext.
2. Significant refactoring of Plink code to work in the rods and use VariantContext.  More coming.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2913 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:26:40 +00:00
ebanks 6ceae22793 utility methods for genotype counts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2912 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:23:41 +00:00
kcibul 7578678f99 refactored to provide a sum of mismatch quality scores capability as well (used by Cancer)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2911 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 16:40:03 +00:00
aaron 232fcf829a removing the unsupported VCF validator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2909 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 15:45:33 +00:00
hanna 1b572b192a Stopgap fix for temporary problems sharding when indexless. A more compelling solution will come later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2908 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 02:59:14 +00:00
hanna 75a541b479 Fix nasty issue where shard boundaries aren't properly clipped during locus traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2907 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 23:31:58 +00:00
rpoplin af6e476df5 Copyright compliant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2905 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 15:29:34 +00:00
rpoplin 3a863d3e8c Initial check in of VariantOptimizer in playground. There is a Gaussian Mixture Model version and a k-Nearest Neighbors version. There is still lots of work to do. Nobody should be using it yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2904 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 15:26:18 +00:00
hanna 6133d73bf0 Locus (non-intervalled) traversal with new sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2903 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 01:58:44 +00:00
hanna 80f5d2829d Support for read interval sharding with proper filtering.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2902 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-27 20:26:34 +00:00
aaron d8fedd59be docs, cleanup, and some improvements to the iterators.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2901 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 22:36:04 +00:00
hanna b69c2d0f70 Cleanup. Remove some unnecessary methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2900 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 21:50:48 +00:00
hanna 30eb28886b Basic functionality for intervaled reads in new sharding system. Not currently filtering out cruft, so
the mode of operation is currently queryOverlapping rather than queryContained.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2899 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 21:41:55 +00:00
chartl cfff486338 This commit is for Kiran
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2898 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 18:18:38 +00:00
chartl 87f8fb7282 Quick commit in advance of Aaron's. Just a bunch of refactoring (private classes separated out, put in proper package). Also support added for coverage by read group rather than sample.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2897 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:39:47 +00:00
aaron 622554d7bd disable a part of the ROD for Reads code until the rest of the system goes live
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2896 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:15:42 +00:00
chartl 496ecc8186 Change in how overall coverage and means are stored in the DOCS object; change from keeping track of sample mean coverage to keeping track of sample total coverage (calculate means at the end)
This is a mid-way commit for Aaron



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2895 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:51:12 +00:00
hanna 1017a38f38 Initial refactoring of read traversal to make it easier to drop in intervalled reads traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2894 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:09:09 +00:00
depristo 9a6b384adb Support for no qual fields in VCF; better support for Mendelian violation calculations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2893 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 00:29:17 +00:00
aaron 246fa28386 RODs for reads phase 2: modified RODRecordList to implement List<ReferenceOrderedDatum> so I could stub it out for testing, added a FlashBackIterator which is needed to prevent the ResourcePool from opening infinity+1 iterators, and some other interfaces to make unit testing much smoother.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2892 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 22:48:55 +00:00
chartl 591102a841 Don't close the output stream if we're printing to stdout
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2891 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:50:58 +00:00
chartl 10cc71ceb0 Another midway commit for teh engineerz
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2890 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:24:02 +00:00
hanna 3289826892 Fix chartl's issue -- reduceInit() is sometimes called unnecessarily at the
end of a traversal.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2889 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:02:18 +00:00
chartl 3d92e5a737 Initial commit of integration test(s) for CoverageStatistics, currently in progress [midway commit is for Matt]
Modifications to CoverageStatistics - now includes and extends much of the behavior of DepthOfCoverage (per-base output, per-target output).

Additional functionality (coverage without deletions, base counts, by read group instead of by sample) is upcoming.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2888 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 20:25:07 +00:00
hanna 553d39bb00 Clean up the code a bit following the introduction of reduceByInterval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2887 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 01:20:22 +00:00
hanna 199b43fcf2 Reduce by interval alterations to interface with new sharding system. This checkin with be followed by a
simplification of some of the locus traversal code.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2886 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 00:16:50 +00:00
asivache 2572c24935 We were still dropping halves of some pairs, in which both reads were assigned to the same position. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2885 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 23:13:23 +00:00
aaron fef1154fc8 starting on RODs for Reads: made RODRecordList implement list<RODatum> (so we can sub in fake lists during testing), and removed unnecessary generic-ness. Removed BrokenRODSimulator, which isn't being used.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2884 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 22:11:53 +00:00
chartl 5df37968de Simplification of code segments; slight alteration to per-locus tabulation; added to-do items for cosmetic changes (mostly binning options and settigns)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2882 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:20:18 +00:00
asivache 27d3ef9458 Got rid of annoying commented printouts; no functional changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2881 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:12:30 +00:00
asivache d73bc490c2 Do not build alt consensuses from insertions that have an N in the inserted sequence. Seems to cause problems rather than solve any
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2880 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 03:00:26 +00:00
asivache 94d74d4f78 Multiple instances of the same consensus were all living happily together in the set of alt consensuses. As the result, we have been taking considerable performance hit from trying to align all reads to those instances over and over again. Fixed. Only one copy of any given alt consensus is now stored.
in class Consensus: 
1) use Arrays.equals() to compare java arrays!!
2) if object overrides equals() it also MUST provide appropriate hashCode() (thanks, Matt) 

As a side effect, a number of commented out debug prints are committed, still need them...

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2879 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 02:09:50 +00:00
chartl 1f673e9fab Float the bins with the given lower bound
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2878 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:48:53 +00:00
chartl 119d449b46 Formatting changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2877 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:43:15 +00:00
chartl 173956927b Summaries generated for firehose from DoC output have been migrated to its own walker to calculate aggregate coverage statistics in a parallelizable and fast way. This is an initial commit, bug-fixing and testing is upcoming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2876 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 18:41:02 +00:00