Commit Graph

2522 Commits (f20f78d77fac3dfd5b43ae30642728fd8b658f5d)

Author SHA1 Message Date
hanna e4360bac6a More comprehensive support when sharding for ref walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2951 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 11:25:20 +00:00
hanna eb165ca844 Celebrate the fact that the new sharding system works with integration tests
by removing the scary debug line.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2950 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 23:40:56 +00:00
hanna 9e107513d0 In the new sharding system, if no read group is present, hallucinate one. Added
for test compatibility, but not sure whether we still need this feature.  TODO: Poll the group about this feature.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2949 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 23:01:34 +00:00
hanna a7fe07c404 A few stopgap fixes to get the GATK to the point where the old sharding
infrastructure can be torn down:
1) New sharding system emulates old MonolithicSharding mechanism.
2) Better awareness of differences between fasta and BAM files when creating
   shards.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2948 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 21:01:25 +00:00
hanna dd6122f682 Fixed another bug in the original sharding system. Updated integration tests
as appropriate.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2947 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 15:32:18 +00:00
hanna ee2ec7ced9 Fix off-by-one error in original implementation of read sharding. Tested by
awking output of BamToFastq vs. samtools until the outputs matched exactly.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2945 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-06 18:52:53 +00:00
hanna 1ef1091f7c Cleanup and simplification of read interval sharding.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2944 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 23:34:38 +00:00
depristo ee913eca07 Forgot to check in fix this morning
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2943 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 21:07:19 +00:00
ebanks 7fa0f77721 add output for number of variants that validated as true
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2942 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 18:57:44 +00:00
chartl 037ac9c9af Actually calculate base counts by read group when "both" is specified. Modified integration test to cement the now-correct "both" behavior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2941 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 18:31:48 +00:00
chartl 8738c544f1 Minor refactoring of CoverageStatistics to allow simultaneous output of per-sample and per-read group statistics.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2940 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 17:06:52 +00:00
rpoplin 95d560aa2f More incremental updates to the variant optimizer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2939 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 16:42:42 +00:00
hanna 7a7e85188c Better eagerDecode default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2938 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 16:42:23 +00:00
depristo 33cefddf55 Better INFO field annotation for Mendel violations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2937 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 15:22:04 +00:00
ebanks 9f7ebe1e1c - add name to vcf od field
- don't do HW calculation if everything is a no-call


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2936 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 01:43:01 +00:00
hanna 7104a3a96c Fix for accumulator exception when running reduce by interval walkers without
intervals.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2935 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 01:04:08 +00:00
aaron 366771d5a6 another test-with-multiple outputs fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2934 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 22:46:15 +00:00
ebanks 9eb122924f misc cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2933 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:34:13 +00:00
chartl 706d49d84c Commit for Aaron
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2932 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:29:07 +00:00
ebanks c20d3e567e Now outputs fully spec-compliant VCF with proper annotations. Emits statistics as to number of good/bad records.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2931 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:28:17 +00:00
aaron 54f04dc541 forgot to uncomment the auto-deletion of temp files...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2930 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 20:29:42 +00:00
aaron 80cc6bbeb4 add a way to test files generated by a walker that aren't command-line arguments; added some example code in CoverageStatisticsIntegrationTest for Chris.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2929 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 20:20:58 +00:00
hanna adea38fd5e Sharding system fixes for corner cases generally related to lack of coverage
in the BAM file.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2928 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 18:59:21 +00:00
chartl a4d494c38b Add option to adhere to the PlinkRod naming convention [ProjectName]|c[Chrom]_p[Pos]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2927 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 18:31:27 +00:00
ebanks 0dd65461a1 Various improvements to plink, variant context, and VCF code.
We almost completely support indels. Not yet done with plink stuff.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2926 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 17:58:01 +00:00
aaron c8077b7a22 Waypoint check-in: a couple of changes to for Tribble, and adding some options to the integration test for passing in auxillary files that aren’t “%s” command line options.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2925 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 16:02:21 +00:00
chartl 6759acbdef Coverage statistics now fully implements DepthOfCoverage functionality, including the ability to print base counts. Minor changes to BaseUtils to support 'N' and 'D' characters. PickSequenomProbes now has the option to not print the whole window as part of the probe name (e.g. you just see PROJECT_NAME|CHR_POS and not PROJECT_NAME|CHR_POS_CHR_PROBESTART-PROBEND). Full integration tests for CoverageStatistics are forthcoming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2924 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 15:00:02 +00:00
hanna 023654696e First pass at handling SAMFileReaders using a SAMReaderID. This allows us to firewall
GATK users from the readers, which they could abuse in ways that could destabilize the GATK.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2923 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 00:59:32 +00:00
rpoplin b241e0915b Incremental update to VariantOptimizer. Refactored parts of the clustering code to make it more clear. More comments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2922 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 20:33:35 +00:00
asivache 073fdd8ec7 Let's try not to die suffocating when a bad region with humongous coverage is encountered. New option: -maxNumberOfReads (--mnr), with default of 10,000. If count of reads cached in the current window reaches the specified limit, the whole window is immediately shifted by the whole window length and all currently cached reads are dropped. NOTE: this also means that we are not going to call ANY indels from the current window, even though we could try using just the reads cached so far.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2921 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 17:34:30 +00:00
chartl 6ca6c98980 Can just give PickSequenomProbes a dbsnp rod to mask
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2920 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 16:50:58 +00:00
aaron ca2cd9d4f5 a little clean-up: move setting the bases of generated reads into Artificial SAM Utils now that the clean read injector test is gone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2919 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 16:31:45 +00:00
aaron 790d2a7776 adding the initial ROD for Reads support; more convenience methods in ReadMetaDataTracker to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2918 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 15:56:44 +00:00
ebanks 0e9a6826b0 Update to VCF code to get it up to spec.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2917 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 06:12:42 +00:00
ebanks 317fac8dff Better error message for --assume_single_sample_reads screw up
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2916 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 01:03:10 +00:00
hanna 104f4f7383 Mediocre implementation of reader pooling within the SAM data source. Will fix this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2915 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 22:35:02 +00:00
ebanks 74a5223b11 oops - didn't mean to check this in
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2914 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:28:22 +00:00
ebanks 5f3c80d9aa 1. To make indel calls, we need to get rid of the SNP-centricity of our code. First step is to have the reference be a String, not a char in the Genotype. Note that this is just a temporary patch until the genotype code is ported over to use VariantContext.
2. Significant refactoring of Plink code to work in the rods and use VariantContext.  More coming.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2913 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:26:40 +00:00
ebanks 6ceae22793 utility methods for genotype counts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2912 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:23:41 +00:00
kcibul 7578678f99 refactored to provide a sum of mismatch quality scores capability as well (used by Cancer)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2911 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 16:40:03 +00:00
aaron 232fcf829a removing the unsupported VCF validator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2909 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 15:45:33 +00:00
hanna 1b572b192a Stopgap fix for temporary problems sharding when indexless. A more compelling solution will come later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2908 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 02:59:14 +00:00
hanna 75a541b479 Fix nasty issue where shard boundaries aren't properly clipped during locus traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2907 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 23:31:58 +00:00
rpoplin af6e476df5 Copyright compliant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2905 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 15:29:34 +00:00
rpoplin 3a863d3e8c Initial check in of VariantOptimizer in playground. There is a Gaussian Mixture Model version and a k-Nearest Neighbors version. There is still lots of work to do. Nobody should be using it yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2904 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 15:26:18 +00:00
hanna 6133d73bf0 Locus (non-intervalled) traversal with new sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2903 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 01:58:44 +00:00
hanna 80f5d2829d Support for read interval sharding with proper filtering.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2902 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-27 20:26:34 +00:00
aaron d8fedd59be docs, cleanup, and some improvements to the iterators.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2901 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 22:36:04 +00:00
hanna b69c2d0f70 Cleanup. Remove some unnecessary methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2900 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 21:50:48 +00:00
hanna 30eb28886b Basic functionality for intervaled reads in new sharding system. Not currently filtering out cruft, so
the mode of operation is currently queryOverlapping rather than queryContained.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2899 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 21:41:55 +00:00
chartl cfff486338 This commit is for Kiran
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2898 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 18:18:38 +00:00
chartl 87f8fb7282 Quick commit in advance of Aaron's. Just a bunch of refactoring (private classes separated out, put in proper package). Also support added for coverage by read group rather than sample.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2897 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:39:47 +00:00
aaron 622554d7bd disable a part of the ROD for Reads code until the rest of the system goes live
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2896 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:15:42 +00:00
chartl 496ecc8186 Change in how overall coverage and means are stored in the DOCS object; change from keeping track of sample mean coverage to keeping track of sample total coverage (calculate means at the end)
This is a mid-way commit for Aaron



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2895 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:51:12 +00:00
hanna 1017a38f38 Initial refactoring of read traversal to make it easier to drop in intervalled reads traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2894 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:09:09 +00:00
depristo 9a6b384adb Support for no qual fields in VCF; better support for Mendelian violation calculations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2893 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 00:29:17 +00:00
aaron 246fa28386 RODs for reads phase 2: modified RODRecordList to implement List<ReferenceOrderedDatum> so I could stub it out for testing, added a FlashBackIterator which is needed to prevent the ResourcePool from opening infinity+1 iterators, and some other interfaces to make unit testing much smoother.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2892 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 22:48:55 +00:00
chartl 591102a841 Don't close the output stream if we're printing to stdout
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2891 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:50:58 +00:00
chartl 10cc71ceb0 Another midway commit for teh engineerz
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2890 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:24:02 +00:00
hanna 3289826892 Fix chartl's issue -- reduceInit() is sometimes called unnecessarily at the
end of a traversal.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2889 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:02:18 +00:00
chartl 3d92e5a737 Initial commit of integration test(s) for CoverageStatistics, currently in progress [midway commit is for Matt]
Modifications to CoverageStatistics - now includes and extends much of the behavior of DepthOfCoverage (per-base output, per-target output).

Additional functionality (coverage without deletions, base counts, by read group instead of by sample) is upcoming.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2888 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 20:25:07 +00:00
hanna 553d39bb00 Clean up the code a bit following the introduction of reduceByInterval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2887 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 01:20:22 +00:00
hanna 199b43fcf2 Reduce by interval alterations to interface with new sharding system. This checkin with be followed by a
simplification of some of the locus traversal code.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2886 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 00:16:50 +00:00
asivache 2572c24935 We were still dropping halves of some pairs, in which both reads were assigned to the same position. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2885 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 23:13:23 +00:00
aaron fef1154fc8 starting on RODs for Reads: made RODRecordList implement list<RODatum> (so we can sub in fake lists during testing), and removed unnecessary generic-ness. Removed BrokenRODSimulator, which isn't being used.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2884 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 22:11:53 +00:00
chartl 5df37968de Simplification of code segments; slight alteration to per-locus tabulation; added to-do items for cosmetic changes (mostly binning options and settigns)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2882 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:20:18 +00:00
asivache 27d3ef9458 Got rid of annoying commented printouts; no functional changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2881 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:12:30 +00:00
asivache d73bc490c2 Do not build alt consensuses from insertions that have an N in the inserted sequence. Seems to cause problems rather than solve any
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2880 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 03:00:26 +00:00
asivache 94d74d4f78 Multiple instances of the same consensus were all living happily together in the set of alt consensuses. As the result, we have been taking considerable performance hit from trying to align all reads to those instances over and over again. Fixed. Only one copy of any given alt consensus is now stored.
in class Consensus: 
1) use Arrays.equals() to compare java arrays!!
2) if object overrides equals() it also MUST provide appropriate hashCode() (thanks, Matt) 

As a side effect, a number of commented out debug prints are committed, still need them...

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2879 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 02:09:50 +00:00
chartl 1f673e9fab Float the bins with the given lower bound
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2878 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:48:53 +00:00
chartl 119d449b46 Formatting changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2877 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:43:15 +00:00
chartl 173956927b Summaries generated for firehose from DoC output have been migrated to its own walker to calculate aggregate coverage statistics in a parallelizable and fast way. This is an initial commit, bug-fixing and testing is upcoming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2876 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 18:41:02 +00:00
hanna 491b30e8de Eliminate a few stray loci that weren't being filtered out.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2875 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 18:00:52 +00:00
hanna fff15944fe Bug fix. Stopping condition of recurrence stopped too soon in some cases where an interval *contained* zero reads but *overlapped* with some reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2874 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 15:58:54 +00:00
hanna a0e8de40cf Bug fix: at one locus in the dataset, two reads were dropped.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2872 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 23:54:52 +00:00
aaron 5546aa4416 adding code to deal with the off-spec situation where our minimum likelihood is above the GLF max of 255.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2871 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 22:27:39 +00:00
hanna 88d0677379 Misc correctness enhancements: develop the bin selector into a recursive algorithm and return a shard when reads are missing. Also improve the performance of the read filter that clips reads not actually present in the shard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2870 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 22:19:06 +00:00
ebanks 8b555ff17c Killed the old cleaner code. Bye bye.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2868 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:49:58 +00:00
kshakir 3738b76320 Added a playground concordance analyzer for summarizing VariantEval across a group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2867 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:28:52 +00:00
ebanks a640bd2d79 ignore uninteresting extended events
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2866 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 19:55:46 +00:00
rpoplin 32e5dceef9 Moving comments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2865 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 19:27:31 +00:00
alecw b236714c8a Optimization - Added method to Covariates: void getValues( SAMRecord read, Comparable[] comparable ) which takes an array of size (at least) read.getReadLength() and fills it with covariate values for all positions in the given read. Made CovariateCounterWalker and TableRecalibrationWalker use this method instead of calling getValue(..) for each covariate and each offset.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2863 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 17:35:25 +00:00
ebanks 32d14d988e Overload parseIntervalRegion() to allow for the interval merging rule to be passed in (so one is not required to use the value from the GATK arg collection).
Now the IndelRealigner can use this functionality without being forced to merge  abutting intervals (which was actually causing a problem with the cleaning).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2862 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 04:13:54 +00:00
hanna cc09f48cd8 Correctness fix: index can concat chunks around shard edges, and my code didn't account for that.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2861 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 21:44:33 +00:00
chartl 0e05a3acb0 Adding depth of coverage features to firehose summary tools
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2860 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 19:47:16 +00:00
hanna 71f18e941f Significant performance improvements made by subtracting out the contents of the prior highest-level bin.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2859 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 16:46:16 +00:00
rpoplin 3e0e7aad2d Removing debug statement. oops.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2858 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 15:26:22 +00:00
rpoplin 7f19ff1fa1 Added a new option in the recalibrator to be used by people who have SOLiD data in which only a few of the reads have no-calls in the color space. These reads will be skipped over and left in the bam file untouched.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2857 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 15:25:23 +00:00
aaron b1a4e6d840 removing non-ascii characters from my Copyright and from VariantEval2Walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2856 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:54:36 +00:00
aaron 33ae256186 a start to some of the infrastructure for Tribble, including dynamic detection of new RMD; not nearly wired in or complete yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2855 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:43:52 +00:00
ebanks bbbad79f8c Forgot to remove debugging code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2854 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:12:58 +00:00
ebanks 7669eaaeb3 Optimizations to the cleaner algorithm; reduce total runtime by almost 20%.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2852 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:10:56 +00:00
ebanks 79ab7affda - Change sortOnDisk option to sortInMemory
- Fix horrible cleaner bug
- Trivial optimizations to cleaner code - more significant ones coming soon.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2850 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-17 20:52:57 +00:00
ebanks 2520889cb3 Check for bad intervals and don't emit them
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2849 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 21:42:36 +00:00
aaron 653f70efa2 added methods to validate an interval before you try to make a GenomeLoc: boolean validGenomeLoc().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2846 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 20:35:35 +00:00
chartl 01af3d0663 Update an error message :)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2842 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 23:24:06 +00:00
jmaguire 81313d9452 added class VCFMerge
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2840 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 14:41:50 +00:00
jmaguire 0ef50bcae7 - update to match recent changes in the VCF parser
- compute Het Error Rate in VCFConcordance
- changes to the frequency-specific optimizer




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2839 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 14:27:01 +00:00
depristo 8072e9aed5 should never commit without running intergration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2838 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 23:42:37 +00:00
depristo a1a3d5fcb0 Support for reading in table of rsIDs -> dbSNP builds to back generate a dbSNP build X from a single file. Very useful indeed. dbSNP -> VC now captures the rsID in the context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2837 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 22:40:55 +00:00