hanna
75a541b479
Fix nasty issue where shard boundaries aren't properly clipped during locus traversals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2907 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 23:31:58 +00:00
rpoplin
af6e476df5
Copyright compliant
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2905 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 15:29:34 +00:00
rpoplin
3a863d3e8c
Initial check in of VariantOptimizer in playground. There is a Gaussian Mixture Model version and a k-Nearest Neighbors version. There is still lots of work to do. Nobody should be using it yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2904 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 15:26:18 +00:00
hanna
6133d73bf0
Locus (non-intervalled) traversal with new sharding system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2903 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 01:58:44 +00:00
hanna
80f5d2829d
Support for read interval sharding with proper filtering.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2902 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-27 20:26:34 +00:00
aaron
d8fedd59be
docs, cleanup, and some improvements to the iterators.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2901 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 22:36:04 +00:00
hanna
b69c2d0f70
Cleanup. Remove some unnecessary methods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2900 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 21:50:48 +00:00
hanna
30eb28886b
Basic functionality for intervaled reads in new sharding system. Not currently filtering out cruft, so
...
the mode of operation is currently queryOverlapping rather than queryContained.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2899 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 21:41:55 +00:00
chartl
cfff486338
This commit is for Kiran
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2898 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 18:18:38 +00:00
chartl
87f8fb7282
Quick commit in advance of Aaron's. Just a bunch of refactoring (private classes separated out, put in proper package). Also support added for coverage by read group rather than sample.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2897 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:39:47 +00:00
aaron
622554d7bd
disable a part of the ROD for Reads code until the rest of the system goes live
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2896 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:15:42 +00:00
chartl
496ecc8186
Change in how overall coverage and means are stored in the DOCS object; change from keeping track of sample mean coverage to keeping track of sample total coverage (calculate means at the end)
...
This is a mid-way commit for Aaron
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2895 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:51:12 +00:00
hanna
1017a38f38
Initial refactoring of read traversal to make it easier to drop in intervalled reads traversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2894 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:09:09 +00:00
depristo
9a6b384adb
Support for no qual fields in VCF; better support for Mendelian violation calculations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2893 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 00:29:17 +00:00
aaron
246fa28386
RODs for reads phase 2: modified RODRecordList to implement List<ReferenceOrderedDatum> so I could stub it out for testing, added a FlashBackIterator which is needed to prevent the ResourcePool from opening infinity+1 iterators, and some other interfaces to make unit testing much smoother.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2892 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 22:48:55 +00:00
chartl
591102a841
Don't close the output stream if we're printing to stdout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2891 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:50:58 +00:00
chartl
10cc71ceb0
Another midway commit for teh engineerz
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2890 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:24:02 +00:00
hanna
3289826892
Fix chartl's issue -- reduceInit() is sometimes called unnecessarily at the
...
end of a traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2889 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:02:18 +00:00
chartl
3d92e5a737
Initial commit of integration test(s) for CoverageStatistics, currently in progress [midway commit is for Matt]
...
Modifications to CoverageStatistics - now includes and extends much of the behavior of DepthOfCoverage (per-base output, per-target output).
Additional functionality (coverage without deletions, base counts, by read group instead of by sample) is upcoming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2888 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 20:25:07 +00:00
hanna
553d39bb00
Clean up the code a bit following the introduction of reduceByInterval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2887 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 01:20:22 +00:00
hanna
199b43fcf2
Reduce by interval alterations to interface with new sharding system. This checkin with be followed by a
...
simplification of some of the locus traversal code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2886 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 00:16:50 +00:00
asivache
2572c24935
We were still dropping halves of some pairs, in which both reads were assigned to the same position. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2885 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 23:13:23 +00:00
aaron
fef1154fc8
starting on RODs for Reads: made RODRecordList implement list<RODatum> (so we can sub in fake lists during testing), and removed unnecessary generic-ness. Removed BrokenRODSimulator, which isn't being used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2884 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 22:11:53 +00:00
chartl
5df37968de
Simplification of code segments; slight alteration to per-locus tabulation; added to-do items for cosmetic changes (mostly binning options and settigns)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2882 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:20:18 +00:00
asivache
27d3ef9458
Got rid of annoying commented printouts; no functional changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2881 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:12:30 +00:00
asivache
d73bc490c2
Do not build alt consensuses from insertions that have an N in the inserted sequence. Seems to cause problems rather than solve any
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2880 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 03:00:26 +00:00
asivache
94d74d4f78
Multiple instances of the same consensus were all living happily together in the set of alt consensuses. As the result, we have been taking considerable performance hit from trying to align all reads to those instances over and over again. Fixed. Only one copy of any given alt consensus is now stored.
...
in class Consensus:
1) use Arrays.equals() to compare java arrays!!
2) if object overrides equals() it also MUST provide appropriate hashCode() (thanks, Matt)
As a side effect, a number of commented out debug prints are committed, still need them...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2879 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 02:09:50 +00:00
chartl
1f673e9fab
Float the bins with the given lower bound
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2878 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:48:53 +00:00
chartl
119d449b46
Formatting changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2877 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:43:15 +00:00
chartl
173956927b
Summaries generated for firehose from DoC output have been migrated to its own walker to calculate aggregate coverage statistics in a parallelizable and fast way. This is an initial commit, bug-fixing and testing is upcoming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2876 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 18:41:02 +00:00
hanna
491b30e8de
Eliminate a few stray loci that weren't being filtered out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2875 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 18:00:52 +00:00
hanna
fff15944fe
Bug fix. Stopping condition of recurrence stopped too soon in some cases where an interval *contained* zero reads but *overlapped* with some reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2874 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 15:58:54 +00:00
hanna
a0e8de40cf
Bug fix: at one locus in the dataset, two reads were dropped.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2872 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 23:54:52 +00:00
aaron
5546aa4416
adding code to deal with the off-spec situation where our minimum likelihood is above the GLF max of 255.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2871 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 22:27:39 +00:00
hanna
88d0677379
Misc correctness enhancements: develop the bin selector into a recursive algorithm and return a shard when reads are missing. Also improve the performance of the read filter that clips reads not actually present in the shard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2870 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 22:19:06 +00:00
ebanks
8b555ff17c
Killed the old cleaner code. Bye bye.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2868 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:49:58 +00:00
kshakir
3738b76320
Added a playground concordance analyzer for summarizing VariantEval across a group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2867 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:28:52 +00:00
ebanks
a640bd2d79
ignore uninteresting extended events
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2866 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 19:55:46 +00:00
rpoplin
32e5dceef9
Moving comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2865 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 19:27:31 +00:00
alecw
b236714c8a
Optimization - Added method to Covariates: void getValues( SAMRecord read, Comparable[] comparable ) which takes an array of size (at least) read.getReadLength() and fills it with covariate values for all positions in the given read. Made CovariateCounterWalker and TableRecalibrationWalker use this method instead of calling getValue(..) for each covariate and each offset.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2863 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 17:35:25 +00:00
ebanks
32d14d988e
Overload parseIntervalRegion() to allow for the interval merging rule to be passed in (so one is not required to use the value from the GATK arg collection).
...
Now the IndelRealigner can use this functionality without being forced to merge abutting intervals (which was actually causing a problem with the cleaning).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2862 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 04:13:54 +00:00
hanna
cc09f48cd8
Correctness fix: index can concat chunks around shard edges, and my code didn't account for that.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2861 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 21:44:33 +00:00
chartl
0e05a3acb0
Adding depth of coverage features to firehose summary tools
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2860 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 19:47:16 +00:00
hanna
71f18e941f
Significant performance improvements made by subtracting out the contents of the prior highest-level bin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2859 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 16:46:16 +00:00
rpoplin
3e0e7aad2d
Removing debug statement. oops.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2858 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 15:26:22 +00:00
rpoplin
7f19ff1fa1
Added a new option in the recalibrator to be used by people who have SOLiD data in which only a few of the reads have no-calls in the color space. These reads will be skipped over and left in the bam file untouched.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2857 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 15:25:23 +00:00
aaron
b1a4e6d840
removing non-ascii characters from my Copyright and from VariantEval2Walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2856 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:54:36 +00:00
aaron
33ae256186
a start to some of the infrastructure for Tribble, including dynamic detection of new RMD; not nearly wired in or complete yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2855 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:43:52 +00:00
ebanks
bbbad79f8c
Forgot to remove debugging code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2854 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:12:58 +00:00
ebanks
7669eaaeb3
Optimizations to the cleaner algorithm; reduce total runtime by almost 20%.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2852 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:10:56 +00:00
ebanks
79ab7affda
- Change sortOnDisk option to sortInMemory
...
- Fix horrible cleaner bug
- Trivial optimizations to cleaner code - more significant ones coming soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2850 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-17 20:52:57 +00:00
ebanks
2520889cb3
Check for bad intervals and don't emit them
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2849 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 21:42:36 +00:00
aaron
653f70efa2
added methods to validate an interval before you try to make a GenomeLoc: boolean validGenomeLoc().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2846 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 20:35:35 +00:00
chartl
01af3d0663
Update an error message :)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2842 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 23:24:06 +00:00
jmaguire
81313d9452
added class VCFMerge
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2840 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 14:41:50 +00:00
jmaguire
0ef50bcae7
- update to match recent changes in the VCF parser
...
- compute Het Error Rate in VCFConcordance
- changes to the frequency-specific optimizer
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2839 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 14:27:01 +00:00
depristo
8072e9aed5
should never commit without running intergration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2838 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 23:42:37 +00:00
depristo
a1a3d5fcb0
Support for reading in table of rsIDs -> dbSNP builds to back generate a dbSNP build X from a single file. Very useful indeed. dbSNP -> VC now captures the rsID in the context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2837 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 22:40:55 +00:00
kcibul
28f24ca2ae
made some private member/methods protected to allow for subclassing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2836 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 21:16:00 +00:00
hanna
232d884578
Got back most of the performance lost when I fixed the dropped reads problem.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2835 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 19:59:56 +00:00
chartl
04a2784bf7
Initial commit of tools under development for data QC through firehose.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2834 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 19:13:24 +00:00
hanna
77af5822d4
Correcting my incomplete understanding of how the BAM file index actually works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2833 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 16:15:19 +00:00
depristo
5f74fffa02
Massive improvements to VE2 infrastructure. Now supports VCF writing of interesting sites; multiple comp and eval tracks. Eric will be taking it over and expanding functionality over the next few weeks until it's ready to replace VE1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2832 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 15:26:52 +00:00
depristo
197dd540b5
added root GATKData variable
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2831 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 15:25:34 +00:00
ebanks
c6f6948f9d
Haiku:
...
Eric is a fool.
Matt found his really dumb bug.
Eric is humbled.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2830 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 04:51:56 +00:00
rpoplin
ecebf0bc62
Bug fix for null pointer exception in AnalyzeAnnotations if -name argument isn't specified
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2828 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 18:39:26 +00:00
mmelgar
ad608d0e9d
Cleaned up documentation on SecondaryBaseTransitionTableWalker and added Read Group and Allele Balance to the info.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2827 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 17:20:35 +00:00
hanna
34e566c90d
Fixed bug where new sharding system wasn't grabbing the reads that start at the end of a bin. Caused by what I currently believe to be a bug in Picard -- will verify with Alec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2826 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 17:00:04 +00:00
ebanks
96fee7cf7a
Disabling input of known indels for use as alternate consenses. When we get rods in a read traversal, it will be trivial to hook it into the cleaner (the code is already there).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2825 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 15:52:21 +00:00
ebanks
a4a2c9b172
Deal with bad input; also N-way out isn't default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2823 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 03:44:56 +00:00
hanna
dc885ba386
Fix for some correctness bugs found during early performance testing, phase 1.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2822 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 22:32:25 +00:00
depristo
c66861746a
improvements to ve2, including more meaningful mendelian violation counting. Support for VCF emitted interesting sites, annotated according to the evaluations themselves. Basic intergration test for VE2 started
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2819 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 16:12:29 +00:00
rpoplin
3de72daa88
Removing an accidently added import statement.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2818 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 15:54:24 +00:00
rpoplin
0b1e243a7b
CountCovariates now sorts the list of standard covariate classes coming from PackageUtils.getClassesImplementingInterface(). As a result some of the integration tests now make use of -standard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2817 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 15:52:20 +00:00
ebanks
6652b992f7
The new cleaner can now use known indels to create alternate consenses for cleaning.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2816 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 04:39:15 +00:00
hanna
0250338ce7
Basic use cases for merging BAM files with the new sharding system work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2815 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 22:14:37 +00:00
depristo
934d4b93a2
VariantContext to VCF converter. BeagleROD, and phasing of VCF calls. Integration tests galore :-)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2814 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 19:02:25 +00:00
andrewk
369cc50802
Added playground walker that does a basic concordance check between two VCF files - an eval and a truth file - across all samples in the eval file. Produces per-sample, per-locus debug info and simple concordance stats. This is not meant to be extended, but rather used for validating the HapMap to VCF conversion in preparation for retiring GFF-based HapMap data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2813 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 02:41:18 +00:00
depristo
94f892ad42
VCF->beagle and VCF phasing using beagle input. Appears to work fairly well. VariantContexts now support phased genotypes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2812 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 01:22:05 +00:00
depristo
457568485a
simple Beagle input ROD
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2811 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 01:21:04 +00:00
hanna
57b8c9a53c
Supporting infrastructure for merging SAM files. Not yet integrated into the datasource.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2810 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 23:59:38 +00:00
kshakir
fc810a1800
Updated VCF Reader to parse VCFs according to the VCFv3.3 spec. Column headers are tab separated since sample names might have spaces.
...
Updated test files in /humgen/gsa-scr1/GATK_Data/Validation_Data/*.vcf to remove spaces except for when they are supposed to be in the sample name.
Added @Test before VCFReaderTest.testHeaderNoRecords()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2809 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 22:55:59 +00:00
chartl
935e76daa1
Minor changes to oneoff walkers. PlinkRod altered but still commented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2808 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 18:49:56 +00:00
hanna
21369869b7
Extend regex that supports every 'word' character to use any printable character except ':'.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2807 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 03:29:55 +00:00
ebanks
4fe851a83d
Optimization: don't keep scoring an alternate consensus if it's already worse than the best alt seen so far.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2806 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-07 05:06:32 +00:00
ebanks
ca1917507f
Various improvements and fixes:
...
In indel cleaner:
1. allow the user to specify that he wants to use Picardâs SAMFileWriter sorting on disk instead of having us sort in memory; this is useful if the input consists of long reads.
2. for N-way-out mode: output bams now use the original headers from the corresponding input bams - as opposed to the merged header. This entailed some reworking of the datasources code.
3. intermediate check-in of code that allows user to input known indels to be used as alternate consenses. Not done yet.
In UG: fix bug in beagle output for Jared.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2805 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-07 04:21:04 +00:00
depristo
3b1ab86d11
Added generic interfaces to RefMetaDataTracker to obtain VariantContext objects. More docs. Integration tests for VariantContexts using dbSNP and VCF. At this stage if you use dbSNP or VCF files only in your walkers, please move them over to the VariantContext, it's just nicer. If you've got RODs that implemented the old variation/genotype interfaces, and you want them to work in new walkers, please add an adaptor to VariantContextAdaptors in refdata package. It should be easy and will reduce burden in the long term when those interfaces are retired.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2803 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:26:06 +00:00
depristo
995d55da81
now uses the new RMDT getVariantContext() functions instead of doing the work itself.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2802 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:23:06 +00:00
depristo
33760834d6
commented out inactive (due to string ==) but actually incorrect code. Sometimes two wrongs do make a right
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2801 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:22:26 +00:00
hanna
c7e006a996
Bug fixes for interval batching in sharding system. Sharding system now batches intervals and passes
...
basic tests for small and large intervals and intervals that cross bin boundaries. Currently works
only with a single BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2800 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 21:47:54 +00:00
asivache
a1d5a384f4
Reverting the last reversal. bestConsensus points to something also kept in a set, so just reassigning it will NOT automatically destroy the underlying data; explicit clearing of unneeded data reinstated. STUPIDO!!!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2796 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:08:53 +00:00
asivache
cf7e6d0c0b
Memory-saving change, same as in old IntervalCleaner (if alt consensus does not beat the best one, destroy its data immediately)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2795 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:05:04 +00:00
asivache
df0be25afb
ooops, no need to destroy old best's data explicitly, it will be done automatically of course
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2794 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:03:16 +00:00
asivache
9f44018b7d
Reducing memory footprint: if alt consensus does not beat the best alt observed so far, destroy its data immediately, instead of keeping them around. If new alt is better than the old best, then destroy the old best right away instead.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2793 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 17:58:54 +00:00
rpoplin
be33d1852c
Reverting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2792 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:57:09 +00:00
depristo
af8c47fc2f
Fixing up testVariantContext for integration tests for variant context. Printing of VCs and genotypes now stable using sorting. Cleaned up comments in quality score by strand. RefMetaDataTracker now directly allows walkers to obtain VariantContexts using the simple Collection<VariantContext> getAllVariantContexts(GenomeLoc curLocation, EnumSet<VariantContext.Type> allowedTypes, boolean requireStartHere, boolean takeFirstOnly) function. VCF and dbSNP VariantContexts now officially supported. Other importan types can be added to the adapator system in refdata package. Integration tests later today
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2791 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:42:54 +00:00
rpoplin
0d8d6e0a14
Ti/Tv module in VariantEval shows known and novel ratios if possible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2790 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:37:40 +00:00
depristo
1494dc875f
fixing up tests. Moves are complete
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2789 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 14:24:00 +00:00
depristo
c6d86da4b8
almost managed to move things around perfectly in move go
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2788 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 14:18:26 +00:00
depristo
e0af3bf761
updating back names
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2786 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:53:45 +00:00