rpoplin
967215066d
The old CountCovariates now warns the user if they didn't supply a dbSNP rod file. Thanks Kiran for the use case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2055 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 19:16:46 +00:00
ebanks
4558375575
Stage 1 of the VariantFiltration refactoring is now complete. There now exists a parallel tool called VariantAnnotator which simply takes variant calls and annotates them with the same type of data that we used to use for filtering (e.g. DoC, allele balance). The output is a VCF with the INFO field appropriately annotated.
...
VariantAnnotator can be called as a standalone walker or by another walker, as it is by the UnifiedGenotyper. UG now no longer computes any of this meta data - it relegates the task completely to the annotator (assuming the output format accepts it).
This is a fairly all-encompassing check in. It involves changes to all of the UG code, bug fixes to much of the VCF code as things popped up, and other changes throughout. All integration tests pass and I've tediously confirmed that the annotation values are correct, but this framework could use some more rigorous testing.
Stage 2 of the process will happen later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2053 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 02:41:20 +00:00
kiran
103763fc84
An accessor for the VCF header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2051 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-15 09:28:25 +00:00
ebanks
bf451873ff
1. Bug fix: check that AF=0 doesn't contain more probability than 1-fraction
...
2. Fix for Kiran: allow UG to call SNPs at deletion sites; we'll add an annotation to the VariantAnotator for deletions at the locus (next week).
3. Added integration tests for joint estimation model
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2038 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 18:02:18 +00:00
asivache
1be36ca959
Bug fix: when cleanedReadIterator is initialized, it gets immediately set to the contig of the first cleaned read; when the first uncleaned read coming in is on the lower contig, this would trigger 'readNextContig' with that lower contig as an argument. As the result, the whole cleaned reads file would be read through the end and no cleaned reads would be ever seen by the code afterwards. Now we do not call readNextContig if the (uncleaned) read's contig is lower than the current contig already loaded into cleanedReadIterator. the 'readNextContig' method now also throws an exception if requested contig is less than the currently loaded one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2037 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 15:41:26 +00:00
depristo
cff31f2d06
comments for eric
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2035 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 14:19:31 +00:00
aaron
234bb71747
changed the toVariation() method to take a reference base, instead of using the reference base loaded from the underlying data source (if it was reference aware). Also changed some isVariant() methods which weren't using the passed in ref base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2034 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 06:54:38 +00:00
ebanks
902cf84448
Bug fix: if the most likely allele frequency is 0, don't make a variant call (even if the Qscore for AF=1/n > threshold)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2033 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 04:10:32 +00:00
ebanks
555fb975de
1. Print out allele frequency range (from joint estimation model only).
...
2. Don't print verbose output from SLOD calculation (it's just a repeat of previous output).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2032 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 03:59:13 +00:00
hanna
8145ed4672
Take 2, updating picard with bug fix for bam files containing no reads.
...
Just stomped on the existing md5s because that's what Eric told me to do.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2029 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 22:52:08 +00:00
ebanks
61b5fb82ce
2 major changes:
...
1. Add dbsnp RS ID to VCF output from genotyper; to do this I needed to fix the dbsnp rod which did not correctly return this value.
2. Remove AlleleBalanceBacked and instead generalize the arbitrary info fields backing VCFs (and potentially others) in preparation for refactoring VariantFiltration next week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2028 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 22:51:49 +00:00
aaron
c3c001e02e
cleanup of the traversal output code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2026 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 06:18:10 +00:00
ebanks
0922400ca9
Don't try to calculate ratios when DoC is zero (which happens when calls are made by an LD-aware genotyper)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2025 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 02:51:44 +00:00
hanna
2ea85fb62b
Fix some problematic command-line argument naming and descriptions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2023 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 02:12:26 +00:00
depristo
6c9f86bb4d
Removed unnecessary output and added debugging print() routine
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2020 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 18:37:36 +00:00
hanna
8406325247
New Picard is breaking one of the integration tests.
...
Revert until we find out whether the cause is legit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2017 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 03:59:32 +00:00
hanna
499e7d1d75
Push forward some more delicate merging routines.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2016 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 03:07:34 +00:00
hanna
bae4d3f7ea
Updated Picard with fix for Doug Voet. Thanks Alec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2015 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 02:01:08 +00:00
hanna
2e4782f202
Command-line arguments for SamReadFilters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2014 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 23:36:17 +00:00
hanna
2cf9670d1e
Allow users to directly specify filters from the command-line, applicable to
...
any walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2012 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 18:40:16 +00:00
ebanks
6a37090529
Output changes for VCF and UG:
...
1. Don't cap q-scores at 99
2. Scale SLOD to allow more resolution in the output
3. UG outputs weighted allele balance (AB) and on-off genotype (OO) info fields for het genotype calls (works for joint estimation model and SSG)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2011 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 16:31:31 +00:00
depristo
d316cbad4c
VariantFilteration now accepts a VCF rod in addition to an input geli. It will then annotate this VCF file with filtering information in the INFO field too. --OnlyAnnotate will not write in filtering output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2008 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 13:24:58 +00:00
aaron
f9819d5f13
a little clean-up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2007 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 06:18:34 +00:00
aaron
2ed423ed56
print the current location in read walkers (in addition to the number of reads processed), along with some refactoring to support the change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2006 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 05:57:01 +00:00
ebanks
2fa2ae43ec
Enough people have found this useful, so...
...
Moving Callset Concordance tool to core and adding integration test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2003 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:59:18 +00:00
ebanks
3793519bd4
-Added convenience method to VCF record to tell if it's a no call and have rodVCF use it before querying for info fields
...
-Don't restrict info fields to 2-letter keys
[about to move these to core]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2002 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:52:51 +00:00
ebanks
74751a8ed3
-Some minor fixes to get accurate vcf record merging done
...
-Improvement to snp genotype concordance test
And with that, it looks like I get revision #2000 .
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2000 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 06:40:55 +00:00
ebanks
7ce0df76f8
Added accessors to the rod data sources so that walkers can access the name/file/type triplets for input rods. This is necessary if e.g. you want to create a vcf writer based on all of the samples being input.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1994 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:25:39 +00:00
ebanks
d07f3bb6f6
Added methods to get strand bias and to test if record has allele freq or bias fields set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1993 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:20:35 +00:00
kiran
3313b0ddb4
Fixed a minor bug where the lodThreshold wasn't being printed in the header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1992 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:51:36 +00:00
kiran
567f5758d2
Optionally lists read depths by read group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1990 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:39:19 +00:00
hanna
21c5f543fa
Fix sharding bug -- loci to which >100,000 (= 1 shard) reads are assigned an
...
alignment start will confuse the sharding system and cause it to return duplicate reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1987 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 14:27:26 +00:00
ebanks
d549347f25
Refactored GenotypeLikelihoods to use an underlying 4-base model.
...
It needs to be modified a bit and then hooked up to a pooled model, but that is now possible.
At this point, there is no difference to the Unified Genotyper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1978 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 21:59:25 +00:00
aaron
aacd72854f
a fix for a bug Andrey discovered: in read-based interval traversals we're dupplicating reads in rare cases. The problem was that to accomidate a bug in SAM JDK indexing, we were forced to add one to the stop of our QueryOverlapping() calls to ensure we always got all of the overlapping reads.
...
Added a PlusOneFixIterator that wraps other iterators, and eliminates reads that start outside of our intended interval (interval stop - 1). Updated and checked BamToFastqIntegrationTest MD5 sums.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1976 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 05:26:33 +00:00
ebanks
a545859c62
Joint Estimation model now emits a reasonable slod
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1969 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 21:12:42 +00:00
ebanks
11d950abe0
No longer allow the lod_threshold argument - use confidence instead.
...
Have UG output qscores in all cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1968 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:18:51 +00:00
asivache
2fb45dbd73
Make window size a command line argument
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1967 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:13:35 +00:00
asivache
55f61b1f88
Bug fix in adjustment of the shift position.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1966 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:08:11 +00:00
ebanks
3a33401822
2nd stage of the genotyper output refactoring is complete.
...
Now, all output is generalized and all of the intelligence lies where it is supposed to.
Next stage is syncing up old and new models and making sure we're outputting exactly what we should.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1960 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 22:43:08 +00:00
aaron
ba67c7f02b
added a warning for those using bed files; we properly convert bed to the internal representation but the user needs to be aware that any output will be one-based closed intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1959 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 21:09:18 +00:00
aaron
b71b66bd88
the underlying parameter is a float so we need to use Float.valueOf() instead; Noticed by external user Hou Huabin
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1958 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 20:22:25 +00:00
ebanks
af6d0003f8
-Generalized the GenotypeConcordance module to deal with any number of individuals (although it will default to its old behavior if the -samples argument is left out).
...
-Make rods return the appropriate type of Genotype calls from getGenotype().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1954 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-01 05:35:47 +00:00
asivache
4b0796ba58
After fixing a few glitches and bugs, this version finally works as intended
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1952 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-31 04:59:58 +00:00
asivache
ea8d5c7077
Some internal refactoring. Now "safely" ignores duplicate records (NOT duplicate reads but rather malformed bam files!) resulting from the bug/feature in CleanedReadInjector.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1949 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 17:50:51 +00:00
ebanks
4ee1d6f733
-Have the calculation models determine whether a call passes the lod/confidence thresholds (as opposed to returning everything and letting the UG decide); this way, walkers which call map() will get only the good calls.
...
-Do the right thing in all models for all-base-mode (for Kiran).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1940 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 02:35:51 +00:00
asivache
e3b4d4cbed
Genotyper reimplemented. Does the same thing, at least for now, but internal data structures redesign enables collecting various statistics for indel-containing/reference-matching reads. The statistics are not yet used by the caller itself to make a better judgement w.r.t. the validity of the calls it makes, but they are now printed into the output stream (--verbose). The statistics (for both normal and tumor) include: indel observation count/total coverage, av. number of mismatches per indel-containing and per ref-matching read, av. mapping quality, av. mismatch rate and av. base quality within an NQS windoew around the indel, numbers of indel and ref observations per strand.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1936 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 19:09:16 +00:00
ebanks
5cdbdd9e5b
now that the design is stable, pull the setReference and setLocation methods back out of Genotype and stick them into constructors of implementing classes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1931 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 13:27:37 +00:00
ebanks
3091443dc7
Sweeping changes to the genotype output system, as per several discussions with Matt & Aaron.
...
Some things still need to be changed, but it will entail some more design decisions first (which means I get to bug M&A again tomorrow!).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1930 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 03:46:41 +00:00
depristo
86573177d1
Reverting rod walkers to use underlying refwalker implementation while we work on ROD2 and reenable the system. Added some serious sparse file parsing to variant eval tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1929 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 01:04:37 +00:00
aaron
5a3bd50537
adding error log reporting to the GATK, and a stream based output method for the argument collection
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1926 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 19:56:05 +00:00
aaron
04e9a494e9
removed the GenotypesBacked interface, which is currently unused. Also cleaned up some documentation lines
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1924 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 18:08:14 +00:00
depristo
186a8dd698
Trivial protection for null value
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1918 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:52:52 +00:00
depristo
726378be8b
Almost ready to stop doing eagar decoding; waiting on Eric
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1914 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 15:28:05 +00:00
aaron
3fb3773098
a fix for traverse dupplicates bug: GSA-202. Also removed some debugging output from FastaAltRef walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1912 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 20:18:55 +00:00
hanna
a1e8a532ad
Support for initialize() and onTraversalDone() output from parallelized walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1911 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 20:18:31 +00:00
ebanks
75ad6bbef7
Check that map isn't being called passing in null arguments.
...
(This seems wrong; see JIRA entry GSA-211)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1907 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 02:30:36 +00:00
hanna
65b98470f3
Temporary fix: have RodLocusView manage and close its RODs. Really the
...
relationship between these two classes needs to be rethought; see JIRA
GSA-207.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1904 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 16:00:12 +00:00
aaron
ad1fc511b1
intermediate commit for some changes in the Variation system, so Eric can go ahead with his changes. Everything is pretty set, but the Variation interface could use a convenience method that joins all the alternate alleles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1903 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 06:31:15 +00:00
ebanks
6c338eccb8
Joint Estimation model now emits calls in all formats.
...
The whole GenotypeCall framework needs to be changed, but this will work for the time being.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1902 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 03:07:28 +00:00
ebanks
54c61c663c
-Cleanup of the Joint Estimation code
...
-Don't print verbose/debugging output to logger, but instead specify a file in the argument collection (and then we only need to print conditionally)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1899 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 15:25:29 +00:00
asivache
2cab4c68d4
Added method: isCodingExon(). Returns true if position is simultaneously within an exon AND within coding interval of any single transcript from the list. The old method of detecting coding positions as isExon() && isCoding() is buggy, as the position could be in the UTR part of one transcript (isExon() is true), and within coding region bounds (but not in the exon) of another transcript (isCoding() is true). As a result UTR positions would be erroneously annotated as coding.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1898 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 14:55:07 +00:00
ebanks
55fa1cfa06
-Renamed new calculation model and worked out some significant xhanges with Mark
...
-Allow walkers calling the UG to pass in their own argument collections
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1896 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 20:49:36 +00:00
ebanks
9b9744109c
Mark's new unified calculation model is now officially implemented.
...
Because it doesn't actually use EM, it's no longer a subclass of the EM model.
Note that you can't use it just yet because it doesn't actually emit calls (just prints to logger). I need to deal with general UG output tomorrow. Hold off until then, Mark, and then you can go wild.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1891 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 02:39:23 +00:00
depristo
caa3187af8
Enabling correct high-performance ROD walker and moved VariantEval over to it. Performance improvements in variantEval in general. See wiki for full description
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1890 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 23:31:13 +00:00
depristo
449a6ba75a
Deleting lots of code as part of my cleanup. More classes tagged for removal. Many more walkers have their days numbered.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1885 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 12:23:36 +00:00
ebanks
b8ab77c91c
Don't filter out reads without proper read groups. Instead, allow the user (or another walker calling UG) to specify an assumed sample to use (but then we assume single-sample mode).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1883 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 01:30:53 +00:00
ebanks
c29924e7cf
Reverting previous change.
...
Aaron, it's all yours...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1881 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:55:24 +00:00
aaron
d21b582b18
memory leak, where the Resource Pool was releasing based on the value and not the key, resulting in the resourceAssignments map growing with each additional shard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1880 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:39:42 +00:00
ebanks
761a730758
assertBiAllelic -> assertMultiAllelic.
...
Chris, if this breaks an integration test, you get it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1879 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:09:46 +00:00
aaron
cfa86d52c2
ensure that in the indel case we don't allow identification as both an insertion and deletion at the same location in the VCF ROD
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1875 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 18:21:00 +00:00
ebanks
51f9ec0a5c
subtract largest posterior value from all values; this hopefully solves any precision issues
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1870 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-18 05:20:15 +00:00
ebanks
b9e8867287
-push allele frequency and genotype likelihood variable definitions down into the subclasses so that they can use different data structures
...
-use slightly more stringent stability metric
-better integration test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1869 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-18 04:22:17 +00:00
chartl
ad777a9c14
@BasicPileup - made the counts public so they can be used
...
@PoolUtils - split reads by indel/simple base
@BaseTransitionTable - complete refactoring, nicer now
@UnifiedArgumentCollection - added PoolSize as an argument
@UnifiedGenotyper - checks to ensure pooled sequencing uses the appropriate model
@GenotypeCalculationModel - instantiates with the new PoolSize argument
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1867 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 21:56:56 +00:00
andrewk
d1a4cd2f73
Added ValidationData analysis type to VariantEvalWalker; this eval takes a GFF file with validated truth data positions (bound to "validation")and calculates the accuracy of the genotype calls bound to "eval".
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1862 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 15:39:08 +00:00
ebanks
418e007ca6
A cleaner interface: now everyone can use UG's initialize method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1860 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 14:09:16 +00:00
aaron
96972c3a5c
a fix for a bug Eric found: if your first call contains fewer samples than calls at other loci, your VCFHeader got setup incorrectly.
...
Also moved a buch of Lists over to Sets for consistancy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1859 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 04:57:50 +00:00
aaron
a69ea9b57c
Cleaning up the VCF code, adding lots of tests for a variety of edge cases. Two issues are still outstanding: updating the no call string with the standard 1000g decided on today, and fixing Eric's issue where not all the VCF sample names are present initially.
...
also: their, I hope your happy Eric, from now on I'll try not to flout my awesomest grammer in the future accept when I need to illicit a strong response :-)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1858 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 04:11:34 +00:00
ebanks
993c567bd8
I had to remove some of my more agressive optimizations, as they were causing us to get slightly different results as MSG. Results in only small cost to running time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1856 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 00:59:32 +00:00
asivache
7d7ff09f54
throw an exception if read has no associated read group
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1855 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 18:11:32 +00:00
depristo
0c2016c19a
Improved error messages -- now easier to read, points to the GATK Error Messages wiki, and avoids double printing of stack traces
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1850 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 12:07:44 +00:00
ebanks
a32470cea1
Deal with the fact that walkers can call UG's init/map functions directly.
...
We need to filter contexts in that case since the calling walkers don't get UG's traversal-level filters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1848 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 02:31:45 +00:00
ebanks
e740e7a7ce
Because walkers call UG's map function, we need to move the actual writing out
...
to UG's reduce function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1845 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 20:49:26 +00:00
ebanks
52d2e0ca07
All walkers now use read.getReadGroup()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1839 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 19:27:40 +00:00
aaron
eb90e5c4d7
changes to VCF output, and updated MD5's in the integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1836 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:42:48 +00:00
ebanks
89771fef05
-Use read.getReadGroup()
...
-Add another filter for read groups for Chris
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1835 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:08:32 +00:00
ebanks
311ab8da5a
A helper class to create the masks for the sequenom design maker.
...
This project is now officially done.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1834 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:28:51 +00:00
ebanks
0c95d6906f
Merge both versions of the Sequenom assay design maker: use Jared's base code and add in indels. [Jared, this still emits the same output for SNPs as your original version)
...
Remove all sequenom stuff from the FastaAlternateReferenceMaker so it can just concentrate on making alternate references...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1831 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:11:45 +00:00
ebanks
f2886d88e0
We now emit genotype calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1828 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 02:49:56 +00:00
ebanks
96b8499a31
Remodeled version of the UnifiedGenotyper.
...
We currently get identical lods and slods as MultiSampleCaller (except slods for ref calls, as I discussed with Jared) and are a bit faster in my few test cases. Single-sample mode still emulates SSG.
The remaining to do items:
1. more testing still needed
2. we currently only output lods/slods, but I need to emit actual calls
3. stubs are in place for Mark's proposed version of the EM calculation and now I need to add the actual code.
More check-ins coming soon...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1821 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 20:27:01 +00:00
aaron
77499e35ac
fixes for GSA-199: Need easier way to write binary outputs to standard output. GLF and VCF now have stream constructors, and can get dumped to standard out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1818 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 15:50:20 +00:00
ebanks
caf689821f
added method to get normalized posteriors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1809 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 02:33:22 +00:00
ebanks
cf7a26759d
-use the getReadGroup() function that was added to picard for us
...
-clean up some include lines
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1808 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 01:39:32 +00:00
hanna
d844d1c496
SAMFileWriters specified as command-line arguments were sometimes incorrectly altering the default short name. Make sure short name is not specified if shortName is not specified but fullName is.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1807 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 19:16:46 +00:00
hanna
da084357db
Fixed minor typo in output message.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1806 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 18:56:54 +00:00
ebanks
a9f3d46fa8
Your time has come, SSG.
...
Fare thee well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1799 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 20:27:56 +00:00
aaron
98e3a0bf1a
VCF can now be emitted from SSG. The basic's are there (the genotype, read depth, our error estimate), but more fields need to be added for each record as nessasary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1797 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 19:50:04 +00:00
kiran
29ad6cd876
Made redundant by BCMMarkDupes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1795 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 18:47:20 +00:00
ebanks
15bf014e0b
logger.info -> logger.debug (don't want to risk filling up my log on genome-wide calls)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1792 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 17:53:11 +00:00
ebanks
04fe50cadd
*** We no longer have a separate model for the single-sample case. ***
...
For now, a single sample input will be special-cased in the EM model - but that will change when the EM model degenerates to the single sample output with a single sample as input. For now, the EM code for multi-samples isn't finished; I'm planning on checking that in soon.
The SingleSampleIntegrationTest now uses the UnifiedCaller instead of SSG, and so should all of you. More on that in a separate email.
Other minor cleanups added too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1785 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 14:08:57 +00:00
kiran
829e99413b
Rescores a variant after removing duplicates (defined very strictly as reads with the same start points).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1782 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 03:07:36 +00:00
ebanks
1905b5defa
Hash by chromosome for now to reduce memory. This is a temporary solution until we decide how to reture the Injector for good.
...
Also, with Picard's latest changes, we need to make sure we don't double-close the sam writer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1779 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 20:06:25 +00:00
ebanks
203c626fc2
A wrapper around the GenotypeLikelihoods class for the UnifiedGenotyper. This wrapper incorporates both strand-based likelihoods and a combined likelihoods over both strands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1777 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 19:57:37 +00:00
depristo
8dd0924b37
Minor performance improvements to VariantEval -- now all of the CPU time is spent dealing with the ROD system...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1772 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 23:40:30 +00:00
aaron
4554ca1b28
more cleanup, depecaited the old genotype, corrected SNPCallsFromGenotypes' imports and two other classes that depend on it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1771 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 19:09:27 +00:00
aaron
3aec76136f
Removing the AllelicVariant interface, which is replaced by the Variation interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1770 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 17:44:24 +00:00
aaron
66fc8ea444
GSA-182: Adding support for BED interval files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1767 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 02:45:31 +00:00
hanna
aec83b401d
SSG multithreading doesn't play well with some I/O changes made since I last svn up'd. Reverting until I can find the reason.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1766 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 19:48:57 +00:00
hanna
8a503c86b6
Code supporting SSG proof-of-concept shared memory parallelism.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1765 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 18:56:16 +00:00
ebanks
fb619bd593
-Refactoring: make GenotypeCalculationModel constructors empty so that they don't have to be updated every time we add a new parameter; instead put that logic in the super class's initialize method (making everything protected so that only the factory can access them)
...
-Adding initial version of Multi-sample calculation model. This still needs much work: it needs to be cleaned up and finished. Right now, it (purposely) throws a RuntimeException after completing the EM loop.
Also:
-Fix logic in GenotypeLikelihoods.setPriors
-Add logger to the models for output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1764 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 18:10:36 +00:00
aaron
7fc4472e6d
A big fix for MergingSamRecordIterator, where we weren't correctly handling the comparisons of SAMRecords correctly (we weren't applying the new reference index first, so sometimes the MT contig would be ID 23, sometimes 24 in different records).
...
Also a fix to the GLF tests, and a correction to PrintReadsWalker to remove the close() on the output source, the source handles that itself (and you get a double close).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1758 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 19:35:35 +00:00
ebanks
53a4bd7f51
A better understanding of what's going on means no need for clearing the cache
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1755 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 18:07:46 +00:00
aaron
e885cc4b21
changes for corrected GLF likelihood output, along with better tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1754 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-01 20:45:05 +00:00
aaron
2e4949c4d6
Rev'ing Picard, which includes the update to get all the reads in the query region (GSA-173). With it come a bunch of fixes, including retiring the FourBaseRecaller code, and updated md5 for some walker tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1751 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:37:59 +00:00
ebanks
303972aa4b
Yup, I broke the build...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1750 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:20:43 +00:00
ebanks
841d25cc44
Added ability to set the priors after construction (and requiring a flushing of the likelihoods cache)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1749 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 19:55:49 +00:00
hanna
70e1aef550
Better integrate the @ArgumentCollection into the command-line argument parser. Walkers can now specify their own @ArgumentCollections. Also cleaned up a bit of the CommandLineProgram template method pattern to minimize duplicate code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1746 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 22:23:19 +00:00
aaron
b1c321f161
Adjusted Genotype concordance to more accurately use the new Genotyping code, fixed the VCF rod, and temp. fix the build by reintroducing Shermans ReadCigarFormatter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1745 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 21:28:21 +00:00
ebanks
9ef80e3c3c
One minor addition: to incorporate Pooled calling (and to be as general as possible), we allow the genotype calculation model to use rods if it wants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1741 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 17:05:59 +00:00
ebanks
19bfe43173
First pass at a unified caller, being checked in now so Mark can give feedback if he chooses and so Matt can debug issues with the ArgumentCollection class.
...
Some notes:
1. This design should be flexible enough to include pooled calling (for now) after discussions with Chris.
2. Using the unified caller with the SingleSampleCalculationModel emits the exact same output as SSG over all of chr20 for NA12878. Additionally, when we include the "max deletions allowed at a locus" argument (so we don't try to call SNPs at deletion sites), it removed 233 SNP calls in chr20 that were clearly indel artficts.
3. The MultiSampleEMCalculationModel is still a work in progress and will be checked in later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1740 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 16:48:15 +00:00
andrewk
5dab95aa5a
Fix getMergedReadGroupsByReaders so that it provides read groups in the same way Picard does so that it works correctly when input read files have no clashes in their read groups and retain their original read group names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1737 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 06:35:50 +00:00
asivache
bce2f0d7cf
Now instantiates the list of alternative consenses to evaluate as LinkedHashSet to guarantee iterator traversal order. Old implementation used HashSet and exhibited unstable behavior when two alt consenses turned out to be equally good: depending on the run conditions (including size of the interval set being cleaned??), either one could be seen first as selected as the 'best' one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1734 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 06:15:46 +00:00
asivache
663175e868
Bug fix: when jumping onto next contig (chromosome), the walker was erasing last mismatch interval from the previous chr it was still holding without printing it; now it gets printed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1733 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 22:24:34 +00:00
asivache
aec61c558b
moving IndelGenotyper out from playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1731 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:43:53 +00:00
aaron
2b7d39035a
switched over the FastaAlternateReferenceWalker to the Variation system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1726 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 16:09:43 +00:00
aaron
7ffc1d97ef
Cut DeNovoSNPWalker over to the new Variation system, some renaming of methods on the Variation interface, and some corrections on the interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1724 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 04:35:52 +00:00
aaron
d2af26e81f
Pooled EM SNP Rod converted over to the Variation interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1719 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:33:11 +00:00
ebanks
97105ac001
We need to return a null RODRecordList when the default value is null (as opposed to a list with a single null value), because that's what everyone is expecting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1718 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:23:12 +00:00
ebanks
d4b40bc06f
Filter for reads with missing read groups so we can safely assume all reads have valid read groups
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1717 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:10:26 +00:00
ebanks
90de2e0cde
Added ability to specify whether you want to use a point estimate or fair coin test calculation; for now you can use either but fair coin test is still experimental as it needs to be parametrized correctly. This job will hopefully be done by the future Bioinformatic Analyst...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1716 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 15:29:50 +00:00
aaron
d262cbd41c
changes to add VCF to the rod system, fix VCF output in VariantsToVCF, and some other minor changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1715 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 15:16:11 +00:00
ebanks
423a3ee894
Added a sequenom rod to empower Carrie to convert 1KG validation SNPs to sequenom format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1706 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:22:09 +00:00
hanna
856bbd0320
Let Picard specify the default compression level.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1701 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:01:48 +00:00
aaron
f783cb30e0
adding an interface so that the current @Requires with ROD annotations work in walkers like VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1700 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:24:05 +00:00
hanna
ebfbe56b43
Make sure compression level always gets pushed into SAMFileWriterFactory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1699 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:20:26 +00:00
asivache
bf7cd66d53
New, simpler rodRefSeq. Fully relies on the ROD system standard mechanisms. Multiple transcripts over a given location will be now returned by the ROD system itself as RodRecordList<rodRefSeq>; and yes, rodRefSeq does represent a single transcript record now and implements Transcript interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1697 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:18:25 +00:00
asivache
8fa4c93f5a
Transcript is now simply an interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1696 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:13:31 +00:00
asivache
1bd4c0077c
Now that ROD system supports overlapping RODs, we do not need rodRefSeq to be too smart and read in all the overlapping records (transcripts) on its own; leave it to the generic ROD mechanism.
...
PARTIAL commit; new, simpler rodRefSeq will reappear in a seq.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1694 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:11:16 +00:00
aaron
11c32b588f
fixing VariantEvalWalkerIntegrationTest md5 sums, a couple comment changes, and a little bit of cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1690 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:54:47 +00:00
ebanks
0748d80baa
Added a convenience method in rodDbSNP to deal with Andrey's changes to the rod. Now you can just ask for the first real SNP rod from the list and not have to think about how it works.
...
CountCovariates uses it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1688 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:15:40 +00:00
ebanks
682b765536
bug: need to upper case chars so that == works throughout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1684 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 18:20:43 +00:00
asivache
57d31b8e9b
Filter that discards reads from specific lanes; and also its friend that helps blacklisting a set of lanes from GATK command line a one-liner.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1681 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 16:46:06 +00:00
ebanks
5ce42cbab3
After thinking about this a bit more, it makes sense to pull this functionality out of my walker and into the GenomeLocParser where everyone else can benefit from it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1677 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 01:32:35 +00:00
ebanks
b1dc6d65e4
interval merging is now blazingly fast
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1674 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 21:15:04 +00:00
asivache
15135788ca
OK, let's bite the bullet. Now rodDbSNP objects are 'isSNP()' only when they are annotated as 'exact', not a 'range'.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1673 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 19:25:16 +00:00
asivache
8ad181f46f
Note to myself: do 'ant clean' now and then or old versions of the code that suddenly became invalid will stick around. The world is not perfect, and neither is automatic dependency resolution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1672 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:40:52 +00:00
asivache
d2d1354199
Now uses BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1670 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:03:49 +00:00
asivache
29adc0ca1c
Little class that can be used to simulate the results returned by the old ROD system. This is needed to keep couple of tests from breaking. All the code that uses this class must be changed urgently to accomodate the data as returned by new ROD system, and the corresponding tests (MD5 sums) have to be modified as well since some data as seen through the new ROD system is indeed different.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1668 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:58:56 +00:00
asivache
a6bd509593
Changing the carpet under your feet!! New incremental update to th eROD system has arrived.
...
all the updated classes now make use of new SeekableRodIterator instead of RODIterator. RODIterator class deleted. This batch makes only trivial updates to tests dictated by the change in the ROD system interface. Few less trivial updates to follow. This is a partial commit; a few walkers also still need to be updated, hold on...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1667 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:55:22 +00:00
asivache
4c67a49ccb
Removed unused imports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1666 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:45:22 +00:00
hanna
e7f44ada98
Make unpackList public static so that Doug can use it in the scatter/gather framework.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1665 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 15:32:49 +00:00