Commit Graph

1709 Commits (f5818c0cda2e218eaf1c1bd815dc719e00194014)

Author SHA1 Message Date
hanna 21c5f543fa Fix sharding bug -- loci to which >100,000 (= 1 shard) reads are assigned an
alignment start will confuse the sharding system and cause it to return duplicate reads.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1987 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 14:27:26 +00:00
rpoplin 84ba604611 Sequential quality score calculation is now in place in the refactored recalibrator and matches the quality scores calculated by the old recalibrator exactly; at least on the small sets of data used so far. Validation, documentation, and optimization work is on going.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1985 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 15:55:16 +00:00
depristo bf1bc94060 Fixes for PooledConcordance bugs and lack of safety checking
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1984 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 01:54:10 +00:00
rpoplin 66d4a995e6 Initial check in of refactored Recalibrator. The new walkers are called CountCovariatesRefactored and TableRecalibrationRefactored. More work is needed to finish up the sequential calculation and to document the code sufficiently. These files are not ready to be used by other people quite yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1982 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 22:33:55 +00:00
ebanks 6fdfc97db6 Added optional field DP to VCF output for Mark.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1981 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 20:03:22 +00:00
ebanks 0a55fa5bb1 Completely refactored the Genotype Concordance module(s).
Now PooledConcordance and GenotypeConcordance inherit from the same super class (and can therefore share data structures and functionality).  Also, they now use ConcordanceTruthTable to keep track of necessary info.
GenotypeConcordance passes integration tests.
PooledConcordance needs to be finished by Chris.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1979 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 16:27:16 +00:00
ebanks d549347f25 Refactored GenotypeLikelihoods to use an underlying 4-base model.
It needs to be modified a bit and then hooked up to a pooled model, but that is now possible.
At this point, there is no difference to the Unified Genotyper.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1978 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 21:59:25 +00:00
jmaguire 4d3871c655 don't flush anymore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1977 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 19:11:51 +00:00
aaron aacd72854f a fix for a bug Andrey discovered: in read-based interval traversals we're dupplicating reads in rare cases. The problem was that to accomidate a bug in SAM JDK indexing, we were forced to add one to the stop of our QueryOverlapping() calls to ensure we always got all of the overlapping reads.
Added a PlusOneFixIterator that wraps other iterators, and eliminates reads that start outside of our intended interval (interval stop - 1).  Updated and checked BamToFastqIntegrationTest MD5 sums.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1976 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 05:26:33 +00:00
hanna 43c3ee61d5 Fix minor mapping quality bug.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1973 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 14:33:23 +00:00
ebanks a545859c62 Joint Estimation model now emits a reasonable slod
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1969 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 21:12:42 +00:00
ebanks 11d950abe0 No longer allow the lod_threshold argument - use confidence instead.
Have UG output qscores in all cases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1968 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:18:51 +00:00
asivache 2fb45dbd73 Make window size a command line argument
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1967 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:13:35 +00:00
asivache 55f61b1f88 Bug fix in adjustment of the shift position.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1966 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:08:11 +00:00
depristo 5d5dc989e7 improvements to VCF and variant eval support of VCF -- now listens to the filter field
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1963 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 12:09:30 +00:00
hanna c63af32fc7 The BWA/C bindings were triggering the local aligner to repeatedly reload the
ref genome.  Make sure the reference genome is cached.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1961 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 00:01:55 +00:00
ebanks 3a33401822 2nd stage of the genotyper output refactoring is complete.
Now, all output is generalized and all of the intelligence lies where it is supposed to.
Next stage is syncing up old and new models and making sure we're outputting exactly what we should.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1960 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 22:43:08 +00:00
aaron ba67c7f02b added a warning for those using bed files; we properly convert bed to the internal representation but the user needs to be aware that any output will be one-based closed intervals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1959 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 21:09:18 +00:00
aaron b71b66bd88 the underlying parameter is a float so we need to use Float.valueOf() instead; Noticed by external user Hou Huabin
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1958 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 20:22:25 +00:00
hanna 5a510e6d98 New PackageUtils interferes with the packaging utility. Revert until Aaron and
I can get together to make this work.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1957 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 19:14:14 +00:00
aaron de6ae51f7e Scala walkers can now be build and run like any other walker in the GATK. Added the getUrlsForClasspath to PackageUtils, the Reflections package isn't getting the manifest files from jars in the classpath, and so we weren't seeing any walkers outside of the GenomeAnalysisTK.jar.
A couple of notes:
-Commented out BaseTransitionTableCalculator.scala because it's won't build; Chris could you fix this one (or kill it if it's not needed).
-Removed the PrintReadsScala walker; moved the code over to a ScalaCountLoci walker (which is what the code was really doing).
-Added configurations items to the ivy xml file.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1956 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 06:02:41 +00:00
hanna 1896f334d9 Fixed collection of bugs in reads aligning to multiple locations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1955 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 04:02:09 +00:00
ebanks af6d0003f8 -Generalized the GenotypeConcordance module to deal with any number of individuals (although it will default to its old behavior if the -samples argument is left out).
-Make rods return the appropriate type of Genotype calls from getGenotype().



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1954 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-01 05:35:47 +00:00
hanna b95165e39c Make alignment (temporarily) part of main GenomeAnalysisTK.jar. Add some extra logging errors on failure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1953 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-01 00:33:18 +00:00
asivache 4b0796ba58 After fixing a few glitches and bugs, this version finally works as intended
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1952 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-31 04:59:58 +00:00
depristo 7d0ac7c6f2 Fix for long-term VariantEval bug plus new intergration test to catch it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1951 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-31 00:00:33 +00:00
asivache ea8d5c7077 Some internal refactoring. Now "safely" ignores duplicate records (NOT duplicate reads but rather malformed bam files!) resulting from the bug/feature in CleanedReadInjector.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1949 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 17:50:51 +00:00
hanna a3da475c88 Documentation and cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1946 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 15:40:28 +00:00
hanna 2d15891719 Created walkers for alignment, validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1945 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 15:04:07 +00:00
ebanks 51fffc7f69 Comments for Ryan (which also apply to ReadQualityScoreWalker).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1944 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 14:44:04 +00:00
ebanks ccd7440730 We can actually make this a bit simpler (and faster)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1943 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 04:21:03 +00:00
ebanks 1b6333e4ab Enough people have asked for this that it just needed to get written.
One can now split up any number of sets into an N-way Venn (although it doesn't check for discordance in the calls, so you'll still want to use SimpleVenn for 2-way comparisons).
Wiki docs are updated.

To do: update to use Ryan's generic hash map when it's ready for public use.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1942 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 04:08:45 +00:00
ebanks 4bdb5b03bd tell UnifiedGenotyper to return calls at all bases
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1941 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 03:10:44 +00:00
ebanks 4ee1d6f733 -Have the calculation models determine whether a call passes the lod/confidence thresholds (as opposed to returning everything and letting the UG decide); this way, walkers which call map() will get only the good calls.
-Do the right thing in all models for all-base-mode (for Kiran).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1940 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 02:35:51 +00:00
ebanks 64ac956885 Okay, I caved in:
CallsetConcordance now gets possible concordance types by looking at classes that implement ConcordanceType instead of having them hard-coded in.
Thanks to Kiran this was pretty easy...


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1939 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 00:32:26 +00:00
hanna 1f0d852a48 Fix bug where alignments with indels would be busted because bwa reverses
the read bases to undo a previous read base reverse that doesn't occur in the
libbwa codepath.
Also fixed some memory management issues.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1938 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 21:33:13 +00:00
asivache e3b4d4cbed Genotyper reimplemented. Does the same thing, at least for now, but internal data structures redesign enables collecting various statistics for indel-containing/reference-matching reads. The statistics are not yet used by the caller itself to make a better judgement w.r.t. the validity of the calls it makes, but they are now printed into the output stream (--verbose). The statistics (for both normal and tumor) include: indel observation count/total coverage, av. number of mismatches per indel-containing and per ref-matching read, av. mapping quality, av. mismatch rate and av. base quality within an NQS windoew around the indel, numbers of indel and ref observations per strand.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1936 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 19:09:16 +00:00
hanna f04b80d7db Fixed epic memory leak.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1934 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 16:32:43 +00:00
ebanks 2b96b2e4e7 better multi-sample integration test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1933 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 13:51:51 +00:00
ebanks 1c4ca9d383 -Mark just reminded me: actually force the ref/loc to be immutable
-VCF writer should be blind to the score/confidence/lod value - just print the thing out as is


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1932 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 13:41:53 +00:00
ebanks 5cdbdd9e5b now that the design is stable, pull the setReference and setLocation methods back out of Genotype and stick them into constructors of implementing classes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1931 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 13:27:37 +00:00
ebanks 3091443dc7 Sweeping changes to the genotype output system, as per several discussions with Matt & Aaron.
Some things still need to be changed, but it will entail some more design decisions first (which means I get to bug M&A again tomorrow!).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1930 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 03:46:41 +00:00
depristo 86573177d1 Reverting rod walkers to use underlying refwalker implementation while we work on ROD2 and reenable the system. Added some serious sparse file parsing to variant eval tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1929 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 01:04:37 +00:00
hanna c9a3707cfd Initial version of BWA/C bindings. Still lots of squirrels roaming the code.
- Some cigar strings aren't right.
- Memory leaks.
- BWA codebase changes aren't committed to BWA tree.
- Aligner interface butchered to support BWA/C-style alignments.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1928 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 21:37:49 +00:00
chartl c4359bc340 Whoops. Forgot the implements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1927 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 19:59:57 +00:00
aaron 5a3bd50537 adding error log reporting to the GATK, and a stream based output method for the argument collection
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1926 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 19:56:05 +00:00
chartl 863d3023d5 IndelCounterWalker -- a new little walker that counts indels over a region (want to see what kind of havoc BWA may be resulting in). Don't know when BasicPileup.indelPileup() was written, but kudos to whoever wrote it.
BTTJ - remove 'N's from previous base analysis -- even if both read and ref are 'N' (which does happen, occasionally)




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1925 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 19:50:50 +00:00
aaron 04e9a494e9 removed the GenotypesBacked interface, which is currently unused. Also cleaned up some documentation lines
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1924 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 18:08:14 +00:00
rpoplin 06ff81efe5 Added NeighborhoodQualityWalker.java and ReadQualityScoreWalker.java which are used to calculate a read quality score based on attributes of the read and the reads in the neighborhood.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1922 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 13:24:11 +00:00
depristo 68fa6da788 Initial graph-based reference implementation and alignment assessor. Not suitable for public use
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1921 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:54:47 +00:00
depristo 31d143a841 now only needs READS
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1920 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:54:14 +00:00
depristo ef2ea79994 code cleanup and containsStartPosition function
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1919 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:53:40 +00:00
depristo 186a8dd698 Trivial protection for null value
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1918 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:52:52 +00:00
depristo be333da9c0 charSeq2byteSeq -- convert a char[] to a byte[] for convenience
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1917 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:52:23 +00:00
chartl 4192b093b8 More robust error handling with parallelization + usePreviousBase. Added forceReadBasesToMatchRef to use in conjunction with nPreviousReadBases as a less stringent approximation of usePreviousBases (requiring previous pileups only had mismatches, and that read mapping quality be high was throwing everything away)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1916 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 17:20:44 +00:00
chartl 31d5df2859 Previous base now checks that the read matches the reference in the previous base window.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1915 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 15:58:20 +00:00
depristo 726378be8b Almost ready to stop doing eagar decoding; waiting on Eric
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1914 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 15:28:05 +00:00
ebanks e96b1791ab Need to check for biallelic snp or exception gets thrown.
Also, update to new tracker calls.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1913 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 02:43:43 +00:00
aaron 3fb3773098 a fix for traverse dupplicates bug: GSA-202. Also removed some debugging output from FastaAltRef walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1912 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 20:18:55 +00:00
hanna a1e8a532ad Support for initialize() and onTraversalDone() output from parallelized walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1911 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 20:18:31 +00:00
chartl 62c1001790 BTTJ is now correct. What a terrible waste of time, turns out I'd just reversed the header. Because of this the MD5 had to be updated in the tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1910 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 19:24:18 +00:00
sjia 24c7f694e6 Handles allele frequencies for any specified population, changed user input for mismatch filter options
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1909 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 22:51:56 +00:00
chartl db9419df49 @ Hack to allow output from onTraversalDone()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1908 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 15:19:04 +00:00
ebanks 75ad6bbef7 Check that map isn't being called passing in null arguments.
(This seems wrong; see JIRA entry GSA-211)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1907 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 02:30:36 +00:00
depristo b4f55df600 Bugfix for Jason F
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1906 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-24 22:09:27 +00:00
hanna 65b98470f3 Temporary fix: have RodLocusView manage and close its RODs. Really the
relationship between these two classes needs to be rethought; see JIRA
GSA-207.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1904 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 16:00:12 +00:00
aaron ad1fc511b1 intermediate commit for some changes in the Variation system, so Eric can go ahead with his changes. Everything is pretty set, but the Variation interface could use a convenience method that joins all the alternate alleles.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1903 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 06:31:15 +00:00
ebanks 6c338eccb8 Joint Estimation model now emits calls in all formats.
The whole GenotypeCall framework needs to be changed, but this will work for the time being.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1902 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 03:07:28 +00:00
chartl a6dc8cd44e BTTC is now Tree Reducible allowing for parallelization.
Integration test comment changed to reflect actual date of last md5 update.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1901 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 23:19:29 +00:00
hanna 2e552eb5a1 Validates intervals against sequence dictionary header bounds.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1900 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 19:31:15 +00:00
ebanks 54c61c663c -Cleanup of the Joint Estimation code
-Don't print verbose/debugging output to logger, but instead specify a file in the argument collection (and then we only need to print conditionally)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1899 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 15:25:29 +00:00
asivache 2cab4c68d4 Added method: isCodingExon(). Returns true if position is simultaneously within an exon AND within coding interval of any single transcript from the list. The old method of detecting coding positions as isExon() && isCoding() is buggy, as the position could be in the UTR part of one transcript (isExon() is true), and within coding region bounds (but not in the exon) of another transcript (isCoding() is true). As a result UTR positions would be erroneously annotated as coding.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1898 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 14:55:07 +00:00
chartl af761fb9bd Base transition table now forces epsilon/3 (three-state) model for the unified genotyper. Verified to be identical with changing the default model to being epsilon/3. This of course changes the observed counts, so the integration test has been updated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1897 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 21:18:26 +00:00
ebanks 55fa1cfa06 -Renamed new calculation model and worked out some significant xhanges with Mark
-Allow walkers calling the UG to pass in their own argument collections


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1896 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 20:49:36 +00:00
chartl 8e3f72ced9 BTTJ - Code refactoring (major) - passes integration test
VariantEvalWalker - whoops, wrote PooledGenotypeAnalysis rather than PooledAnalysis, now passes tests again

- PooledFrequencyAnalysis - don't bother initializing matrices if this isn't a pool




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1895 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 19:04:51 +00:00
depristo 15a1849758 notes for chartl
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1894 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 18:31:31 +00:00
chartl 77863d4940 @PowerBelowFrequency
+ Changes to doc

@ BasicPoolVariantAnalysis
    + use char rather than ReferenceContext
    + calculate # alleles

@ PooledFrequencyAnalysis
    + breakdown of call metrics by estimated number of alleles in pool

@ VariantEvalWalker
    + add PooledFrequencyAnalysis to analysis set

@ PooledGenotypeConcordance
    + correctly calculate maximal allele frequency for output




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1893 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 15:17:11 +00:00
chartl 967128035e Make command like args default to false.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1892 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 13:59:35 +00:00
ebanks 9b9744109c Mark's new unified calculation model is now officially implemented.
Because it doesn't actually use EM, it's no longer a subclass of the EM model.

Note that you can't use it just yet because it doesn't actually emit calls (just prints to logger).  I need to deal with general UG output tomorrow.  Hold off until then, Mark, and then you can go wild.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1891 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 02:39:23 +00:00
depristo caa3187af8 Enabling correct high-performance ROD walker and moved VariantEval over to it. Performance improvements in variantEval in general. See wiki for full description
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1890 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 23:31:13 +00:00
chartl 4a8a6468be Use read group as a condition for confusion tables. With an integration test.
Changed BaseTransitionTable to comparable objects for consistent ordering of output
( e.g. so the integration test doesn't yell so much )




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1889 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 19:39:32 +00:00
chartl b83df5616a Change for lower-case references (always compare upper case bases)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1888 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 17:36:31 +00:00
chartl 3b1fabeff0 Major code refactoring:
@ Pooled utils & power
   - Removed two of the power walkers leaving only PowerBelowFrequency, added some additional
     flags on PowerBelowFrequency to give it some of the behavior that PowerAndCoverage had
   - Removed a number of PoolUtils variables and methods that were used in those walkers or simply
     not used
   - Removed AnalyzePowerWalker (un-necessary)
   - Changed the location of Quad/Squad/ReadOffsetQuad into poolseq

@NQS
   - Deleted all walkers but the minimum NQS walker, refactored not to use LocalMapType

@ BaseTransitionTable
   - Added a slew of new integration tests for different flaggable and integral parameters
   - (Scala) just a System.out that was added and commented out (no actual code change)
   - (Java) changed a < to <= and a boolean formula


Chris



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1887 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 14:58:04 +00:00
aaron 4be6bb8e92 added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums. For some reason my check-ins from home wouldn't work last night, so this is the actual changes for 1884.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1886 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 14:15:33 +00:00
depristo 449a6ba75a Deleting lots of code as part of my cleanup. More classes tagged for removal. Many more walkers have their days numbered.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1885 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 12:23:36 +00:00
aaron d749a5eb5f added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1884 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 04:56:51 +00:00
ebanks b8ab77c91c Don't filter out reads without proper read groups. Instead, allow the user (or another walker calling UG) to specify an assumed sample to use (but then we assume single-sample mode).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1883 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 01:30:53 +00:00
depristo a8a2c1a2a1 Replaced SSG with UG in packaging utils. Minor performance and formatting improvements for ClipReads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1882 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 01:19:58 +00:00
ebanks c29924e7cf Reverting previous change.
Aaron, it's all yours...


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1881 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:55:24 +00:00
aaron d21b582b18 memory leak, where the Resource Pool was releasing based on the value and not the key, resulting in the resourceAssignments map growing with each additional shard
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1880 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:39:42 +00:00
ebanks 761a730758 assertBiAllelic -> assertMultiAllelic.
Chris, if this breaks an integration test, you get it.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1879 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:09:46 +00:00
depristo 2a26bb42dd Softclipping support in clip reads walker. Minor improvement to WalkerTest -- now can specify file extensions for tmp files. Matt -- I couldn't easily create non-presorted SAM file. The softclipper has an impact on this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1878 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 21:54:53 +00:00
chartl 055a99fb05 Change in ordering for a disjunctions. Walker will no longer try to calculate number of simple mismatches in the pileup if the pileup includes 'N's.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1877 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 18:24:14 +00:00
chartl 10bde9e77b Integration test for BTT calculator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1876 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 18:21:55 +00:00
aaron cfa86d52c2 ensure that in the indel case we don't allow identification as both an insertion and deletion at the same location in the VCF ROD
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1875 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 18:21:00 +00:00
chartl 3d50c72d74 Forgot a dumb little System.out.println. You will be flooded with "This read will not be used." statements until, overwhelmed, you give in to my demands.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1874 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 16:13:48 +00:00
chartl 225ef52973 Now produces same output as the Scala walker for unconditioned tables (no 2bb, no previous base, etc.)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1873 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 16:10:44 +00:00
ebanks bb180a23ef Updated MD5
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1871 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-18 05:30:38 +00:00
ebanks 51f9ec0a5c subtract largest posterior value from all values; this hopefully solves any precision issues
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1870 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-18 05:20:15 +00:00
ebanks b9e8867287 -push allele frequency and genotype likelihood variable definitions down into the subclasses so that they can use different data structures
-use slightly more stringent stability metric
-better integration test



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1869 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-18 04:22:17 +00:00