Commit Graph

1607 Commits (186a8dd698c6f656de2a290724f1173ba749ff4d)

Author SHA1 Message Date
depristo 186a8dd698 Trivial protection for null value
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1918 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:52:52 +00:00
depristo be333da9c0 charSeq2byteSeq -- convert a char[] to a byte[] for convenience
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1917 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:52:23 +00:00
chartl 4192b093b8 More robust error handling with parallelization + usePreviousBase. Added forceReadBasesToMatchRef to use in conjunction with nPreviousReadBases as a less stringent approximation of usePreviousBases (requiring previous pileups only had mismatches, and that read mapping quality be high was throwing everything away)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1916 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 17:20:44 +00:00
chartl 31d5df2859 Previous base now checks that the read matches the reference in the previous base window.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1915 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 15:58:20 +00:00
depristo 726378be8b Almost ready to stop doing eagar decoding; waiting on Eric
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1914 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 15:28:05 +00:00
ebanks e96b1791ab Need to check for biallelic snp or exception gets thrown.
Also, update to new tracker calls.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1913 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 02:43:43 +00:00
aaron 3fb3773098 a fix for traverse dupplicates bug: GSA-202. Also removed some debugging output from FastaAltRef walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1912 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 20:18:55 +00:00
hanna a1e8a532ad Support for initialize() and onTraversalDone() output from parallelized walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1911 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 20:18:31 +00:00
chartl 62c1001790 BTTJ is now correct. What a terrible waste of time, turns out I'd just reversed the header. Because of this the MD5 had to be updated in the tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1910 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 19:24:18 +00:00
sjia 24c7f694e6 Handles allele frequencies for any specified population, changed user input for mismatch filter options
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1909 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 22:51:56 +00:00
chartl db9419df49 @ Hack to allow output from onTraversalDone()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1908 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 15:19:04 +00:00
ebanks 75ad6bbef7 Check that map isn't being called passing in null arguments.
(This seems wrong; see JIRA entry GSA-211)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1907 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 02:30:36 +00:00
depristo b4f55df600 Bugfix for Jason F
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1906 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-24 22:09:27 +00:00
hanna 65b98470f3 Temporary fix: have RodLocusView manage and close its RODs. Really the
relationship between these two classes needs to be rethought; see JIRA
GSA-207.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1904 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 16:00:12 +00:00
aaron ad1fc511b1 intermediate commit for some changes in the Variation system, so Eric can go ahead with his changes. Everything is pretty set, but the Variation interface could use a convenience method that joins all the alternate alleles.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1903 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 06:31:15 +00:00
ebanks 6c338eccb8 Joint Estimation model now emits calls in all formats.
The whole GenotypeCall framework needs to be changed, but this will work for the time being.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1902 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 03:07:28 +00:00
chartl a6dc8cd44e BTTC is now Tree Reducible allowing for parallelization.
Integration test comment changed to reflect actual date of last md5 update.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1901 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 23:19:29 +00:00
hanna 2e552eb5a1 Validates intervals against sequence dictionary header bounds.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1900 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 19:31:15 +00:00
ebanks 54c61c663c -Cleanup of the Joint Estimation code
-Don't print verbose/debugging output to logger, but instead specify a file in the argument collection (and then we only need to print conditionally)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1899 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 15:25:29 +00:00
asivache 2cab4c68d4 Added method: isCodingExon(). Returns true if position is simultaneously within an exon AND within coding interval of any single transcript from the list. The old method of detecting coding positions as isExon() && isCoding() is buggy, as the position could be in the UTR part of one transcript (isExon() is true), and within coding region bounds (but not in the exon) of another transcript (isCoding() is true). As a result UTR positions would be erroneously annotated as coding.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1898 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 14:55:07 +00:00
chartl af761fb9bd Base transition table now forces epsilon/3 (three-state) model for the unified genotyper. Verified to be identical with changing the default model to being epsilon/3. This of course changes the observed counts, so the integration test has been updated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1897 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 21:18:26 +00:00
ebanks 55fa1cfa06 -Renamed new calculation model and worked out some significant xhanges with Mark
-Allow walkers calling the UG to pass in their own argument collections


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1896 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 20:49:36 +00:00
chartl 8e3f72ced9 BTTJ - Code refactoring (major) - passes integration test
VariantEvalWalker - whoops, wrote PooledGenotypeAnalysis rather than PooledAnalysis, now passes tests again

- PooledFrequencyAnalysis - don't bother initializing matrices if this isn't a pool




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1895 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 19:04:51 +00:00
depristo 15a1849758 notes for chartl
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1894 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 18:31:31 +00:00
chartl 77863d4940 @PowerBelowFrequency
+ Changes to doc

@ BasicPoolVariantAnalysis
    + use char rather than ReferenceContext
    + calculate # alleles

@ PooledFrequencyAnalysis
    + breakdown of call metrics by estimated number of alleles in pool

@ VariantEvalWalker
    + add PooledFrequencyAnalysis to analysis set

@ PooledGenotypeConcordance
    + correctly calculate maximal allele frequency for output




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1893 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 15:17:11 +00:00
chartl 967128035e Make command like args default to false.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1892 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 13:59:35 +00:00
ebanks 9b9744109c Mark's new unified calculation model is now officially implemented.
Because it doesn't actually use EM, it's no longer a subclass of the EM model.

Note that you can't use it just yet because it doesn't actually emit calls (just prints to logger).  I need to deal with general UG output tomorrow.  Hold off until then, Mark, and then you can go wild.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1891 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 02:39:23 +00:00
depristo caa3187af8 Enabling correct high-performance ROD walker and moved VariantEval over to it. Performance improvements in variantEval in general. See wiki for full description
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1890 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 23:31:13 +00:00
chartl 4a8a6468be Use read group as a condition for confusion tables. With an integration test.
Changed BaseTransitionTable to comparable objects for consistent ordering of output
( e.g. so the integration test doesn't yell so much )




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1889 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 19:39:32 +00:00
chartl b83df5616a Change for lower-case references (always compare upper case bases)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1888 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 17:36:31 +00:00
chartl 3b1fabeff0 Major code refactoring:
@ Pooled utils & power
   - Removed two of the power walkers leaving only PowerBelowFrequency, added some additional
     flags on PowerBelowFrequency to give it some of the behavior that PowerAndCoverage had
   - Removed a number of PoolUtils variables and methods that were used in those walkers or simply
     not used
   - Removed AnalyzePowerWalker (un-necessary)
   - Changed the location of Quad/Squad/ReadOffsetQuad into poolseq

@NQS
   - Deleted all walkers but the minimum NQS walker, refactored not to use LocalMapType

@ BaseTransitionTable
   - Added a slew of new integration tests for different flaggable and integral parameters
   - (Scala) just a System.out that was added and commented out (no actual code change)
   - (Java) changed a < to <= and a boolean formula


Chris



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1887 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 14:58:04 +00:00
aaron 4be6bb8e92 added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums. For some reason my check-ins from home wouldn't work last night, so this is the actual changes for 1884.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1886 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 14:15:33 +00:00
depristo 449a6ba75a Deleting lots of code as part of my cleanup. More classes tagged for removal. Many more walkers have their days numbered.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1885 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 12:23:36 +00:00
aaron d749a5eb5f added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1884 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 04:56:51 +00:00
ebanks b8ab77c91c Don't filter out reads without proper read groups. Instead, allow the user (or another walker calling UG) to specify an assumed sample to use (but then we assume single-sample mode).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1883 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 01:30:53 +00:00
depristo a8a2c1a2a1 Replaced SSG with UG in packaging utils. Minor performance and formatting improvements for ClipReads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1882 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 01:19:58 +00:00
ebanks c29924e7cf Reverting previous change.
Aaron, it's all yours...


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1881 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:55:24 +00:00
aaron d21b582b18 memory leak, where the Resource Pool was releasing based on the value and not the key, resulting in the resourceAssignments map growing with each additional shard
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1880 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:39:42 +00:00
ebanks 761a730758 assertBiAllelic -> assertMultiAllelic.
Chris, if this breaks an integration test, you get it.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1879 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:09:46 +00:00
depristo 2a26bb42dd Softclipping support in clip reads walker. Minor improvement to WalkerTest -- now can specify file extensions for tmp files. Matt -- I couldn't easily create non-presorted SAM file. The softclipper has an impact on this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1878 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 21:54:53 +00:00
chartl 055a99fb05 Change in ordering for a disjunctions. Walker will no longer try to calculate number of simple mismatches in the pileup if the pileup includes 'N's.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1877 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 18:24:14 +00:00
chartl 10bde9e77b Integration test for BTT calculator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1876 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 18:21:55 +00:00
aaron cfa86d52c2 ensure that in the indel case we don't allow identification as both an insertion and deletion at the same location in the VCF ROD
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1875 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 18:21:00 +00:00
chartl 3d50c72d74 Forgot a dumb little System.out.println. You will be flooded with "This read will not be used." statements until, overwhelmed, you give in to my demands.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1874 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 16:13:48 +00:00
chartl 225ef52973 Now produces same output as the Scala walker for unconditioned tables (no 2bb, no previous base, etc.)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1873 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 16:10:44 +00:00
ebanks bb180a23ef Updated MD5
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1871 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-18 05:30:38 +00:00
ebanks 51f9ec0a5c subtract largest posterior value from all values; this hopefully solves any precision issues
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1870 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-18 05:20:15 +00:00
ebanks b9e8867287 -push allele frequency and genotype likelihood variable definitions down into the subclasses so that they can use different data structures
-use slightly more stringent stability metric
-better integration test



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1869 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-18 04:22:17 +00:00
depristo d6385e0d88 simpleComplement function() in BaseUtils. Generic framework for clipping reads along with tests. Support for Q score based clipping, sequence-specific clipping (not1), and clipping of ranges of bases (cycles 1-5, 10-15 for example). Can write out clipped bases as Ns, quality scores as 0s, or in the future will support softclipping the bases themselves.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1868 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 22:29:35 +00:00
chartl ad777a9c14 @BasicPileup - made the counts public so they can be used
@PoolUtils - split reads by indel/simple base

@BaseTransitionTable - complete refactoring, nicer now

@UnifiedArgumentCollection - added PoolSize as an argument

@UnifiedGenotyper - checks to ensure pooled sequencing uses the appropriate model

@GenotypeCalculationModel - instantiates with the new PoolSize argument




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1867 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 21:56:56 +00:00