gatk-3.8

Commit Graph

Author	SHA1	Message	Date
rpoplin	d1b525b428	Default window size for NQS covariate is 3 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2040 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 19:24:27 +00:00
rpoplin	394c839974	Implemented NQS covariate. Extended Cycle covariate to handle 454 and SOLID reads. Added a Primer Round covariate for SOLID reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2039 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 19:22:21 +00:00
rpoplin	b1376e4216	structure refactored throughout for performance improvements git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2036 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 15:41:09 +00:00
mmelgar	72825c4848	A walker that generates a table of secondary base counts in a bam file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2031 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 02:11:23 +00:00
ebanks	61b5fb82ce	2 major changes: 1. Add dbsnp RS ID to VCF output from genotyper; to do this I needed to fix the dbsnp rod which did not correctly return this value. 2. Remove AlleleBalanceBacked and instead generalize the arbitrary info fields backing VCFs (and potentially others) in preparation for refactoring VariantFiltration next week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2028 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-12 22:51:49 +00:00
ebanks	578dcc54a4	Don't create a record if ref=N git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2018 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-11 04:32:17 +00:00
rpoplin	a13cbe1df0	The refactored recalibrator now passes the integration tests as well as my own validation tests. I'm ready to have other people start jamming on the files. I'll make an updated wiki page soon. The refactored recalibrator is currently a bit slower than the old one but there were a lot of great, easy ideas today for how to improve it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2013 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-10 22:20:06 +00:00
rpoplin	1e7ddd2d9f	Added a validateOldRecalibrator option to CovariateCounterWalker which reorders the output to match the old recalibrator exactly. This facilitates direct comparison of output. Changed the -cov argument slightly to require the user to specify both ReadGroupCovariate and QualityScoreCovariate to make it more clear to the user which covariates are being used. Some speed up improvements throughout. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2010 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-10 15:55:56 +00:00
ebanks	2fa2ae43ec	Enough people have found this useful, so... Moving Callset Concordance tool to core and adding integration test. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2003 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 20:59:18 +00:00
ebanks	3793519bd4	-Added convenience method to VCF record to tell if it's a no call and have rodVCF use it before querying for info fields -Don't restrict info fields to 2-letter keys [about to move these to core] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2002 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 20:52:51 +00:00
rpoplin	740a5484c4	Added some documentation to the code, mostly especially to CovariateCounterWalker but various comments added throughout. Also changed the HashMap data structure to accept an estimated initial capacity. This had a very modest improvement to the speed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2001 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 20:13:56 +00:00
ebanks	74751a8ed3	-Some minor fixes to get accurate vcf record merging done -Improvement to snp genotype concordance test And with that, it looks like I get revision #2000. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2000 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 06:40:55 +00:00
ebanks	ab705565cf	Completely refactored the Callset Concordance code. Now, it takes in VCF rods and emits a single VCF file which has merged calls from all inputs and is annotated (in the INFO fields) with the appropriate concordance test(s). Still needs a bit of polish... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1999 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 05:03:13 +00:00
kiran	7fde6c0bf4	One more output tweak. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1996 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 04:42:55 +00:00
kiran	00a7113d7a	Tweaks to formatting of output table. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1995 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 04:33:36 +00:00
kiran	95d381efe2	Optionally computes the error rate using the best base and a random base. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1991 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 16:47:34 +00:00
kiran	a679bdde18	FindContaminatingReadGroupsWalker lists read groups in a single-sample BAM file that appear to be contaminants by searching for evidence of systematic underperformance at likely homozygous-variant sites. Procedure: 1. Sites that are likely homozygous-variant but are called as heterozygous are identified. 2. For each site and read group, we compute the proportion of bases in the pileup supporting an alternate allele. 3. A one-sample, left-tailed t-test is performed with the null hypothesis being that the alternate allele distribution has a mean of 0.95 and the alternate hypothesis being that the true mean is statistically significantly less than expected (pValue < 1e-9). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1989 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 16:36:39 +00:00
kiran	2225d8176e	A convenience class for maintaining a dynamically growing table of values with access to the elements by named row and column identifiers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1988 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 16:34:35 +00:00
rpoplin	84ba604611	Sequential quality score calculation is now in place in the refactored recalibrator and matches the quality scores calculated by the old recalibrator exactly; at least on the small sets of data used so far. Validation, documentation, and optimization work is on going. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1985 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-07 15:55:16 +00:00
depristo	bf1bc94060	Fixes for PooledConcordance bugs and lack of safety checking git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1984 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-07 01:54:10 +00:00
rpoplin	66d4a995e6	Initial check in of refactored Recalibrator. The new walkers are called CountCovariatesRefactored and TableRecalibrationRefactored. More work is needed to finish up the sequential calculation and to document the code sufficiently. These files are not ready to be used by other people quite yet. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1982 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-06 22:33:55 +00:00
ebanks	0a55fa5bb1	Completely refactored the Genotype Concordance module(s). Now PooledConcordance and GenotypeConcordance inherit from the same super class (and can therefore share data structures and functionality). Also, they now use ConcordanceTruthTable to keep track of necessary info. GenotypeConcordance passes integration tests. PooledConcordance needs to be finished by Chris. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1979 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-06 16:27:16 +00:00
ebanks	d549347f25	Refactored GenotypeLikelihoods to use an underlying 4-base model. It needs to be modified a bit and then hooked up to a pooled model, but that is now possible. At this point, there is no difference to the Unified Genotyper. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1978 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-05 21:59:25 +00:00
jmaguire	4d3871c655	don't flush anymore. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1977 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-05 19:11:51 +00:00
depristo	5d5dc989e7	improvements to VCF and variant eval support of VCF -- now listens to the filter field git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1963 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-03 12:09:30 +00:00
ebanks	3a33401822	2nd stage of the genotyper output refactoring is complete. Now, all output is generalized and all of the intelligence lies where it is supposed to. Next stage is syncing up old and new models and making sure we're outputting exactly what we should. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1960 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-02 22:43:08 +00:00
ebanks	af6d0003f8	-Generalized the GenotypeConcordance module to deal with any number of individuals (although it will default to its old behavior if the -samples argument is left out). -Make rods return the appropriate type of Genotype calls from getGenotype(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1954 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-01 05:35:47 +00:00
depristo	7d0ac7c6f2	Fix for long-term VariantEval bug plus new intergration test to catch it git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1951 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-31 00:00:33 +00:00
ebanks	51fffc7f69	Comments for Ryan (which also apply to ReadQualityScoreWalker). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1944 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 14:44:04 +00:00
ebanks	ccd7440730	We can actually make this a bit simpler (and faster) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1943 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 04:21:03 +00:00
ebanks	1b6333e4ab	Enough people have asked for this that it just needed to get written. One can now split up any number of sets into an N-way Venn (although it doesn't check for discordance in the calls, so you'll still want to use SimpleVenn for 2-way comparisons). Wiki docs are updated. To do: update to use Ryan's generic hash map when it's ready for public use. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1942 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 04:08:45 +00:00
ebanks	4bdb5b03bd	tell UnifiedGenotyper to return calls at all bases git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1941 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 03:10:44 +00:00
ebanks	4ee1d6f733	-Have the calculation models determine whether a call passes the lod/confidence thresholds (as opposed to returning everything and letting the UG decide); this way, walkers which call map() will get only the good calls. -Do the right thing in all models for all-base-mode (for Kiran). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1940 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 02:35:51 +00:00
ebanks	64ac956885	Okay, I caved in: CallsetConcordance now gets possible concordance types by looking at classes that implement ConcordanceType instead of having them hard-coded in. Thanks to Kiran this was pretty easy... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1939 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 00:32:26 +00:00
ebanks	3091443dc7	Sweeping changes to the genotype output system, as per several discussions with Matt & Aaron. Some things still need to be changed, but it will entail some more design decisions first (which means I get to bug M&A again tomorrow!). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1930 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 03:46:41 +00:00
chartl	c4359bc340	Whoops. Forgot the implements. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1927 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 19:59:57 +00:00
chartl	863d3023d5	IndelCounterWalker -- a new little walker that counts indels over a region (want to see what kind of havoc BWA may be resulting in). Don't know when BasicPileup.indelPileup() was written, but kudos to whoever wrote it. BTTJ - remove 'N's from previous base analysis -- even if both read and ref are 'N' (which does happen, occasionally) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1925 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 19:50:50 +00:00
aaron	04e9a494e9	removed the GenotypesBacked interface, which is currently unused. Also cleaned up some documentation lines git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1924 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 18:08:14 +00:00
rpoplin	06ff81efe5	Added NeighborhoodQualityWalker.java and ReadQualityScoreWalker.java which are used to calculate a read quality score based on attributes of the read and the reads in the neighborhood. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1922 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 13:24:11 +00:00
depristo	68fa6da788	Initial graph-based reference implementation and alignment assessor. Not suitable for public use git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1921 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 21:54:47 +00:00
depristo	31d143a841	now only needs READS git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1920 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 21:54:14 +00:00
chartl	4192b093b8	More robust error handling with parallelization + usePreviousBase. Added forceReadBasesToMatchRef to use in conjunction with nPreviousReadBases as a less stringent approximation of usePreviousBases (requiring previous pileups only had mismatches, and that read mapping quality be high was throwing everything away) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1916 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 17:20:44 +00:00
chartl	31d5df2859	Previous base now checks that the read matches the reference in the previous base window. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1915 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 15:58:20 +00:00
ebanks	e96b1791ab	Need to check for biallelic snp or exception gets thrown. Also, update to new tracker calls. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1913 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 02:43:43 +00:00
chartl	62c1001790	BTTJ is now correct. What a terrible waste of time, turns out I'd just reversed the header. Because of this the MD5 had to be updated in the tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1910 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-26 19:24:18 +00:00
sjia	24c7f694e6	Handles allele frequencies for any specified population, changed user input for mismatch filter options git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1909 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-25 22:51:56 +00:00
chartl	db9419df49	@ Hack to allow output from onTraversalDone() git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1908 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-25 15:19:04 +00:00
depristo	b4f55df600	Bugfix for Jason F git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1906 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-24 22:09:27 +00:00
aaron	ad1fc511b1	intermediate commit for some changes in the Variation system, so Eric can go ahead with his changes. Everything is pretty set, but the Variation interface could use a convenience method that joins all the alternate alleles. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1903 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-23 06:31:15 +00:00
chartl	a6dc8cd44e	BTTC is now Tree Reducible allowing for parallelization. Integration test comment changed to reflect actual date of last md5 update. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1901 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-22 23:19:29 +00:00
chartl	af761fb9bd	Base transition table now forces epsilon/3 (three-state) model for the unified genotyper. Verified to be identical with changing the default model to being epsilon/3. This of course changes the observed counts, so the integration test has been updated. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1897 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 21:18:26 +00:00
chartl	8e3f72ced9	BTTJ - Code refactoring (major) - passes integration test VariantEvalWalker - whoops, wrote PooledGenotypeAnalysis rather than PooledAnalysis, now passes tests again - PooledFrequencyAnalysis - don't bother initializing matrices if this isn't a pool git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1895 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 19:04:51 +00:00
depristo	15a1849758	notes for chartl git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1894 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 18:31:31 +00:00
chartl	77863d4940	@PowerBelowFrequency + Changes to doc @ BasicPoolVariantAnalysis + use char rather than ReferenceContext + calculate # alleles @ PooledFrequencyAnalysis + breakdown of call metrics by estimated number of alleles in pool @ VariantEvalWalker + add PooledFrequencyAnalysis to analysis set @ PooledGenotypeConcordance + correctly calculate maximal allele frequency for output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1893 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 15:17:11 +00:00
chartl	967128035e	Make command like args default to false. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1892 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 13:59:35 +00:00
depristo	caa3187af8	Enabling correct high-performance ROD walker and moved VariantEval over to it. Performance improvements in variantEval in general. See wiki for full description git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1890 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 23:31:13 +00:00
chartl	4a8a6468be	Use read group as a condition for confusion tables. With an integration test. Changed BaseTransitionTable to comparable objects for consistent ordering of output ( e.g. so the integration test doesn't yell so much ) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1889 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 19:39:32 +00:00
chartl	b83df5616a	Change for lower-case references (always compare upper case bases) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1888 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 17:36:31 +00:00
chartl	3b1fabeff0	Major code refactoring: @ Pooled utils & power - Removed two of the power walkers leaving only PowerBelowFrequency, added some additional flags on PowerBelowFrequency to give it some of the behavior that PowerAndCoverage had - Removed a number of PoolUtils variables and methods that were used in those walkers or simply not used - Removed AnalyzePowerWalker (un-necessary) - Changed the location of Quad/Squad/ReadOffsetQuad into poolseq @NQS - Deleted all walkers but the minimum NQS walker, refactored not to use LocalMapType @ BaseTransitionTable - Added a slew of new integration tests for different flaggable and integral parameters - (Scala) just a System.out that was added and commented out (no actual code change) - (Java) changed a < to <= and a boolean formula Chris git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1887 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 14:58:04 +00:00
aaron	4be6bb8e92	added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums. For some reason my check-ins from home wouldn't work last night, so this is the actual changes for 1884. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1886 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 14:15:33 +00:00
depristo	449a6ba75a	Deleting lots of code as part of my cleanup. More classes tagged for removal. Many more walkers have their days numbered. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1885 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 12:23:36 +00:00
aaron	d749a5eb5f	added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1884 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 04:56:51 +00:00
depristo	a8a2c1a2a1	Replaced SSG with UG in packaging utils. Minor performance and formatting improvements for ClipReads git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1882 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 01:19:58 +00:00
depristo	2a26bb42dd	Softclipping support in clip reads walker. Minor improvement to WalkerTest -- now can specify file extensions for tmp files. Matt -- I couldn't easily create non-presorted SAM file. The softclipper has an impact on this. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1878 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-19 21:54:53 +00:00
chartl	055a99fb05	Change in ordering for a disjunctions. Walker will no longer try to calculate number of simple mismatches in the pileup if the pileup includes 'N's. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1877 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-19 18:24:14 +00:00
chartl	3d50c72d74	Forgot a dumb little System.out.println. You will be flooded with "This read will not be used." statements until, overwhelmed, you give in to my demands. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1874 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-19 16:13:48 +00:00
chartl	225ef52973	Now produces same output as the Scala walker for unconditioned tables (no 2bb, no previous base, etc.) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1873 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-19 16:10:44 +00:00
depristo	d6385e0d88	simpleComplement function() in BaseUtils. Generic framework for clipping reads along with tests. Support for Q score based clipping, sequence-specific clipping (not1), and clipping of ranges of bases (cycles 1-5, 10-15 for example). Can write out clipped bases as Ns, quality scores as 0s, or in the future will support softclipping the bases themselves. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1868 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 22:29:35 +00:00
chartl	ad777a9c14	@BasicPileup - made the counts public so they can be used @PoolUtils - split reads by indel/simple base @BaseTransitionTable - complete refactoring, nicer now @UnifiedArgumentCollection - added PoolSize as an argument @UnifiedGenotyper - checks to ensure pooled sequencing uses the appropriate model @GenotypeCalculationModel - instantiates with the new PoolSize argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1867 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 21:56:56 +00:00
andrewk	bdb34fcf38	Updated integration tests for VariantEval. Hooray for IT! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1866 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 20:00:29 +00:00
andrewk	d1a4cd2f73	Added ValidationData analysis type to VariantEvalWalker; this eval takes a GFF file with validated truth data positions (bound to "validation")and calculates the accuracy of the genotype calls bound to "eval". git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1862 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 15:39:08 +00:00
ebanks	418e007ca6	A cleaner interface: now everyone can use UG's initialize method git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1860 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 14:09:16 +00:00
aaron	96972c3a5c	a fix for a bug Eric found: if your first call contains fewer samples than calls at other loci, your VCFHeader got setup incorrectly. Also moved a buch of Lists over to Sets for consistancy. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1859 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 04:57:50 +00:00
aaron	a69ea9b57c	Cleaning up the VCF code, adding lots of tests for a variety of edge cases. Two issues are still outstanding: updating the no call string with the standard 1000g decided on today, and fixing Eric's issue where not all the VCF sample names are present initially. also: their, I hope your happy Eric, from now on I'll try not to flout my awesomest grammer in the future accept when I need to illicit a strong response :-) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1858 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 04:11:34 +00:00
chartl	b9544d3f89	Output formatting change (very slight) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1854 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-15 16:47:29 +00:00
kcibul	79993be46c	changed blank gene name to UNKNOWN git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1851 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-15 13:47:00 +00:00
ebanks	a32470cea1	Deal with the fact that walkers can call UG's init/map functions directly. We need to filter contexts in that case since the calling walkers don't get UG's traversal-level filters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1848 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-15 02:31:45 +00:00
ebanks	e740e7a7ce	Because walkers call UG's map function, we need to move the actual writing out to UG's reduce function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1845 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 20:49:26 +00:00
kcibul	825e6c7a4d	added calculation for bases over 2x,10x,20x,30x plus gene name git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1844 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 20:32:26 +00:00
chartl	1f66738c8e	Fix a hashing function bug. Ignore reads with non-reference bases in the pileup. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1842 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 19:41:26 +00:00
ebanks	52d2e0ca07	All walkers now use read.getReadGroup() git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1839 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 19:27:40 +00:00
chartl	0a09fa4d5c	Rename to distinguish this transition table calculator from the scala version. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1838 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 18:52:21 +00:00
chartl	1d055011bd	Getting rid of this so I can rename it without the world blowing up. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1837 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 18:45:11 +00:00
ebanks	0c95d6906f	Merge both versions of the Sequenom assay design maker: use Jared's base code and add in indels. [Jared, this still emits the same output for SNPs as your original version) Remove all sequenom stuff from the FastaAlternateReferenceMaker so it can just concentrate on making alternate references... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1831 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 17:11:45 +00:00
ebanks	49af5269e5	Jared: feel free to change or revert, but until we move over to UG version... Only print out positions with at least one non-ref call git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1830 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 17:08:57 +00:00
chartl	f5a2e6dd50	Fix! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1829 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 16:15:20 +00:00
chartl	8d0e057d83	I got bored today and decided to write the confusion matrix calculator. At present it is untested. I'm submitting it to subversion to make sure I have previous revision to revert back to. This is a calculator that will calculate: P[ True base is X \| read base mismatches, secondary base is Y, previous K bases are Z1,Z2,...ZK ] where the number of pervious reference bases to take into account is user-defined. The secondary base is optional as well. --usePreviousBases k tells the walker to use the k previous reference bases in the transition table --useSecondaryBase tells the walker to use the secondary base at a locus in the transition table these can be used together. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1816 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-13 02:55:29 +00:00
chartl	ec83bc6ec5	This somehow didn't make it into subversion the last time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1814 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-12 21:11:13 +00:00
chartl	ecbb11e017	Modified PowerBelowFrequency to ignore reads below a user-defined mapping quality. Request from Jason Flannick. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1813 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-12 20:59:24 +00:00
chartl	ec68ae3bc5	Added a filter that will split the read set by a threshold of mapping quality (Request from Jason Flannick) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1812 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-12 20:58:37 +00:00
chartl	0d73fe69e7	Recalibrator by NQS. Had this puppy running all afternoon. Thing had got through 100,000,000 reads before I decided to delete my sting tree. sigh, a little more delay. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1811 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-12 20:55:02 +00:00
chartl	ee0afba0af	Recalibration stuff... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1810 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-12 20:51:39 +00:00
aaron	62c484b57a	Fixes for GSA-201, where enumerated types in command line arguments had to be defined as all uppercase for the system to work. Also a little playground walker that changes the sort order flag of a BAM file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1805 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-09 18:11:32 +00:00
jmaguire	d9f5a314ac	avoid an out of memory error by no putting more than 5000 reads in the cache. on pilot1 at least those are crazy loci anyway. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1802 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-09 14:56:55 +00:00
chartl	6d7f4481e4	Changed traversal type slightly git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1800 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-09 04:11:48 +00:00
ebanks	a9f3d46fa8	Your time has come, SSG. Fare thee well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1799 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 20:27:56 +00:00
jmaguire	8fdb8922b8	now output in the exact format that works with sequenom software. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1798 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 20:06:27 +00:00
aaron	98e3a0bf1a	VCF can now be emitted from SSG. The basic's are there (the genotype, read depth, our error estimate), but more fields need to be added for each record as nessasary. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1797 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 19:50:04 +00:00
kiran	94d82d1915	Matthew Bainbridge's duplicate removal utility for 454 data. This code should eventually be moved into a read walker. For now, it's being introduced into the repository as-is (well, with one minor change to make the handling of command-line arguments a little more straightforward). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1794 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 18:32:37 +00:00
chartl	f89a89ffe3	Use of AlleleFrequency as an input to PowerAndCoverage is deprecated by the new walker. Reverting to the standard "power at 1 allele" calculation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1788 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 16:07:45 +00:00
chartl	ae05f5c7ad	Fixin the header. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1787 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 15:49:28 +00:00
chartl	11ff1e09b8	A new power walker for the user to feed in a number of alleles. Call that number k. Output is: Locus Power_for_k_alleles Power_for_k-2_alleles Power_for_k-2_alleles ... Power_for_1_allele This was a request from Jason Flannick & the T2DB group. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1786 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 15:35:35 +00:00
jmaguire	32128e093a	misc. changes to get the numbers back to the baseline while keeping the speedup. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1784 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 12:27:07 +00:00
jmaguire	d38a0d04b9	fix a snp mask offset error. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1783 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 12:25:40 +00:00
jmaguire	02d2492d68	Simple tool for picking sequenom probes for SNPs. Can be extended to indels if necessary. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1780 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 23:46:41 +00:00
sjia	5bdcc2b4dc	Included HLA class 2 genes in CreatePedFileWalker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1776 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 18:46:51 +00:00
sjia	8f896b734f	Included HLA class 2 genes in CreatePedFileWalker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1775 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 18:28:01 +00:00
chartl	225b9bccc1	Modifications to NQSClusteredZScoreWalker to output empirical mismatch rates on bins by both Z-score and reported Q-score, rather than averaging over all Q-score bins for each Z-score. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1773 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 13:45:12 +00:00
depristo	8dd0924b37	Minor performance improvements to VariantEval -- now all of the CPU time is spent dealing with the ROD system... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1772 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-06 23:40:30 +00:00
aaron	3aec76136f	Removing the AllelicVariant interface, which is replaced by the Variation interface. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1770 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-06 17:44:24 +00:00
depristo	1bd0c3c145	variant eval allows non Variation rod objects git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1768 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-06 13:04:26 +00:00
sjia	98076db6b4	Modified CreatePedFileWalker to output PED file given HLA allele names git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1763 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-05 03:06:42 +00:00
chartl	7605ee500c	Idiocy! All tests were being disabled because I forgot the instanceof git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1760 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-02 20:04:56 +00:00
chartl	88d0890cc3	Made PooledGenotypeConcordance a standard test in VariantEval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1759 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-02 20:03:31 +00:00
chartl	68cb2ee54b	Tweaks to parameters for NQS analysis walkers; change to PowerAndCoverage for Jason Flannick (can input the number of alleles to compute power for - i.e. doubletons, tripletons; rather than statically checking singletons. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1757 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-02 19:11:27 +00:00
aaron	e885cc4b21	changes for corrected GLF likelihood output, along with better tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1754 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-01 20:45:05 +00:00
hanna	2309d19f6f	Bug fix from Michael Ross: mark second read in sequence as second of pair. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1753 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-01 14:34:36 +00:00
aaron	b1c321f161	Adjusted Genotype concordance to more accurately use the new Genotyping code, fixed the VCF rod, and temp. fix the build by reintroducing Shermans ReadCigarFormatter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1745 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 21:28:21 +00:00
sjia	9b78a789e2	HLA Caller 2.0 Walkers: CalculateBaseLikelihoodsWalker.java walks through reads calculates likelihoods using SSG at each base position CalculateAlleleLikelihoodsWalker.java walks through HLA dictionary and calculates likelihoods for allele pairs given output of CalculateBaseLikelihoodsWalker.java CalculatePhaseLikelihoodsWalker.java walks through reads and calculates likelihoods score for allele pairs given phase information File Readers: BaseLikelihoodsFileReader.java reads text file of likelihoods outputted by SSG FrequencyFileReader.java reads text file of HLA allele frequencies PolymorphicSitesFileReader.java reads text file of polymorphic sites in the HLA dictionary SAMFileReader.java reads a sam file (used to read HLA dictionary when in another walker) SimilarityFileReader.java reads a text file of how similar each read is to the closest HLA allele (used to filter misaligned reads) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1744 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 20:45:55 +00:00
chartl	281a77c981	Bugfix. isMismatch() was actually computing isMatch(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1743 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 20:04:59 +00:00
chartl	e28b45688c	More NQS Related Walkers to play with git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1742 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 20:01:04 +00:00
andrewk	6134f49e3c	Convert de novo SNP caller to run using parent1 and parent2 BAM files (by splitting contexts by reader using getMergedReadGroupsByReaders) instead of geli files providing a large speed-up and obviating the need for large whole-genome geli files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1738 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 06:42:21 +00:00
andrewk	5662a88ee1	Cosmetic change to list sampling functions: the typical usage of n and k were reversed. No change in functionality of the classes has been made and unit tests still pass. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1736 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-28 18:12:32 +00:00
aaron	39598f1f0a	switching the concordance walker over to the new Variation system git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1735 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-28 15:46:36 +00:00
asivache	92c6efabb7	moving IndelGenotyper out of playground git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1732 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-25 19:44:49 +00:00
chartl	fe6d810515	Some basic commits that I've been sitting on for a while now: @ PooledGenotypeConcordance - changes to output, now also reports false-negatives and false-positives as interesting sites. It's been like this in my directory for ages, just never committed. @NQSExtendedGroupsCovariantWalker - change for formatting. @NQSTabularDistributionWalker - breaks out the full (window_size)-dimensional empirical error rate distribution by the window. So if you've got a window of size 3; the quality score sequences 22 25 23 and 22 25 24 have their own bins (each of the 40^3 sequences get one) for match and mismatch counts. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1730 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-25 19:35:50 +00:00
sjia	f7684d9e1b	ImputeAllelesWalker fills missing portions of HLA dictionary based on best allele matches git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1729 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-25 18:51:46 +00:00
sjia	235de38c2e	Updates to FindClosestAlleleWalker and CreateHaplotypesWalker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1728 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-25 16:41:58 +00:00
aaron	7ffc1d97ef	Cut DeNovoSNPWalker over to the new Variation system, some renaming of methods on the Variation interface, and some corrections on the interface. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1724 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-25 04:35:52 +00:00
depristo	392152f149	1000x performance improvements to MSG for crisis control git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1723 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-24 23:44:33 +00:00
aaron	d262cbd41c	changes to add VCF to the rod system, fix VCF output in VariantsToVCF, and some other minor changes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1715 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-24 15:16:11 +00:00
sjia	1ee8ba590c	Reads cigar files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1713 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-24 03:14:10 +00:00
sjia	9422156e09	Finds closest allele for each read in bam file git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1712 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-24 03:12:20 +00:00
sjia	5c5151c4e7	Creates ped file from reads git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1711 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-24 02:48:29 +00:00
sjia	b446b3f1b6	CreateHaplotypeWalker now gives correct output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1709 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-23 21:13:52 +00:00
sjia	3916e165fb	New walker to output haplotypes for each read (for SNP analysis or imputation, etc) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1707 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-23 20:26:43 +00:00
chartl	63f3d45ca4	fixing the build git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1705 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-23 20:04:09 +00:00
chartl	540e1b971f	And we fix one boneheaded mistake, which was actually causing the problem; though the last change was still correct. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1704 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-23 19:26:45 +00:00
chartl	124ca68fa8	And an IMMEDIATE minor fix (want neighborhood quality > base quality to be represented correctly) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1703 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-23 19:21:09 +00:00
chartl	8cdb78ebee	More sophisticated version of the NQSCovariantWalker - modified to be more explicit about how much higher the quality score of a particular base is than the quality score of its neighbors. The granularity of the binning jumps from 32 groups to 860 groups. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1702 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-23 19:18:24 +00:00
aaron	f783cb30e0	adding an interface so that the current @Requires with ROD annotations work in walkers like VariantEval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1700 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-23 18:24:05 +00:00
asivache	fa87dd386d	Now uses rodRefSeq in its new reincarnation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1698 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-23 18:19:36 +00:00
asivache	fe36289e44	Noone needs this, probably... Old experimental code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1695 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-23 18:11:50 +00:00
sjia	aa66074a0e	Compares each read to the HLA dictionary and outputs closest allele, as well as other stats git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1693 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-23 16:17:23 +00:00
aaron	11c32b588f	fixing VariantEvalWalkerIntegrationTest md5 sums, a couple comment changes, and a little bit of cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1690 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-22 20:54:47 +00:00
sjia	22932042ea	Combined Scores, bug fixed for printing HLA-C git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1685 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-22 18:28:16 +00:00
asivache	d7d0b270d1	now supports blacklisting lanes (with -BL option will ignore reads from any of the specified lanes) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1682 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-22 16:46:57 +00:00
asivache	fb09835ef8	Changed to accomodate new ROD system git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1671 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 17:10:56 +00:00
asivache	f4d270cba4	These classes now use BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1669 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 17:03:15 +00:00
aaron	3a487dd64e	little fixes; also fixed a tyPo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1662 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 22:38:51 +00:00
depristo	3a341b2f06	Fixes for VariantEval for genotyping mode git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 21:01:43 +00:00
aaron	7b39aa4966	Adding the VCF ROD. Also changed the VCF objects to much more user friendly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1658 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 20:19:34 +00:00
sjia	83e6e5a3e4	Calculates Probability for each allele combination (using likelihood score and allele frequencies only) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1656 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 18:46:38 +00:00
ebanks	7da9ff2a9e	Put back the check that both chip and variant are not null. Also, sanity check that ref is not 'N'. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1651 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 16:03:54 +00:00
ebanks	66a4de9a1d	Genotype check should be case-insensitive git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1649 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 03:23:30 +00:00
sjia	0e73b2ba8e	Use population allele frequencies to distinguish between top candidates git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1645 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-17 15:49:19 +00:00
chartl	534486a254	Output formatting changed: - summary output now reported as a percentage rather than proportion; 2 sigfigs - fixed minor bug where FNR was calculated over total calls rather than total variant sites - column headers are_now_contiguous_strings - spacing fixed - "No Call" separated from "Ref Call" as its own column git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1644 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-17 14:00:25 +00:00
depristo	73bec6f36d	Now uses expanding array list for coverage histograms. No hard limit on maximum depth now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1643 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 23:27:25 +00:00
chartl	4ad46590a3	Changes to PooledGenotypeConcordance: Additional output & better output formatting. It has now undergone a good five hours of testing; and for pools of size 1 outputs exactly the same statistics as GenotypeConcordance (when GenotypeConcordance is modified to do nothing on reference='N'); and for pools of many sizes outputs close to the expected (by genetics) statistics. Looks like this is working properly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1642 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 21:45:01 +00:00
chartl	386a6442ba	Actually deleted now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1641 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 20:28:06 +00:00
chartl	8fce376792	Changes: Deletion: PooledGenotypeConcordanceNew Rewrite: PooledGenotypeConcordance. It works, and is blazing fast compared to the earlier version (1 order of magnitude speedup)! And is now entirely non-hackey, as opposed to before when there were some hacky bits. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1640 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 20:22:16 +00:00
asivache	3e289fcaa4	A little piece that PairMaker needs in order to compile ;) Iterates synchronously over two (name-ordered) single-end alignment SAM files with, possibly, multiple alignments per read and for each read name encountered returns pairs<all alignments for end1, all alignments for end2> git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1639 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 19:17:40 +00:00
asivache	2f29cf59ba	Very early, half-baked version. All it can do right now is to take two SAM files with end1 and end2 individual single-end alignmnets from a pair-end run and spit out a "paired" BAM file that contains ONLY properly paired ends (both ends align uniquely && both ends align to the same chromosome && the ends align in proper orientation). Insert size is currently not used (and not set in the output). Unpaired/unmapped reads are NOT transferred into the output bam. For the pairs that do get written, the output is (should be) standard-conforming: all flags are properly set and mate pair information is correct. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1637 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 18:38:18 +00:00
chartl	f6bdb47bb6	Addition: @PooledGenotypeConcordanceNew - a new version of the pooled genotype concordance test for Variant Eval. Code altered to be more extensible, use a private class for handling the count tables so it doesn't gunk up the code in the test itself, and for easy debugging. The hackier methods from the original were rewritten properly. Currently computes more statistics that it outputs. Code compiles, is never called by anything, and breaks none of the tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1632 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 04:14:58 +00:00
aaron	542d817688	more cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1631 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 21:42:03 +00:00
aaron	b401929e41	incremental clean-up and changes for VariantEval, moved DiploidGenotype to a better home, and fixed a spelling error. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1624 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 04:48:42 +00:00
andrewk	fb254759cb	Trivial: Don't print reduce result git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1621 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 23:42:20 +00:00
chartl	7d6d114ab5	Additions: @NQSMismatchCovariantWalker - Walks along the gene calculating the table # NQS # Q score # mismatches at non-dbsnp sites # total number of bases at non-dbsnp sites And prints it out at the end. Changes: @PooledGenotypeConcordance now works. Takes a path to a file listing a bunch of hapmap IDs in whatever pool we want to check, reads those in, and checks for concordance by name. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1614 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 20:12:04 +00:00
sjia	9be1832d7b	Phasing version 1 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1613 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 16:10:37 +00:00
aaron	e03fccb223	Changes to switch Variant Eval over to the new Variation system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1611 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 05:34:33 +00:00
chartl	5cf1d6c104	Bugfix - this walker was never changed to work with the new PoolUtils methods after those methods were changed to return ReadOffsetQuad objects rather than nested pairs. This broke the build :(. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1608 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 19:39:23 +00:00
ebanks	15178977e1	Naive tool to convert from vcf to geli text git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1606 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 17:25:02 +00:00
chartl	794bd26b20	Changed some ShortNames so they made more sense. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1604 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 01:32:12 +00:00
chartl	b353bd6f81	Added a Quad toString() method. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1603 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 01:13:57 +00:00
chartl	2e237a12e9	This commit has a bunch to do with cleaning up the CoverageAndPowerWalker code: implementing some new printing options, but mostly altering the code so it's much more readable and understandable, and much less hacky-looking. ADDED: @Quad: This is just like Pair, except with four fields. In the original CoverageAndPowerWalker I often used a pair of pairs to hold things, which made the code nigh unreadable. @SQuad: An extension of Quad for when you want to store objects of the same type. Let's you simply declare new SQuad<X> rather than new Quad<X,X,X,X> @ReadOffsetQuad: An extension of Quad specifically for holding two lists of reads and two lists of offsets Supports construction from AlignmentContexts and conversion to AlignmentContexts (given a GenomeLoc). There are methods that make it very clear what the code is doing (getSecondRead() rather than the cryptic getThird() ) @PowerAndCoverageWalker: The new version of CoverageAndPowerWalker. If the tests all go well, then I'll remove the old version. New to this version is the ability to give an output file directly to the walker, so that locus information prints to the file, while the final reduce prints to standard out. Bootstrap iterations are now a command line argument rather than a final int; and users can instruct the walker to print out the coverage/power statistics for both the original reads, and those reads whose quality score exceeds a user-defined threshold. CHANGES: @PoolUtils: Altered methods to accept as argumetns, and return, Quad objects. Added a random partition method for bootstrapping. @CoverageAndPowerWalker: Altered methods to work with the new PoolUtils methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1602 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 01:00:04 +00:00
andrewk	5354c1876c	De Novo SNP caller as presented at 1KG meeting on 9/10/09 with min LOD 5 calls required from both parents and a LOD 5 call in the daugter gold standard concordant call set. All SNP calls must be present as bound RODs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1590 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 19:30:23 +00:00
chartl	c3f77acd5e	Alteration to CoverageAndPowerWalker. It can now be flagged with -uc which will cause it to print not only the coverage on each strand that exceeds the quality score threshold, but also the total coverage on each strand as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1588 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 17:55:44 +00:00
chartl	d6a0b65ac9	Changes: Rollback of Variant-related changes of r1585, additional PGC code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1586 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 16:23:01 +00:00
chartl	0c54aba92a	Changes: @VariantEvalWalker - added a command line option to input a file path to a pooled call file for pooled genotype concordance checking. This string is to be passed to the PooledGenotypeConcordance object. @AllelicVariant - added a method isPooled() to distinguish pooled AllelicVariants from unpooled ones. @ all the rest - implemented isPooled(); for everything other than PooledEMSNProd it simply returns false, for PooledEMSNProd it returns true. Added: @PooledGenotypeConcordance - takes in a filepath to a pool file with the names of hapmap individuals for concordance checking with pooled calls and does said concordance checking over all pools. Commented out as all the methods are as yet unwritten. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1585 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 15:01:50 +00:00
ebanks	e24c8d00d5	So, the VCF spec allows for an optional meta field in the header representing the date. However, using this field means that integration tests run on the vcf file will fail the MD5 test (which is what happened to the VariantFiltration test this morning after working just fine yesterday). After consulting our resident expert (Aaron), we're going to (temporarily) remove the date from the vcf output until we can come up with a better solution. However, this shouldn't cause any short-term problems because the data truly is optional. VF test's MD5s are updated. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1580 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-10 14:28:43 +00:00
asivache	d9f3e9493f	Does not return 0-length cigar elements anymore (used to do so when previous cigar element ended exactly at the segment boundary) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1570 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 20:05:55 +00:00
ebanks	cb31d5a0ab	VariantFiltration now outputs VCF. Important changes: 1. VariantsToVCF can now be called statically to output VCF for a single ROD instance; this is temporary until we have a VCF ROD. 2. VariantFiltration now outputs only 2 files, both mandatory: all variants that pass filters in geli text, and all variants in VCF. If there are any problems, go find Aaron. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1569 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 20:04:32 +00:00
chartl	9c7f456510	Changed the short name on the PoolSize cmd line argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1560 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:53:22 +00:00
chartl	9d69bd2c84	Modifications: @CoverageAndPowerWalker - removed a hanging colon that was being printed after the reference position @VariantEvalWalker - added a command line argument for pool size for eventual use in doing pooled caller evaluations. As now, the variable is unused. @AlignmentContext - altered the scope of class variables from private to protected in order that child objects might have access to them New Additions: Filtered Contexts Sometimes we want to filter or partition reads by some aspect (quality score, read direction, current base, whatever) and use only those reads as part of the alignment context. Prior to this I've been doing the split externally and creating a new AlignmentContext object. This new approach makes it a bit easier, as each of these objects are children of AlignmentContext, and can be instantiated from a "raw" AlignmentContext. @FilteredAlignmentContext is an abstract class that defines the behavior. The abstract method 'filter' is called on the input AlignmentContext, filtering those reads and offsets by whatever you can think of. The filtered reads/offsets are then maintained in the reads and offsets fields. These classes can be passed around as AlignmentContexts themselves. Writing a new kind of read-filtered alignment context boils down to implementing the filter method. @ReverseReadsContext - a FilteredAlignmentContext that takes only reads in the reverse direction @ForwardReadsContext - a FilteredAlignmentContext that takes only reads in the forward direction @QualityScoreThresholdContext - a FilteredAlignmentContext that takes only reads above a given quality score threshold (defaults to 22 if none provided). A unit test bamfile and associated unit tests for these are in the works. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1559 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:49:52 +00:00
asivache	0721c450c2	Bug fix: single unmapped read now keeps mapping qual 0 after remapping, not 37! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1557 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:29:34 +00:00
depristo	ec0f6f23c7	LocusIterationByState is now the system deafult. Fixed Aaron's build problem git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 01:28:05 +00:00
aaron	ea6ffd3796	initial VariantEvalWalker test. More to be added soon... Also fixed the case where MD5 sums had leading zero's clipped off git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1551 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 01:02:04 +00:00
sjia	600c234643	Starting code on phasing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1548 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 15:20:38 +00:00
aaron	3276e01e5f	fixing the build git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1546 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 13:13:55 +00:00
kiran	fd20f5c2e8	For a file or files backed by a ROD implementing AllelicVariant, outputs a VCF file summarizing the information. Metadata like Hapmap and dbSNP membership, genotype LOD, read depth, etc, are annotated appropriately. The results output by this program are equivalent to those given by Gelis2PopSNPs.py. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1544 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 06:12:18 +00:00
ebanks	4a95f2181d	print out the right variant git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1543 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 01:37:35 +00:00
sjia	5791da17ae	Updated to reference HLA database of unique 4 digit alleles git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1542 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-07 22:12:56 +00:00
ebanks	5dbba6711c	Lots of changes: (I'll send email out in a sec) 1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it). 2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing). 3) Have indel rod print samples git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-07 01:12:09 +00:00
sjia	471ca8201e	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1537 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 19:12:46 +00:00
aaron	0cc634ed5d	-Renamed rodVariants to RodGeliText -Remove KGenomesSNPROD -Remove rodFLT -Renamed rodGFF to RodGenotypeChipAsGFF -Fixed a problem in SSGenotypeCall -Added basic SSGenotype Test class -Make VCFHeader constructors public git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1536 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 18:40:43 +00:00
ebanks	6c476514f8	Moved to core. Wiki pages are going up; unit tests will be written soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1533 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 15:09:11 +00:00
ebanks	42c71b4382	Fix for Kris: now SNPs aren't masked by default (only when they come from a mask rod) and we can design Sequenom validation assays for them. I'll move this all to core in a bit... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1532 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 14:52:06 +00:00
depristo	a08c68362e	Renaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls AND the compares the geli MD5 sum to the expected one! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1530 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 12:39:06 +00:00
aaron	3c2ae55859	changes for the genotype overhaul. Lots of changes focusing on the output side, from single sample genotyper to the output file formats like GLF and geli. Of note the genotype formats are still emitting posteriors as likelihoods; this is the way we've been doing it but it may change soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1529 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 05:31:15 +00:00
ebanks	2241173fff	In order to help learn python, I decided to convert Michael's DoC python script to Java; the CoverageHistogram now spits out standard deviations for a good Gaussian fit. This code eventually needs to end up in the VariantFiltration system - when we are ready to parameterize on the fly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1528 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 02:23:57 +00:00
chartl	544900aa99	Migration of some core calculations (log-likelihood probabilties, etc.) from CoverageAndPowerWalker into static methods in PoolUtils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1527 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 21:43:29 +00:00
chartl	93cedf4285	--------------- \| Added items \| --------------- @/varianteval/PoolAnalysis Interface to identify variant analyses that are pool-specific. @/varianteval/BasicPoolVariantAnalysis Nearly the same as BasicVariantAnalysis with the addition of a protected integer (numIndividualsInPool) which holds the pool size. One soulcrushing change is that "protected String filename" needed to become "protected String[] filename" since now multiple truth files may be looked at. It was tempting to make the change in BasicVariantAnalysis with some default methods that would maintain usability of the remainder of the VariantAnalysis objects, but I decided to hold off. We can always merge these together later. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1526 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 21:26:04 +00:00
sjia	ee06c7f29f	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1525 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:41:12 +00:00
sjia	043c97eede	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1524 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:34:42 +00:00
aaron	c849282e44	reverting the HLA walker changes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1523 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:11:57 +00:00
asivache	5202d959bf	NM attribute changed in sam jdk (?) from Integer to Short, or maybe it is presented differently by the reader depending on whether SAM or BAM is processed; in any case, both Integer and Short are safe now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1522 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:03:32 +00:00
sjia	ada4c5a13c	Small change to debug printing code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1521 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 18:31:21 +00:00
kiran	c3aaca1262	Improvements to make this work with uncompressed fastq files. Pulled the fastq parser out into it's own SAMFileReader-like entity. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1520 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 17:20:16 +00:00
asivache	499b3536a4	Changed to use AlignmentUtils.isReadUnmapped() for better consistency with SAM spec; also, it is now explicitly enforced that unmapped reads have <NO_...> values set for ref contig and start upon "remapping" git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1519 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 16:45:07 +00:00
ebanks	5bd99fc1c4	VariantFiltration moved to core. Another win for the team. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1517 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 15:41:41 +00:00
chartl	5130ca9b94	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1516 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 15:17:02 +00:00
jmaguire	e2780c17af	Checkin of the Multi-Sample SNP caller. Doesn't work yet; same command I used to use now causes GATK to throw an exception. Will check with Matt & Aaron tomorrow, then do a regression test. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1509 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 00:23:28 +00:00
ebanks	55013eff78	Re-revert back to point estimation for now. We need to do this right, just not yet. Also, it's safer to let colt do the log factorial calculations for us. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1503 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 15:33:18 +00:00
ebanks	24d809133d	Oops - comment out the printouts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1500 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 01:45:56 +00:00
ebanks	91ccb0f8c5	Revert to having these filters use integration over binomial probs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1499 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 01:40:22 +00:00
aaron	4a1d79cd7b	added a flag, maximum_reads_at_locus, shortName "mrl", which limits the number of reads we add to the locusByHanger. In some bam files misalignment produces pile-ups of 750K or more reads. We now limit this to the default of 100K reads. The user is warned if a locus exceeds this threshold, and no more reads are added. Also CombineDup walker had an incorrect package name. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1496 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-01 04:21:58 +00:00
ebanks	0addae967a	IndelArtifact filter can now handle filtering false SNPs that occur within the span of an indel but after the first position git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1495 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-01 03:34:39 +00:00
asivache	591f8eedbb	Added setName() and getName() (however, not used anywhere yet). Now can set the name of the fasta record manually to whatever, however it will work only if done early enough. If the fasta record already started printing itself (i.e. the header line is already done), setName() will throw an exception. Could be too entangled, may reverse this back... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1493 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 18:09:55 +00:00
asivache	c9eb193c7f	Now recognizes a special name for a bound rod track: snpmask. If a rod with this name is bound, then ONLY snps from that track will be used (to set alt reference bases to N's), but indels will be ignored. This helps when an alt. ref has to be created for a set of indel calls, and another rod (e.g. dbSNP) is used to put N's in (for sequenom). If dbSNP rod is not marked as "snpmask", the indels reported there will make their way into the alt. reference output and mess it up. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1492 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 18:05:57 +00:00
ebanks	8e3c3324fa	Added filter for SNPs cleaned out by the realigner. It uses the realigner output for filtering; in addition, dbsnp indels partially work; IndelGenotyper calls don't yet work. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1489 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 04:32:32 +00:00
ebanks	463f80c03e	Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1487 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 03:37:24 +00:00
ebanks	1a299dd459	Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1486 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 03:31:37 +00:00
ebanks	e70101febc	Add a VEC filter for clustered SNP calls that takes advantage of the new windowed approach; delete the old standalone walker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1485 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 03:14:42 +00:00
ebanks	215e908a11	Reworking of the VariantFiltration system to allow for a windowed view of variants and inclusion of more data to the various filters. This now allows us to incorporate both the clustered SNP filter and a SNP-near-indels filter, which otherwise wasn't possible. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1484 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 02:16:39 +00:00
depristo	49a7babb2c	Better organization of Genotype likelihood calculations. NewHotness is now just GenotypeLikelihoods. There are 1, 3, and empirical base error models available as subclasses, along with a simple way to make this (see the factory). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1481 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-30 19:16:30 +00:00
depristo	5af4bb628b	Intermediate checking before code reorganization. Full blown support for empirical transition probs in SSG for all platforms. Support for defaultPlatform arg in SSG. Renaming classes for final cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1479 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-30 17:34:43 +00:00
depristo	6ab9ddf9f5	Significant output formatting improvements. SNPs as indels analysis. heterozygosity rate calculations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1478 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-29 21:49:09 +00:00
depristo	f0179109fa	Removing min confidence for on/off genotype git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1473 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-28 01:04:13 +00:00
depristo	dc9d40eb9a	Now requires a minimum genotype LOD before applying tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1471 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-28 00:19:23 +00:00
depristo	a639459112	Trival consistency change from char in to char out, not char in to byte out git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1466 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 23:37:37 +00:00
chartl	6012f7602b	@ minor fixes to CoverageAndPowerWalker and AnalyzePowerWalker (switching to By Reference traversal, spitting out Syzygy position for sanity check) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1465 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 21:44:18 +00:00
chartl	bd1e679bc5	@ Fixed issues with AnalyzePowerWalker which depended on CoverageAndPowerWalker. The latter was changed but not the former. Now fixed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1464 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 20:23:41 +00:00
kiran	a17dad5fa9	Converts from fastq.gz to unaligned BAM format. Accepts a single fastq (for single-end run) or two fastqs (for paired-end run). Also allows you to set certain BAM metadata (read groups, etc.). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1463 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 20:20:09 +00:00
chartl	8740124cda	@ListUtils - Bugfix in getQScoreOrderStatistic: method would attempt to access an empty list fed into it. Now it checks for null pointers and returns 0. @MathUtils - added a new method: cumBinomialProbLog which calculates a cumulant from any start point to any end point using the BinomProbabilityLog calculation. @PoolUtils - added a new utility class specifically for items related to pooled sequencing. A major part of the power calculation is now to calculate powers independently by read direction. The only method in this class (currently) takes your reads and offsets, and splits them into two groups by read direction. @CoverageAndPowerWalker - completely rewritten to split coverage, median qualities, and power by read direction. Makes use of cumBinomialProbLog rather than doing that calculation within the object itself. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1462 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 19:31:53 +00:00
chartl	1da45cffb3	New: Minor changes to CoverageAndPowerWalker bootstrapping (faster selection of indeces). Entirely new Aritifical Pool Walker (ArtificialPoolWalkerMk2), will likely replace ArtificialPoolWalker on the next commit. Adapted the method of sampling, and added a helper context class: ArtificialPoolContext which carries much of the burden of calculation and data handling for the walker. The walker itself maps and reduces ArtificialPoolContexts. Cheers! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1461 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-26 21:42:35 +00:00
chartl	92ea947c33	Added binomProbabilityLog(int k, int n, double p) to MathUtils: binomialProbabilityLog uses a log-space calculation of the binomial pmf to avoid the coefficient blowing up and thus returning Infinity or NaN (or in some very strange cases -Infinity). The log calculation compares very well, it seems with our current method. It's in MathUtils but could stand testing against rigorous truth data before becoming standard. Added median calculator functions to ListUtils getQScoreMedian is a new utility I wrote that given reads and offsets will find the median Q score. While I was at it, I wrote a similar method, getMedian, which will return the median of any list of Comparables, independent of initial order. These are in ListUtils. Added a new poolseq directory and three walkers CoverageAndPowerWalker is built on top of the PrintCoverage walker and prints out the power to detect a mutant allele in a pool of 2*(number of individuals in the pool) alleles. It can be flagged either to do this by boostrapping, or by pure math with a probability of error based on the median Q-score. This walker compiles, runs, and gives quite reasonable outputs that compare visually well to the power calculation computed by Syzygy. ArtificialPoolWalker is designed to take multiple single-sample .bam files and create a (random) artificial pool. The coverage of that pool is a user-defined proportion of the total coverage over all of the input files. The output is not only a new .bam file, but also an auxiliary file that has for each locus, the genotype of the individuals, the confidence of that call, and that person's representation in the artificial pool .bam at that locus. This walker compiles and, uhh, looks pretty. Needs some testing. AnalyzePowerWalker extends CoverageAndPowerWalker so that it can read previous power calcuations (e.g. from Syzygy) and print them to the output file as well for direct downstream comparisons. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1460 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:27:50 +00:00
kiran	478f426727	Fixed a missing method implementation in these two files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1459 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:21:58 +00:00
kiran	f12ea3a27e	Added ability for all filters to return a probability for a given variant - interpreted as the probability that the given variant should be included in the final set. The joint probability of all the filters is computed to determine whether a variant should stay or go. At the moment, this is only visible in verbose mode (specify -V). Also removed 'learning mode'; now, filters emit important stats no matter what. Various code cleanups. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1458 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:17:56 +00:00
asivache	0bdecd8651	A most stupid bug. In cases when more than one indel variant was present in cleaned bam file, the "consensus" (max. # of occurences) call was computed incorrectly, and most of the times the call itself was not made at all. Fortunately, the locations where we see multiple indels are a minority, and many of them are suspicious anyway (manifestation of alignment problems?). Could change results of POOLED calls though. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1448 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 22:31:44 +00:00
kcibul	6c0adc9145	resuse fasta file reader git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1446 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 16:01:58 +00:00
ebanks	10c98c418b	Walker to determine the concordance of 2 genotype call sets. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1443 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 01:32:44 +00:00
ebanks	1d74143ef4	A convenience argument - for Mark - so that you don't have to specify all the output file names git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1442 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 00:49:12 +00:00
ebanks	82e2b7017e	Prevent array bounds errors git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1435 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 16:54:31 +00:00
ebanks	26a6f816c9	set default value for output format git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1434 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 16:17:09 +00:00
ebanks	9b1d7921e8	added filter based on concordance to another call set git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1432 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 15:16:30 +00:00
ebanks	b2a18a9d61	- first pass at a basic indel filter (for now, based on size and homopolymer runs) - fix simple indel rod printout git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1431 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 03:04:12 +00:00
ebanks	78439f7305	Modify Sequenom input format based on official documentation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1430 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 01:42:57 +00:00
ebanks	d4808433a1	Added option to output the locations of indels in the alternate reference git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1424 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-16 03:46:36 +00:00
ebanks	4b6ddc55bd	Merge our 2 fastq writers into 1: incorporate Kiran's secondary-base file writer into the fasta/fastq writers git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1423 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-14 20:55:23 +00:00
ebanks	0ec581080c	Refactoring the code; also, now it prints continuously instead of potentially storing one long string. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1421 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-13 01:32:46 +00:00
asivache	2a01e71277	A very simple standalone filter for fooling around with the data: can extract only mapped or only unmapped reads, only reads with mapping quals > X, reads with average base qual > Y, reads with min base qual > Z, reads with edit distance from the ref > MIN and/or < MAX git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1420 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:28:51 +00:00
asivache	ebec0ec171	A standalone companion to BamToFastqWalker: does the same thing but without calling in gatk's heavy artillery (does not "require" a reference either). Extracts seqs and quals and places them into fastq; along the way it also reverse complements reads that align to the negative strand (so that fastq contains reads as they come from the machine). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1419 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:24:37 +00:00
asivache	112a283f54	be nice, don't forget to close the reader when done git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1418 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:19:56 +00:00
asivache	ba2a3d8a58	Reverse qualities when read seq. is reverse complemented git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1417 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:17:35 +00:00
ebanks	143f8eea4e	option to output in sequenom input format git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1415 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 16:50:37 +00:00
ebanks	7f1159b6a9	Added option to mask out SNP sites with "N"s in the new reference. This is useful when producing Sequenom input files for validating indels... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1414 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 15:17:45 +00:00
ebanks	43f63b7530	Added a walker to convert a bam file to fastq format (including the option to re-reverse the negative strand reads). Picard has such a tool but it is geared towards their pipeline and requires intimate knowledge of the lanes/flowcells,etc. This is just easy. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1413 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 15:10:40 +00:00
asivache	e4acd14675	Now GenomicMap maps (and RemapAlignment outputs) regions between intervals on the master reference as 'N' cigar elements, not 'D'. 'D' is now used only for bona fide deletions. Also: do not die if alignment record does not have NM tags (but mapping quality will not be recomputed after remapping/reducing for the lack of required data) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1411 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-11 21:10:17 +00:00
ebanks	5fab934f4e	- moved the reference maker to its own directory - added first version of a more complicated reference maker which takes in RODs and creates an alternative reference based on the variants (indels and/or SNPs) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1409 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-11 18:01:06 +00:00
sjia	1851613de4	Now using larger database of HLA alleles git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1405 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-11 03:11:14 +00:00
asivache	3208eaabcc	A standalone picard-level tool for breaking individual reads into "pairs" of first/last N bases. Supports: * splitting off only start or end of the read, or both; the output will contain chopped sequences AND corresponding base qualities * splitting arbitrary number of bases off each end (different numbers for left and right segments can be specified; segments can overlap) * splitting only unmapped reads, ignoring mapped ones * writing splitted ends into separate sam/bam files, or into a single output file * decorating original read names with user-specified suffixes for each end (e.g. _1 and _2 for left and right parts of the read); default: no decoration, original read names are used * when mapped reads are split, the alignment cigars are chopped appropriately and the alignment start positions are adjusted (for the right end) to correctly specify the alignment of the selected part of the read git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1402 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-10 20:42:49 +00:00
asivache	36312ae4b2	tiny cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1401 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-10 20:26:52 +00:00
asivache	921d4f4e95	RemapAlignments is a standalone picard-level tool that does not use gatk engine; moved to 'tools' git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1396 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-10 15:41:07 +00:00
depristo	089dab00e2	Was discordance rate, now concordance rate git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1393 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 19:37:52 +00:00
depristo	6d3ef73868	Now includes statistics on the allele agreement with dbSNP -- counts concordant calls as dbSNP = A/C and we say A/C, vs. we say A/T git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1392 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 19:37:07 +00:00
depristo	a864c2f025	Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1390 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 19:00:06 +00:00
ebanks	db250f8d3e	Don't print if not in learning mode git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1389 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 06:08:02 +00:00
ebanks	4c1fa52ddf	-Added mapping quality zero filter -Set some reasonable defaults (based on pilot2) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1388 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 03:18:02 +00:00
sjia	d60d5aa516	Fixed bug: previously reset likelihoods after each region/exon. Better comments/documentation added git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1386 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 18:44:46 +00:00
kcibul	0d47798721	made booster distance a parameter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1385 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 18:29:21 +00:00
ebanks	3b74b3ba74	print out ref/alt ratio, not major/minor git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1384 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 16:36:25 +00:00
depristo	65e9dcf5b7	Fully operational version of the new genotype likelihoods class. (1) Much cleaner interface. Now explicitly stores likelihoods, priors, and posteriors in separate arrays indexed by an enum, (2) no longer can be used to make calls, it relies on SSGGenotypeCall to order the likelihoods, calculate best to ref, etc, this is just for calculating genotype likelihoods now; (3) Now performs extensive error checking with validate() to ensure the system is behaving properly. (4) fixed incorrect treatment of N bases, which we being counted against everyone (5) likely found a stats bug in which heterozyosity was being applied incorrectly to the genotype priors git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1382 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 01:00:55 +00:00
sjia	68309408e4	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1378 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 21:23:01 +00:00
sjia	45ab212f22	Post-presentation update git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1377 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 21:21:12 +00:00
hanna	21d1eba502	Cleaned division of responsibilities between arguments to map function. Reference has been changed from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect the fact that it contains contextual information only about the alignments, not the locus in general. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 21:01:37 +00:00
kcibul	a5a7d7dab8	added "booster" metrics git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1375 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 20:53:45 +00:00
ebanks	3a8d923785	minor output changes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1374 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 20:12:16 +00:00
mmelgar	939b19e715	Committing the first version of the homopolymer filter. Removes SNPs that occur at the edges of homopolymer runs and whose nonref allele matches the repeated base in the homopolymer. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1373 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 14:35:51 +00:00
depristo	20ff603339	New hotness and old and Busted genotype likelihood objects are now in the code base as I work towards a bug-free SSG along with a cleaner interface to the genotype likelihood object git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1372 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 23:07:53 +00:00
depristo	3485397483	Reorganization of the genotyping system git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1370 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 20:55:31 +00:00
ebanks	9f1d3aed26	-Output single filtration stats file with input from all filters -move out isHet test to GenotypeUtils so all can use it git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1369 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 20:44:21 +00:00
depristo	d840a47b11	Slight reorganization of genotype interface git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1366 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 19:17:15 +00:00
depristo	20986a03de	cleanup before moving files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1365 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 19:08:24 +00:00
ebanks	e3b08f245f	Pull out RMS calculation into MathUtils for all to use git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1364 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 17:00:20 +00:00
ebanks	e495b836d3	- added mapping quality filter - make the filters brainless in that they strictly have thresholds and filter based on them; require user to calculate and input these thresholds. - update filters in preparation for migration to new output format git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1363 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 16:46:51 +00:00
kiran	8bc925a216	Commit on the behalf of Mark: cleaning up some old and busted code in GenotypeLikelihood and associated objects. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1361 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-31 21:18:30 +00:00
aaron	9dfee7a75c	the "-genotype" option now acts correctly as a discovery mode caller in SSG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1359 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-31 18:31:45 +00:00
sjia	9dada95ec3	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1357 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-31 16:21:16 +00:00
andrewk	678c2533ca	Removed custom output stream for file and replaced with the standard out PrintStream git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1350 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 22:36:42 +00:00
andrewk	44673b2dce	Removed a debugging println that was accidentally checked in git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1348 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 22:23:27 +00:00
andrewk	845488ff94	VariantEval now decides whether a variant is not confidently called using BestVsNetxBest if genotypes are being evaluated and BestVsRef if not (variant discovery only). Also, the absolute value of the BestVsRef LOD (getVariantionConfidence) is used so that confident reference calls (if the GELI has output them) will show up in the final table as reference calls rather than no calls. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1347 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 21:54:06 +00:00
andrewk	fdc7cc555b	Removed extra column name from geliHeaderString that was mislabeling the 10 genotype likelihoods by shifting them over by onex git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1345 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 21:42:02 +00:00
aaron	0087234ed7	small code cleanup, a couple of little changes to SSGGenotypeCall git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1343 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 19:47:37 +00:00
ebanks	fbc7d44bc7	don't allow users to input priors anymore; they should be using heterozygosity and having the SSG calculate priors. Note that nothing was changed for dnSNP/hapmap priors (not sure what we want to do with these yet - any thoughts?) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1342 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 19:10:33 +00:00
ebanks	b282635b05	Complete reworking of Fisher's exact test for strand bias: - fixed math bug (pValue needs to be initialized to pCutoff, not 0) - perform factorial calculations in log space so that huge numbers don't explode - cache factorial calculations so that each value needs to be computed just once for any given instance of the filter I've tested it against R and it has held up so far... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1341 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 18:52:13 +00:00
aaron	4033c718d2	moving some code around for better organizations, some fixes to the fields out of SSG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1340 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 15:09:43 +00:00
ebanks	4366ce16e0	Made sure all RODs have a (good) toString() method - and use it in the Venn walker. (thanks, Mark) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1339 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 14:53:27 +00:00
aaron	9cd53d3273	some initial changes from the first review of the genotype redesign, more to come. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1338 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 07:04:05 +00:00
hanna	5429b4d4a8	A bit of reorganization to help with more flexible output streams. Pushed construction of data sources and post-construction validation back into the GATKEngine, leaving the MicroScheduler to just microschedule. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1336 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-29 23:00:15 +00:00
aaron	bca894ebce	Adding the intial changes for the new Genotyping interface. The bullet points are: - SSG is much simpler now - GeliText has been added as a GenotypeWriter - AlleleFrequencyWalker will be deleted when I untangle the AlleleMetric's dependance on it - GenotypeLikelihoods now implements GenotypeGenerator, but could still use cleanup There is still a lot more work to do, but this is a good initial check-in. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1335 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-29 19:43:59 +00:00
kiran	c5c11d5d1c	First attempt at modifying the VFW interfaces to support direct emission of relevant training data per feature and exclusion criterion. This way, you could run the program once, get the training sets, and then feed that training set back to the filters and have them automatically choose the optimal thresholds for themselves. This current version is pretty ugly right now... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1334 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-29 19:29:03 +00:00
ebanks	3554897222	allow filters to specify whether they want to work with mapping quality zero reads; the VariantFiltrationWalker passes in the appropriate contextual reads git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1333 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-29 17:38:15 +00:00
hanna	7a13647c35	Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. Very rough initial implementation, but should provide enough support so that people can stop creating SAMFileWriters in reduceInit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-29 16:11:45 +00:00
depristo	56f769f2ce	Output improvements to GenotypeConcordance calculations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1331 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-29 12:54:46 +00:00
ebanks	72dda0b85c	Fixed calculations for Mark git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1330 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-29 03:21:43 +00:00
ebanks	f0378db9b7	added accuracy numbers git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1329 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-29 01:38:33 +00:00
ebanks	a5a56f1315	At this point, we are convinced that the new priors are the way to go... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1328 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-28 17:25:25 +00:00
depristo	df4fd498c5	Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1327 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-28 13:21:38 +00:00
depristo	46643d3724	Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1326 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-28 13:21:27 +00:00
ebanks	3c4410f104	-add basic indel metrics to variant eval -variants need a length method (can't assume it's a SNP)! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1324 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-28 03:25:03 +00:00
kcibul	1d6d99ed9c	walk by reference git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1323 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-27 20:21:04 +00:00
ebanks	089ae85be7	1. output grep-able strings for genotype eval 2. free DB coverage from isSNP restriction git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1322 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-27 17:36:59 +00:00
kcibul	1bca9409a4	calculate freestanding intervals git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1321 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-27 16:40:27 +00:00
asivache	2499c09256	added minIndelCount (short: minCnt) command line argument. The call is made only if the number of reads supporting the consensus indel is equal or greater than the specified value (default: 0, so only minFraction filter is on in default runs!) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1320 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-27 15:22:51 +00:00
ebanks	73ddf21bb7	SNPs no longer fail this filter if they are actually hom in reads git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1319 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-27 15:20:43 +00:00
asivache	f2b3fa83ac	fix for another bug found by Eric: some indels were printed into the output stream twice (when there's another indel within MISMATCH_WINDOW bases and that other indel requires delayed print in order to accumulate coverage) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1318 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-27 15:07:07 +00:00
asivache	5eca4c353c	IndelGenotyper now uses GATK::getMergedReadGroupsByReaders() to sort out which read in the merged stream is for normal, and which is for tumor (in --somatic mode, apparently) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1316 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-24 23:01:18 +00:00
asivache	64221907a2	fixed a bug found by Eric: genotyper would crash in the case of an indel too close to the window end, with the next read mapping sufficiently far away on the ref git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1313 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-24 21:00:31 +00:00
hanna	df44bdce7d	Retire the pooled caller...its been eclipsed by other walkers in the tree. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1310 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-24 14:49:03 +00:00
kiran	884806fc16	Broken and unused. It goes away now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1309 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-24 14:26:52 +00:00
ebanks	d044681fbe	change paths to new ones git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1308 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-24 07:28:43 +00:00
ebanks	59f0c00d77	-set indel cleaning walkers to be in core package -move Andrey's alignment utility classes to core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1307 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-24 05:23:29 +00:00
kiran	bb20462a7c	A better way: down-scale second-base ratios until the infinities disappear. This way, high-coverage sites don't cause binomialProbability to explode. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1306 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-24 03:02:00 +00:00
kiran	038cbcf80e	If the result from the secondary-base test is 0.0, replace the result with a minimum likelihood such that the log-likelihood doesn't underflow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1303 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-23 20:59:52 +00:00
kiran	093550a3f2	Removed secondary-base test from SingleSampleGenotyper. It now lives in the variant filtration system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1302 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-23 20:58:41 +00:00
ebanks	477502338f	moved major indel cleaning pieces to core (yippee!) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1301 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-23 19:59:51 +00:00
ebanks	4efe26c59a	Major: allow genotyper to optionally output in 1KG format, including outputting the samples in which indels are found. Minor: refactor 454 filtering git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1300 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-23 19:53:51 +00:00
ebanks	f8b1dbe3b3	getBestGenotype() does not necessarily return hets in alphabetical order; the string (unfortunately) needs to be sorted for lookup in the table (otherwise we throw a NullPointerException) TO DO: have the table be smarter instead of sorting each genotype string git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1298 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-23 01:58:47 +00:00
ebanks	ee8ed534e0	print full genotype for alt allele git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1297 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-23 01:35:23 +00:00
depristo	9c12c02768	AlleleBalance and on/off primary base filters -- version 0.0.1 -- for experimental use only git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1294 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-22 17:54:44 +00:00
ebanks	c54fd1da09	Beautify the genotype concordance printouts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1291 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-22 02:53:02 +00:00
hanna	1843684cd2	Cleanup: GATKEngine no longer needs to be lazy loaded, b/c the plugin directory no longer exists. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1287 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-21 18:50:51 +00:00
hanna	b43925c01e	Switched to Reflections (http://code.google.com/p/reflections/) project for inspecting the source tree and loading walkers, rather than trying to roll our own by hand. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1286 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-21 18:32:22 +00:00
kiran	436a196e2b	Bug fixes to support hapmap genotyping concordance. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1285 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-21 16:20:10 +00:00
depristo	7e04313b4e	Bug fixes and improvements to CoverageHistogram. Now displays the frequency of the bin. Also correctly prints out the last element in the coverage histogram (<= vs. <) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1284 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-21 11:55:05 +00:00
aaron	b4adb5133a	GLF rod as a AllelicVariant object. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1282 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-21 00:55:52 +00:00
kiran	f314ef8d84	Features and exclusion criteria are now instantiated in VariantFiltrationWalker's initialize() method, rather than in every map() call. This means the features and exclusion criteria will only ever be initialized once. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1281 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-20 22:47:21 +00:00
mmelgar	8da754eb4e	First implementation of a primary base filter. Assumes distribution of on/off bases is distributed according to a binomial. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1278 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-17 18:43:35 +00:00
ebanks	24ebfee604	don't print traversal stats git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1277 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-17 16:13:28 +00:00
ebanks	f978b04633	A very simple walker to print out (using the ROD's toString method) all of the RODs it sees. This is the easiest solution to get around the (temporary) bug of reads being seen multiple times by reads walkers when close intervals are passed to them (i.e. process full contigs and then use a ref walker to filter the ones within your intervals of choice) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1273 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-17 14:03:34 +00:00
hanna	df1c61e049	Re-add the plugin path. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1271 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-16 22:48:44 +00:00
hanna	7c30c30d26	Cleaned up some duplicate code in preparation for making plugin dir configurable. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1270 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-16 22:02:21 +00:00
depristo	31f3f466ca	Improvements to support GLF generation -- now correctly handles GLF git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1269 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-16 21:10:39 +00:00
depristo	0548026a2e	Now understanding GLFs for calculating genotyping results like callable bases, as well as avoids emitting stupid amounts of data when doing a genotype evaluation (i.e., ignores non-SNP() calls) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1267 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-16 21:03:26 +00:00
depristo	c5f6ab3dd5	CoverageHistogram now sees 0 coverage sites git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1266 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-16 20:58:41 +00:00
ebanks	8bc0832215	Generate chip concordance table. This should work, although I need to test it with some real GLFs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1265 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-16 17:44:47 +00:00
kcibul	e1055bcc4c	moving to new external repository git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1261 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-15 20:46:08 +00:00
kcibul	4a730adfc1	committing latest changes before moving repositories git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1260 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-15 20:44:02 +00:00
ebanks	a245ee32fa	A walker to split 2 call sets into their intersection/union/disjoint (sub)sets. Yes, the name is retarded, but I'm under pressure here... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1258 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-15 20:20:47 +00:00
kcibul	00d49976fb	committing latest changes before moving repositories git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1255 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-15 18:41:52 +00:00

... 5 6 7 8 9 ...

1096 Commits (ffeb3fd80dfccaf00a96d2009f326829c1ce1fdd)