gatk-3.8

Commit Graph

Author	SHA1	Message	Date
ebanks	74751a8ed3	-Some minor fixes to get accurate vcf record merging done -Improvement to snp genotype concordance test And with that, it looks like I get revision #2000. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2000 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 06:40:55 +00:00
ebanks	ab705565cf	Completely refactored the Callset Concordance code. Now, it takes in VCF rods and emits a single VCF file which has merged calls from all inputs and is annotated (in the INFO fields) with the appropriate concordance test(s). Still needs a bit of polish... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1999 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 05:03:13 +00:00
ebanks	bc6f24e88f	Added VCFUtils which contains some useful VCF-related functions (e.g. ability to merge VCF records). Also, various minor improvements. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1998 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 04:53:32 +00:00
ebanks	cff645e98b	convenience method to deal with genotypes that are unsorted (e.g. CA vs. AC) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1997 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 04:45:49 +00:00
kiran	7fde6c0bf4	One more output tweak. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1996 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 04:42:55 +00:00
kiran	00a7113d7a	Tweaks to formatting of output table. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1995 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 04:33:36 +00:00
ebanks	7ce0df76f8	Added accessors to the rod data sources so that walkers can access the name/file/type triplets for input rods. This is necessary if e.g. you want to create a vcf writer based on all of the samples being input. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1994 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 04:25:39 +00:00
ebanks	d07f3bb6f6	Added methods to get strand bias and to test if record has allele freq or bias fields set. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1993 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 04:20:35 +00:00
kiran	3313b0ddb4	Fixed a minor bug where the lodThreshold wasn't being printed in the header. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1992 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 16:51:36 +00:00
kiran	95d381efe2	Optionally computes the error rate using the best base and a random base. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1991 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 16:47:34 +00:00
kiran	567f5758d2	Optionally lists read depths by read group. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1990 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 16:39:19 +00:00
kiran	a679bdde18	FindContaminatingReadGroupsWalker lists read groups in a single-sample BAM file that appear to be contaminants by searching for evidence of systematic underperformance at likely homozygous-variant sites. Procedure: 1. Sites that are likely homozygous-variant but are called as heterozygous are identified. 2. For each site and read group, we compute the proportion of bases in the pileup supporting an alternate allele. 3. A one-sample, left-tailed t-test is performed with the null hypothesis being that the alternate allele distribution has a mean of 0.95 and the alternate hypothesis being that the true mean is statistically significantly less than expected (pValue < 1e-9). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1989 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 16:36:39 +00:00
kiran	2225d8176e	A convenience class for maintaining a dynamically growing table of values with access to the elements by named row and column identifiers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1988 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 16:34:35 +00:00
hanna	21c5f543fa	Fix sharding bug -- loci to which >100,000 (= 1 shard) reads are assigned an alignment start will confuse the sharding system and cause it to return duplicate reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1987 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 14:27:26 +00:00
rpoplin	84ba604611	Sequential quality score calculation is now in place in the refactored recalibrator and matches the quality scores calculated by the old recalibrator exactly; at least on the small sets of data used so far. Validation, documentation, and optimization work is on going. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1985 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-07 15:55:16 +00:00
depristo	bf1bc94060	Fixes for PooledConcordance bugs and lack of safety checking git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1984 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-07 01:54:10 +00:00
rpoplin	66d4a995e6	Initial check in of refactored Recalibrator. The new walkers are called CountCovariatesRefactored and TableRecalibrationRefactored. More work is needed to finish up the sequential calculation and to document the code sufficiently. These files are not ready to be used by other people quite yet. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1982 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-06 22:33:55 +00:00
ebanks	6fdfc97db6	Added optional field DP to VCF output for Mark. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1981 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-06 20:03:22 +00:00
ebanks	0a55fa5bb1	Completely refactored the Genotype Concordance module(s). Now PooledConcordance and GenotypeConcordance inherit from the same super class (and can therefore share data structures and functionality). Also, they now use ConcordanceTruthTable to keep track of necessary info. GenotypeConcordance passes integration tests. PooledConcordance needs to be finished by Chris. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1979 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-06 16:27:16 +00:00
ebanks	d549347f25	Refactored GenotypeLikelihoods to use an underlying 4-base model. It needs to be modified a bit and then hooked up to a pooled model, but that is now possible. At this point, there is no difference to the Unified Genotyper. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1978 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-05 21:59:25 +00:00
jmaguire	4d3871c655	don't flush anymore. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1977 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-05 19:11:51 +00:00
aaron	aacd72854f	a fix for a bug Andrey discovered: in read-based interval traversals we're dupplicating reads in rare cases. The problem was that to accomidate a bug in SAM JDK indexing, we were forced to add one to the stop of our QueryOverlapping() calls to ensure we always got all of the overlapping reads. Added a PlusOneFixIterator that wraps other iterators, and eliminates reads that start outside of our intended interval (interval stop - 1). Updated and checked BamToFastqIntegrationTest MD5 sums. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1976 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-05 05:26:33 +00:00
hanna	43c3ee61d5	Fix minor mapping quality bug. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1973 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-04 14:33:23 +00:00
ebanks	a545859c62	Joint Estimation model now emits a reasonable slod git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1969 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-03 21:12:42 +00:00
ebanks	11d950abe0	No longer allow the lod_threshold argument - use confidence instead. Have UG output qscores in all cases. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1968 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-03 16:18:51 +00:00
asivache	2fb45dbd73	Make window size a command line argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1967 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-03 16:13:35 +00:00
asivache	55f61b1f88	Bug fix in adjustment of the shift position. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1966 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-03 16:08:11 +00:00
depristo	5d5dc989e7	improvements to VCF and variant eval support of VCF -- now listens to the filter field git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1963 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-03 12:09:30 +00:00
hanna	c63af32fc7	The BWA/C bindings were triggering the local aligner to repeatedly reload the ref genome. Make sure the reference genome is cached. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1961 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-03 00:01:55 +00:00
ebanks	3a33401822	2nd stage of the genotyper output refactoring is complete. Now, all output is generalized and all of the intelligence lies where it is supposed to. Next stage is syncing up old and new models and making sure we're outputting exactly what we should. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1960 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-02 22:43:08 +00:00
aaron	ba67c7f02b	added a warning for those using bed files; we properly convert bed to the internal representation but the user needs to be aware that any output will be one-based closed intervals git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1959 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-02 21:09:18 +00:00
aaron	b71b66bd88	the underlying parameter is a float so we need to use Float.valueOf() instead; Noticed by external user Hou Huabin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1958 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-02 20:22:25 +00:00
hanna	5a510e6d98	New PackageUtils interferes with the packaging utility. Revert until Aaron and I can get together to make this work. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1957 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-02 19:14:14 +00:00
aaron	de6ae51f7e	Scala walkers can now be build and run like any other walker in the GATK. Added the getUrlsForClasspath to PackageUtils, the Reflections package isn't getting the manifest files from jars in the classpath, and so we weren't seeing any walkers outside of the GenomeAnalysisTK.jar. A couple of notes: -Commented out BaseTransitionTableCalculator.scala because it's won't build; Chris could you fix this one (or kill it if it's not needed). -Removed the PrintReadsScala walker; moved the code over to a ScalaCountLoci walker (which is what the code was really doing). -Added configurations items to the ivy xml file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1956 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-02 06:02:41 +00:00
hanna	1896f334d9	Fixed collection of bugs in reads aligning to multiple locations. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1955 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-02 04:02:09 +00:00
ebanks	af6d0003f8	-Generalized the GenotypeConcordance module to deal with any number of individuals (although it will default to its old behavior if the -samples argument is left out). -Make rods return the appropriate type of Genotype calls from getGenotype(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1954 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-01 05:35:47 +00:00
hanna	b95165e39c	Make alignment (temporarily) part of main GenomeAnalysisTK.jar. Add some extra logging errors on failure. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1953 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-01 00:33:18 +00:00
asivache	4b0796ba58	After fixing a few glitches and bugs, this version finally works as intended git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1952 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-31 04:59:58 +00:00
depristo	7d0ac7c6f2	Fix for long-term VariantEval bug plus new intergration test to catch it git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1951 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-31 00:00:33 +00:00
asivache	ea8d5c7077	Some internal refactoring. Now "safely" ignores duplicate records (NOT duplicate reads but rather malformed bam files!) resulting from the bug/feature in CleanedReadInjector. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1949 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 17:50:51 +00:00
hanna	a3da475c88	Documentation and cleanup. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1946 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 15:40:28 +00:00
hanna	2d15891719	Created walkers for alignment, validation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1945 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 15:04:07 +00:00
ebanks	51fffc7f69	Comments for Ryan (which also apply to ReadQualityScoreWalker). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1944 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 14:44:04 +00:00
ebanks	ccd7440730	We can actually make this a bit simpler (and faster) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1943 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 04:21:03 +00:00
ebanks	1b6333e4ab	Enough people have asked for this that it just needed to get written. One can now split up any number of sets into an N-way Venn (although it doesn't check for discordance in the calls, so you'll still want to use SimpleVenn for 2-way comparisons). Wiki docs are updated. To do: update to use Ryan's generic hash map when it's ready for public use. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1942 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 04:08:45 +00:00
ebanks	4bdb5b03bd	tell UnifiedGenotyper to return calls at all bases git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1941 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 03:10:44 +00:00
ebanks	4ee1d6f733	-Have the calculation models determine whether a call passes the lod/confidence thresholds (as opposed to returning everything and letting the UG decide); this way, walkers which call map() will get only the good calls. -Do the right thing in all models for all-base-mode (for Kiran). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1940 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 02:35:51 +00:00
ebanks	64ac956885	Okay, I caved in: CallsetConcordance now gets possible concordance types by looking at classes that implement ConcordanceType instead of having them hard-coded in. Thanks to Kiran this was pretty easy... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1939 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 00:32:26 +00:00
hanna	1f0d852a48	Fix bug where alignments with indels would be busted because bwa reverses the read bases to undo a previous read base reverse that doesn't occur in the libbwa codepath. Also fixed some memory management issues. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1938 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 21:33:13 +00:00
asivache	e3b4d4cbed	Genotyper reimplemented. Does the same thing, at least for now, but internal data structures redesign enables collecting various statistics for indel-containing/reference-matching reads. The statistics are not yet used by the caller itself to make a better judgement w.r.t. the validity of the calls it makes, but they are now printed into the output stream (--verbose). The statistics (for both normal and tumor) include: indel observation count/total coverage, av. number of mismatches per indel-containing and per ref-matching read, av. mapping quality, av. mismatch rate and av. base quality within an NQS windoew around the indel, numbers of indel and ref observations per strand. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1936 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 19:09:16 +00:00
hanna	f04b80d7db	Fixed epic memory leak. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1934 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 16:32:43 +00:00
ebanks	2b96b2e4e7	better multi-sample integration test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1933 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 13:51:51 +00:00
ebanks	1c4ca9d383	-Mark just reminded me: actually force the ref/loc to be immutable -VCF writer should be blind to the score/confidence/lod value - just print the thing out as is git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1932 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 13:41:53 +00:00
ebanks	5cdbdd9e5b	now that the design is stable, pull the setReference and setLocation methods back out of Genotype and stick them into constructors of implementing classes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1931 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 13:27:37 +00:00
ebanks	3091443dc7	Sweeping changes to the genotype output system, as per several discussions with Matt & Aaron. Some things still need to be changed, but it will entail some more design decisions first (which means I get to bug M&A again tomorrow!). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1930 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 03:46:41 +00:00
depristo	86573177d1	Reverting rod walkers to use underlying refwalker implementation while we work on ROD2 and reenable the system. Added some serious sparse file parsing to variant eval tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1929 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 01:04:37 +00:00
hanna	c9a3707cfd	Initial version of BWA/C bindings. Still lots of squirrels roaming the code. - Some cigar strings aren't right. - Memory leaks. - BWA codebase changes aren't committed to BWA tree. - Aligner interface butchered to support BWA/C-style alignments. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1928 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 21:37:49 +00:00
chartl	c4359bc340	Whoops. Forgot the implements. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1927 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 19:59:57 +00:00
aaron	5a3bd50537	adding error log reporting to the GATK, and a stream based output method for the argument collection git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1926 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 19:56:05 +00:00
chartl	863d3023d5	IndelCounterWalker -- a new little walker that counts indels over a region (want to see what kind of havoc BWA may be resulting in). Don't know when BasicPileup.indelPileup() was written, but kudos to whoever wrote it. BTTJ - remove 'N's from previous base analysis -- even if both read and ref are 'N' (which does happen, occasionally) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1925 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 19:50:50 +00:00
aaron	04e9a494e9	removed the GenotypesBacked interface, which is currently unused. Also cleaned up some documentation lines git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1924 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 18:08:14 +00:00
rpoplin	06ff81efe5	Added NeighborhoodQualityWalker.java and ReadQualityScoreWalker.java which are used to calculate a read quality score based on attributes of the read and the reads in the neighborhood. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1922 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 13:24:11 +00:00
depristo	68fa6da788	Initial graph-based reference implementation and alignment assessor. Not suitable for public use git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1921 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 21:54:47 +00:00
depristo	31d143a841	now only needs READS git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1920 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 21:54:14 +00:00
depristo	ef2ea79994	code cleanup and containsStartPosition function git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1919 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 21:53:40 +00:00
depristo	186a8dd698	Trivial protection for null value git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1918 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 21:52:52 +00:00
depristo	be333da9c0	charSeq2byteSeq -- convert a char[] to a byte[] for convenience git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1917 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 21:52:23 +00:00
chartl	4192b093b8	More robust error handling with parallelization + usePreviousBase. Added forceReadBasesToMatchRef to use in conjunction with nPreviousReadBases as a less stringent approximation of usePreviousBases (requiring previous pileups only had mismatches, and that read mapping quality be high was throwing everything away) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1916 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 17:20:44 +00:00
chartl	31d5df2859	Previous base now checks that the read matches the reference in the previous base window. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1915 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 15:58:20 +00:00
depristo	726378be8b	Almost ready to stop doing eagar decoding; waiting on Eric git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1914 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 15:28:05 +00:00
ebanks	e96b1791ab	Need to check for biallelic snp or exception gets thrown. Also, update to new tracker calls. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1913 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 02:43:43 +00:00
aaron	3fb3773098	a fix for traverse dupplicates bug: GSA-202. Also removed some debugging output from FastaAltRef walker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1912 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-26 20:18:55 +00:00
hanna	a1e8a532ad	Support for initialize() and onTraversalDone() output from parallelized walkers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1911 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-26 20:18:31 +00:00
chartl	62c1001790	BTTJ is now correct. What a terrible waste of time, turns out I'd just reversed the header. Because of this the MD5 had to be updated in the tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1910 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-26 19:24:18 +00:00
sjia	24c7f694e6	Handles allele frequencies for any specified population, changed user input for mismatch filter options git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1909 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-25 22:51:56 +00:00
chartl	db9419df49	@ Hack to allow output from onTraversalDone() git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1908 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-25 15:19:04 +00:00
ebanks	75ad6bbef7	Check that map isn't being called passing in null arguments. (This seems wrong; see JIRA entry GSA-211) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1907 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-25 02:30:36 +00:00
depristo	b4f55df600	Bugfix for Jason F git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1906 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-24 22:09:27 +00:00
hanna	65b98470f3	Temporary fix: have RodLocusView manage and close its RODs. Really the relationship between these two classes needs to be rethought; see JIRA GSA-207. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1904 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-23 16:00:12 +00:00
aaron	ad1fc511b1	intermediate commit for some changes in the Variation system, so Eric can go ahead with his changes. Everything is pretty set, but the Variation interface could use a convenience method that joins all the alternate alleles. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1903 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-23 06:31:15 +00:00
ebanks	6c338eccb8	Joint Estimation model now emits calls in all formats. The whole GenotypeCall framework needs to be changed, but this will work for the time being. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1902 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-23 03:07:28 +00:00
chartl	a6dc8cd44e	BTTC is now Tree Reducible allowing for parallelization. Integration test comment changed to reflect actual date of last md5 update. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1901 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-22 23:19:29 +00:00
hanna	2e552eb5a1	Validates intervals against sequence dictionary header bounds. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1900 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-22 19:31:15 +00:00
ebanks	54c61c663c	-Cleanup of the Joint Estimation code -Don't print verbose/debugging output to logger, but instead specify a file in the argument collection (and then we only need to print conditionally) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1899 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-22 15:25:29 +00:00
asivache	2cab4c68d4	Added method: isCodingExon(). Returns true if position is simultaneously within an exon AND within coding interval of any single transcript from the list. The old method of detecting coding positions as isExon() && isCoding() is buggy, as the position could be in the UTR part of one transcript (isExon() is true), and within coding region bounds (but not in the exon) of another transcript (isCoding() is true). As a result UTR positions would be erroneously annotated as coding. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1898 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-22 14:55:07 +00:00
chartl	af761fb9bd	Base transition table now forces epsilon/3 (three-state) model for the unified genotyper. Verified to be identical with changing the default model to being epsilon/3. This of course changes the observed counts, so the integration test has been updated. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1897 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 21:18:26 +00:00
ebanks	55fa1cfa06	-Renamed new calculation model and worked out some significant xhanges with Mark -Allow walkers calling the UG to pass in their own argument collections git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1896 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 20:49:36 +00:00
chartl	8e3f72ced9	BTTJ - Code refactoring (major) - passes integration test VariantEvalWalker - whoops, wrote PooledGenotypeAnalysis rather than PooledAnalysis, now passes tests again - PooledFrequencyAnalysis - don't bother initializing matrices if this isn't a pool git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1895 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 19:04:51 +00:00
depristo	15a1849758	notes for chartl git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1894 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 18:31:31 +00:00
chartl	77863d4940	@PowerBelowFrequency + Changes to doc @ BasicPoolVariantAnalysis + use char rather than ReferenceContext + calculate # alleles @ PooledFrequencyAnalysis + breakdown of call metrics by estimated number of alleles in pool @ VariantEvalWalker + add PooledFrequencyAnalysis to analysis set @ PooledGenotypeConcordance + correctly calculate maximal allele frequency for output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1893 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 15:17:11 +00:00
chartl	967128035e	Make command like args default to false. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1892 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 13:59:35 +00:00
ebanks	9b9744109c	Mark's new unified calculation model is now officially implemented. Because it doesn't actually use EM, it's no longer a subclass of the EM model. Note that you can't use it just yet because it doesn't actually emit calls (just prints to logger). I need to deal with general UG output tomorrow. Hold off until then, Mark, and then you can go wild. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1891 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 02:39:23 +00:00
depristo	caa3187af8	Enabling correct high-performance ROD walker and moved VariantEval over to it. Performance improvements in variantEval in general. See wiki for full description git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1890 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 23:31:13 +00:00
chartl	4a8a6468be	Use read group as a condition for confusion tables. With an integration test. Changed BaseTransitionTable to comparable objects for consistent ordering of output ( e.g. so the integration test doesn't yell so much ) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1889 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 19:39:32 +00:00
chartl	b83df5616a	Change for lower-case references (always compare upper case bases) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1888 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 17:36:31 +00:00
chartl	3b1fabeff0	Major code refactoring: @ Pooled utils & power - Removed two of the power walkers leaving only PowerBelowFrequency, added some additional flags on PowerBelowFrequency to give it some of the behavior that PowerAndCoverage had - Removed a number of PoolUtils variables and methods that were used in those walkers or simply not used - Removed AnalyzePowerWalker (un-necessary) - Changed the location of Quad/Squad/ReadOffsetQuad into poolseq @NQS - Deleted all walkers but the minimum NQS walker, refactored not to use LocalMapType @ BaseTransitionTable - Added a slew of new integration tests for different flaggable and integral parameters - (Scala) just a System.out that was added and commented out (no actual code change) - (Java) changed a < to <= and a boolean formula Chris git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1887 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 14:58:04 +00:00
aaron	4be6bb8e92	added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums. For some reason my check-ins from home wouldn't work last night, so this is the actual changes for 1884. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1886 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 14:15:33 +00:00
depristo	449a6ba75a	Deleting lots of code as part of my cleanup. More classes tagged for removal. Many more walkers have their days numbered. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1885 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 12:23:36 +00:00
aaron	d749a5eb5f	added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1884 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 04:56:51 +00:00
ebanks	b8ab77c91c	Don't filter out reads without proper read groups. Instead, allow the user (or another walker calling UG) to specify an assumed sample to use (but then we assume single-sample mode). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1883 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 01:30:53 +00:00

1 2 3 4 5 ...

1722 Commits (43bd4c8e8fdac593ff4a866e47987e017d6b74e6)