gatk-3.8

Commit Graph

Author	SHA1	Message	Date
ebanks	6b5c88d4d6	The GATK no longer writes vcf3.3; welcome to the world of vcf4.0. Needed to fix a few output bugs to get this to work, but it's looking great. Much more still to come. Guillermo: hopefully this doesn't break your local build too badly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3786 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 04:56:58 +00:00
chartl	9d2a485532	Update to AminoAcidTransition eval module git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3783 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-13 17:12:03 +00:00
ebanks	6442dabf94	Deleting/archiving as instructed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3779 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-13 15:23:50 +00:00
ebanks	221e01fb27	deleting/archiving as instructed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3765 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-12 16:59:45 +00:00
aaron	3347d1ca7c	part one of combining format and info header lines code into a single abstract class for Mark; plus some 'm' removals from access methods for Eric. Adding fixes for CombineVariants next. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3719 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-05 05:57:58 +00:00
delangel	b6bdd61283	a) Fix bug when multi-base reference is homopolymeric when writing a VCF4.0 variant context: computation of number of trailing bases was incorrect and we ended up with incorrect position. b) Updated VCF4WriterTestWalker to take either VCF3 or VCF4 as inputs (this walker can also be used to convert from 3.3 to 4.0). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3711 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-02 15:19:42 +00:00
hanna	4995950d04	IndexedFastaSequenceFile is now in Picard; transitioning to that implementation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3701 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-01 04:40:31 +00:00
hanna	c9d5345150	Redo StratifiedAlignmentContext to use ReadBackedPileup's stratification options. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3699 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-01 02:46:05 +00:00
chartl	610cc7ae2b	Cool package trick Kiran showed me. VariantEvaluator no longer public, AAT specifies the core package even though it lives in oneoffs. Disabled so integration tests pass. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3677 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-29 22:42:04 +00:00
chartl	9ac13b8f5d	Name and body change for this module to reflect local code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3675 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-29 21:45:26 +00:00
aaron	844cb2ed33	fixing a bug that Eric found with RODs for reads, where some records could be omitted. Sorry Eric! Also putting more tolerance into the timing on the tibble index tests (that check to make sure we're deleting out of date indexes, and not deleting perfectly good indexes). It seems that some of the farm nodes aren't great with a stopwatch. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3674 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-29 21:38:55 +00:00
chartl	101c27294d	Comment this guy out so we build again. (Hate it when my repository goes all funky.) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3673 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-29 21:16:33 +00:00
chartl	3017f82550	Initial commit of items for analyzing amino acid transitions in variant eval. Blew up my subversion by coding locally while i did not have internet. I hope this doesn't bust any integrationtests since I changed no existing code but...who knows. Crossing my fingers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3672 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-29 20:57:18 +00:00
delangel	3ca2b7374b	Fixes to better deal with the "Type" and "Number" field in the INFO and FORMAT header lines in VCF4.0. We now record these fields and provide appropriate conversions. This is the first version that passes fully the VCF validator. Also, moved the flag indicating VCF4.0 to the VCFWriter constructor. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3669 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-29 16:43:00 +00:00
delangel	ed71e53dd4	1) Initial complete version of VCF4 writer. There are still issues (see below) but at least this version is fully functional. It incorporates getting rid of intermediate VCFRecord so we now operate from VariantContext objects directly to VCF 4.0 output. See VCF4WriterTestWalker for usage example: it just amounts to adding vcfWriter.add(vc,ref.getBases()) in walker. add() method in VCFWriter is polymorphic and can also take a VCFRecord, lthough eventually this should be obsolete. addRecord is still supported so all backward compatibility is maintained. Resulting VCF4.0 are still not perfect, so additional changes are in progress. Specifically: a) INFO codes of length 0 (e.g. HM, DB) are not emitted correctly (they should emit just "HM" but now they emit "HM=1"). b) Genotype values that are specified as Integer in header are ignored in type and are printed out as Doubles. Both issues should be corrected with better header parsing. 2) Check in ability of Beagle to mask an additional percentage of genotype likelihoods (0 by default), for testing purposes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3664 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-28 23:54:38 +00:00
chartl	20f5fdbcf7	Changes to MVC to make the the header of its output VCF compliant with spec (give expected # of values for info field annotations) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3660 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-28 18:33:23 +00:00
aaron	682f9b46c6	Two fixes together: 1) Some improvements to the VCF4 parsing, including disabling validation. 2) Reimplemented RefSeq in the new Tribble-style rod system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3630 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-24 22:17:03 +00:00
chartl	75d4736600	Committing changes to comp overlap for indels. Passes all integration tests; minor changes to MVC walker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3618 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-23 15:49:13 +00:00
aaron	a6d3e4bd47	Add code to allow reference alleles with 'N' in VariantContext, but not in the alternate allele(s). Also more updates to the VCF 4 code (fixed parsing for files without genotypes). This check-in will temperarly break the build (I need to see if Bamboo is correctly returning the log file for the failed builds). Will be fixed once Bamboo starts building. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3609 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-22 18:26:37 +00:00
aaron	32f324a009	incremental changes to the VCF4 codec, including allele clipping down to the minimum reference allele; adding unit testing for certain aspects of the parsing. Not ready for prime-time yet. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3604 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-22 06:31:05 +00:00
hanna	f18ac069e2	A refactoring / unification of ReadBackedPileup and ReadBackedExtendedEventPileup. Provides a cleaner interface with extended events inheriting all of the basic RBP functionality. Implementation is still slightly messy, but should allow users to provide separate implementations of methods for sample split pileups and unsplit pileups for efficiency's sake. Methods not covered by unit/integration tests have not been sufficiently tested yet. Unit tests will follow this week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3597 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-20 04:42:26 +00:00
chartl	f44d8b150f	Mendelian Violation Classifier now filters violations on the fly via command line arguments; and closes unterminated homozygous regions at the end of a chromosome (so we see arms falling off in the file, rather than in the log) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3592 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-18 19:32:24 +00:00
ebanks	b75ded61b8	Removing obsolete rod; no longer needed given previous addition to SampleUtils. JIRA GSA-318 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3572 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-16 20:03:14 +00:00
weisburd	1e42984a16	Improved buffer-size arg handling git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3553 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-14 19:59:15 +00:00
kiran	804facb0cc	Removing these utilities as part of a hostage negotation with Matt. Can I have my journal club paper now?! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3539 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-11 21:41:29 +00:00
weisburd	338bb9adf4	CommandLineProgram for measuring java I/O speeds for large plain-text or gzipped files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3532 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-10 21:34:37 +00:00
chartl	20167fd411	Final changes to MVC -- associates variants with regions of homozygosity in child and parents, corrects for genotype errors, and prints out a separate file with informationf or each region of homozygosity. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3521 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 18:05:37 +00:00
ebanks	9b2fcc4711	Refactoring of the annotation system: 1. VA is now a ROD walker so it no longer requires reads (needs a little more testing) 2. Annotations can now represent multiple INFO fields (i.e. sets of key/value pairs) 3. The chromosome count annotations have been pulled out of UG and the VCF writer code and into VA where they belong. Fixed the headers too. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3513 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 17:05:51 +00:00
chartl	8f9e3e8ad7	Commit for Kiran; but this is now working, barring little exceptions that I've yet to run across... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3511 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 14:21:19 +00:00
chartl	736098b58d	A quick commit before running home. This is a re-factored version of the OppositeHomozygoteClassifier which will work with deNovo violations as well. Some code still needs to be migrated from OHC which is wy that walker isn't yet deleted. This'll be up and running tonight. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3502 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-08 20:47:01 +00:00
chartl	933133ee28	Initial commit of the opposite homozygote classifier. Currently does the following, given a trio vcf: + Identifies opposite homozygote sites + Identifies the parent from whom it is expected that a null allele was inherited (or whether it was a putative genotype error; e.g. mom=homref, dad=homref, child=homvar) + Labels each opposite homozygote with its homozygous region in the child (e.g. region 1, region 2) + Labels each opposite homozygote with the size of the homozygous region in which it was found, the number of child homozygotes in the region, and the number of opposite homozygote violations within that region To come: + Classification of sites as likely tri-allelic Note that this is very experimental git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3498 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-08 03:56:07 +00:00
delangel	ef47a69c50	a) First fully functional (sort of) version of walker that parses Beagle imputation output files and produce a vcf with imputed genotypes. More doc/info to follow shortly. Issues still to be solved: a) Walker changes all genotypes based on Beagle data, but annotations on the original VCF are unchanged. They should in theory be recomputed based on new genotypes. b) Current implementation is ugly, dirty unwieldy and will necessitate a refactoring soon so I can keep my pride. Most aesthetically affronting issue right now is that we read the full Beagle files at initialization and keep them in memory, but a more delicate implementation would just read from files on a marker by marker basis. Issue that currently prevents this is that BufferedReader() instances don't seem to play nice when called from the map() function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3488 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-04 20:37:25 +00:00
depristo	b811e61ae1	Optimized, nearly complete VCF4 reader 2-4x faster than the previous implementation, along with a VCF4 reader performance testing walker that can read 3/4 files, useful for benchmarking git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3487 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-04 18:11:38 +00:00
aaron	0b03e28b60	updating the tribble library to include the reference dictionary reading / writing. We now check the dictionaries of any tracks that have them against the reference (all new tribble tracks and out-of-date tracks will have this). Also renamed some classes to be more reflective of their function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3485 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-04 06:34:26 +00:00
ebanks	ffeb3fd80d	Thanks to Guillermo, I found a bug in the Unified Genotyper output: GL was posteriors instead of likelihoods. Not a huge deal because the priors were flat, but fixed nonetheless. Also, needed to update Tribble. Minor updates to the Beagle input maker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3461 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-28 19:28:26 +00:00
aaron	871cf0f4f6	Call out ROD types by there record type, instead of the codec type (which was clumsy). So instead of: @Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFCodec.class)) you'd say: @Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFRecord.class)) Which is more in-line with what was done before. All instances in the existing codebase should be switched over. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3457 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-28 14:52:44 +00:00
chartl	ff4a0764df	Read error rate is now parallelizable git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3447 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-27 19:00:09 +00:00
delangel	3873dccb35	First fully functional (though preliminary) version of walker that takes an input VCF and outputs a Beagle .bgl file that can be used for missing genotype calls/haplotype imputation. For now, only supported input format is likelihood format for unrelated individuals. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3444 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-26 21:03:23 +00:00
chartl	f9efc1248c	VariantEvalWalker now takes indels if you throw the -dels flag. IndelLengthHistogram appears to be working properly, it is turned off by default (as it is experimental) but you can turn it on in your own repository. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3443 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-26 20:03:14 +00:00
chartl	88a06ad81f	Changes to Depth of Coverage: - For speedup in large number of samples, base counts are done on a per read group level, then merged into counts on larger partitions (samples, libraries, etc) + passed all integration tests before next item - Added additional summary item, a coverage threshold. Set by (possibly multiple) -ct flags, the summary outputs will have columns for "%_bases_covered_to_X"; both per sample, and per sample per interval summary files are effected (thus md5s changed for these) NOTE: This is the last revision that will include the per-gene summary files. Once DesignFileGenerator is sufficiently general, and has integration tests, it will be moved to core and the per-gene summary from Depth of Coverage will be retired. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3437 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-26 03:39:22 +00:00
hanna	a40e64e47b	A downsampling validator. Compares the generated pileup passed in from the alignment context to the reads, passed in as a Tribble SAM text feature. If the generated pileup contains a valid set of reads according to the downsampling rules, the test passes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3421 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-23 21:49:54 +00:00
delangel	355396109b	Bug fix to avoid build failure (class changed under me??) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3416 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-21 18:48:56 +00:00
delangel	1753d07b02	Added AnnotationByAlleleFrequencyWalker - walker takes an input vcf, a reference vcf and a list of annotations (with the -A argument). For each site present in both VCF's, it outputs the given annotations into the screen as well as allele frequency. Since HapMap vcf reference doesn't include AF in annotations, it computes it from Chromosome, Het and HomVar counts. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3415 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-21 18:31:34 +00:00
depristo	a10fca0d5c	Genotyper now is using bytes not chars. Passes all tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3406 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 21:02:44 +00:00
depristo	1ab00e5895	Retiring multi-sample genotyper git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3401 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 14:10:56 +00:00
depristo	727822adb4	BaseUtils has more clear distinction between byte and char routines. All char routines are @Depreciated now. Please use bytes. Better organization of reverse(), now in Utils not BaseUtils. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3400 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 14:05:13 +00:00
depristo	8a725b6c93	Restructuring of ReferenceContext and ReadWalkers to accept a ReferenceContext. Now ReferenceContext is byte[] backed not char[]. Please no more chars for the reference. All of the tests pass now. Coming check-ins are going to clean up the char / byte problems in the GATK git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3397 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 23:27:55 +00:00
aaron	02cc1afdc8	remove RodBed and all it's dependencies. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3396 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 19:12:30 +00:00
chartl	ffb1b46166	Added a GCCalculatorWalker for a oneoff analysis for Mark Daly (GC content of agilent 1.1 targets) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3395 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 18:49:51 +00:00
chartl	b7d21627ab	Changes to DepthOfCoverage (JIRA items) and added back an integration test to cover it. Alterations to the design file generator to output all transcripts (rather than choosing one at random). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3366 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-17 17:23:00 +00:00

1 2 3 4

192 Commits (6b5c88d4d6fe214c4ffe79bc23d96195a9ef3ec9)