gatk-3.8

Commit Graph

Author	SHA1	Message	Date
depristo	595907e98e	Moving StingException git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4262 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:34:15 +00:00
depristo	40e6179911	Penultimate step in exception system overhaul. UserError is now UserException. This class should be used for all communication with the USER for problems with their inputs. Engine now validates sequence dictionaries for compatibility, detecting not only lack of overlap but now inconsistent headers (b36 ref with v37 BAM, for example) as well as ref / bam order inconsistency. New -U option to allow users to tolerate dangerous seq dict issues. WalkerTest system now supports testing for exceptions (see email and wiki for docs). Tests for vcf and bam vs. ref incompatibility. Waiting on Tribble seq dict improvements to detect b36 VCF with b37 ref (currently cannot tell this is wrong. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4258 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:02:43 +00:00
depristo	8f1a32acae	All exceptions thrown by the GATK have been reviewed and UserErrors replaced where appropriate. Shazam. Another check-in will remove the GATKException and restore the StingException. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4252 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-10 15:25:30 +00:00
depristo	ca9c7389ee	Not useful git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4238 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-09 02:33:03 +00:00
depristo	8708753a6a	checkin for removal git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4237 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-09 02:32:46 +00:00
fromer	1b1ec7e52d	Changed default phasing window size to 10 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4235 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-08 21:28:36 +00:00
depristo	7eeabe534a	QSample walker for 1KG -- measures aggregate quality of sequencing. Includes misc. improvements throughtout the code, including using the new Tribble GenotypeLikelihoods class for working with VCF GLs from the UG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4211 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-03 18:21:43 +00:00
fromer	529eecd4dc	Added phasing sub-directory to keep walkers directory clean git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4208 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-03 13:38:46 +00:00
fromer	c0ce9ca8cc	Added phasing sub-directory to keep walkers directory clean git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4207 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-03 13:32:30 +00:00
fromer	c119f64514	Added phasing sub-directory to keep walkers directory clean git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4205 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-03 13:24:18 +00:00
fromer	a1cf3398a5	Added basic version of phasing evaluation: GenotypePhasingEvaluator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4196 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-02 22:09:50 +00:00
fromer	50f7f18cbd	Changed ReadBackedPhasing default PQ threshold to 10 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4166 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-30 21:26:15 +00:00
kiran	16b75e3b9a	A new version of the ErrorRateByReadPosition walker, using the GATKReport functionality to store and emit its output. This version of the walker is roughly half the number of lines as the previous version, owing simply to the removal of all of the output formatting that's now handled by GATKReport. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4160 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-29 05:41:13 +00:00
kiran	fd19c63aaf	A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module). This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R. In the end, you get a table that looks like this: ##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads cycle errorrate.61PA8.7 qualavg.61PA8.7 0 0.007451835696110506 25.474613284804366 1 0.002362777171937477 29.844949954504095 2 9.087604507451836E-4 32.87590975254731 3 5.452562704471102E-4 34.498999090081895 4 9.087604507451836E-4 35.14831665150137 5 5.452562704471102E-4 36.07223435225619 6 5.452562704471102E-4 36.1217248908297 7 5.452562704471102E-4 36.1910480349345 8 5.452562704471102E-4 36.00345705967977 ... A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession. Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone. This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect. The display property of individual columns can be turned off. This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file. Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations. For instance, two whole columns can be divided, the results of the operation being stored in a third column. This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-29 05:39:24 +00:00
hanna	de5ccfb0b1	Moved hasPileupBeenDownsampled() based on Eric's request. Also eliminated @Deprecated constructors from AlignmentContext. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4142 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 16:12:05 +00:00
ebanks	bfcac33e80	Cleaning up playground utils and tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 01:25:47 +00:00
ebanks	4979dcc9a7	Finishing up the playground cleanup (for now) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4135 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 01:19:37 +00:00
ebanks	0452b1ab68	archiving, removing, or promoting to core from playground git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4134 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 01:07:42 +00:00
ebanks	dfae48cee0	Moving supported tools to core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4127 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-26 13:56:19 +00:00
ebanks	e06b2c90ef	Cap the default size of join tables; this can be modified with the --maxJoinTableSize argument. Also, misc cleanup of the comments. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4125 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-26 05:21:26 +00:00
ebanks	79cd716671	More cleanup of the Genomic Annotator. Also, we now require join tables to have unique entries for the column keyed on the join. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4124 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-26 04:43:52 +00:00
fromer	39da567d48	Changed ReadBackedPhasing to be a RodWalker (corrected to By(READS)) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4120 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 20:53:04 +00:00
ebanks	4678613893	Significant fixes for the Genomic Annotator. 1. Rip out all of Ben's code intended to circumvent the stable VCF Writer output system in multi-threaded mode (I threw up a little when I saw this code). This will improve memory consumption when running with -nt. 2. Don't annotate indels or > bi-allelic sites. 3. Fix bug where not all records were making it into the output VCF. 4. General code clean up. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4118 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 20:16:50 +00:00
fromer	41e53d37e1	Changed ReadBackedPhasing to be a RodWalker (more efficient, since it is ROD-focused) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4117 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 19:43:57 +00:00
fromer	aa8cf25d08	Implemented fully symmetric sliding window read-backed phaser git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4104 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-24 21:12:32 +00:00
ebanks	90aef66ec5	Minor fixes for my last commit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 23:25:29 +00:00
ebanks	ef795825fd	Yet more argument consistency updates git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4089 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 20:52:30 +00:00
ebanks	ccda4f6ec1	More output consistency changes (updating wiki docs as I go along). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 18:46:08 +00:00
ebanks	55a8306a0d	Update the @RMD tags to look for VariantContext.class instead of ReferenceOrderedDatum.class. Since the test for rod type is broken this won't affect anything right now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4084 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 17:49:37 +00:00
aaron	35b9883dd6	vcfwriter is in tribble now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4083 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 17:01:04 +00:00
kiran	295472bf69	Simple change to handle a no-call (must avoid asking for the second allele, which will be be null in this case). Also, added a hack to deal with input VCFs where there are no genotype likelihoods (needed in order to process Hapmap and 1KG VCFs). In this mode, called genotypes are assigned a likelihood of 0.96, and alternative genotypes are given 0.02 each. I know Beagle actually takes genotype data without likelihoods, so this might not be the right way to do this. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4081 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 05:13:09 +00:00
hanna	b80cf7d1d9	Modifications to the output system for better interaction with @Output. Multiplexed arguments. More details in the Monday meeting. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4077 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-22 14:27:05 +00:00
depristo	b6989289fc	Potential bug fix for bad references where some codons may have Ns git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4075 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-21 12:09:33 +00:00
ebanks	165dc6d3b0	Ryan, what did you decide about supporting this tool? Is it still useful? git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4073 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-20 19:16:14 +00:00
fromer	1c4784999a	Updated to work exclusively in log10 space git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4069 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 21:31:07 +00:00
fromer	3af4e618cc	Fixed precision issues with PQ (phasing quality) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4068 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 20:34:47 +00:00
fromer	effeedf1a3	Updated Bayesian phasing method to output per-site phasing statistics (and to not cap PQ at 40) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4064 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 19:55:47 +00:00
fromer	1336ea17a3	quality-scored-based Bayesian phasing algorithm implemented git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4055 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-18 21:17:46 +00:00
kiran	3d63302b70	Deprecated. Use SelectVariants instead. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4043 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 15:07:50 +00:00
fromer	dfe2922b5e	First working version of statistical haplotype phaser git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4031 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 21:29:45 +00:00
ebanks	f36c0ed613	Stop building obsolete VCFTools and CGUtilities git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4030 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 19:28:36 +00:00
hanna	cb144734c0	Getting rid of GenotypeWriter interface. Of note: - GATKVCFWriter deleted, to be replaced if absolutely necessary when VCF writing goes into Tribble. - VCFWriter is now an interface, for easier redirection. - VCFWriterImpl fleshes out the VCFWriter interface. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4026 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 16:33:22 +00:00
kshakir	f39dce1082	Exposed CommandLineFunction defaults to the Queue.jar command line (see -help). Added ability to skip up-to-date jobs where the outputs are older than the inputs. Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names. Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile Moved Hidden from the GATK to StingUtils. Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7 Added Queue to javadoc and testing build targets. Added first Queue unit test. Another pass at avoiding cycles in the DAG thanks to all function I/O being files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 21:58:26 +00:00
ebanks	419a36f74c	Starting the clean up of the sting.utils.genotype code which is all either moving to Tribble, moving to sting.utils.vcf, or being removed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3994 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 02:16:05 +00:00
kiran	9aa70d9c7c	Replaced by SelectVariants git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3979 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 07:07:42 +00:00
ebanks	637a1e5055	Updating to use the new VA interface git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3975 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 05:31:01 +00:00
aaron	72ae81c6de	VariantContext has now moved over to Tribble, and the VCF4 parser is now the only VCF parser in town. Other changes include: - Tribble is included directly in the GATK repo; those who have access to commit to Tribble can now directly commit from the GATK directory from Intellij; command line users can commit from inside the tribble directory. - Hapmap ROD now in Tribble; all mentions have been switched over. - VariantContext does not know about GenomeLoc; use VariantContextUtils.getLocation(VariantContext vc) to get a genome loc. - VariantContext.getSNPSubstitutionType is now in VariantContextUtils. - This does not include the checked-in project files for Intellij; still running into issues with changes to the iml files being marked as changes by SVN I'll send out an email to GSAMembers with some more details. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3954 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 18:47:53 +00:00
fromer	b21f90aee0	Added preliminary framework for performing short-range phasing (ReadBackedPhasingWalker.java) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3953 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 14:56:34 +00:00
ebanks	1539791a04	Fix for Kiran: when using VCFs for the comp tracks in the Annotator(s), don't put the headers from them into the output VCF. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3950 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 04:45:47 +00:00
chartl	38e65f6e1b	Added: A VariantEval module that gives simple metrics by sample, an an abstract class that makes per-sample modules easy to write (but a little bit clunky since a class needs be defined for each data point -- see SimpleMetricsBySample as an example). AnalysisModuleScanner needed a slight update to pull in data points from parent classes for this to work (thanks Khalid for showing me how to do this). After a code review with Aaron (thanks) and ensuring integration tests pass, I am committing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3939 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 19:37:39 +00:00
chartl	2bc69572cb	Make transcript2info capable of handling b37/hg19 contigs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3915 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-02 17:32:08 +00:00
delangel	4fc1db7aaf	Change interface to VCFWriter add() method to take only 1 byte from reference (since that's the only thing it needs), to prevent bugs like having people call it with ref.addBases() which is wrong (since it provides bases starting from the left of reference context window). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3868 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 20:24:03 +00:00
delangel	5eef15cfdf	a) Bad bug fix to CombineVariants: when indels were being merged, the reference base provided was wrong - ref.getBases()[0] was being used, but this returns bease at start of window. Instead, the reference at current locus should be used. b) Cosmetic change to Beagle annotation description. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3861 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 15:13:47 +00:00
ebanks	c6ad26e04f	1) When quals/GQs are really integers (x.00), strip off the floating points. 2) Keep track of whether vcf records are unfiltered vs. pass filters in the variant context so we can regenerate the records on output. 3) No more "ID" hard-coded all over the code to set the VariantContext ID. Use a static variable instead. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3840 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 18:01:45 +00:00
ebanks	f742980864	1. Refactoring of GenoypeWriters so that parallelization now works again with VCF4.0. We now have just a single reference to the old VCF classes, and that one will be purged soon. 2. Moved Jared's VCFTool code into archive so that everything would compile. 3. Added the vcf reference base (needed for indels) as an attribute to the VariantContext from the reader. 4. TribbleRMDTrackBuilderUnitTest was complaining that a validation file didn'r exist, so I commented it out. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3835 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 06:16:45 +00:00
aaron	f4cfb0f990	The first step in integrating Jim's tree based index scheme: - changed to a better method for getting headers from Codecs - some removal of old commented out code in the GATKAgrumentCollection - changes for the rename of FeatureReader to FeatureSource - removed the old Beagle ROD - cleaned up some of the code in SampleUtils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3826 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 04:49:27 +00:00
depristo	7c42e6994f	FindBugs fixes throughout the code base git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3823 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-18 16:29:59 +00:00
delangel	55b756f1cc	First step in major cleanup/redo of VCF functionality. Specifically, now: a) VCF track name can work again with 3.3 or 4.0 VCF's when specifying -B name,VCF,file. Code will read header and parse automatically the version. b) Old VCF codec is deprecated. Reader goes now direct from parsing VCF lines into producing VariantContext objects, with no intermediate VCF records. If anyone can't resist the urge to still input files using the old method, a new VCF3Codec is in place with the old code, but it will be eventually deleted. c) VCF headers and VCF info fields no longer keep track of the version. They are parsed into an internal representation and will be output only in VCF4.0 format. d) As a consequence, the existing GATK bug where files are produced with VCF4 body but VCF3.3 headers is solved. e) Several VCF 4.0 writer bugs are now solved. f) Integration test MD5's are changed, mostly because of corrected VCF4.0 headers and because validation data mostly uses now VCF4.0. g) Several VCF files in the ValidationData/ directory have been converted to VCF 4.0 format. I kept the old versions, and the new versions have a .vcf4 extension. Pending issues: a) We are still not dealing with indels consistently or correctly when representing them. This will be a second part of the changes. b) The VCF writer doesn't use VCFRecord but it does still use a lot of leftovers like VCFGenotypeEncoding, VCFGenotypeRecord, etc. This needs to be simplified and cleaned. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3813 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 22:49:16 +00:00
hanna	dfddf8fd75	- Bring the PaperGenotyper up to code. - Remove some old debugging cruft regarding handling of threaded engine exceptions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3796 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 22:31:21 +00:00
ebanks	af23762778	Removing more references to VCFRecord git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3789 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 11:54:23 +00:00
ebanks	460283f6d2	No more manually converting VariantContexts to VCFRecords. You should be utilizing VCs and not VCFRecords. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3787 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 05:21:28 +00:00
ebanks	6b5c88d4d6	The GATK no longer writes vcf3.3; welcome to the world of vcf4.0. Needed to fix a few output bugs to get this to work, but it's looking great. Much more still to come. Guillermo: hopefully this doesn't break your local build too badly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3786 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 04:56:58 +00:00
ebanks	9a05e8143d	Move to 4.0 and away from VCFRecord. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3780 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-13 15:54:54 +00:00
ebanks	7e7da75d27	Moving over to 4.0 and away from VCFRecord git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3778 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-13 14:07:10 +00:00
delangel	297f15a60c	Protect ProduceBeagleInputWalker against evil users who feed to it VCF's with indels, no variation sites or other interesting markers: Write to Beagle input only in biallelic SNP sites since that's the only thing Beagle can do. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3772 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-12 20:54:42 +00:00
delangel	5992b79159	a) Simplify normalization code in ProduceBeagleInputWalker, as to always normalize, and use MathUtils.normalizeFromLog10 to do this. b) Several improvements to BeagleOutputToVCFWalker: 1. If a Hapmap input track is provided (e.g. -B comp,VCF,file), Hapmap sites will be annotated with Hapmap Allele count and allele frequency (key ACH, AFH). 2. If probability of correct genotype is lower than ncthr (optional argument provided by user, default = 0.0), walker will keep original calls instead of using Beagle calls. 3. Instead of annotating just whether Beagle had modified a site, annotate instead HOW MANY genotypes in a site were actually changed by Beagle. All three improvements are mostly for debugging and analysis only. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3769 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-12 19:54:58 +00:00
ebanks	e50627a49e	1. Updated tests and added integration test for liftover code. 2. Updated liftover code (and scripts) to emit vcf 4.0 and no longer depend on VCFRecord. 3. Beagle walker now also emits vcf 4.0. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3767 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-12 17:58:18 +00:00
ebanks	8086ab1f75	Pulled sample/header merging routines out of CombineVariants and into util classes. Added more generalized methods for retrieving samples. Updated the Beagle walkers to use these methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3764 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-12 16:51:54 +00:00
ebanks	0c4a32843c	No longer uses VCFRecord git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3763 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-12 13:57:39 +00:00
ebanks	f130d29318	No longer uses VCFRecord. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3762 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-12 13:34:10 +00:00
ebanks	fb717fe128	First pass needed to remove old VCF code: moving all VCF-related constants into a single unified class git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3759 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-11 07:19:16 +00:00
delangel	be75b087ec	a) Add input argument (-ncrate) to BeagleOutputToVCFWalker. If the genotype posterior error probability is higher than this threshold, we declare No-call at this genotype. b) Add "OG" annotation to genotypes. If Beagle changes genotypes, this annotation gets the original genotype call, to ease performance comparisons. If not, this annotation gets an empty value. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3723 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-06 18:33:28 +00:00
aaron	3347d1ca7c	part one of combining format and info header lines code into a single abstract class for Mark; plus some 'm' removals from access methods for Eric. Adding fixes for CombineVariants next. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3719 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-05 05:57:58 +00:00
hanna	4995950d04	IndexedFastaSequenceFile is now in Picard; transitioning to that implementation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3701 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-01 04:40:31 +00:00
delangel	ed71e53dd4	1) Initial complete version of VCF4 writer. There are still issues (see below) but at least this version is fully functional. It incorporates getting rid of intermediate VCFRecord so we now operate from VariantContext objects directly to VCF 4.0 output. See VCF4WriterTestWalker for usage example: it just amounts to adding vcfWriter.add(vc,ref.getBases()) in walker. add() method in VCFWriter is polymorphic and can also take a VCFRecord, lthough eventually this should be obsolete. addRecord is still supported so all backward compatibility is maintained. Resulting VCF4.0 are still not perfect, so additional changes are in progress. Specifically: a) INFO codes of length 0 (e.g. HM, DB) are not emitted correctly (they should emit just "HM" but now they emit "HM=1"). b) Genotype values that are specified as Integer in header are ignored in type and are printed out as Doubles. Both issues should be corrected with better header parsing. 2) Check in ability of Beagle to mask an additional percentage of genotype likelihoods (0 by default), for testing purposes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3664 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-28 23:54:38 +00:00
weisburd	147ba68441	Fixed bug with mrnaCoord field - made it count exon positions only, rather than introns & exons git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3642 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-25 19:53:32 +00:00
aaron	682f9b46c6	Two fixes together: 1) Some improvements to the VCF4 parsing, including disabling validation. 2) Reimplemented RefSeq in the new Tribble-style rod system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3630 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-24 22:17:03 +00:00
ebanks	824c2bbac0	Finishing previous checkin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3608 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-22 17:21:38 +00:00
ebanks	aa1852575e	Add -noVerbose flag to stop output of INFO data. Cuts runtime by 30% and output from 65Mb to 1Kb. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3591 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-18 18:53:35 +00:00
rpoplin	724affc3cc	Major bug fixes for the Variant Recalibrator. Covariance matrix values are now allowed to be negative. When probabilities are multiplied together the calculation is done in log space, normalized, then converted back to real valued probabilities. Clustering weights have been changed to only use HapMap and by-1000genomes sites. The -nI argument was removed and now clustering simply runs until convergence. Test cases seem to work best when using just two annotations (QD and SB). More changes are in the works and are being evaluated. Misc fixes to walkers that use RScript due to CentOS changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3590 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-18 17:37:11 +00:00
delangel	b694ca9633	Cleanup: Don't require likelihood ROD in Beagle parameters when generating output VCF. Likelihoods file is only an input to Beagle but the Walker that generates a VCF doesn't need it, so it's silly to ask for it and it's error-prone. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3579 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-17 17:45:48 +00:00
aaron	3d049204ed	some refactoring for the variant eval output system git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3576 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-17 05:34:31 +00:00
delangel	8cb16a1d45	a) Cleanup, remove -input argument from BeagleOutputToVCFWalker since it's not needed. b) Added back old Beagle ROD to maintain backward compatibility (does anyone even use this???) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3563 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-16 02:13:08 +00:00
delangel	d319a28be7	Complete rewrite of the Beagle functionality to read from Beagle output files and produce VCF with modified genotypes. Now, a new ROD system using Tribble is in place. Beagle inputs are set using -B beagleType,Beagle,pathToBeagleFile, where beagleType can be either beagleR2, beagleLike, beaglePhased or beagleR2 (BeagleOutputToVCFWalker requires all of the above). Only pending items: -input argument is now unused and can be removed, will be cleaned later. Wiki will be updated with new usage shortly. We can now run with a reduced memory footprint, and output VCF is exactly identical to previous version. Drawback is increased runtime because Tribble has to create an index for all the Beagle files when starting if the idx files are missing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3562 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-16 02:01:35 +00:00
sjia	b99a5e06f3	Added option to only consider alleles of > specific allele frequency. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3557 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-15 02:09:35 +00:00
sjia	8defb30796	Documentation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3555 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-14 21:31:01 +00:00
weisburd	c1046653a2	Fixed handling of records where gene-names are identical (eg. as in refseq NR_030638 in chr20) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3554 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-14 20:00:49 +00:00
sjia	b3c3023c3c	Allows callers to handle HLA reference files as input (rather than hard-coded paths) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3552 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-14 18:56:08 +00:00
sjia	abdc8521ea	Added debug options for FindClosestHLAWalker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3549 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-14 17:52:03 +00:00
sjia	c38390eabb	Added option for min number of matches between reads and alleles required to consider reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3548 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-14 16:08:49 +00:00
sjia	d8c963c91c	Remove PhaselikelihoodsWalker.java git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3544 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-14 15:21:43 +00:00
sjia	5704294f9d	HLA caller updated - now searches all (common and rare) alleles, more efficient read filtering and allele comparison runs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3543 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-14 15:14:40 +00:00
weisburd	06fc5eecf8	Implemented TreeReducible - if num threads > 1, the output will be accumulated in memory and written to a vcf file at the end - in onTraveralDone(..). If num threads == 1, things will work as before - where vcf records are written to disk as soon as they are computed with map(..). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3530 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-10 20:57:23 +00:00
weisburd	fdded73861	Improved error reporting git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3520 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 17:52:48 +00:00
weisburd	c1b7bcc786	Fixed handling of mitochondrial genes - added special cases such as ATT being a start codon in mitochondria. Added warning if a gene doesn't start with Met or end in a stop codon git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3517 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 17:15:47 +00:00
weisburd	4f1181974b	Added toString() method git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3516 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 17:12:57 +00:00
ebanks	9b2fcc4711	Refactoring of the annotation system: 1. VA is now a ROD walker so it no longer requires reads (needs a little more testing) 2. Annotations can now represent multiple INFO fields (i.e. sets of key/value pairs) 3. The chromosome count annotations have been pulled out of UG and the VCF writer code and into VA where they belong. Fixed the headers too. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3513 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 17:05:51 +00:00
delangel	de134c226d	Removed ability of users to specify annotations to recompute, cleanups. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3501 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-08 19:17:59 +00:00
ebanks	4d1a6b3d99	quick changes for G git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3500 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-08 16:33:27 +00:00
delangel	907931c902	a) Update annotations when creating new vcf with Beagle's imputed data. Since genotypes may (will) change based on imputation, several annotations need to be updated. By default, AC, AF, AN and AB will be updated. User can force extra annotaqtions to be updated with -A <annotation> argument. b) Several cleanups and beautifications. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3499 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-08 15:12:04 +00:00
depristo	6eeb1693ca	JEXL2 upgrade. Improvements to JEXL processing including dynamically resolving variable -> value bindings instead of up front adding them to a map. Performance improvements and code cleanup throughout. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3494 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-07 00:33:02 +00:00
delangel	c503f01dcf	More cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3492 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-06 17:41:38 +00:00
delangel	d4c66d6191	a) Small cleanup b) Fix major issue with Beagle likelihood converter: if likelihood triplets from UG end up being too low, then Beagle input file will be produced with 0.00,0.00,0.00 triplet. If all samples at a marker have this issue, Beagle will effectively produce junk. To fix, likelihoods are renormalized before converting to linear space. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3491 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-06 17:31:59 +00:00
depristo	cfa18f6743	Fixing missed update with new Allele in it git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3490 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-04 23:56:34 +00:00
delangel	ef47a69c50	a) First fully functional (sort of) version of walker that parses Beagle imputation output files and produce a vcf with imputed genotypes. More doc/info to follow shortly. Issues still to be solved: a) Walker changes all genotypes based on Beagle data, but annotations on the original VCF are unchanged. They should in theory be recomputed based on new genotypes. b) Current implementation is ugly, dirty unwieldy and will necessitate a refactoring soon so I can keep my pride. Most aesthetically affronting issue right now is that we read the full Beagle files at initialization and keep them in memory, but a more delicate implementation would just read from files on a marker by marker basis. Issue that currently prevents this is that BufferedReader() instances don't seem to play nice when called from the map() function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3488 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-04 20:37:25 +00:00
weisburd	3ab936181c	Supports the join feature of GenomicAnnotator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3478 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-02 16:29:57 +00:00
weisburd	f5f7217413	Implemented joins git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3477 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-02 16:28:53 +00:00
weisburd	e14ae471a0	Refactored some of the small utility methods git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3475 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-02 16:26:00 +00:00
ebanks	ffeb3fd80d	Thanks to Guillermo, I found a bug in the Unified Genotyper output: GL was posteriors instead of likelihoods. Not a huge deal because the priors were flat, but fixed nonetheless. Also, needed to update Tribble. Minor updates to the Beagle input maker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3461 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-28 19:28:26 +00:00
rpoplin	2014837f8a	VariantOptimizer package is moved to core, renamed as VariantRecalibration, and added to the binary release package. VariantOptimizer walker is renamed to GenerateVariantClustersWalker and ApplyVariantClustersWalker renamed to VariantRecalibrator. Integration tests added, performance tests still to be done. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3458 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-28 18:20:18 +00:00
aaron	871cf0f4f6	Call out ROD types by there record type, instead of the codec type (which was clumsy). So instead of: @Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFCodec.class)) you'd say: @Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFRecord.class)) Which is more in-line with what was done before. All instances in the existing codebase should be switched over. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3457 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-28 14:52:44 +00:00
rpoplin	062b316881	Better Exception message when can't find annotation value in variant recalibrator. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3434 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-25 21:15:50 +00:00
rpoplin	bf530d23de	Variant Recalibrator now makes use of a prior on known/novel status as well as on allele frequency spectrum. The VariantOptimizer walker now clusters with all variants but gives more weight to knowns / hapmap / 1KG / MQ1 sites. The weights are all optional command line arguments. We no longer assign default values to annotations that are malformed. The walkers will crash with exception so as to not cover up potential issues. We only produce titv-less clusters now, and so the titv argument in VO was removed and the WithoutTiTv string that gets added to the cluster file is removed. The wiki is updated to show new example commands. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3433 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-25 21:08:31 +00:00
weisburd	8db7c97c4d	Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3427 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-24 14:38:54 +00:00
weisburd	4aa749c709	Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3426 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-24 14:38:07 +00:00
weisburd	aca3bcb193	Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3425 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-24 14:37:17 +00:00
weisburd	64ed770250	Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3424 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-24 14:36:28 +00:00
depristo	a10fca0d5c	Genotyper now is using bytes not chars. Passes all tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3406 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 21:02:44 +00:00
depristo	727822adb4	BaseUtils has more clear distinction between byte and char routines. All char routines are @Depreciated now. Please use bytes. Better organization of reverse(), now in Utils not BaseUtils. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3400 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 14:05:13 +00:00
depristo	5abac5c057	A few more char -> byte cleanups git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3398 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 00:02:06 +00:00
depristo	8a725b6c93	Restructuring of ReferenceContext and ReadWalkers to accept a ReferenceContext. Now ReferenceContext is byte[] backed not char[]. Please no more chars for the reference. All of the tests pass now. Coming check-ins are going to clean up the char / byte problems in the GATK git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3397 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 23:27:55 +00:00
weisburd	984c51efd3	Updated to use Tribble-based GATKFeature instead of TabularROD git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3390 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 03:42:12 +00:00
weisburd	42ee16f256	Updated to use Tribble-based GATKFeature instead of TabularROD git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3389 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 03:41:37 +00:00
weisburd	d8469e2fba	Updated to use Tribble-based GATKFeature instead of TabularROD git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3388 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 03:40:47 +00:00
rpoplin	9e15299475	Misc cleanup in variant recalibrator. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3380 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-18 17:37:01 +00:00
weisburd	3c022e4b0c	Improved command-line-arg validation at startup. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3374 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-18 02:46:17 +00:00
weisburd	35b4bba35e	Refactored so it could be used for knownGene and CCDS as well as refGene git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3372 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-18 02:44:10 +00:00
weisburd	bb86c0e03a	Improved error message git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3371 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-18 02:43:13 +00:00
kiran	4235164359	Removed the confusionMatrix column (of course this is a confusion matrix... what else would it be?!). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3365 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-14 21:55:37 +00:00
kiran	95b29f608b	Specify default values. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3364 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-14 21:42:53 +00:00
rpoplin	6efd05831b	Encapsulating annotation decoding function in order to use same fixed random seed in both VariantOptimizer and ApplyVariantClusters git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3363 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-14 20:03:38 +00:00
depristo	1538dc0144	optimizer now uses -an arguments instead of exclude and force for clarity. command-line length reduced by 50% git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3361 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-14 15:41:44 +00:00
aaron	cac98ba5ef	a couple of small documentation fixes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3353 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-12 17:40:27 +00:00
kiran	4a7902bb8e	Bases 'A' and 'a' (etc.) no longer considered different. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3339 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-10 14:53:38 +00:00
kiran	b223b04331	Don't list '.' as an alternate allele, dummy! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3337 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-10 14:51:18 +00:00
kiran	7527f950d1	Computes the quality score distribution per readgroup (one column per readgroup) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3335 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-10 14:49:38 +00:00
kiran	c111c15072	Computes the distribution of insert size per library (for now, one output file per library) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3334 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-10 14:48:35 +00:00
rpoplin	33a9549896	Variant Optimizer accepts a dbSNP rod arugment to use in determining known/novel status as opposed to using the rsID in the vcf record. VO generates plots of annotation values used in clustering broken out by knowns and novels. Useful for showing which annotations are approximately Gaussian. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3332 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-09 16:48:07 +00:00
ebanks	18f1d31a22	Moving to and organizing in core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3320 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-07 04:05:36 +00:00
aaron	a68f3b2e9c	VCF moved over to tribble. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3302 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-05 17:28:48 +00:00
kiran	510b3efcc2	Fixed an issue where asking for the alternate alleles at hom-ref sites would result in an array out-of-bounds exception. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3292 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-03 18:46:33 +00:00
sjia	94b51de401	HLA caller updated to examine class II loci, updated pointers to dictionary, allele frequencies. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3290 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-03 14:54:52 +00:00
rpoplin	97fdd92e7b	Clean up the code to have a unified approach to calculating p(true) for both with and without ti/tv models git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3289 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-03 13:30:20 +00:00
rpoplin	9d01670f62	Major update to the Variant Optimizer. It now performs clustering for both the titv and titv-less models simultaneously, outputting the cluster files at every iteration. It makes use of the Jama matrix library to do full inverse and determinant calculation for the covariance matrix where before it was using only approximations. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3286 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-02 19:21:23 +00:00
weisburd	a318b1871d	Removed unused column git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3285 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-30 21:29:34 +00:00
ebanks	850f36aa61	Changes to the Unified Genotyper's arguments: 1. User can specify 4 confidence thresholds: for calling vs. emitting and at standard vs. 'trigger' sites. 2. User can cap the base quality by the read's mapping quality (not done yet). 3. Default confidence threshold is now Q30. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3281 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-30 16:44:24 +00:00
ebanks	1714c322c2	Reorg of UG args; checking in first before upcoming changes that will break integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3274 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-30 14:48:46 +00:00
weisburd	ba78d146ec	Finished implementing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3273 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-30 14:14:31 +00:00
weisburd	5d5c7f9d34	Changed short code of stop codon to 'stop' git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3272 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-30 13:55:52 +00:00
aaron	7fbfd34315	adding the GELI ROD validation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3270 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-29 21:43:00 +00:00

1 2 3 4 5 ...

1304 Commits (e45b699ac059045f4e762c42b872c42eaf250fb3)