gatk-3.8

Commit Graph

Author	SHA1	Message	Date
depristo	23cb399a88	Reasonable first pass at a correct SB calculation. Simple utilities to support it. VariantsToTable no longer prints filtered sites by default. New non-standard variant eval module to print comp sites not present in eval (FN finder) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4601 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-31 12:41:52 +00:00
delangel	30fae5cf18	Major redo of exact AF computation for UnifiedGenotyperV2. Fact of life is, there's no way we can compute an exact QUAL field and keep performing the AF computation in linear probability space. In good sites with lots of samples, the ratio of Pr(AC=K*\|D) to Pr(AC=0\|D) can be 10^1500 or some ridiculous large number like that, which no double can represent. So, we abandon probablity space and work now in log likelihood space, which has several major repercussions: a) Sites were numerically well behaved now, but another hard fact of life is that the AF iteration is defined in linear Pr space, not in log likelihood space, and the math doesn't work out in log space. So, we need to convert back and forth from lin to log space. b) As a consequence of a), the code got a major slowdown, and calling the 629 samples was about 15 times slower than before (sic). c) To solve b), log10 of integers are now cached at init, and numerical approximations are now made. Most importantly, I'm using the approximation that log(exp(a) + exp(b)) ~= max(a,b) which seems almost inconsequential in practical performance but reduces computation time to what it was before. More detailes analyses are forthcoming. This approximation can be refined further on to avoid expensive log-exp conversions if further profiling and analysis deems it necessary. Also, two other issues were solved: a) Strand bias computation was actually wrong in the case where the optimal AC was bigger than max(forward reads,reverse reads). Now the code is exactly as buggy as the grid search model (all bugs are equal, but some are more equal than others) b) Genotype likelihoods are now computed in a better way and if a likelihood < 0 we don't just cap to 0 but do something a bit smarter. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4600 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-31 01:26:04 +00:00
hanna	d492621122	The TraversalEngine's habit of hanging onto old ROD states seems to have a bad interaction with Tribble. In Tribble, keeping these references in memory until the shard is flushed means keeping one 512K character buffer per object in memory. Fixed by purging the reference to the object at the end of the shard traversal. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4599 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-29 17:09:58 +00:00
ebanks	1c056ea791	Users can now use VariantAnnotator to add annotations from one VCF to another. For example, if you want to annotate your target VCF with the AC field value from the rod bound to CEU1kg, you can specify -E CEU1kg.AC and records will be annotated with CEU1kg.AC=N when a record exists in that rod at the given position. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4598 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-29 16:38:31 +00:00
ebanks	1b3fc8ddd2	Doing things too quickly is also naughty. Thanks, Andrey. Now, we're even. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4597 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-29 14:50:04 +00:00
ebanks	58f7b4c595	Naughty use of assertions means that malformed records are not caught. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4596 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-29 14:41:38 +00:00
delangel	9a60e72364	Trivial change to LeftAlignVariants: make walker return number of aligned variants on map(), and print out the # of aligned variants at the end of the traversal. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4595 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-29 02:03:36 +00:00
hanna	2f8057bf24	Cleanup for multithreading memory leak during integration tests...unregister MXBean at end of traversal to avoid holding a reference to the microscheduler, which holds a reference to the engine, which in turn holds a reference to the walker, which itself holds a reference to all the data aggregated during the course of the traversal. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4594 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-28 18:37:42 +00:00
depristo	860de05a7c	Bug fix for PL vs. GL in header. PL now truly default output for UGv2 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4592 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-28 12:39:18 +00:00
depristo	9782dde3dd	Bug fix for PL vs. GL in header. PL now truly default output for UGv2 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4591 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-28 12:38:48 +00:00
ebanks	fe3cfb067c	very minor cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4590 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-28 02:11:33 +00:00
depristo	cbce3e3c83	General support for both GL (log10) and PL (phred-scaled) genotype likelihoods. All walkers now use the Tribble GenotypeLikelihoods object for parsing VCFs with genotype likelihood fields. Please use GenotypeLikelihoods object from now on for seamless support for GL and PL tags. UGv2 now uses PL by default. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4589 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-28 01:48:47 +00:00
fromer	15183ed778	Reduced header to single sample when useSingleSample arg is given (to prevent lots of pointless no-calls) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4588 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 23:02:10 +00:00
fromer	34538bf2b3	Added ability to focus only on a single sample and/or emit only merged records in MNP merger git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4587 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 20:41:05 +00:00
kshakir	5cdd7a7ba4	There's no such thing as a sam index, so the GATK extension generator doesn't need to add an @Input for them. Updated a call to swapExt to specify the directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4586 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 20:39:03 +00:00
hanna	4c23b1fe9c	Get rid of the static cache of ArgumentTypeDescriptors by making them an integral part of the parsing engine. Hugely lowers our memory footprint in integrationtests, but not yet enough to run Mark's new parallelized VariantEvalIntegrationTests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4585 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 19:44:55 +00:00
ebanks	e112df20df	Use a sorting VCF writer because records can flip positions during left-alignment git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4583 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 06:33:03 +00:00
ebanks	708e973911	Adding a walker to left-align indels in a VCF file (was able to reuse code from AlignmentUtils to do the hard part). The code correctly updates the alleles if they change. This makes it much easier to compare our indel calls to e.g. CG or dbSNP. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4582 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 06:08:26 +00:00
ebanks	ec442086ec	Minor refactoring of the cleaner allows me to add a trivial walker that left aligns the indels present in reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4581 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 03:39:10 +00:00
hanna	04e38929f0	Disabling parallelized version of VE integration tests. Still slow, but not deadlocking any more. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4580 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 02:47:03 +00:00
ebanks	ffc0ed2b32	Renamed getName() to getSource() in VariantContext to be more accurate git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4579 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 02:21:41 +00:00
ebanks	52fc023d80	Added convenience methods to check/get the ID of the VariantContext git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4578 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 01:56:58 +00:00
fromer	a7af1a164b	Updated MNP merging to merge VC records if any sample has a haplotype of ALT-ALT, since this could possibly change annotations. Note that, besides the "interesting" case of an ALT-ALT MNP in a pair of HET sites, this could even occur if two records are hom-var (irrespective of using phasing). Note also that this procedure may generate more than one ALT allele. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4577 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 01:50:36 +00:00
depristo	e02aac0743	No longer print out 0 reads were filtered out... message when there were no reads scene at all git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4575 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-26 20:22:16 +00:00
depristo	b085648141	Parallelized VariantEval. Refactored output to support parallel output style. Minor improvements to testing framework to enable easy executeTestParallel to run -nt 1 and -nt 4 by default. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4574 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-26 20:21:38 +00:00
kshakir	8211cee0b2	Queue UI Improvements: - Forcing user to set the temp directory via -Djava.io.tmpdir to avoid filling up /tmp. - By default deleting job outputs tagged as intermediate. - Defaulting pipeline to scatter count 1 (no reads deleted). - Cleaning up temp classes even when scripting fails. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4573 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-26 19:49:08 +00:00
ebanks	cedceb33cd	My only experience with getting external groups (GAP,dbSNP) to use VCF has been painful at best, so I'm not holding my breath to get indels for CG in VCF. To that extent, here's a oneoffs walker to convert from CG format to VCF for all 'del' & 'ins' types (but not 'sub' types, since they're too complex to code up in VCF and I don't care about them for now). rs ids are included. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4572 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-26 17:53:14 +00:00
ebanks	071799453c	More complete fix to previous commit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4571 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-25 20:47:37 +00:00
ebanks	67a776d53c	Yikes! VariantEval was always loading genotypes unnecessarily when no sample list was provided because the order of the checks in the if statement wasn't optimal. This results in a massive performance penalty when running with many-sample VCFs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4570 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-25 20:30:23 +00:00
ebanks	0d97394c4f	Add capability to liftover to do the right thing when sections of the genome are reverse complemented. This does not work for indels (we don't try to reverse complement) because we need to figure out what the hell to do about the fact that the 'base to the left' that we automatically add on will be wrong because the location of the indel actually changes when reverse complemented. Sheesh. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4569 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-25 20:03:03 +00:00
fromer	c357ec775a	Trivially phases any hom site (since it is always correct to continue the previous haplotypes by appending the same allele onto both haplotypes) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4568 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-25 16:58:41 +00:00
rpoplin	da64183854	Fix for the case of the truth VCF file having multiple SNPs at the same locus. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4567 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-25 15:04:50 +00:00
hanna	3039c0de3c	Retire old ROD syntax. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4564 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 23:52:11 +00:00
depristo	78e71c4167	Fisher exact makes a return. Seems to be working properly. Current tagged as a work in progress. Needs to take the filtered context to be truly correct. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4561 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 20:35:44 +00:00
fromer	f06f955e06	Added count of number of mergeable records (within specified distance cutoff) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4560 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 20:11:15 +00:00
depristo	84b6d2926b	Useful walker that creates a new interval list with only the interval overlapping input sites list. Really a one-off walker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4559 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 19:55:04 +00:00
depristo	78b4a1c240	VariantsToTable now supports the virtual TRANSITION field git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4558 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 19:53:46 +00:00
hanna	e6d61197e6	Disable OTF indexing when writing indices for temporary VCFs when running with -nt option. When last I checked in, Ryan was seeing a ~25% speedup per shard by not indexing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4556 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 17:40:37 +00:00
depristo	e6b008f87c	Fixed >= vs. > test leading to failure to tolerate dynamic indexes that are created at exactly the instant the output VCF is closed too git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4555 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 16:11:14 +00:00
ebanks	72c5b75460	Tribble exceptions can be generated outside of the normal codec parsing code because we now lazy load the VCF genotype fields. I'm not sure how else to account for this (to make sure they show up as user errors and not GATK system errors) besides catching them here. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4554 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 15:22:17 +00:00
delangel	e24f7fec47	Fixed indel genotyper which broke yet again because we can't just call context.getBasePileup() without checking again for its existence in the first place. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4553 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 15:17:11 +00:00
ebanks	c0b4317311	Er, here's the right fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4552 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 15:08:25 +00:00
ebanks	181f901126	Fix for Ryan: don't pull reference sequence for the portions of reads that extend beyond the contig boundaries git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4551 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 14:38:26 +00:00
ebanks	9f76aed515	Fix for IDs 5zP7jJeffK2sdPH1BH4JBVSrQztVEDKP and nX0cuBjoqBW4NQFpM6dE13KpkCuYFpZu git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4550 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 14:05:27 +00:00
hanna	d4feb99d9a	For parallel ROD traversals, simplified reference sharding. Will replace with a more sensible strategy for sharding w/o BAMs at some point after ASHG. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4549 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 05:08:15 +00:00
fromer	9ba7269728	Fixed Integration Tests to output VCF files with -NO_HEADER git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4548 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 19:49:44 +00:00
fromer	60f88866dd	Uses VCFConstants instead of hard-coded constants git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4547 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 19:49:01 +00:00
fromer	883b8ff80e	Removed flush() method from VCFWriter interface; added takeOwnershipOfInner parameter in constructor of wrapper VCFWriters to designate if the Writer should close the inner Writer it receives on construction git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4546 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 19:48:00 +00:00
fromer	1ea43be976	Removed flush() method from VCFWriter interface git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4545 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 19:46:42 +00:00
chartl	3566ad2146	Wrong if statement. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4544 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 17:37:45 +00:00
chartl	bf17f92b64	Do not look for samples in dbsnp binding git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4543 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 17:36:38 +00:00
ebanks	225cf49128	Implementing reference confidence estimate in UGv2 as per UGv1 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4542 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 16:57:59 +00:00
delangel	cf9c9ae241	Three important updates for Dindel genotyper: a) Fix it up because it broke with a recent checkin to annotate vcf with unfiltered depth. b) Printout of ref/alt alleles in output vcf was incorrect because the start/stop positions of associated GenomeLoc were incorrectly computed in case of a deletion. c) Redid Beagle input/output walkers as not assume that ref was a single base, not to assume that variant was a vcf and generalized it to be indel-capable, so now the Beagle walkers can be used for indels as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4541 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 16:00:16 +00:00
kshakir	b88cfd2939	Updated MD5s of VCFs, since the approximate command line arguments injected into the VCF headers now have a little more order to them thanks to changes in the ParsingEngine. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4538 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 03:07:40 +00:00
ebanks	8f38ebf98e	Throw a user exception when using the clustered SNP filter in the presence of ref calls. It's unfortunate, but until we get a windowed ROD context this is just too much of a headache to support. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4537 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 02:44:10 +00:00
kshakir	88a0d77433	Changed parsing engine to store the order the argument bindings based on their definition in the class, moving "-T" to the front of Queue command lines. Queue GATK generated .intervals is now a List(File) again removing special case handling in the generator. Instead of using @Scatter annotation, using ScatterFunction instance to determine if a job can be scattered. Implemented special VcfGatherFunction which only uses the header from the first file, even if the other files differ in their headers. Added a -deleteIntermediates to Queue to delete the outputs from intermediate commands after a successful run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4536 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 21:43:52 +00:00
ebanks	91049269c2	Optimizations across the board, with help from Guillermo, Matt, and JProfiler. Too tired to give details now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4535 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 20:47:41 +00:00
fromer	f76865abbc	ReadBackedPhasing now uses a SortedVCFWriter to simplify, and has the ability to merge phased SNPs into MNPs on the fly [turned off by default]; MergeSegregatingPolymorphismsWalker can also do this as a post-processing step; Integration tests for MergeSegregatingPolymorphismsWalker were also added git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4534 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 20:27:10 +00:00
fromer	e8079399ac	Added flush() method to VCFWriters git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4533 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 20:23:22 +00:00
fromer	00726b6c4b	Added mergeIntoMNPs to merge successive VCF records into a single MNP VCF [if possible] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4532 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 19:40:26 +00:00
fromer	55230ce5f3	Added startsBefore, startsAfter, and minDistance [calculates distance between any pair of bases in the two GenomeLocs] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4531 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 19:12:34 +00:00
ebanks	4f77581087	More optimizations for HaplotypeScore: pulling final constants out of loops git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4530 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 17:40:57 +00:00
hanna	20fac43521	Add extra logging to the GATK run report at the start of metrics aggregation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4529 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 17:32:51 +00:00
ebanks	a205900eff	Naughty use of Strings in HaplotypeScore literally double the runtime of Unified Genotyper. Moved over to bytes and no longer allow Strings in the Haplotype util class. New round of profiling on tap for tomorrow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4528 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 03:32:21 +00:00
depristo	f9541b78d3	Timing of traversal now starts at the start of the traversal, so the rate is reasonable right off the bat. For example, we now see: INFO 22:45:02,476 TraversalEngine - [TRAVERSAL STARTING]; INFO 22:45:32,484 TraversalEngine - [PROGRESS] Traversed to 2:50850686, processing 18,646 sites in 30.05 secs (1611.50 secs per 1M sites) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4527 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 02:47:34 +00:00
depristo	f7ce18553e	GenotypeConcordance now prints interesting sites more nicely. RMDTrackBuilder is now uses the root class FeatureSource not BasicFeatureSource. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4525 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 00:29:02 +00:00
ebanks	7a291a8ff3	First pass at a VCF validator. Will test more tonight. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4524 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-19 19:55:49 +00:00
chartl	341e93ee12	The reference fixer seems to have munged the OMNI rather than making it better. Looks like some sites need to only have the ref and alt bases swapped, and others need to have the genotypes swapped as well? E.g. some subset need A C 1/1 --> C A 0/0 while another subset need A C 1/1 --> C A 1/1 it's unclear how big these subsets are (or even if one is empty). What I do know is, doing the first one totally screws up concordance metrics for the 421-sample chip. So either something else needs to be done, or there's a bug in this walker. Until I know for sure, I've added an initialize exception to disable this thing... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4523 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-19 12:50:24 +00:00
ebanks	5251f49a90	Including Marian Thieme's BaseCounts class (with some modifications) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4522 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-19 03:07:30 +00:00
hanna	c5f105d050	Fix boneheaded mistake in the new interval filtering code I added on Sunday. Sorry everyone. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4521 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-19 01:20:12 +00:00
ebanks	524cb8257c	Renaming for accuracy git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4519 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-18 18:11:07 +00:00
ebanks	0fe504b748	Use filtered depth for Exact model (just like grid search) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4518 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-18 18:08:31 +00:00
ebanks	d54d9880d7	Now that G's new genotyping algorithm is live, I've cleaned up the code to completely separate the grid search from the exact model. AlleleFrequencyCalculationModel is now completely abstract. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4517 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-18 18:04:06 +00:00
ebanks	80e5ac65b4	CAP_BASE_QUALITY needs to be included in the clone() method for it to be usable in UG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4516 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-18 03:11:03 +00:00
hanna	6af9532090	Fix for GATK slowdowns at the ends of intervals. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4514 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-17 23:21:23 +00:00
chartl	5889138f4a	facepalm forgot to add the samples to the header. How could the VCFWriter let me get away with something so boneheaded?! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4513 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-17 05:36:29 +00:00
chartl	2bc5971ca1	Added - a tool to fix reference bases of a VCF. The OMNI had a couple of sites with incorrect reference bases (look to be legacy from other chips), and a few more that had ref and alt flipped. GAP should probably take care of it, but since I need results by monday, I'm doing it. Modified - SelectVariants: Hook up to VariantContextUtils to recalculate AC/AF/AN, which uses the accessor in VariantContext to do this. Somehow sites that were selected down to hom-ref genotypes only wound up getting positive AC. IMPORTANT I kind of need input here. The header of a file used for an integration test specifies AC as being an integer. Recalculating it casts it into an integer list (which it should be, as it allows for alternate alleles). However this appears to clash with what the jexl expression is looking for? For now, the integration test itself needed to be changed -- it's unclear what to do when the header specifies AC of being one class, but recalculating it casts to another class, and I'm not sure what to do. I'm committing my omni_qc pipeline because I'm almost certain 2 months down the road I'm going to wonder what the heck I did to generate my results. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4511 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-17 03:18:01 +00:00
ebanks	7aa030a9a4	Hmm. Apparently variants can get lifted over to different chromosomes. Who knew? Reverting changes from a couple of days ago. The only way to do this correctly (without requiring lots of memory) is to turn off on-the-fly indexing for this walker. Integration tests cover this now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4510 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-17 02:54:12 +00:00
chartl	8b2d387643	Added in an eval module that calculates the dispersion histograms between eval and comp (e.g. M_{i,j} = # of times eval observed to have AC i, comp AC j -- for af it's i/100 vs j/100 ) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4507 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 19:07:43 +00:00
ebanks	f78ff08e2b	This is less correct than my previous change but it's what UGv1 does and now is not the right time to start mucking with things. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4506 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 18:56:45 +00:00
ebanks	471c18054f	Fix for SB calculation: the best overall AF might not have any mass when just looking at reads from a single strand. We need to compute the best AF for each stratification. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4505 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 17:51:18 +00:00
asivache	42c3d74432	bug fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4503 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 16:27:40 +00:00
chartl	c9d473edee	More changes to Variant Eval and Genotype Concordance (passes all integration tests): 1: -sample can now include a file, which will be parsed for sample-name entries 2: If you request a sample to run analysis on, but it is not present in any of your RODs, VEW will exception out 3: Change added to parse Integer, String, and List<Integer> type Allele Count annotations (error otherwise) 4 [slightly problematic]: The count objects now maintain row-keys in order, as the keys were taking an inordinate amount of time in onTraversalDone (multiple calls to getRowKeys(), so many multiple sorts of the same underlying unsorted object, very bad) There is a legacy comparison object which is unused which I will strip out soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4502 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 12:40:36 +00:00
ebanks	954dd84f51	Adding an integration test (against hg18 this time) that requires on-the-fly sorting in order to work properly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4500 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 07:45:21 +00:00
ebanks	9f54170dff	Hooking up the liftover tool to the new on-the-fly sorting VCF writer so that records can now get emitted in order. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4499 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 07:27:01 +00:00
ebanks	d41c252b13	Looking over the calling results with Ryan, it's clear that while the grid search optimization (ignoring samples that are clearly ref) can work for assigning genotypes, it cannot be used for calculating P(AF>0). There's too much area under the likelihood curve that gets lost and the QUALs are negatively affected. However, testing showed that this only slightly affects runtime (~15 minutes per 1Mbase for the 1kg allpops). The optimization does remain for genotyping. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4498 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 19:06:32 +00:00
ebanks	2606e67cf1	Reverting Matt's change from yesterday which I accidentally blew away when trying to cope with the stupid svn update issues we've been plagued with recently. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4495 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 14:40:42 +00:00
ebanks	cfb33d8e12	Filtering optimizations are now live for UGv2. Instead of re-computing filtered bases at every locus, they are computed just once per read and stored in the read itself. Eyeballing the results on the ~600 sample set from 1kg, we cut out ~40% of the runtime! QUALs are now sometimes different from UGv1 because I noticed a bug in v1 where samples with spanning deletions only were assigned ref calls instead of no-calls which ever so slightly affects the QUAL. Not a big deal though. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4494 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 05:04:28 +00:00
chartl	4ac636e288	Minor change: when tabulating concordance by AC, ignore sites with multiple segregating alleles in the population, at least for now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4493 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 01:35:33 +00:00
chartl	7c9ef59d65	This is simultaneously a minor and major change to VariantEval, so take heed: The core walker has been modified so that when variant contexts (eval and comp) are subset to command-line-specified sample(s), the chromosome count annotations (AC/AN/AF) are altered to reflect the AC/AN/AF of only those samples involved in the comparison. No more getting AC500 when you're comparing a 10-sample overlap. Interestingly enough, this didn't break any integration tests. GenotypeConcordance now has two additional tables: Allele Count Statistics, and Allele Count Summary Statistics. These work exactly identically to the Sample Statistics and Sample Summary Statistics tables, except that the partition being used is no longer the sample, but instead the allele count of the variant sites. These tables stratify by both eval and comp ACs, e.g. evalAC0 evalAC1 evalAC2 compAC0 compAC1 compAC2 Differences with previous integration tests were verified to only be in the Allele Count tables (by grepping them out of the diff); a new test has been added for the simple case of an AC=1 site in the eval becoming an AC=2 site in the comp. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4491 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 22:26:15 +00:00
hanna	83b8676b69	Hack to fix mysterious disappearing read attributes. Ultimately caused by the fact that the GATKSAMRecord, by design, needs to both inherit from SAMRecord and wrap a 'member' SAMRecord, and method calls that aren't implemented as explicit passthroughs can compromise the content of the SAMRecord in subtle ways. Will be automatically fixed when Picard moves to a lightweight SAMRecord interface rather than the current heavyweight implementation. But in the short-term, there's no obvious fix. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4489 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 19:06:54 +00:00
depristo	da29fcdb68	No longer writes the index to disk twice. But fixes for closing VCFWriters throughout the codebase git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4488 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 14:26:06 +00:00
aaron	28a1020c89	comment out debugging line that was clogging the performance test output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4487 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 03:26:55 +00:00
aaron	272ac2ae4a	more fixes for tests broken by indexing-on-the-fly; I think this should do it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4486 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 01:54:32 +00:00
hanna	ed39af53cd	Fix for exception when trying to load reference segment for a read that aligns to 0 bases. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4485 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 23:50:51 +00:00
ebanks	fe9f128631	Better fix for earlier bug. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4484 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 19:21:33 +00:00
aaron	ff0df1a2da	A fix for an integration test that was broken by on-the-fly indexing. Also, better reporting of Tribble exceptions in GATK integration tests. Trying to get the tests back up and running... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4483 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 18:39:56 +00:00
ebanks	69652e08c6	Bug fix for reads that completely fall within an insertion: the I cigar string element was 1 base too long. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4482 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 14:46:21 +00:00
kiran	f348ca2976	Now processes VCF files with repeated loci without crashing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4481 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 04:36:07 +00:00
ebanks	fd8351cd49	Get rid of useless test/'optimization' that was carried over from UGv1. New codde is (minimally) faster with same results. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4478 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-11 04:04:07 +00:00
ebanks	f28523e7de	Implemented SB for UGv2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4477 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-11 03:56:01 +00:00
hanna	7008a469dc	Update MalformedReadFilter to pass reads that have cigar strings like 40S36I that have 0 aligned bases in the genome. We'll have to fix walkers as faults appear. Also added JIRA GSA-406: finer-grained control of MalformedReadFilter: want to exception out by default in these cases but pass them with a warning with a corresponding -U flag. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4476 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-11 03:01:04 +00:00
ebanks	530875817f	Experimental code for better filtering of bases in sam records. Not hooked up yet. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4475 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-11 02:19:51 +00:00
ebanks	a0de269c4b	Better message git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4474 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-10 20:11:51 +00:00
rpoplin	0a4cf02a52	Fix for index out of bounds exception in VR. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4473 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-10 17:35:15 +00:00
depristo	116309b3c3	More test cases for UG integration test. We currently fail doing multi-threaded gzip output, FYI git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4472 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 20:22:12 +00:00
depristo	38a67fed63	High performance version of standard vcf writer. New general static Tribble class for common constants, including general .idx constant and functions to get standard index name for a given file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4471 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 19:53:21 +00:00
fromer	bdd3a9752e	Changed min MQ and BQ to 20 (for phasing) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4469 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 19:27:45 +00:00
asivache	05500d1a8d	An iterator wrapper/adapter: takes GenomeLoc iterators 1 and 2 and traverses intersections of intervals from 1 with intervals from 2. Both 1 and 2 must be SORTED and NON_OVERLAPPING, but this iterator does NOT perfrom any checks, so if these conditions are not met, the behavior is unspecified git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4468 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 16:34:00 +00:00
asivache	253d528e49	not ready for commit yet git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4467 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 15:30:55 +00:00
asivache	4f2f33b42a	fix method invocation to conform to new API; this version of the code will compile but new functionality is still not fully in git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4466 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 15:30:26 +00:00
asivache	cece19d4d2	not ready for commit yet git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4465 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 15:14:54 +00:00
asivache	39e373af6e	deleting accidentally committed junk git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4464 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 15:13:01 +00:00
asivache	b3d81984aa	renaming MergingIterator to RODMergingIterator as it is more appropriate for this specialized implementation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4462 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 14:10:11 +00:00
asivache	77dddd0afa	renaming MergingIterator to RODMergingIterator as it is more appropriate for this specialized implementation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4461 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 14:08:28 +00:00
chartl	21ec44339d	Somewhat major update. Changes: - ProduceBeagleInputWalker + Now takes a validation ROD and a prior to give it, will use those genotypes in place of the variant genotypes if both are present + Takes a bootstrap argument -- can use some given %age of the validation sites + Optionally takes a bootstrap output argument -- re-prints the validation VCF, filtering those sites used as part of the bootstrap -BeagleOutputToVCFWalker + Now filters sites where the genotypes have been reverted to hom ref + Now calls in to the new VCUtils to calculate AC/AN -Queue + New pipeline libraries for easy qscript creation, still a work in progress, but this is a considerable prototype + full calling pipeline v2 uses the above libraries + minor changes to some of my own scripts + no more need for contig interval lists, these will be parsed out of your normal interval list when it is provided git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4459 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 13:30:28 +00:00
ebanks	97b153f2fa	Quick fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4457 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 06:10:52 +00:00
ebanks	acd238f3f2	For Chris: pull out the chromosome counting code into VCUtils so that other tools can make use of it. Transitioned SelectVariants over to use it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4456 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 04:37:54 +00:00
delangel	3838823262	Two ugly hopefully temporary fixes for new genotyping model: a) In Indel genotyper: we can't deal yet with extended events correctly and we are still triggering at each extended event which results in repeated records on a vcf. So, to avoid this, keep track of start position of candidate variantes we've visited and if we've visited a variant before we don't do it again. b) Avoid infinite terms in QUAL and in genotype likelihoods which can happen if posterior AF happens to be exactly zero. For now, hard-code a minimum value of each term of the posterior AF likelihood to be -300 (ie 1e-300 in lin space). This can be solved with better and smarter log-to-lin conversions and some precision fixes in AF calculation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4455 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 00:53:16 +00:00
depristo	0a2e76e9dc	2nd step towards on the fly indexing. Also fixed parsing bug for headers with < symbols git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4454 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 21:38:46 +00:00
rpoplin	7bb9704592	Update the BeagleOutputToVCF integration test because of removing the source header line. Source headers are provided by the engine for all VCF files now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4453 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 19:55:57 +00:00
rpoplin	0de658534d	Removed the qScale arguments in VariantRecalibrator. It is smarter about how it tries to find a cut so the arbitrary scale factor hopefully is no longer necessary. Now the recalibrated variant quality score more accurately reflects our believed lod of the call. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4451 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 18:04:57 +00:00
fromer	ee00dcb79d	1. Phasing now ignores bases without minimum base quality (BQ) and minimum mapping quality (MQ); 2. The probability of a non-called base is now divided by 3, to evenly split up the error probability over the non-called bases git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4450 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 17:40:59 +00:00
ebanks	6205910f9f	updating integration test for Sarah Calvo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4449 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 04:03:37 +00:00
fromer	652a3e8de5	Added integration tests for ReadBackedPhasing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4446 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 20:50:32 +00:00
fromer	f8f1cc45a3	Now ReadBackedPhasing caps Base Quality by Mapping Quality git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4445 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 20:48:57 +00:00
scalvo	bda427f078	Change specification of AnnotationInputTable, and fix 2 bugs. Previous output spec contained 3 columns: haplotypeReference,haplotypeAlternate,haplotypeStrand where haplotypeReference was always on the + strand, and haplotypeAlternate was on the strand specified by haplotypeStrand. The new specification contains 3 columns: haplotypeReference,haplotypeAlternate,transcriptStrand where haplotypeRef and haplotypeAlt are required to be on the + strand. transcriptStrand now specifies the strand of the transcript, which is needed for interpreting the haplotypes. Bugfix #1: fix incorrect assignment of variantCodon and variantAA (Previously variantCodon was incorrectly set to referenceCodon) Bugfix #2: fix incorrect codingCoordStr values for - strands (bug reported by Giulio Genovese), and incorrect usage of "m." for mitochondrial transcripts (bug reported by Steve Hershman) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4444 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 20:46:09 +00:00
scalvo	b5c127e643	Removed HAPLOTYPE_STRAND_COLUMN; Previously, GenomicAnnotation allowed a user to specify the strand of the haplotypeAlternate, and would reverseComplement the haplotypeAlternate if HAPLOTYPE_STRAND_COLUMN was "-". The new specification does not allow this functionality, and instead requires both the reference and the alternate haplotypes to be on the + strand (as in VCF format). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4443 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 20:37:41 +00:00
kshakir	ca5db821ce	Added the ability to Queue to run scala functions inside the JVM. NOTE: Extend from InProcessFunction instead of CommandLineFunction to use this functionality. Queue now submits new LSF jobs only after previous functions have completed successfully. When the Queue process is shutdown (ex: via Control-C) sends a bkill command for any running jobs. Ported commands like creating directories and scatter/gather interval list to scala functions. Updates to LSF status tracking by porting the python to internally generated bash scripts. Temporarily disabled job name submission to LSF. Plus side is that the full command is now available in "bjobs -w". TODO: Put back jobName passing to LSF based on an option? Changed BaseTest to allow scala to access paths to references. Changed the extension generator to default the analysis name to the walker "name". git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4442 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 18:29:56 +00:00
ebanks	3c5dc675ab	For Guillermo: only decide that something is a clear reference call if it is at least 10 times as likely as the next best genotype git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4441 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 15:16:41 +00:00
depristo	00491fcd2e	Only see not writing GATK Run Report if you are running with debug enabled git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4437 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 14:09:21 +00:00
rpoplin	69485d6a7a	Added command line argument for the max value of the allele count prior in VariantRecalibrator (--max_ac_prior). Default value increased to 0.99 from 0.95. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4436 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 14:00:53 +00:00
ebanks	3d564f4a29	reverting an accidental change from the dindel merge git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4434 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 03:08:09 +00:00
ebanks	b5e148140b	Officially fixed the UG priors; updated the default min MQ/BQs to pipeline values of q20 and min calling threshold to Q50 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4431 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-05 18:35:36 +00:00
fromer	c6668bd49c	Fixed bug in phasing, where mapping probability was incorrectly raised to the power of number of non-null bases [instead, it is just multiplied into phasing probability once] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4430 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-05 17:07:31 +00:00
hanna	250c18e679	Error message fixes for the following issues: nvjpM4yOwQAu3fNGxi4oXLuVpKn6aAlf,1GL0OuXK2xKQfvbu34tWYgbojSVSLo0l, ehEGBJOfgc4V7qj8W0Homf5ICuVK5Sm3,cZsreLm1CbY3aYKZhV7DOSvQNwur41zp, GlrlyGEyP9kJDIRCQNFQp7BGJBXSzdDJ,hyz1uiHXr39ANmdZu9K1epOSX8EL3mDw, q0n4EucZESCI4LZhQik306zD4VAuH2cb. Messages: camrhG5tHzlY9WUSEVpVZGkU1tyJqKb5,s0OX2g7nYRctJxyFoQCa6clac9IsjHyi, THIAtjllvYNlnTmiMnJEIHd2Ju4gqQIO,jwVk3JYZJNHloW7HO4LeGxFexknqro0v, BFNRGOGmGGJNNPZqgeF1ikTNFfskbyLc,... Were fixed in 4392. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4428 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-05 03:37:13 +00:00
ebanks	aa00801108	remove reference to -mrl git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4423 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-04 17:27:01 +00:00
chartl	f978c25b9d	Perhaps both, Eric. Perhaps both. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4422 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-04 13:56:04 +00:00
chartl	0eb777612a	Swap "." over to VCFConstants.MISSING_DEPTH_v3 Why v3, you ask? Why not? Simply because v2 was a String so old and clunky, the sun would fizzle out and grow cold before any VCF could be successfully parsed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4421 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-04 13:41:41 +00:00
chartl	74087c44ae	Fixed a bug which caused a parsing exception when there was a variant with a dp field of ".", e.g. "GT:DP 0/1:." -- which can happen when using imputation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4420 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-04 12:37:36 +00:00
ebanks	6448753cf7	Removed the SequenomValidationConvertor and renamed it VariantValidationAssessor since it no longer handles ped/sequenom files (but instead works on vcfs/variantcontexts). Updated all of the wiki docs, including adding instructions on how to convert ped files to vcf, a la Shaun Purcell. We now officially no longer support ped files everyone. Other misc cleanup in the code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4419 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-04 02:11:38 +00:00
ebanks	d8db48204e	Fix typo and tell people not to post user errors git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4415 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-03 18:58:03 +00:00
ebanks	490e5e1b0f	Better error when bad ref bases are provided git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4414 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-03 05:40:37 +00:00
aaron	64b7b3f83b	fix for a recent change to the indexing code where we ignore the results of locking the file (this is bad), and as a result don't write the index; this should fix the build. Off to Yosemite in 4 hours, enjoy the week gsa folks! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4410 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-02 04:35:11 +00:00
depristo	7551ba8249	Trival refactoring in preparation for on-the-fly indexing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4409 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-01 22:32:59 +00:00
rpoplin	2f7892601c	Useful debugging argument added to VariantRecalibrator to only use sites whose qual field is above --qual git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4406 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-01 21:08:55 +00:00
hanna	575c38fc04	Accidental fail to commit missing file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4405 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-01 20:26:51 +00:00
delangel	d4398f2686	silly bug fix: if I'm to do a short term hack to avoid -infinity likelihoods I might as well do it right. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4403 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-01 18:39:45 +00:00
hanna	8d25a5f9f2	A mechanism for supplying attribution text -- mainly useful for external walkers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4402 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-01 18:31:19 +00:00
delangel	e920badcc4	Temporary fix for case where genotype likelihoods are exactly (1,0,0) or (0,1,0) etc. at a site with new indel genotyper: this would make us blow up when converting to log space and try to assign genotypes at a site. A more robust solution is in the works. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4401 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-01 17:43:43 +00:00
rpoplin	b83fdf8a17	Bug fix in AnalyzeAnnotations. Be sure the site is a biallelic, unfiltered SNP. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4400 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-01 13:09:46 +00:00
delangel	fa9c21c020	More fixes for exact AF calculation model in new unified genotyper: a) Fixed bugs in new dynamic programming-based genotyper b) Fixed up temp hack that handles extended pileups for now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4398 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-01 02:32:50 +00:00
delangel	eb67aee732	bug fix: forgot to uncomment code to compute genotype likelihoods git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4397 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 21:38:22 +00:00
delangel	ece694d0af	Next iteration on new UG framework: - Brought over exact AF estimation from branch (which is now dead). Exact model is default in UnifiedGenotyperV2. - Implemented completely new genotyping algorithm given best AF estimate using dynamic programming, which in theory should be better than both greedy search and any HWE-based genotyper. - Integrated and added new Dindel likelihood estimation model. - Corrected annotators that would call readBasePileup: since we can be annotating extended events, best way is to interrogate context for kind of pileup and either readBasePileup or readExtendedEventPileup. All changes above except last one are still in playground since they require more testing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4396 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 21:33:59 +00:00
hanna	4ea73bcfb1	Basic unit tests for WalkerManager. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4394 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 19:27:41 +00:00
hanna	bf7fd08810	Fix newly-introduced bug in the PluginManager/DynamicClassResolutionException where, when the system can't find a plugin of the correct name, the system prefers to crap all over itself and throw an unintelligible NullPointerException rather than displaying an intelligent error. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4393 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 19:07:05 +00:00
hanna	14e19f4605	(Slightly) better exception text when SAM/BAM output file can't be created. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4392 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 18:43:22 +00:00
hanna	1fb8c86f6d	Looks like we've got two competing models for an empty interval list: null and the empty list. Score another victory for the integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4391 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 17:11:47 +00:00
hanna	78343be52c	At some time in the recent past, we lost our ability to process the '-L all' argument. Brought it back, and added an integrationtest to make sure it stays around. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4390 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 15:58:43 +00:00
delangel	e80742e72f	Use -o as argument for output file in ProduceBeagleInputWalker, to be consistent with other walkers (you're welcome, chartl :)). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4386 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 22:46:39 +00:00
hanna	732aa32758	Every Sting app from now on will be forced into the US English locale. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4385 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 21:55:21 +00:00
fromer	20ffe484bc	Added detection and INFO field marking of phasing inconsistencies (and optional filtration using --filterInconsistentSites) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4384 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 19:28:56 +00:00
rpoplin	a6c7de95c8	By using the AC info field instead of parsing the genotypes we cut 78% off the runtime of VariantRecalibrator. There is a new argument to force the parsing of genotypes if necessary. Various other optimizations throughout. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4383 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 18:56:50 +00:00
ebanks	2d1265771f	Fix for G: make sure to generate the genotype conformations in the grid for the target frequency when not using grid search for anything except the conformations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4382 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 16:44:53 +00:00
delangel	4556e3b273	First iteration in filling up exact AF calculation with new refactored UG. Code computes EM iterations of exact AF spectrum and returns to caller. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4381 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 16:21:54 +00:00
ebanks	0d71dff928	Small bug fix to the new UG (need to initialize the entire posteriors array) means that we also get identical results as old UG when calling with 60 samples in the pilot1 data. Now that I'm happier with UGv2, I've transitioned it to use the correct AF priors instead of the busted ones still in the old UG. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4379 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 14:24:50 +00:00
hanna	eee134baf2	Chris found a bug in the downsampler where, if the number of reads entering the pileup at the next alignment start is large, we don't add as many of those incoming reads as we should. No integration tests were affected. Thanks, Chris! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4378 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 11:18:12 +00:00
ebanks	0ec07ad99a	Initial version of refactored Unified Genotyper. Using SNP genotype likelihoods and GRID_SEARCH AF estimation models, achieves the exact same results as original UG on 1-2 samples with the exception of strand bias (not implemented yet); other than that I have no idea. Needs tons more testing. Do not use. For Guillermo only. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4377 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 08:42:25 +00:00
kshakir	6df7f9318f	For enums generate the full path to the Enum type to avoid collisions such as enum Model and enum Model used in the same class. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4376 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 05:28:59 +00:00
fromer	e322e71c2f	Restored SVN history for phasing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4373 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 00:02:02 +00:00
fromer	720aaca8a0	Trying to restore SVN history for phasing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4372 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 23:50:28 +00:00
fromer	bf88117ead	Trying to restore SVN history for phasing directory git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4371 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 23:48:24 +00:00
fromer	dfb5143a41	Restore folder git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4370 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 23:46:07 +00:00
fromer	7c909bef82	Moved phasing classes out of playground! The code is still under production, though... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4369 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 23:21:28 +00:00
fromer	8d8980e8eb	Fixed phasing algorithm to: 1. More correctly weed out irrelevant reads and sites; 2. Crudely flag sites with large phase discrepancies betweens reads git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4368 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 23:02:53 +00:00
chartl	5a5c72c80d	Accidentally commited some debug output to PackageUtils, reverting change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4367 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 21:58:42 +00:00
chartl	862c94c8ce	Small change for Matt -- output partition types in lexicographic order. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4365 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 20:08:03 +00:00
ebanks	7ad87d328d	Make sure to uppercase ref bases since they aren't coming from the engine git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4364 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 19:05:46 +00:00
bthomas	96cccafb0d	Adding a few helper methods for accessing sample metadata, and associated unit tests. These are motivated by discussion with Ryan about how he'll use sample metadata in VariantEvalwalker - hopefully will make it easier for him. Methods are: -- getToolkit().subContextFromSampleProperty(): filters a VariantContext to genotypes that come from samples that have a given property value -- getToolkit().getSamplesWithProperty(): gets all samples with a given property -- getToolkit().getSamplesFromVariantContext(): sample objects that are referenced by name in a VariantContext git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4361 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 02:16:25 +00:00
ebanks	1034853a84	Adding 'solexa' to list of known/supported platforms git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4357 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-27 02:38:38 +00:00
aaron	70f03a7113	first pass of well-formatted tribble exceptions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4352 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-25 03:29:33 +00:00
kshakir	edaa278edd	Removed cases where various toolkit functions were accessing GenomeAnalysisEngine.instance. This will allow other programs like Queue to reuse the functionality. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4351 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-25 02:49:30 +00:00
hanna	497bcbcbb7	Recent changes to the build system make the build system complain loudly about pieces of core that depend on playground. Most of these have been eliminated by (temporarily) promoting Aaron's report system to core in this checkin. I'll follow up with other changes in separately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4350 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 22:09:12 +00:00
hanna	6ebca5d219	Enhancements to build external projects for walker sharing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4348 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 21:17:16 +00:00
corin	eb1fa4bff3	changes an argument to an output so I can use it to track dependencies in queue git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4347 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 21:07:09 +00:00
depristo	745b8cc6d3	GATK now detects and UserExceptions when human lexicographically sorted data is provided git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4343 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 15:19:48 +00:00
rpoplin	1931b2e1bd	Three fixes for VariantFiltrationWalker: Trying to filter an empty VCF file will produce a well-formed VCF file with zero records instead of a blank file, needed for pipelines. The first record's genotype info fields are now in the same order as all the others. The VCF header lines are pulled from just the input variant rod instead of from all rods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4341 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 13:52:56 +00:00
kshakir	4ed9f437e9	Sliced the GAE in half like a gordian knot to avoid the constant merge conflicts. The GAE half has all the walker specific code. The new "Abstract" GAE has the rest of the logic. More refactoring to come, with the end goal of having a tool that other java analysis programs (Queue, etc.) can use to read in genomic data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4339 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 23:28:55 +00:00
rpoplin	0c9fabb06f	Fix in AnalyzeAnnotations, somebody changed it look for ID in the vc's info field. This dinosaur desperately needs integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4338 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 19:48:44 +00:00
hanna	0c781968fb	Tried to do a bit of pre-commit refactoring and screwed it up. Fixed. Thanks to Ryan for identifying the problem. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4336 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 18:17:29 +00:00
depristo	d081b9b352	Improvements to error messages about @Requires and @Allows git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4334 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 12:08:27 +00:00
hanna	7841b301c4	Added more diagnostics so that I have some idea of what a 'general' exception is. Required to fix bug ZjhCJAdwhtFq1x54ZlmlN8pFNcbrRpdJ and similar. We might want to change this particular case to a ReviewedStingException after we gain a bit more experience with it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4333 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 21:32:01 +00:00
fromer	44ccfc3531	Updated Phasing algorithm + evaluation module to properly implement haplotypes [including homozygous genotypes]; Implemented dynamic window phasing model for LARGE increase in efficiency git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4332 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 21:29:58 +00:00
hanna	8f75d88519	Fix for GATK run report ids: mOVsxGfDiiSMxVs2PPTVjzYTVbizlD6e f9kUHUADFsZ0LiTGxRL5zPmq9kZcA4cQ 8eGHWJFAlBVmgxwPi3sMd1RmiN2PwHOf iLhvHWveypKb2F8vKS5irHylc3pYvlOb HDttXKUMEVoPrvVeWrH7E0htxYyNydMx plus a bit of cleanup of custom exceptions in the sharding system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4330 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 19:49:25 +00:00
kshakir	20b38b38f3	Updated from SnakeYAML 1.6 to 1.7. Added a pipeline java bean and YAML utility to serialize java beans. Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format. Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference. More changes to come as this code gets tested out in the fullCallingPipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 19:47:49 +00:00
hanna	fb5d595ef0	Disable VCF header output in the Beagle integrationtest. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4327 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 16:50:03 +00:00
hanna	0c99c97685	The engine now automatically adds the command-line arguments to the header of every VCF, unless -NO_HEADER is specified. Changed integration tests, adding the -NO_HEADER argument, for walkers that previously did not include the command-line arg headers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4326 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 15:27:58 +00:00
aaron	1af9ca6d45	enabling tests that now pass with the conitg length validation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4325 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 22:20:50 +00:00
depristo	522830fb01	Support for --assume-single-sample in UG, better malformated bam exceptions, and ignoring out of order contigs in seqdictutils. All for the CG bam file git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4323 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 20:33:34 +00:00
aaron	3938d53738	one broken build short of the hat trick. Fixing the unix test which expects the sequence dictionary of the Tribble track to equal the reference; we actually return the sequence dictionary of the track iself, with each contig set to the length of the sequence dictionary contig entry. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4322 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 18:47:20 +00:00

... 2 3 4 5 6 ...

3925 Commits (2bf4fc94f09c350b239fe4b67ddfe6ef34715d2b)