gatk-3.8

Commit Graph

Author	SHA1	Message	Date
ebanks	d89e17ec8c	Fare thee well, UGv1. Here come the days UGv2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4747 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-29 21:51:19 +00:00
ebanks	e3e6d176df	Looking over the daily error log email made me realize that there were 2 implementations of vc.modifyLocation() - the correct one in VC that didn't require lazy loading the genotype data and the bad one in VCUtils that did. Removing the implementation in VCUtils and updating the code accordingly. Also, removing createPotentiallyInvalidGenomeLoc() since no one uses it anymore. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4736 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-26 18:40:34 +00:00
ebanks	6934f83cc7	Two changes to CombineVariants. 1. Fix: VCs were padded before the merge, but they were never unpadded afterwards. This leaves us with a VC that doesn't meet our spec. 2. Update: instead of running the merged VC through every standard annotation (which seems really wrong, since this isn't the annotator tool), just update the chromosome count annotations (AC,AF,AN) through VCUtils. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4734 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-25 04:52:12 +00:00
rpoplin	ed08899abc	Overwhelming evidence that maxQ = 50 is now a better default than maxQ = 40 in the base quality score recalibrator, especially when combined with dbsnp build 132. Also, added option in ProduceBeagleInputWalker for Beagle-ing chromosome X calls with male samples which sets the genotype likelihood for the AB allele to zero for those samples. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4731 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-24 21:32:26 +00:00
hanna	082073ca3c	Stop RBP.getPileupBySample() from throwing a NullPointerException if the sample doesn't exist -- now returns null. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4719 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-23 05:17:06 +00:00
kshakir	787e5d85e9	Added the ability to test pipelines in dry or live mode via 'ant pipelinetest' and 'ant pipelinetest -Dpipeline.run=run'. Added an initial test for genotyping chr20 on ten 1000G bams. Since tribble needs logging support too, for now setting the logging level and appending the console logger to the root logger, not just to "org.broadinstitute.sting". Updated IntervalUtilsUnitTest to output to a temp directory and not the SVN controlled testdata directory. Added refseq tables and dbsnps to validation data in BaseTest. Now waiting up to two minutes for gather parts to propagate over NFS before attempting to merge the files. Setting scatter/gather directories relative to the -run directory instead of the current directory that queue is running. Fixed a bug where escaping test expressions didn't handle delimiters at the beginning or end of the String. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4717 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-22 22:59:42 +00:00
bthomas	374c0deba2	Updating the core LocusWalker tools to include the Sample infrastructure that I added last month. This commit touches a lot of files, but only significantly changes a few: LocusIteratorByState and ReadBackedPileup and associated classes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4711 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-19 19:59:05 +00:00
kshakir	79725f2d9c	Excluding the QFunction log files from the set of files to delete on completion. When a QGraph is empty displaying a warning instead of crashing with an JGraph internal assertion error. Cleaned up code using the Log4J root logger and explicitly talking to a logger for Sting. When integration tests are run detecting that the logger has already been setup so that messages aren't logged twice. Updated from Ivy 2.2.0-rc1 to 2.2.0. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4707 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 20:22:01 +00:00
depristo	721e8cb679	VariantsToTable now supports wildcard captures. -F PREFIX* now captures all fields that begin with PREFIX, output as a comma-separated list of unique values. Added integration test for VariantsToTable since I find it so useful. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4706 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 18:54:59 +00:00
hanna	90711d445c	Change the interface for RMDTrackBuilder, therefore always mandating the specification of a sequence dictionary and related info. This will hopefully eliminate the cases in which the refseq track depends a sequence dictionary / contig parser that hasn't been specified. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4700 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-17 19:00:17 +00:00
depristo	d86ab2becb	JEXL expressions now generate exceptions, not warnings. Tools should catch the runtime exception to handle correctly. Removed unncessary complexity from the JEXL contexts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4695 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-17 16:08:16 +00:00
kshakir	01b721ab61	Passing ReviewedStingExceptions through the HMS. Added a @Hidden experimental argument -validate to VariantEval that allows external JEXL assertions that must evaluate to true will throw an exception. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4692 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-16 21:50:42 +00:00
hanna	24ec35deaf	- Reintroduce test dependency so that the tests passing / failing is not dependent on the contents of the integrationtest directory. Will figure out how to better manage the integrationtest directory at some point in the future. - Up the max heap size for tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4691 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-16 19:55:20 +00:00
hanna	8ff4e4cb25	Cleanup testng listener configuration. - Add StingTextReporter, which provides a text dump of the errors to the console. Had to create our own reporter (inheriting from the standard TestNG TextReporter) to work around a configuration issue with the TextReporter. In an ideal world, I'd report this on the TestNG mailing list and help them resolve the issue, but this solution is relatively robust at the moment and life is too short. - Added back the failed test listener, which generates the testng-failed.xml file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4686 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-15 23:43:14 +00:00
depristo	ef2f6d90d2	VQSR now operates on LOD scores in the INFO field directly, and doesn't adjust the QUAL field. New format for tranches file uses LOD score. Old file format no longer supported. log10sumlog10() function, a very useful utility in MathUtils. No more ExtendedPileupElement! Robust math calculations in GMM so that no infinities are generated! HaplotypeScore refactored to enable use of filtered context. Not yet enabled... InferredContext getDouble and getInteger arguments now parse values from Strings if necessary git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4684 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-15 22:19:22 +00:00
hanna	5b83942cee	- Fix DepthOfCoverage so that, when it abuses the ROD system by instantiating a track in onTraversalDone, it also supplies the correct sequence dictionary and parser. - Changed RMDTrackBuilder to use SequenceDictionaryUtils.validateDictionaries for ref <-> ROD sequence dictionary validation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4683 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-15 20:34:04 +00:00
kshakir	2fd816ac5f	Updated ordering of integration tests. GVC > VR > AVC git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4669 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-14 06:33:28 +00:00
depristo	44d0cb6cde	New version of cutting routines for VQSR. Old code removed. Working unit tests. Best practice with testng integration test (everyone look at it). Walker test now allows you to not specify no. input files, if it can infer input counts from MD5s git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4664 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-13 16:19:56 +00:00
kshakir	62a106ca5a	Disabled VariantGaussianMixtureModelUnitTest git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4663 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-13 03:53:33 +00:00
kshakir	673fa841a4	Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader. Removed obsolete usages of PackageUtils with updated PluginManager. Ported Queue interval utilities written in scala over to Sting's java IntervalUtils. Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles. Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test). While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1". Upgraded to scala 2.8.1 and updated calls to deprecated functions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-12 20:14:28 +00:00
depristo	42acc968b1	Unit tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4660 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-12 20:09:39 +00:00
ebanks	b51762c279	When you commit code late at night you tend to make careless mistakes... like forgetting to update integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4658 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-12 14:41:10 +00:00
depristo	988da428ae	Bug fix for old style tranches file. ApplyVariantCuts moved over, and passes integration tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4657 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-12 14:38:26 +00:00
depristo	c5f8c4dd0d	VariantEval test for tranches file, plus cutting over VE to use the generic Tranches framework git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4656 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-12 13:52:40 +00:00
ebanks	69de3e51bf	Better precision for the calculated AF value. Now looks at the total number of samples to determine how much precision is necessary. Also, changing default min BQ used for calling in UGv2 to Q17. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4655 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-12 08:31:40 +00:00
depristo	ec83a4b765	Initial commit, without any tool changes, of a new infrastructure for determining tranches. This new version walker up from the lowest quality snps and determines Ti/Tv. This is marginally more stable than moving in the other direction when there are few novel variants (exomes). Can make a substantial difference in the size of the call set (10-20%). I'll hook it into the main system now. Includes an new class Tranche, isolated read/writing utilities that are now testing in TestVariantRecalibrator, which should be moved to UnitTest as soon as I can figure out how to do this on my mac. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4654 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-11 23:52:49 +00:00
hanna	8e36a07bea	Convert GenomeLocParser into an instance variable. This change is required for anything that needs to be simultaneously aware of multiple references, eg Queue's interval sharding code, liftover support, distributed GATK etc. GenomeLocParser instances must now be used to create/parse GenomeLocs. GenomeLocParser instances are available in walkers by calling either -getToolkit().getGenomeLocParser() or -refContext.getGenomeLocParser() This is an intermediate change; GenomeLocParser will eventually be merged with the reference, but we're not clear exactly how to do that yet. This will become clearer when contig aliasing is implemented. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4642 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-10 17:59:50 +00:00
depristo	5ef4b234d8	Updates for broken integration tests. Counting annotations (AC, AF) now work correctly for AC = 0 sites git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4640 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-09 19:43:43 +00:00
chartl	42e9987e69	Bug fix to GenotypeConcordance. AC metrics get instantiated based on number of eval samples; if Comp has more samples, we can see AC indeces outside the bounds of the array. Bug fix to LiftoverVariants - no barfing at reference sites. AlleleFrequencyComparison - local changes added to make sure parsing works properly Added HammingDistance annotation. Mostly useless. But only mostly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4622 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-03 19:23:03 +00:00
hanna	8f9bf82aa7	Bamboo is correctly interpreting test fails. Reverting forced-fail test code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4617 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-02 19:32:34 +00:00
hanna	1df166b76e	Forcing a unit test fail to ensure that Bamboo is picking up on failed tests as well as successes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4616 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-02 19:03:12 +00:00
hanna	861ee3e37a	Changing testing framework from junit -> testng, for its enhanced configurability. Initial test to see how Bamboo will respond. More detailed email to follow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4609 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-01 21:31:44 +00:00
asivache	aadd230636	N-Way-Out is back. Now uses SAMReadID to identify each read's source bam, so should be reliable. Interface is sort of ugly fo now: to generate output file names, .bam is stripped from input file names, then the value of -nWayOut argument is pasted on (and all the output files are written into the current dir). Unrelated change: in the sorted-target mode (when we read sorted target intervals one by on from a file), one can now specify multiple semicolon-separated interval files (all must be sorted). Not hugely useful probably, but makes --targetIntervals always process its values in exactly the same way, so we are consistent (it has been already taking ;-separated args in unsorted mode) NwayIntervalMergingIterator: reads in multiple sorted GenomeLoc input streams (iterators) and presents them as a single sorted and merged stream git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4602 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-01 16:06:51 +00:00
ebanks	1c056ea791	Users can now use VariantAnnotator to add annotations from one VCF to another. For example, if you want to annotate your target VCF with the AC field value from the rod bound to CEU1kg, you can specify -E CEU1kg.AC and records will be annotated with CEU1kg.AC=N when a record exists in that rod at the given position. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4598 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-29 16:38:31 +00:00
hanna	2f8057bf24	Cleanup for multithreading memory leak during integration tests...unregister MXBean at end of traversal to avoid holding a reference to the microscheduler, which holds a reference to the engine, which in turn holds a reference to the walker, which itself holds a reference to all the data aggregated during the course of the traversal. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4594 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-28 18:37:42 +00:00
hanna	4c23b1fe9c	Get rid of the static cache of ArgumentTypeDescriptors by making them an integral part of the parsing engine. Hugely lowers our memory footprint in integrationtests, but not yet enough to run Mark's new parallelized VariantEvalIntegrationTests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4585 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 19:44:55 +00:00
hanna	04e38929f0	Disabling parallelized version of VE integration tests. Still slow, but not deadlocking any more. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4580 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 02:47:03 +00:00
fromer	a7af1a164b	Updated MNP merging to merge VC records if any sample has a haplotype of ALT-ALT, since this could possibly change annotations. Note that, besides the "interesting" case of an ALT-ALT MNP in a pair of HET sites, this could even occur if two records are hom-var (irrespective of using phasing). Note also that this procedure may generate more than one ALT allele. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4577 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 01:50:36 +00:00
depristo	b085648141	Parallelized VariantEval. Refactored output to support parallel output style. Minor improvements to testing framework to enable easy executeTestParallel to run -nt 1 and -nt 4 by default. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4574 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-26 20:21:38 +00:00
fromer	c357ec775a	Trivially phases any hom site (since it is always correct to continue the previous haplotypes by appending the same allele onto both haplotypes) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4568 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-25 16:58:41 +00:00
fromer	9ba7269728	Fixed Integration Tests to output VCF files with -NO_HEADER git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4548 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 19:49:44 +00:00
kshakir	b88cfd2939	Updated MD5s of VCFs, since the approximate command line arguments injected into the VCF headers now have a little more order to them thanks to changes in the ParsingEngine. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4538 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 03:07:40 +00:00
fromer	f76865abbc	ReadBackedPhasing now uses a SortedVCFWriter to simplify, and has the ability to merge phased SNPs into MNPs on the fly [turned off by default]; MergeSegregatingPolymorphismsWalker can also do this as a post-processing step; Integration tests for MergeSegregatingPolymorphismsWalker were also added git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4534 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 20:27:10 +00:00
ebanks	7a291a8ff3	First pass at a VCF validator. Will test more tonight. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4524 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-19 19:55:49 +00:00
chartl	2bc5971ca1	Added - a tool to fix reference bases of a VCF. The OMNI had a couple of sites with incorrect reference bases (look to be legacy from other chips), and a few more that had ref and alt flipped. GAP should probably take care of it, but since I need results by monday, I'm doing it. Modified - SelectVariants: Hook up to VariantContextUtils to recalculate AC/AF/AN, which uses the accessor in VariantContext to do this. Somehow sites that were selected down to hom-ref genotypes only wound up getting positive AC. IMPORTANT I kind of need input here. The header of a file used for an integration test specifies AC as being an integer. Recalculating it casts it into an integer list (which it should be, as it allows for alternate alleles). However this appears to clash with what the jexl expression is looking for? For now, the integration test itself needed to be changed -- it's unclear what to do when the header specifies AC of being one class, but recalculating it casts to another class, and I'm not sure what to do. I'm committing my omni_qc pipeline because I'm almost certain 2 months down the road I'm going to wonder what the heck I did to generate my results. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4511 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-17 03:18:01 +00:00
ebanks	7aa030a9a4	Hmm. Apparently variants can get lifted over to different chromosomes. Who knew? Reverting changes from a couple of days ago. The only way to do this correctly (without requiring lots of memory) is to turn off on-the-fly indexing for this walker. Integration tests cover this now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4510 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-17 02:54:12 +00:00
ebanks	954dd84f51	Adding an integration test (against hg18 this time) that requires on-the-fly sorting in order to work properly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4500 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 07:45:21 +00:00
ebanks	9f54170dff	Hooking up the liftover tool to the new on-the-fly sorting VCF writer so that records can now get emitted in order. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4499 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 07:27:01 +00:00
chartl	7c9ef59d65	This is simultaneously a minor and major change to VariantEval, so take heed: The core walker has been modified so that when variant contexts (eval and comp) are subset to command-line-specified sample(s), the chromosome count annotations (AC/AN/AF) are altered to reflect the AC/AN/AF of only those samples involved in the comparison. No more getting AC500 when you're comparing a 10-sample overlap. Interestingly enough, this didn't break any integration tests. GenotypeConcordance now has two additional tables: Allele Count Statistics, and Allele Count Summary Statistics. These work exactly identically to the Sample Statistics and Sample Summary Statistics tables, except that the partition being used is no longer the sample, but instead the allele count of the variant sites. These tables stratify by both eval and comp ACs, e.g. evalAC0 evalAC1 evalAC2 compAC0 compAC1 compAC2 Differences with previous integration tests were verified to only be in the Allele Count tables (by grepping them out of the diff); a new test has been added for the simple case of an AC=1 site in the eval becoming an AC=2 site in the comp. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4491 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 22:26:15 +00:00
hanna	83b8676b69	Hack to fix mysterious disappearing read attributes. Ultimately caused by the fact that the GATKSAMRecord, by design, needs to both inherit from SAMRecord and wrap a 'member' SAMRecord, and method calls that aren't implemented as explicit passthroughs can compromise the content of the SAMRecord in subtle ways. Will be automatically fixed when Picard moves to a lightweight SAMRecord interface rather than the current heavyweight implementation. But in the short-term, there's no obvious fix. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4489 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 19:06:54 +00:00
aaron	272ac2ae4a	more fixes for tests broken by indexing-on-the-fly; I think this should do it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4486 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 01:54:32 +00:00
aaron	ff0df1a2da	A fix for an integration test that was broken by on-the-fly indexing. Also, better reporting of Tribble exceptions in GATK integration tests. Trying to get the tests back up and running... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4483 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 18:39:56 +00:00
kiran	f348ca2976	Now processes VCF files with repeated loci without crashing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4481 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 04:36:07 +00:00
depristo	116309b3c3	More test cases for UG integration test. We currently fail doing multi-threaded gzip output, FYI git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4472 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 20:22:12 +00:00
depristo	38a67fed63	High performance version of standard vcf writer. New general static Tribble class for common constants, including general .idx constant and functions to get standard index name for a given file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4471 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 19:53:21 +00:00
fromer	bdd3a9752e	Changed min MQ and BQ to 20 (for phasing) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4469 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 19:27:45 +00:00
chartl	21ec44339d	Somewhat major update. Changes: - ProduceBeagleInputWalker + Now takes a validation ROD and a prior to give it, will use those genotypes in place of the variant genotypes if both are present + Takes a bootstrap argument -- can use some given %age of the validation sites + Optionally takes a bootstrap output argument -- re-prints the validation VCF, filtering those sites used as part of the bootstrap -BeagleOutputToVCFWalker + Now filters sites where the genotypes have been reverted to hom ref + Now calls in to the new VCUtils to calculate AC/AN -Queue + New pipeline libraries for easy qscript creation, still a work in progress, but this is a considerable prototype + full calling pipeline v2 uses the above libraries + minor changes to some of my own scripts + no more need for contig interval lists, these will be parsed out of your normal interval list when it is provided git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4459 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 13:30:28 +00:00
depristo	0a2e76e9dc	2nd step towards on the fly indexing. Also fixed parsing bug for headers with < symbols git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4454 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 21:38:46 +00:00
rpoplin	7bb9704592	Update the BeagleOutputToVCF integration test because of removing the source header line. Source headers are provided by the engine for all VCF files now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4453 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 19:55:57 +00:00
rpoplin	0de658534d	Removed the qScale arguments in VariantRecalibrator. It is smarter about how it tries to find a cut so the arbitrary scale factor hopefully is no longer necessary. Now the recalibrated variant quality score more accurately reflects our believed lod of the call. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4451 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 18:04:57 +00:00
fromer	ee00dcb79d	1. Phasing now ignores bases without minimum base quality (BQ) and minimum mapping quality (MQ); 2. The probability of a non-called base is now divided by 3, to evenly split up the error probability over the non-called bases git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4450 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 17:40:59 +00:00
ebanks	6205910f9f	updating integration test for Sarah Calvo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4449 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 04:03:37 +00:00
fromer	652a3e8de5	Added integration tests for ReadBackedPhasing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4446 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 20:50:32 +00:00
kshakir	ca5db821ce	Added the ability to Queue to run scala functions inside the JVM. NOTE: Extend from InProcessFunction instead of CommandLineFunction to use this functionality. Queue now submits new LSF jobs only after previous functions have completed successfully. When the Queue process is shutdown (ex: via Control-C) sends a bkill command for any running jobs. Ported commands like creating directories and scatter/gather interval list to scala functions. Updates to LSF status tracking by porting the python to internally generated bash scripts. Temporarily disabled job name submission to LSF. Plus side is that the full command is now available in "bjobs -w". TODO: Put back jobName passing to LSF based on an option? Changed BaseTest to allow scala to access paths to references. Changed the extension generator to default the analysis name to the walker "name". git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4442 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 18:29:56 +00:00
rpoplin	69485d6a7a	Added command line argument for the max value of the allele count prior in VariantRecalibrator (--max_ac_prior). Default value increased to 0.99 from 0.95. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4436 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 14:00:53 +00:00
ebanks	b5e148140b	Officially fixed the UG priors; updated the default min MQ/BQs to pipeline values of q20 and min calling threshold to Q50 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4431 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-05 18:35:36 +00:00
ebanks	6448753cf7	Removed the SequenomValidationConvertor and renamed it VariantValidationAssessor since it no longer handles ped/sequenom files (but instead works on vcfs/variantcontexts). Updated all of the wiki docs, including adding instructions on how to convert ped files to vcf, a la Shaun Purcell. We now officially no longer support ped files everyone. Other misc cleanup in the code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4419 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-04 02:11:38 +00:00
hanna	4ea73bcfb1	Basic unit tests for WalkerManager. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4394 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 19:27:41 +00:00
hanna	78343be52c	At some time in the recent past, we lost our ability to process the '-L all' argument. Brought it back, and added an integrationtest to make sure it stays around. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4390 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 15:58:43 +00:00
delangel	e80742e72f	Use -o as argument for output file in ProduceBeagleInputWalker, to be consistent with other walkers (you're welcome, chartl :)). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4386 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 22:46:39 +00:00
rpoplin	a6c7de95c8	By using the AC info field instead of parsing the genotypes we cut 78% off the runtime of VariantRecalibrator. There is a new argument to force the parsing of genotypes if necessary. Various other optimizations throughout. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4383 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 18:56:50 +00:00
chartl	862c94c8ce	Small change for Matt -- output partition types in lexicographic order. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4365 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 20:08:03 +00:00
bthomas	96cccafb0d	Adding a few helper methods for accessing sample metadata, and associated unit tests. These are motivated by discussion with Ryan about how he'll use sample metadata in VariantEvalwalker - hopefully will make it easier for him. Methods are: -- getToolkit().subContextFromSampleProperty(): filters a VariantContext to genotypes that come from samples that have a given property value -- getToolkit().getSamplesWithProperty(): gets all samples with a given property -- getToolkit().getSamplesFromVariantContext(): sample objects that are referenced by name in a VariantContext git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4361 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 02:16:25 +00:00
kshakir	edaa278edd	Removed cases where various toolkit functions were accessing GenomeAnalysisEngine.instance. This will allow other programs like Queue to reuse the functionality. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4351 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-25 02:49:30 +00:00
hanna	497bcbcbb7	Recent changes to the build system make the build system complain loudly about pieces of core that depend on playground. Most of these have been eliminated by (temporarily) promoting Aaron's report system to core in this checkin. I'll follow up with other changes in separately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4350 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 22:09:12 +00:00
depristo	745b8cc6d3	GATK now detects and UserExceptions when human lexicographically sorted data is provided git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4343 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 15:19:48 +00:00
rpoplin	1931b2e1bd	Three fixes for VariantFiltrationWalker: Trying to filter an empty VCF file will produce a well-formed VCF file with zero records instead of a blank file, needed for pipelines. The first record's genotype info fields are now in the same order as all the others. The VCF header lines are pulled from just the input variant rod instead of from all rods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4341 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 13:52:56 +00:00
kshakir	4ed9f437e9	Sliced the GAE in half like a gordian knot to avoid the constant merge conflicts. The GAE half has all the walker specific code. The new "Abstract" GAE has the rest of the logic. More refactoring to come, with the end goal of having a tool that other java analysis programs (Queue, etc.) can use to read in genomic data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4339 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 23:28:55 +00:00
hanna	8f75d88519	Fix for GATK run report ids: mOVsxGfDiiSMxVs2PPTVjzYTVbizlD6e f9kUHUADFsZ0LiTGxRL5zPmq9kZcA4cQ 8eGHWJFAlBVmgxwPi3sMd1RmiN2PwHOf iLhvHWveypKb2F8vKS5irHylc3pYvlOb HDttXKUMEVoPrvVeWrH7E0htxYyNydMx plus a bit of cleanup of custom exceptions in the sharding system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4330 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 19:49:25 +00:00
kshakir	20b38b38f3	Updated from SnakeYAML 1.6 to 1.7. Added a pipeline java bean and YAML utility to serialize java beans. Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format. Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference. More changes to come as this code gets tested out in the fullCallingPipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 19:47:49 +00:00
hanna	fb5d595ef0	Disable VCF header output in the Beagle integrationtest. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4327 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 16:50:03 +00:00
hanna	0c99c97685	The engine now automatically adds the command-line arguments to the header of every VCF, unless -NO_HEADER is specified. Changed integration tests, adding the -NO_HEADER argument, for walkers that previously did not include the command-line arg headers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4326 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 15:27:58 +00:00
aaron	1af9ca6d45	enabling tests that now pass with the conitg length validation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4325 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 22:20:50 +00:00
aaron	3938d53738	one broken build short of the hat trick. Fixing the unix test which expects the sequence dictionary of the Tribble track to equal the reference; we actually return the sequence dictionary of the track iself, with each contig set to the length of the sequence dictionary contig entry. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4322 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 18:47:20 +00:00
aaron	b968af5db5	The tribble indexes are now updated with correct sequence lengths for each contig they have in their sequence dictionary. Also clean-up in the RMD track builder. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4321 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 18:21:22 +00:00
aaron	2586f0a1ca	fix for the build I broke - the original file got corrupted, which I replaced with a version that didn't have the header stripped off. Other integration tests passed, but this test relied on the header being stripped off. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4320 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 15:35:25 +00:00
rpoplin	547763b230	Better error message for Petr's null pointer exception. Also added an exception integration test because I'm certain this used to work. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4319 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 13:44:40 +00:00
ebanks	f5a30d0248	I just spoke to Andrey & Kiran (the original authors of these tools), and they voted to kill these in favor of Picard git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4313 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-20 13:27:35 +00:00
rpoplin	7e58d8ed61	CombineVariants now outputs the command line in the VCF header. Added a new hidden argument to VR walkers called --NoByHapMapValidationStatus to turn off the by-hapmap dbsnp rod behavior. Very useful for experimenting with which sets to use as training data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4307 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-18 16:06:50 +00:00
bthomas	c6c6d32b46	Quickly adding a new convenience method for retreiving a group of samples. The method is getSamples(Collection<String>) and returns a set of sample objects. There's also a test there. Ryan is using this to modify VCF code today... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4303 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-17 15:55:17 +00:00
ebanks	a10b2a00a5	Moving the util VariantContext 'modifying' routines into VC itself (as opposed to VCUtils) so that we can pass the genotype data directly into it and are no longer forced to decode the genotypes for no reason. This means that any walker that takes in a VCF and modifies the records without touching the genotypes never have to decode them. I've hooked this into the other two Variant Recalibrator walkers for Ryan. One side effect, though, is that we no longer can sort the sample names in the VCF (i.e. if the input VCF doesn't have samples in alphabetical order, then we used to sort them when writing a new VCF but no longer do that), because if we don't decode then we can't re-order the genotypes. I don't think this is a big concern given that the Unified Genotyper does emit sorted samples and that's the main source for most of the VCFs we use. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4300 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-17 07:09:58 +00:00
bthomas	f66ef4626e	Fixing two minor issues: 1) adding a new error message if the user adds a fasta file in a directory that doesn't exist; 2) renaming my sample unit tests so they actually run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4299 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 20:45:51 +00:00
rpoplin	3a400e3dc0	Added CountCovariates integration test to ensure that it throws an exception if a variant mask isn't provided. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4298 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 19:18:38 +00:00
aaron	de56568ce4	Adding the appropriate DbSNP file to the performance tests so they don't exception out. The exception: "org.broadinstitute.sting.utils.exceptions.UserException$CommandLineException: Invalid command line: This calculation is critically dependent on being able to skip over known variant sites. Please provide a dbSNP ROD or a VCF file containing known sites of genetic variation." git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4293 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 16:30:54 +00:00
aaron	782e0018e4	removal of most of the old GATK ROD system; also a fix for -Dsingle so we can again run just a single unit or integration test (single tests in tribble can be run with the -DsingleTest option now). More to come. * Three integration tests had to change: * RecalibarationWalkersIntegrationTest: One of the tests was using the interval as the snp track, and wasn't supplying a DbSNP track (for CountCovariates) SequenomValidationConverterIntegrationTest: relies on Plink ROD which we've removed. PileupWalkerIntegrationTest: we no longer have implicit interval tracks, so there isn't a rod name over the specified region. Otherwise the same result. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4292 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-15 22:54:49 +00:00
rpoplin	0a06fbdb94	Adding header lines to output of VR walkers to settle validator warnings. Command lines are added to the VCF header. GATK version numbers will be added to the header lines by Matt. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4288 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-15 16:45:03 +00:00
depristo	41fa323e63	Added iterator for tribble, fixing GS bug report. Removed unnecessary tabix double wrapping. Intergation tests to ensure the BTI works with both vcfs and vcf.gz git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4287 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-15 16:38:04 +00:00
bthomas	e5f81d25d4	Adding the --sample-metadata (-SM) command line argument and associated functionality. This is something Matt and I have been working on for a while. Basically, it allows you to integrate sample metadata into an analysis, by including a sample file. More detailed documentation is on the wiki: http://www.broadinstitute.org/gsa/wiki/index.php/Adding_Sample_data_to_an_analysis This commit adds two important classes: Sample, which contains data about one sample; and SampleDataSource, which manages sample data a la ReferenceDataSource and ReadsDataSource. This code should be stable, but it has not been integrated with existing walkers yet. That's the next commit. In the meantime, feel free to experiment with the code - there are two basic example walkers in the playground.sample package. And PLEASE let me know if you see any errors/inconsistencies. Note that this also adds a new dependency on SnakeYaml, a YAML parser. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4285 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-15 11:50:22 +00:00
ebanks	1901e3208e	Oops, ran integration tests before Guillermo committed his change to the Beagle code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4281 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-15 01:41:02 +00:00
ebanks	4e83ba411f	We now do lazy loading for the genotype data in VCF. Practically, almost all walkers end of loading the genotype data because we need to be smarter about transfering the unparsed genotype string when modifying VariantContexts; however, this does solve the problem for VR's piece to generate clusters (shaved off 75% of runtime for Ryan's large case). That further optimization will happen later. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4279 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-15 00:18:17 +00:00
delangel	2be5e862f1	forgot to commit change to MD5 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4277 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-14 19:28:03 +00:00
hanna	7fa6b2135b	Added a back door so that integration tests can reset the sequence dictionary in the reference. Reset routine is not accessible to any class outside GenomeLocParser's package. We'll have to do something more intelligent with this when the GATK goes distributed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4275 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-14 18:58:08 +00:00
depristo	fa3be2209f	Improvements to the error display code to print out the SVN number in all messages. Fixes to CallableLoci and tests to check for that case git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4270 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-13 18:36:45 +00:00
depristo	4d0ff336c2	Missed update input git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4269 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 15:46:13 +00:00
depristo	7880863eb7	Final step in error refactoring. GATK exception is now ReviewedStingException, indicating that this exception is really what one wants. Only use this exception when you have thought about StingException vs. UserException and made a real decision. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4267 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 15:07:38 +00:00
depristo	7ad8fbdd5a	Moved GATKException to exceptions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4266 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:47:19 +00:00
depristo	1876c9856a	Moved stingexception git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4265 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:39:22 +00:00
depristo	595907e98e	Moving StingException git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4262 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:34:15 +00:00
depristo	40e6179911	Penultimate step in exception system overhaul. UserError is now UserException. This class should be used for all communication with the USER for problems with their inputs. Engine now validates sequence dictionaries for compatibility, detecting not only lack of overlap but now inconsistent headers (b36 ref with v37 BAM, for example) as well as ref / bam order inconsistency. New -U option to allow users to tolerate dangerous seq dict issues. WalkerTest system now supports testing for exceptions (see email and wiki for docs). Tests for vcf and bam vs. ref incompatibility. Waiting on Tribble seq dict improvements to detect b36 VCF with b37 ref (currently cannot tell this is wrong. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4258 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:02:43 +00:00
ebanks	a0231f073f	Damnit. Enabling the Picard code to recalculate all of the relevant SAMRecord attribute tags means that I need to have reference bases over all read bases even after realignment (and there are some big indels in dbsnp). Fortunately, I have my trusty IndexedFastaSequenceFile reader handy! Re-enabling the previously broken performance test. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4255 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 05:06:37 +00:00
rpoplin	7b113a4886	Truncate the floating point numbers coming out of the variant recalibration walkers. Integration tests now work with both 1.6.0_16-b01 and 1.6.0_21-b06 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4253 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-10 18:37:49 +00:00
aaron	cf33614ddc	remove the test that's failing the performance tests, please don't release until this is figured out git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4251 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-10 06:30:40 +00:00
aaron	4adb07683d	all fixed..thanks Matt! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4250 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-10 06:18:59 +00:00
aaron	bd4bc84abd	comment out the broken aligner test again - I'll take a crack at fixing it tomorrow. Each software engineer is going to take a pass at fixing it, and we'll see who can do it with the most style. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4249 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-10 05:22:24 +00:00
rpoplin	61e848c4f0	It's clear from Sendu's calling and my own calling that -qScale 100.0 is a much better default value for low pass data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4248 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-10 01:47:21 +00:00
hanna	e183b6598c	- Fix our private repository of bwa reference support files. - Update the test to point to our repository. - Update the md5 to reflect new Picard tag ordering. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4247 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-10 00:29:26 +00:00
kshakir	4183e8805a	Fixed reference (via busted symlink) /broad/1KG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4245 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-09 20:34:51 +00:00
rpoplin	aeb897db7f	VR walkers look at by-hapmap validation status by default. Eric will be updating the syntax to allow for more flexibility here. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4242 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-09 15:40:56 +00:00
kshakir	d7f55574e2	Re-enabling aligner integration test now that we're back to having more than 1 or 2GB memory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4241 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-09 15:09:48 +00:00
rpoplin	d625186796	I think the VR integration tests are fine. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4240 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-09 15:00:41 +00:00
depristo	6a30617a60	Initial implementation of UserError exceptions and error message overhaul. UserErrors and their subclasses UserError.MalFormedBam for example should be used when the GATK detects errors on part of the user. The output for errors is now much clearer and hopefully will reduce GS posts. Please start using UserError and its subclasses in your code. I've replace some, but not all, of the StingExceptions in the GATK with UserError where appropriate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4239 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-09 11:32:20 +00:00
ebanks	65edbced36	Addition for Tim: recalculate the NM and UQ tags after realignment. Also, don't fix the insert size calculation, since that's done by fix mate information. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4227 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-08 04:02:14 +00:00
rpoplin	e3962c0d13	VR integration tests are longer but much more useful. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4210 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-03 15:50:19 +00:00
ebanks	b59d62927e	Fix busted performance test (-outputBam has been deprecated in the BQ recalibrator in favor of -o) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4201 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-03 12:51:53 +00:00
hanna	70bb480939	The battle is over. Picard is revved. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4200 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-03 05:28:01 +00:00
rpoplin	0bb05fb472	Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4194 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-02 21:12:09 +00:00
hanna	dc5f858d29	Replaced placeholder support for splitting by read group with read support (sorry everyone), and added relatively comprehensive unit tests to ensure that splitting by read group works. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4190 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-01 22:24:50 +00:00
rpoplin	b28f63a948	Base recalibrator now uses -o and deprecates -outputBam git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4189 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-01 22:13:50 +00:00
kshakir	33400074fa	Updated tribble BED parsing code to use the official UCSC spec, and updated tests to match expected results. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4188 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-01 21:49:06 +00:00
rpoplin	469bbaa240	Added more integration tests for the variant quality score recalibrator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4181 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-01 15:31:24 +00:00
depristo	8c4009ee18	Oops, don't enable reporting in integration tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4179 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-31 22:56:18 +00:00
ebanks	3d6c4fc55f	Removing the obsolete --hapmap and --hapmap_chip options git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4172 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-31 16:57:05 +00:00
depristo	b33873206a	GATKRunReport now has an ID (random 32 char string) that uniquely identifies the JOB run and can be used to find a run in the run repository git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4171 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-31 16:18:57 +00:00
depristo	3fd2392090	Improved interface to getting command line options. Now fully traverses all objects to get all internal argument collections. Preliminary (but disabled version) of phoning home (see -et argument for more information). Captures correct and erroring out runs and writes out gzipped, xml report with lots of useful information. Needs a bit more information but is approximately working. Reports going to /humgen/gsa-hpprojects/GATK/reports/ in submitted directory that will be collated by some external tool. Only operating if -et STANDARD or -et STDOUT are provided currently and REPORT_DIR contains a file called ENABLE. WalkerTest now adds -et NO_ET to tests to avoid populating the reports with tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4155 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-28 22:53:32 +00:00
rpoplin	9c3f403307	Add the calculated lod value to the info field of each recalibrated VCF record. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4153 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 21:33:58 +00:00
ebanks	bfcac33e80	Cleaning up playground utils and tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 01:25:47 +00:00
hanna	d773b3264b	Eliminated -mrl option. Eliminated -fmq0 option. Eliminated read group hallucination. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4133 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-26 21:38:03 +00:00
ebanks	dfae48cee0	Moving supported tools to core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4127 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-26 13:56:19 +00:00
ebanks	45d895dcf4	Remove the check in the Unified Genotyper for hitting the max reads at locus value. Instead, simply add a flag to the INFO field if any of the samples has been downsampled. 95% hooked up. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4126 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-26 05:50:47 +00:00
ebanks	79cd716671	More cleanup of the Genomic Annotator. Also, we now require join tables to have unique entries for the column keyed on the join. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4124 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-26 04:43:52 +00:00
ebanks	dd7f136298	Office-mate courtesy: fixing Andrey's busted integration test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4123 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-26 02:00:06 +00:00
ebanks	4678613893	Significant fixes for the Genomic Annotator. 1. Rip out all of Ben's code intended to circumvent the stable VCF Writer output system in multi-threaded mode (I threw up a little when I saw this code). This will improve memory consumption when running with -nt. 2. Don't annotate indels or > bi-allelic sites. 3. Fix bug where not all records were making it into the output VCF. 4. General code clean up. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4118 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 20:16:50 +00:00
rpoplin	5623e01602	GenerateVariantClusters and VariantRecalibrator now uses hapmap and 1kg ROD bindings (in addition to dbsnp) to distinguish between knowns and novels. It no longer looks at by-hapmap validation status so providing hapmap is highly recommended. Example on the wiki. Input variants tracks now must start with input. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4113 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 18:33:40 +00:00
hanna	bf0b6bd486	Update integration tests to use the new ROD syntax. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4112 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 18:13:30 +00:00
hanna	3dc78855fd	Command-line argument tagging is in, and the ROD system is hacked slightly to support the new syntax (-B:name,type file) as well as the old syntax. Also, a bonus feature: BAMs can now be tagged at the command-line, which should allow us to get rid of some of the hackier calls in GenomeAnalysisEngine. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4105 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 03:47:57 +00:00
rpoplin	85007ffa87	Some clean up for the variant recalibrator. Now uses @Input and @Output so that it can join the Queue party. Users now specify a -o, -clusterFile, -tranchesFile, and -reportDatFile. Example on the wiki. ApplyVariantCuts now has an integration test. Base quality recalibrator now requires a dbsnp rod or vcf file. Now that the base quality recalibrator is using @Output the PrintStream shouldn't be closed in OnTraversalDone. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4101 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-24 20:14:58 +00:00
ebanks	90aef66ec5	Minor fixes for my last commit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 23:25:29 +00:00
ebanks	ccda4f6ec1	More output consistency changes (updating wiki docs as I go along). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 18:46:08 +00:00
ebanks	c9c6ff49c2	Deprecated 'O' in favor of 'o' in the cleaner git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4085 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 18:09:24 +00:00
aaron	2d3b6d89dc	adding the ability in Tribble to create indexes from a stream of features, so that we can create multiple indexes from one pass of the file. In the GATK we now create multiple indexes, and choose the most appropriate based on feature density, and the longest feature in the file. Also: - Converted Tribble to TestNG; it has better features and is about 6x faster. - As much code clean-up as I could get done. More to do, especially in the example code. - Moved asserts in the code to throw exceptions. - Added getBinSize to the index interface; both indexes already implemented this. - Removed the abstract parts of the indexCreator interface; this is now more simple. - Added an IndexType enumeration; might be overkill but it is at least a single point of entry for index information. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4082 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 06:54:59 +00:00
hanna	8252494fa9	Forgot to update UG performance test to reflect the new -o argument. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4079 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 00:57:16 +00:00
hanna	c177801d81	Add deprecated command-line arguments, and switched over UG to output to -o/--out instead of -varout. Let's watch as our intrepid support engineer gracefully responds to all the incoming questions of the form: "the GATK told me to use -o instead of -varout. What do I do?" git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4078 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-22 21:01:44 +00:00
hanna	b80cf7d1d9	Modifications to the output system for better interaction with @Output. Multiplexed arguments. More details in the Monday meeting. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4077 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-22 14:27:05 +00:00
kiran	121b4f23b6	Simple change to allow a list of samples or regular expressions to be provided in a text file (one line per sample). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4074 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-21 00:01:48 +00:00
aaron	fa36731faf	fixes for VariantEval integration tests affected by the spaces to underscores change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4070 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 22:43:20 +00:00
ebanks	1ec305cd15	Fix for running the cleaner at the lane-level for known indels only: instead of relying on the reads to get the reference sequence, we now use an IndexedFastaSequenceFile in all cases and pad the reference with bases on either end. This allows us to deal with cases in which we are trying to clean just a single deletion-containing read with tiny LOD (so the read needs to be pushed off the seen reference; @Reference doesn't yet work for Read Walkers) and has the added benefit of allowing us now to get much larger known indels that aren't completely covered with reads. Thanks to Matt for the advice. Also, for Guillermo: while I was at it, I changed the .stats debug output to emit the original interval instead of the cleaned region. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4058 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 11:31:13 +00:00
rpoplin	8f15b2ba72	Memory optimization for the VariantRecalibrator. Only add variants to the list if they pass the novelty and qual filters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4051 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-17 21:57:28 +00:00
aaron	e632d9b83d	remove some dependencies on out of date methods from the tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4047 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-17 00:07:26 +00:00
aaron	c1df293feb	remove testing code from tribble track builder, set the command line program in walker test to null to reclaim memory in integration tests, and removed some orphaned intergration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4046 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 23:52:01 +00:00
rpoplin	578e7fa36d	Don't output -0 as qual value in VariantRecalibrator. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4044 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 16:47:58 +00:00
aaron	cc58a27b00	fix for broken unit test; make sure when we can't get an index off of disk, the internal method returns null git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4040 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 13:12:32 +00:00
kshakir	4710015c17	Disabled AlignerIntegrationTest while addressing build machine memory issues. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4033 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-14 01:23:21 +00:00
hanna	cb144734c0	Getting rid of GenotypeWriter interface. Of note: - GATKVCFWriter deleted, to be replaced if absolutely necessary when VCF writing goes into Tribble. - VCFWriter is now an interface, for easier redirection. - VCFWriterImpl fleshes out the VCFWriter interface. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4026 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 16:33:22 +00:00
ebanks	71c4d3f33d	Moving pointer to b36 reference from /broad/1KG to /humgen/1kg git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4021 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 00:54:34 +00:00
kshakir	f39dce1082	Exposed CommandLineFunction defaults to the Queue.jar command line (see -help). Added ability to skip up-to-date jobs where the outputs are older than the inputs. Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names. Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile Moved Hidden from the GATK to StingUtils. Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7 Added Queue to javadoc and testing build targets. Added first Queue unit test. Another pass at avoiding cycles in the DAG thanks to all function I/O being files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 21:58:26 +00:00
hanna	41d57b7139	Massive cleanup of read filtering. - Eliminate reduncancy of filter application. - Track filter metrics per-shard to facitate per merging. - Flatten counting iterator hierarchy for easier debugging. - Rename Reads class to ReadProperties and track it outside of the Sting iterators. Note: because shards are currently tied so closely to reads and not the merged triplet of <reads,ref,RODs>, the metrics classes are managed by the SAMDataSource when they should be managed by something more general. For now, we're hacking the reads data source to manage the metrics; in the future, something more general should manage the metrics classes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4015 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 20:17:11 +00:00
aaron	0a8ebcb4f9	moving tests over from the GATK to Tribble, and added a speed-up to the readNextRecord() that Mark suggested. Also removed the contained flag from the queries to Tribble in the GATK. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4003 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 17:54:59 +00:00
ebanks	3ff6e3404e	Alleles are now returned in a consistent order, so we can deal with tri-allelic sites git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4002 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 15:21:10 +00:00
aaron	d514c424fd	adding tests for BTI in the ROD validation tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3997 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 06:05:40 +00:00
ebanks	ca5b274f16	Unit, integration, and performance tests are all busted, so this is a good time to make a big commit... Major cleanup of the genotype writer code from the calling end. UG no longer supports making calls in anything but VCF, and that allows us to use the VCFWriter more generically now. Putting the ball in Matt's court to finish collapsing everything. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3996 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 04:18:29 +00:00
aaron	0f78f70ed4	fix for feature source in Tribble; we need to check that the record coming back isn't null. Also in the GATK added code to set the default logging level in integration tests to WARN, with the default level change they were spewing a bunch of text. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3995 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 02:57:23 +00:00
ebanks	419a36f74c	Starting the clean up of the sting.utils.genotype code which is all either moving to Tribble, moving to sting.utils.vcf, or being removed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3994 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 02:16:05 +00:00
aaron	0f29f2ae3f	fixes for the Tree index, and some small clean-up in the GATK. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3991 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 20:41:50 +00:00
rpoplin	3eee3183fd	Checking in the tiger team changes. LOD calculation modified. -qScale is back in case people need it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3990 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 20:41:03 +00:00
aaron	30178c05c5	providing a way to specify how you'd like -BTI combined with your -L options; set BTIMR to either UNION (default) or INTERSECTION. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3983 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 14:00:52 +00:00
kiran	e242a8f143	Put single quotes around the regex. This isn't strictly necessary through the integration test machinery, but is necessary at the console, and it's convenient to be able to cut and paste this. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3977 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 05:56:57 +00:00
kiran	13f29660bb	Integration test for SelectVariants. Tests a complex case with an explicit sample selection, sample selection by regex, exclusion of non-variant and filtered loci, and JEXL selection on low allele-frequency variants git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3976 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 05:49:47 +00:00
ebanks	bd6d5a8d51	Adding command-line header to VA and VF git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3974 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 05:21:15 +00:00
ebanks	594b7912f1	Added a generic method for returning the complete command-line used when calling a walker, to be used in the bam/vcf headers. As requested, every possible engine/walker argument is included. I've added it to the Unified Genotyper output, so people can try it out and let me know what they think. Something that needs to be discussed in group meeting: what happens when we merge VCFs? Do we keep all of the command-lines? git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3969 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 03:53:07 +00:00
ebanks	ac4699a650	Re-enabling this test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3962 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-06 20:20:37 +00:00
depristo	f275041b1c	-minimalVCF for CombineVariants. Work around for broken locking code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3960 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-06 16:10:59 +00:00
aaron	9076c0b28b	removing unused code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3958 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-06 14:24:39 +00:00
ebanks	341e752c6c	1) AlleleBalance is no longer a standard annotation, but the Allelic Depth (AD) is for each sample. 2) Small fixes in the VCFWriter: a) Trailing missing values weren't being removed if their count was > 1 (e.g. ".,.") b) We were handling key values that were Lists, but not Arrays. We now handle both. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3956 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-06 12:05:14 +00:00
aaron	72ae81c6de	VariantContext has now moved over to Tribble, and the VCF4 parser is now the only VCF parser in town. Other changes include: - Tribble is included directly in the GATK repo; those who have access to commit to Tribble can now directly commit from the GATK directory from Intellij; command line users can commit from inside the tribble directory. - Hapmap ROD now in Tribble; all mentions have been switched over. - VariantContext does not know about GenomeLoc; use VariantContextUtils.getLocation(VariantContext vc) to get a genome loc. - VariantContext.getSNPSubstitutionType is now in VariantContextUtils. - This does not include the checked-in project files for Intellij; still running into issues with changes to the iml files being marked as changes by SVN I'll send out an email to GSAMembers with some more details. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3954 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 18:47:53 +00:00
rpoplin	a8d37da10b	Checking in everyone's changes to the variant recalibrator. We now calculate the variant quality score as a LOD score between the true and false hypothesis. Allele Count prior is changed to be (1 - 0.5^ac). Known prior breaks out HapMap sites git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3952 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 14:12:19 +00:00
ebanks	07addf1187	Fix for Kiran: since the Variant Annotator will re-annotate on top of existing annotations it makes sense to remove old headers if they conflict with the definitions being added by VA. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3951 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 06:44:39 +00:00
ebanks	227c4b10f0	Bug fix for Chris: convert comp tracks to VC so that we can respect the filter field. Added an integration test to cover this. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3949 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 04:13:16 +00:00
asivache	d53d5ffbf6	A utility class that computes running average and standard deviation for a stream of numbers it is being fed with. Updates mean/stddev on the fly and does not cache the observations, so it uses no memory and also should be stable against overflow/loss of precision. Simple unit test is also provided (does not stress-test the engine with millions of numbers though). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3944 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 21:39:02 +00:00
ebanks	8d8acc9fae	Moving G's MyHapScore to replace the old HapScore git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3943 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 21:00:54 +00:00
ebanks	340bd0e2c1	Removed hard-coded pointers to references git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3934 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 17:59:37 +00:00
ebanks	2307bed742	VariantEval now uses the "standard" modules only by default. You can add other modules with the -E argument and not use all of the standard ones with -noStandard (they can be added back individually with -E). Generalized some of the packaging code from VariantAnnotator. Matt might want to take a look to make this nicer...? git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3925 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-03 16:51:10 +00:00
delangel	5af986e0c1	Add an integration test for Beagle (one for ProduceBeagleInput and one for BeagleOutputToVCFWalker) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3897 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-29 18:49:22 +00:00
ebanks	7dd55fbf13	Archiving git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3882 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 02:47:18 +00:00
depristo	19ad44d332	Minor improvements to CombineVariants to handle the complex case from Chris. IntegrationTest of complex case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3876 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-25 13:46:11 +00:00
depristo	e21376219d	Updates to CombineVariants for Tim. -setKey can be null. Integrationtests for -setKey foo and -setKey null. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3870 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 22:35:52 +00:00
delangel	26bb1cd9ce	Fix broken test correctly git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3869 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 20:47:41 +00:00
delangel	4fc1db7aaf	Change interface to VCFWriter add() method to take only 1 byte from reference (since that's the only thing it needs), to prevent bugs like having people call it with ref.addBases() which is wrong (since it provides bases starting from the left of reference context window). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3868 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 20:24:03 +00:00
delangel	5eef15cfdf	a) Bad bug fix to CombineVariants: when indels were being merged, the reference base provided was wrong - ref.getBases()[0] was being used, but this returns bease at start of window. Instead, the reference at current locus should be used. b) Cosmetic change to Beagle annotation description. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3861 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 15:13:47 +00:00
depristo	536399eaa0	Improvements to variant combine. Now calculates AC/AN/AF correctly by calling into the VariantAnnotator engine. Automatically removes annotations that are inconsistent across incoming VCs (in simpleMerge). TODO bug fix for Guillermo/Eric. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3858 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 13:33:11 +00:00
aaron	9579aace1f	updates to code dependent on Tribble, as well as the following Tribble changes: - makes writing to disk optional for indexes using the indexCreator classes (allow the user to specify the index file, if null don't write it) - removed some system.out debugging code - fixed version checking in interval tree - made indexes store and return a LinkedHashSet for sequence names (to ensure they've preserved the ordering in the file) - index creators now read the file before creating the index - changed the Index.write() method to take a LEDataStream instead of a file - removed the sequence dictionary code on the header - added utils for getting LEDataStreams - added a base Tribble exception git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3857 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 01:56:10 +00:00

... 2 3 4 5 6 ...

1091 Commits (8fdad20f33dff0cacceeb2cedc1c442e7a7e1bb5)