gatk-3.8

Commit Graph

Author	SHA1	Message	Date
chartl	c355afc320	Queue now does job tracking (replace -run with -status in the command line). Produces output that looks like: INFO 20:58:17,827 QCommandLine - Checking pipeline status INFO 20:58:23,234 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_MergeIndels [DONE] INFO 20:58:23,236 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_158.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,237 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_929.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,238 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_SNP_calls [NOT DONE] 5t/0d/0r/5p/0f INFO 20:58:23,239 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_HandFilter [NOT DONE] INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1122.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantRecalibrator [NOT DONE] INFO 20:58:23,241 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_913.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,242 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_2037.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,243 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantEval [NOT DONE] INFO 20:58:23,244 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster [NOT DONE] INFO 20:58:23,245 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_106.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,246 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster_and_Indel_filter [NOT DONE] INFO 20:58:23,247 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_ApplyVariantCuts [NOT DONE] INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_GenomicAnnotator [NOT DONE] INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1713.bam [DONE] 5t/5d/0r/0p/0f git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4340 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 00:59:09 +00:00
kshakir	4ed9f437e9	Sliced the GAE in half like a gordian knot to avoid the constant merge conflicts. The GAE half has all the walker specific code. The new "Abstract" GAE has the rest of the logic. More refactoring to come, with the end goal of having a tool that other java analysis programs (Queue, etc.) can use to read in genomic data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4339 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 23:28:55 +00:00
rpoplin	0c9fabb06f	Fix in AnalyzeAnnotations, somebody changed it look for ID in the vc's info field. This dinosaur desperately needs integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4338 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 19:48:44 +00:00
hanna	0c781968fb	Tried to do a bit of pre-commit refactoring and screwed it up. Fixed. Thanks to Ryan for identifying the problem. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4336 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 18:17:29 +00:00
corin	d6bd1debeb	This is an updated version of the automated data processing report. Each page in the report is a stand alone function, which are linked together with a function which pulls all appropriate data (assuming a standard naming convention) and generates the pdf. This script still need to respond appropriately when it doesn't find the data it needs, database access, and a way of getting some information from sequencing for the tearsheet. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4335 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 18:08:16 +00:00
depristo	d081b9b352	Improvements to error messages about @Requires and @Allows git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4334 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 12:08:27 +00:00
hanna	7841b301c4	Added more diagnostics so that I have some idea of what a 'general' exception is. Required to fix bug ZjhCJAdwhtFq1x54ZlmlN8pFNcbrRpdJ and similar. We might want to change this particular case to a ReviewedStingException after we gain a bit more experience with it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4333 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 21:32:01 +00:00
fromer	44ccfc3531	Updated Phasing algorithm + evaluation module to properly implement haplotypes [including homozygous genotypes]; Implemented dynamic window phasing model for LARGE increase in efficiency git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4332 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 21:29:58 +00:00
kshakir	192757d1e0	Added the new pipeline classes to the StingUtils.jar so that ant test picks them up. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4331 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 20:05:22 +00:00
hanna	8f75d88519	Fix for GATK run report ids: mOVsxGfDiiSMxVs2PPTVjzYTVbizlD6e f9kUHUADFsZ0LiTGxRL5zPmq9kZcA4cQ 8eGHWJFAlBVmgxwPi3sMd1RmiN2PwHOf iLhvHWveypKb2F8vKS5irHylc3pYvlOb HDttXKUMEVoPrvVeWrH7E0htxYyNydMx plus a bit of cleanup of custom exceptions in the sharding system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4330 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 19:49:25 +00:00
kshakir	20b38b38f3	Updated from SnakeYAML 1.6 to 1.7. Added a pipeline java bean and YAML utility to serialize java beans. Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format. Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference. More changes to come as this code gets tested out in the fullCallingPipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 19:47:49 +00:00
hanna	d9b8fa2acc	Up the memory required for integrationtests until we can figure out why memory isn't being freed correctly when multiple integration tests run as part of a single class. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4328 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 18:54:24 +00:00
hanna	fb5d595ef0	Disable VCF header output in the Beagle integrationtest. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4327 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 16:50:03 +00:00
hanna	0c99c97685	The engine now automatically adds the command-line arguments to the header of every VCF, unless -NO_HEADER is specified. Changed integration tests, adding the -NO_HEADER argument, for walkers that previously did not include the command-line arg headers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4326 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 15:27:58 +00:00
aaron	1af9ca6d45	enabling tests that now pass with the conitg length validation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4325 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 22:20:50 +00:00
chartl	6dec042288	Re-enabling indel cleaning, explicitly calling fix mates in the case where indel cleaning is not scatter/gathered git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4324 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 20:37:49 +00:00
depristo	522830fb01	Support for --assume-single-sample in UG, better malformated bam exceptions, and ignoring out of order contigs in seqdictutils. All for the CG bam file git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4323 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 20:33:34 +00:00
aaron	3938d53738	one broken build short of the hat trick. Fixing the unix test which expects the sequence dictionary of the Tribble track to equal the reference; we actually return the sequence dictionary of the track iself, with each contig set to the length of the sequence dictionary contig entry. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4322 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 18:47:20 +00:00
aaron	b968af5db5	The tribble indexes are now updated with correct sequence lengths for each contig they have in their sequence dictionary. Also clean-up in the RMD track builder. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4321 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 18:21:22 +00:00
aaron	2586f0a1ca	fix for the build I broke - the original file got corrupted, which I replaced with a version that didn't have the header stripped off. Other integration tests passed, but this test relied on the header being stripped off. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4320 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 15:35:25 +00:00
rpoplin	547763b230	Better error message for Petr's null pointer exception. Also added an exception integration test because I'm certain this used to work. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4319 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 13:44:40 +00:00
depristo	8719dde59d	Now prints out PASS when a variant is unfiltered git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4318 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 13:16:41 +00:00
asivache	21a34daf2e	add playground classes to the jar (if they are compiled). Thanks, Khalid! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4317 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-20 22:28:08 +00:00
kshakir	f9707bb7bf	Fix for Matt: For Mac OS 10.6 temporary directories replace paths like '/var/folders/Ax/AxRUoz51Fh05fVe-j6C1Wk+++TI/-Tmp-/' with '/tmp/' so that google reflections 0.95RC2 still works on classes in the directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4316 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-20 22:02:07 +00:00
delangel	205fc0b636	Cleanup: Use Tribble's version of createVariantContextWithPaddedAlleles (no real functional difference) to avoid duplicated code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4315 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-20 19:53:30 +00:00
delangel	a10cfe213b	Small bug fix in simple indel genotyper: Likelihood of case where best haplotype pair was (REF,REF) was not computed correctly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4314 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-20 17:04:39 +00:00
ebanks	f5a30d0248	I just spoke to Andrey & Kiran (the original authors of these tools), and they voted to kill these in favor of Picard git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4313 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-20 13:27:35 +00:00
chartl	b24172c80f	Queue now utilizes .[file].done to allow skipping of previous jobs, if they have been completed. This is, unfortunately, reliant on a python script to do the post-execution touching of .done files. That is to say, proper resumability is live (but not extensively tested) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4312 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-20 00:16:53 +00:00
delangel	f64b6fddc1	Major changes/improvements to indel genotyper: a) Redid way to compute path metrics in indel error model. Paper formulation where we have an anchor point in the alignemt between read and haplotype won't work in practice except in nice data sets that are perfectly indel-realigned and that are well mapped by aligner. New formulation doesn't assume this, and it's actually simpler and uses less code. It now resembles more a classic SW dynamic programming formulation but it still preserves the HMM probabilistic formulation. b) Added a programmable call threshold, set by command line. c) Use now sample name from BAM file, remove -sampleName argument. d) Simplify loop to compute read-haplotype likelihoods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4311 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-19 23:47:31 +00:00
chartl	40274ba7dc	Do this the right way git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4310 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-19 04:30:48 +00:00
chartl	fa8cfd3031	Adding this to get around lsf/csh issues (see recent help message). Also seems like a good time to reiterate http://www.faqs.org/faqs/unix-faq/shell/csh-whynot/ git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4309 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-19 02:45:16 +00:00
rpoplin	c6351a11d6	Clearer logger output when not using by-hapmap git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4308 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-18 16:10:42 +00:00
rpoplin	7e58d8ed61	CombineVariants now outputs the command line in the VCF header. Added a new hidden argument to VR walkers called --NoByHapMapValidationStatus to turn off the by-hapmap dbsnp rod behavior. Very useful for experimenting with which sets to use as training data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4307 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-18 16:06:50 +00:00
chartl	6f6d2eb31f	Told people this worked...forgot to commit! -c git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4306 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-18 03:46:00 +00:00
kshakir	a3f31e5df0	When QScript writers use the RodBind, then the File version of the same argument should be optional, i.e. should not always try to output the file, which when unpopulated will be null. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4305 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-17 18:22:07 +00:00
kshakir	1c94a73434	Fixed header generation when lines contain spaces. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4304 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-17 17:14:59 +00:00
bthomas	c6c6d32b46	Quickly adding a new convenience method for retreiving a group of samples. The method is getSamples(Collection<String>) and returns a set of sample objects. There's also a test there. Ryan is using this to modify VCF code today... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4303 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-17 15:55:17 +00:00
kshakir	a898908918	The output BAM file optional arguments of compression and whether to write an index are not outputs themselves. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4302 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-17 15:35:54 +00:00
bthomas	bc12055fcf	Quick patch to fix the sample code. It wasn't actually initializing the sample data source, so I added a call to initializeSampleDataSource() in GenomeAnalysisEngine. I think there was just an error resolving the versions of GenomeAnalysisEngine Also added a new error message that I thought would be helpful... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4301 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-17 14:05:26 +00:00
ebanks	a10b2a00a5	Moving the util VariantContext 'modifying' routines into VC itself (as opposed to VCUtils) so that we can pass the genotype data directly into it and are no longer forced to decode the genotypes for no reason. This means that any walker that takes in a VCF and modifies the records without touching the genotypes never have to decode them. I've hooked this into the other two Variant Recalibrator walkers for Ryan. One side effect, though, is that we no longer can sort the sample names in the VCF (i.e. if the input VCF doesn't have samples in alphabetical order, then we used to sort them when writing a new VCF but no longer do that), because if we don't decode then we can't re-order the genotypes. I don't think this is a big concern given that the Unified Genotyper does emit sorted samples and that's the main source for most of the VCFs we use. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4300 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-17 07:09:58 +00:00
bthomas	f66ef4626e	Fixing two minor issues: 1) adding a new error message if the user adds a fasta file in a directory that doesn't exist; 2) renaming my sample unit tests so they actually run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4299 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 20:45:51 +00:00
rpoplin	3a400e3dc0	Added CountCovariates integration test to ensure that it throws an exception if a variant mask isn't provided. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4298 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 19:18:38 +00:00
kshakir	bf69b5fa21	"!=" != "==" git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4297 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 19:15:37 +00:00
rpoplin	2eb5d9b2d2	CountCovariates makes sure that it sees a rod type that it expects for use as a variant mask (accepted types are dbsnp, vcf, and bed) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4296 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 18:53:42 +00:00
chartl	c1720cc8f5	Now compiles. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4295 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 18:49:53 +00:00
chartl	c581bd2d84	Minor modifications to fCP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4294 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 18:29:24 +00:00
aaron	de56568ce4	Adding the appropriate DbSNP file to the performance tests so they don't exception out. The exception: "org.broadinstitute.sting.utils.exceptions.UserException$CommandLineException: Invalid command line: This calculation is critically dependent on being able to skip over known variant sites. Please provide a dbSNP ROD or a VCF file containing known sites of genetic variation." git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4293 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 16:30:54 +00:00
aaron	782e0018e4	removal of most of the old GATK ROD system; also a fix for -Dsingle so we can again run just a single unit or integration test (single tests in tribble can be run with the -DsingleTest option now). More to come. * Three integration tests had to change: * RecalibarationWalkersIntegrationTest: One of the tests was using the interval as the snp track, and wasn't supplying a DbSNP track (for CountCovariates) SequenomValidationConverterIntegrationTest: relies on Plink ROD which we've removed. PileupWalkerIntegrationTest: we no longer have implicit interval tracks, so there isn't a rod name over the specified region. Otherwise the same result. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4292 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-15 22:54:49 +00:00
delangel	c604ed9440	Several improvements to new indel genotyper (more to come soon): a) Turns out previous change of centering haplotype around indel was a bad idea. Context to the left of indel is important but not as important as right one, because by definition all alleles start at the same location, so haplotype is the same to the left of indel regardless of allele. So, go back to having a constant size window to the left of event. b) Expand reference context so we can test larger haplotypes. c) Optimize computation of read likelihoods by doing them in linear array instead of in a matrix - no difference in biallelic sites but could be significantly faster in multiallelic sites. d) Bug fix: read alignment wasn't being computed correctly if, a) we were at an insertion, b) read started right at the insertion, c) read CIGAR didn't include insertion - more of these corner conditions are lurking, so a revamped computation of how reads align to candidate haplotypes is in the works. e) Add debug option not to use prior haplotype likelihoods. f) Don't hard-code NA12878 for genotyping, now sample name is a required input argument. g) Bug fix: if there are no reads covering a candidate indel event, just output NO_CALL (didn't notice this in HiSeq, but in P1 data it happens all the time). I need to add a confidence threshold for calling later on. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4291 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-15 21:53:08 +00:00
depristo	fb6d7d19f9	Better window size error message git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4290 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-15 20:40:56 +00:00

1 2 3 4 5 ...

4303 Commits (c355afc3207a40f4ae69052d4e8aa79350a700af) All Branches Search

4303 Commits (c355afc3207a40f4ae69052d4e8aa79350a700af)

All Branches