gatk-3.8

Commit Graph

Author	SHA1	Message	Date
carneiro	2efd807952	No more default callsets, they're now mandatory arguments. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5858 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 21:56:43 +00:00
fromer	bc4305c956	Added memory limit parameter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5855 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 21:11:44 +00:00
fromer	833dff658a	Small script to do full variant annotation in parallel git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5853 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 20:33:20 +00:00
chartl	912c6cdbfa	Moving this script out of playground while I figure out what's going on. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5848 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 17:48:44 +00:00
depristo	72ad8ded19	Removed unused importants, but some of these scripts are now out of date (they have been for a long time) so they don't compile anyway git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5837 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-22 18:43:48 +00:00
depristo	e234589240	Contracts for GenomeLocParser and GenomeLoc are now fully implemented. GenomeLocs can officially have any start/stop values from -Inf - +Inf. Bounds w.r.t. the reference are enforced, optionally, by GenomeLocParser. General code cleanup throughout the subsystem. All validation code for GLs is now centralized, and all I/O systems now validate their inputs. Because of this, the Picard interval processing code has been changed to examine whether an interval is valid, and only keep the valid intervals. Note that the scatter/gather test was changed, because the original hg18 chr20 interval files as actually malformed (all records for some reason where on chr20). Many interval processing routines were moved to IntervalUtils, as this is their natural home. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5830 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-21 02:01:59 +00:00
carneiro	3a2e32eef3	wex is wex, wgs is wgs.... i think i got it right this time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5828 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-20 16:44:25 +00:00
kshakir	6c6e52def9	Renamed FCP to HybridSelectionPipeline. Reviewed pipelines with dev team. HSP updates: - Calling SNPs and Indels at the same time then using SelectVariants to separate them for filtering - Moved logs next to the files like in WGP - Flattened outputs into one directory - The file names for the final outputs are now <projectName>.vcf and <projectName>.eval - Updated test to pass the chr20 intervals instead of a boolean - Removed MultiFCP WGP updates: - Only cleaning and calling chromosomes 1-22, X, Y, MT - Splitting SNPs from indels, filtering indels, then merging the selected SNPs and selected Indels back together to make sure there are no collisions in CombineVariants - Still running VQSR on the recombined SNPs plus hard filtered indels - Using hard indel filters from delangel - Reduced number of tranches with rpoplin - Changed prior for dbsnp from 10 to 8 with rpoplin - Assuming identical samples on both CombineVariants - Explicitly using variant merge option UNION even though it's the default - Not setting the default genotype merge option PRIORITIZE - Generating a vcf and eval for each tranche git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5825 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 22:47:02 +00:00
carneiro	76c87c9f1d	trio WGS was creating trio WEX filenames. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5822 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 17:45:45 +00:00
carneiro	ebcd333ed8	Quick small updates: SelectVariants: typo MethodsDevelopmentPipeline: Added CEU Trio WGS dataset git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5818 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 20:08:39 +00:00
carneiro	b5b8cb959a	Added VQSR to the downsampling script and changed memory limits for the clean script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5817 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 20:07:42 +00:00
kshakir	83e207d9dd	Added option to exclude intervals during chunk calling. Removed job priority as temp space isn't as tight at the moment and planning on changing the priority interface. Updated chunk calling with ebanks: - Using "the bundle" of resources. - Using dbsnp 132 and 1000G indel RODs for both RTC & IR. - Using the default maxIntervalSize in RTC. - Removed use of UG.exactCalculation argument. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5814 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 03:48:02 +00:00
depristo	9423652ad8	Computes how well a genotype chip covers a reference panel git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5806 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-14 15:07:28 +00:00
kshakir	95fc6c0a83	Changed VR tranches from old 0.1-10 to new 100 to 90. Using hapmap training and truth based on wiki. Explicitly setting the ts_filter_level even though 99.0 is the default. Recal file path now ends with with .recal. Added ar's vcf input. Omni rod name now omni instead of 1kg. The VR RodBind tags had spaces in them. Was passing both the full intervals and the chunk intervals to chunk jobs. Switched back to chr20 for default since the VR crashes on small intervals sets with "MESSAGE: Matrix is singular." Log files names based on the file paths + .out. Added eval statifications by sample based on the Hybrid Selection / Whole Exome pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5800 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-13 14:38:56 +00:00
kshakir	08c13f3944	Using embedded GATK. Hardcoded the reference and dbsnp since the training rods are also hardcoded, for now. Changed freeze/chr20 to wg/chr20/cent1 to also test the heaviest known shard. Other cleanup. TODO: Memory command line options or have the script figure it out using FLS or similar. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5799 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 23:19:49 +00:00
dheiman	9e08a699c6	Corrected memory handling and jobName formatting issues git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5797 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 17:47:56 +00:00
chartl	66c8fa5c48	James P says this change worked for him, so I'm committing it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5795 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 16:55:18 +00:00
dheiman	16db86e6cb	Grid Engine backend to GATK-Queue, initial commit of implementation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5788 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-11 13:21:45 +00:00
kshakir	3ffc2ccd81	Implemented broad specific LSF requirement in the LSF job runner ahead of GridEngine check in by dheiman. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5781 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-09 22:14:04 +00:00
rpoplin	1d11e88899	Adding another example call set to GATK resource bundle for use in VQSR wiki tutorial git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5774 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-05 21:16:33 +00:00
fromer	04f156d86b	Removed extraneous import git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5772 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-05 18:51:03 +00:00
kshakir	4d08d39849	Moved some of the java to scala conversions from production to test code as it's not needed in production and slows down the code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5769 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-05 04:11:15 +00:00
kshakir	28b897d5de	Fixed O(N^2) operation when scattering interval files. Cleaned up intervals contig count function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5768 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-05 03:32:35 +00:00
kshakir	8ad547e6c2	Fixed another interval bug where dividing up N intervals into N parts wasn't working. Minor updates to the FCPTest to match the changes due to using the old indel caller. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5766 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 20:49:35 +00:00
rpoplin	825682f58c	oops, putting the script back into a sensible state git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5765 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 20:17:05 +00:00
rpoplin	b5ab2274f6	Committing the base qscript I used to make the Phase1 Project Consensus. Does per-population cleaning and simplifyBAM, and then per-analysis-panel calling with genotype given alleles. Combines info fields using the panel with max AC. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5764 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 20:13:26 +00:00
kshakir	4d251fb91f	Why won't you die? git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5758 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 19:13:39 +00:00
kshakir	f7d9f0a1f3	Removing QPipeline directory as there's no one to support it at the moment. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5757 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 18:36:02 +00:00
kshakir	08f0509a5c	Disabling the queue/pipeline package by default so that scala code can build. If it's not going to be fixed the package should be removed. If it is going to be fixed this patch to build.xml should be reverted. Also added the old model of indel calling to the FCP. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5749 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 12:17:33 +00:00
carneiro	f35d955490	recalibrates a dataset splitting between good and bad regions for comparison (used to be named justRecalibrate) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5747 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:43:09 +00:00
carneiro	9f2a8033ff	just recalibrates now recalibrates one sample, fully, not splitting intervals (naming makes more sense) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5746 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:42:23 +00:00
carneiro	c2f8536e02	removing old GATK options git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5745 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:40:39 +00:00
carneiro	8bb92160b5	Script to identify mendelian violations in the CEU Trio and follow up with supposedly incorrect SNP calls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5744 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:19:42 +00:00
carneiro	e2b9227d8d	script to test BQSR on good/bad regions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5743 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:16:37 +00:00
rpoplin	4bbce42861	Renaming ContrastiveRecalibrator --> VariantRecalibrator in preparation for move to core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5733 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 18:12:47 +00:00
rpoplin	3224bbe750	New visualization output for VQSR. It creates the R script file on the fly and then runs Rscript on it. Adding 1000G Project consensus code. First pass of having VQSR work with missing data by marginalizing over the missing dimension for that data point (thanks Chris and Bob for ideas). Updated math functions to use apache math commons instead of approximations from wikipedia. New parameters available for the priors based on further reading in Bishop and looking at the new visualizations. Updated integration test to use more modern files. Updated MDCP to use new best practices w.r.t. annotations. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5723 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 19:14:42 +00:00
carneiro	a93a9ac663	adding gold standard (full coverage) to the variant eval analysis output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5721 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 16:29:11 +00:00
kshakir	2d81262f87	Fixed a bug where empty intervals were being scattered zero ways parallel. Would be awesome to use the GAE at some point. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5718 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 22:42:48 +00:00
carneiro	2384e23274	Added the capability of running count covariates only on a given interval. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5717 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 21:30:14 +00:00
carneiro	3868a7e778	Oneoff project to downsample, bootstrap and call snps to test sensitivity/specificity of downsampled coverage in WEX projects. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5713 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 19:17:30 +00:00
carneiro	f04cc4321f	fixed a bug when the pipeline was used on a single bam. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5708 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-28 17:19:22 +00:00
depristo	122d5845d3	GATK Resource bundle, latest version (now with b37 -> b36 support). Oneoff scala script that assesses chip coverage of calls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5703 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 22:01:36 +00:00
kshakir	df35a143b2	Removed -debug/--debug_mode. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5697 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 10:56:39 +00:00
kshakir	ca817356b6	Quick disabling test to restore build. TODO fix test or complete removal of the MFCP. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5696 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 04:26:11 +00:00
kshakir	6b1b4931e7	Added FCP VE stratifications for Filter, FunctionalClass, and Stratification as requested by Corin. Feeding FCP UG the bam list instead of individual bams to cut scatter gather time from O(m^100) as measured by Chris to O(m^1). Fixed NPE when eval values aren't found in PipelineTests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5694 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 02:29:56 +00:00
kshakir	58c7b27ccc	Missing file from last checkin. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5688 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-26 00:12:41 +00:00
kshakir	f619dd3ca7	Refactored IntervalUtils used to parse and scatter intervals for Queue. Scattering non-contig interval lists by number of loci in the intervals instead of just number of intervals. Queue caches the list of locs and how to split them up instead of reloading them from disk repeatedly. TODO: general purpose function to divide data evenly. Skip over comments when parsing picard analysis files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5687 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-26 00:06:00 +00:00
kshakir	6ca4e3cebf	Updating FCPT nCalledLoci due to fixed QD<2.0 filter. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5686 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-25 21:37:04 +00:00
kshakir	1158c99726	Only running chr20 test on the hour queue. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5684 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-22 22:09:42 +00:00
kshakir	00b57c751b	Added missing ".0". git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5682 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-22 21:50:07 +00:00

1 2 3 4 5 ...

441 Commits (cf3dbfee979d1f0bbb3fa0140615fd2bc4b73abe)