gatk-3.8

Commit Graph

Author	SHA1	Message	Date
carneiro	3a2e32eef3	wex is wex, wgs is wgs.... i think i got it right this time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5828 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-20 16:44:25 +00:00
kshakir	6c6e52def9	Renamed FCP to HybridSelectionPipeline. Reviewed pipelines with dev team. HSP updates: - Calling SNPs and Indels at the same time then using SelectVariants to separate them for filtering - Moved logs next to the files like in WGP - Flattened outputs into one directory - The file names for the final outputs are now <projectName>.vcf and <projectName>.eval - Updated test to pass the chr20 intervals instead of a boolean - Removed MultiFCP WGP updates: - Only cleaning and calling chromosomes 1-22, X, Y, MT - Splitting SNPs from indels, filtering indels, then merging the selected SNPs and selected Indels back together to make sure there are no collisions in CombineVariants - Still running VQSR on the recombined SNPs plus hard filtered indels - Using hard indel filters from delangel - Reduced number of tranches with rpoplin - Changed prior for dbsnp from 10 to 8 with rpoplin - Assuming identical samples on both CombineVariants - Explicitly using variant merge option UNION even though it's the default - Not setting the default genotype merge option PRIORITIZE - Generating a vcf and eval for each tranche git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5825 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 22:47:02 +00:00
carneiro	76c87c9f1d	trio WGS was creating trio WEX filenames. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5822 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 17:45:45 +00:00
carneiro	ebcd333ed8	Quick small updates: SelectVariants: typo MethodsDevelopmentPipeline: Added CEU Trio WGS dataset git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5818 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 20:08:39 +00:00
carneiro	b5b8cb959a	Added VQSR to the downsampling script and changed memory limits for the clean script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5817 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 20:07:42 +00:00
kshakir	83e207d9dd	Added option to exclude intervals during chunk calling. Removed job priority as temp space isn't as tight at the moment and planning on changing the priority interface. Updated chunk calling with ebanks: - Using "the bundle" of resources. - Using dbsnp 132 and 1000G indel RODs for both RTC & IR. - Using the default maxIntervalSize in RTC. - Removed use of UG.exactCalculation argument. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5814 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 03:48:02 +00:00
depristo	9423652ad8	Computes how well a genotype chip covers a reference panel git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5806 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-14 15:07:28 +00:00
kshakir	95fc6c0a83	Changed VR tranches from old 0.1-10 to new 100 to 90. Using hapmap training and truth based on wiki. Explicitly setting the ts_filter_level even though 99.0 is the default. Recal file path now ends with with .recal. Added ar's vcf input. Omni rod name now omni instead of 1kg. The VR RodBind tags had spaces in them. Was passing both the full intervals and the chunk intervals to chunk jobs. Switched back to chr20 for default since the VR crashes on small intervals sets with "MESSAGE: Matrix is singular." Log files names based on the file paths + .out. Added eval statifications by sample based on the Hybrid Selection / Whole Exome pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5800 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-13 14:38:56 +00:00
kshakir	08c13f3944	Using embedded GATK. Hardcoded the reference and dbsnp since the training rods are also hardcoded, for now. Changed freeze/chr20 to wg/chr20/cent1 to also test the heaviest known shard. Other cleanup. TODO: Memory command line options or have the script figure it out using FLS or similar. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5799 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 23:19:49 +00:00
chartl	66c8fa5c48	James P says this change worked for him, so I'm committing it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5795 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 16:55:18 +00:00
rpoplin	1d11e88899	Adding another example call set to GATK resource bundle for use in VQSR wiki tutorial git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5774 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-05 21:16:33 +00:00
fromer	04f156d86b	Removed extraneous import git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5772 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-05 18:51:03 +00:00
rpoplin	825682f58c	oops, putting the script back into a sensible state git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5765 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 20:17:05 +00:00
rpoplin	b5ab2274f6	Committing the base qscript I used to make the Phase1 Project Consensus. Does per-population cleaning and simplifyBAM, and then per-analysis-panel calling with genotype given alleles. Combines info fields using the panel with max AC. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5764 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 20:13:26 +00:00
kshakir	08f0509a5c	Disabling the queue/pipeline package by default so that scala code can build. If it's not going to be fixed the package should be removed. If it is going to be fixed this patch to build.xml should be reverted. Also added the old model of indel calling to the FCP. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5749 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 12:17:33 +00:00
carneiro	f35d955490	recalibrates a dataset splitting between good and bad regions for comparison (used to be named justRecalibrate) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5747 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:43:09 +00:00
carneiro	9f2a8033ff	just recalibrates now recalibrates one sample, fully, not splitting intervals (naming makes more sense) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5746 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:42:23 +00:00
carneiro	c2f8536e02	removing old GATK options git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5745 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:40:39 +00:00
carneiro	8bb92160b5	Script to identify mendelian violations in the CEU Trio and follow up with supposedly incorrect SNP calls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5744 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:19:42 +00:00
carneiro	e2b9227d8d	script to test BQSR on good/bad regions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5743 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:16:37 +00:00
rpoplin	4bbce42861	Renaming ContrastiveRecalibrator --> VariantRecalibrator in preparation for move to core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5733 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 18:12:47 +00:00
rpoplin	3224bbe750	New visualization output for VQSR. It creates the R script file on the fly and then runs Rscript on it. Adding 1000G Project consensus code. First pass of having VQSR work with missing data by marginalizing over the missing dimension for that data point (thanks Chris and Bob for ideas). Updated math functions to use apache math commons instead of approximations from wikipedia. New parameters available for the priors based on further reading in Bishop and looking at the new visualizations. Updated integration test to use more modern files. Updated MDCP to use new best practices w.r.t. annotations. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5723 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 19:14:42 +00:00
carneiro	a93a9ac663	adding gold standard (full coverage) to the variant eval analysis output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5721 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 16:29:11 +00:00
carneiro	2384e23274	Added the capability of running count covariates only on a given interval. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5717 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 21:30:14 +00:00
carneiro	3868a7e778	Oneoff project to downsample, bootstrap and call snps to test sensitivity/specificity of downsampled coverage in WEX projects. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5713 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 19:17:30 +00:00
carneiro	f04cc4321f	fixed a bug when the pipeline was used on a single bam. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5708 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-28 17:19:22 +00:00
depristo	122d5845d3	GATK Resource bundle, latest version (now with b37 -> b36 support). Oneoff scala script that assesses chip coverage of calls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5703 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 22:01:36 +00:00
kshakir	6b1b4931e7	Added FCP VE stratifications for Filter, FunctionalClass, and Stratification as requested by Corin. Feeding FCP UG the bam list instead of individual bams to cut scatter gather time from O(m^100) as measured by Chris to O(m^1). Fixed NPE when eval values aren't found in PipelineTests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5694 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 02:29:56 +00:00
kshakir	00b57c751b	Added missing ".0". git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5682 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-22 21:50:07 +00:00
chartl	5b9a8555cd	Queue graph time is currently of O(n^m) where n = num jobs, m = num unique base files. This script therefore was running in order 1200^16, which I don't think would finish before the heat death of the universe. For now, push down the number of files to 1 and gather them outside of Queue, once I've fixed up scatter-gather in core, outputs can be uncommented. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5674 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-21 12:56:25 +00:00
corin	9f006be425	Updates Omni path and removes a typo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5673 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-21 04:17:13 +00:00
kshakir	8619f49d20	Added a utility method to retrieve the contig lengths for WG chunking. Added a rudimentary GATKReportParser for parsing VE3 results. Re-enabled the FCPTest using VE3, the GATKRP, and the PicardAggregationUtils. The tag type for .rod files is DBSNP, not ROD. More explicit return types on implicit methods. Added null checks for implicit string to/from file conversions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5668 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-20 19:22:21 +00:00
depristo	d8b8f857f3	V2 -- now working -- of a core walker that creates the standard GATK resource bundle See https://www.broadinstitute.org/gsa/wiki/index.php/GATK_resource_bundle Which live locally in /humgen/gsa-hpprojects/GATK/bundle/current You use this following command to create the bundle: java -Djava.io.tmpdir=/broad/shptmp/depristo/tmp -jar dist/Queue.jar -S scala/qscript/core/GATKResourcesBundle.scala --gatkjarfile dist/GenomeAnalysisTK.jar -bsub -jobQueue gsa -svn 5660 $* Annoyingly, it must be run in the trunk directory, and requires an explicit svn version number to create the directory. It also must be run in two stages manually. First, the local bundle is created, and then with the -phase2 argument all of the files in the local bundle are compressed and pushed to the FTP server. I'm likely going to shift most of my processes over to using this location for data file access, especially for b37 data sets. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5665 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-20 12:48:47 +00:00
carneiro	d35c7d1029	- minor changes to the 'justclean' script to handle the Trio Cleaning. - fixing a bug on single ended BWA option of the data processing pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5662 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-19 16:35:24 +00:00
depristo	541c9109b3	V1 of GATK Resource Bundling system git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5659 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-18 19:23:45 +00:00
chartl	23fac043d9	Fix the outputs so the proper files are gathered (not automatic due to multiplexer) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5654 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 23:55:12 +00:00
chartl	e5ef8388fc	BatchMerge - AlleleVCF --> AllelesVCF, this (combined with Eric's fix) will solve James P.'s forum issue. After viewing results on real case/control data from RAW -- it's really working quite well. ReadIndels, however, needs to use a T-test rather than a U-test, especially in deep coverage (at indel sites, the reads with indels will have mostly the same number of CIGAR indel elements -- one -- which doesn't really play nicely with the UTest when sample sets are large). Modified ReadsLargeInsertSize to be a two-way test (e.g. ReadsLarge and ReadsSmall). BaseQualityScore also suffers from the same issue as read indels, so switching over to a T-test in that case as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5653 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 22:03:16 +00:00
chartl	104d5515fe	Huh, somehow this change didn't make it through last time git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5639 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 17:09:37 +00:00
chartl	47fa7e2227	+ Added override to extractFileEntries + UG now doesn't care whether it's given SNPs or indels to genotype, it will do the right thing -- so remove the option to specify which GM user wants + Max misamatches argument removed integration test will follow git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5638 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 15:13:35 +00:00
kshakir	475ad1259d	Put a band-aid on the FCP by switching use of DINDEL to INDEL and explicitly running UG the old way with just indels and just snps. Switched YAML parser to new Broad parser which will additionally update picard cleaned bams to the latest version if the project and sample are specified. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5634 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 02:22:31 +00:00
corin	9ee30ce594	Whole genome pipeline script. currently chunks, cleans, calls, merges, selects and filters indels, recalibrates, and evals. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5627 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 16:59:48 +00:00
chartl	8125b8b901	Old changes to the exome VQSR search. SGA updated to include new proportion-based insert size test. Major fix for dichotomization test: MathUtils now optionally ignores NaN values for sums, averages, variances. In the future this feature can be pushed back into the AssociationContext object iself (e.g. no data? no entry), but it's kept like this for transparency for now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5618 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-12 23:00:50 +00:00
kshakir	4b7c3af763	When /etc/mailname is unreadable fall back to the hostname. Implicit conversions for String to/from File. Small updates to the example QScripts. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5614 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-11 20:22:44 +00:00
rpoplin	05ad6ecf72	bug fix in MDCP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5613 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-11 18:27:47 +00:00
chartl	b81228fec1	Minor bug fixes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5603 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-08 17:30:40 +00:00
hanna	437db28937	Incorporating Khalid's feedback. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5602 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-08 16:22:49 +00:00
chartl	cc58e19621	This is now running. Expect results in a few weeks when the ~7k jobs have percolated through the week queue. Pray gsa1 doesn't go down. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5593 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-07 21:12:59 +00:00
chartl	6a26957b65	Bug squashing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5592 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-07 20:11:28 +00:00
chartl	a1b7d28375	Initial VQSR full search script git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5591 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-07 20:03:48 +00:00
rpoplin	febb883511	updates to MDCP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5586 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-06 19:44:46 +00:00

1 2 3 4 5 ...

306 Commits (d77f4ebe31a8f9e48165bd7ccfd3cd39f2ee25e1)