gatk-3.8

Commit Graph

Author	SHA1	Message	Date
carneiro	9c1b8ea796	Updated BQSR script to be more general and work with the new PacBio BAM files - for Kristian Cibulskis git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6075 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-23 21:05:28 +00:00
carneiro	087a25d9e3	quick memory upgrade to BWA classes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6074 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-23 20:53:32 +00:00
carneiro	fbe157137f	removing the old processing pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6073 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-23 20:19:13 +00:00
droazen	48055d45cb	Added support for PICARD functions to QUEUE after following Khalid's pointers on where to do it. I have added the 6 functions used by the Data Processing Pipeline, but from now on it should be a matter of seconds to copy/paste and create bindings to more functions. Updated the Data Processing Pipeline to use the new Picard classes and reorganized the pre-processing of the pipeline accordingly. Will only update the wiki once this change goes live. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6071 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:56:14 +00:00
droazen	c8124496d0	now with the new 'consensus model' parameter to the cleaner. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6064 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:55:42 +00:00
droazen	29a0e08aa2	Testing bug fix process #3 (changes are irrelevant) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6048 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:54:40 +00:00
droazen	e148a75c32	Testing the 'bug fix' process #2 (changes are irrelevant) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6047 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:54:37 +00:00
droazen	9b90e9385d	Putting new association files, some qscripts, and the new pick sequenom probes file under local version control. I notice some dumb emacs backup files, I'll kill those off momentarily. Also minor changes to GenomeLoc (getStartLoc() and getEndLoc() convenience methods) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6034 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:53:37 +00:00
depristo	4c6d0e6143	Added stratification by discrete allele count, just like AF, but requiring genotypes so it can be exact. Added docs on wiki, and integrationtest using Kiran's very nice fundamental VCF VariantEvalWalker now passes a pointer to itself to the Stratefication setVariantEvalWalker (and assoc. get method) so that stratefications can look at VEWalker variables to obtain information necessary for their calculations, like the list of eval samples. This is a better interface, in my opinion, than the current approach of extending the base abstract Stratefication to include an initialize function that has all arguments necessarily for any Strat. JEXL expressions now provide access to the VariantContext vc object itself, so you can write JEXL's that directly use VariantContext and associated functions from the command line. ExomePostQC Queue script now creates a byAC eval using the new strat, and no longer produces a byAF file (as this was not exact, and lead to strange punctile behavior when actual AF quantization was out of sync with fix quantization of AF strat. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6015 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-19 03:11:00 +00:00
fromer	03a0185566	Control unscattered output file location git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6011 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-17 15:53:25 +00:00
ebanks	4e85416af1	[Foiled yet again when trying to do this in git] Slight modifications in the argument structure for the IndelRealigner. Instead of boolean flags -knownsOnly and -doNotUseSW, we now have an enum --consensusDeterminationModel which lets you specify knowns only, also use indels in reads, or also use SW. Please note that the default behavior of IR has not changed at all (and won't for a few more days) - that'll be done in GIT (fingers crossed). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6008 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-16 17:35:37 +00:00
depristo	27d4b317fc	Simple program that calls indels in CEU trio exomes and WGS can compared the results. Overall the indel calls really look good to me, given reasonably good input BAM files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6006 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-16 12:56:04 +00:00
depristo	43fdd31e20	Significant performance optimization for reduced reads due to better algorithm for including reads in the variable regions. Fixed a critical bug that actually produced multiple copies of the same read in the variable regions with this optimization as well. Scala exploration script updated as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6005 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-16 12:54:59 +00:00
depristo	38d7733989	Now accepts any number of VCFs to evaluate. Runs the standard (now three) variant eval commands and invokes the exomeQC R script. Has some annoying assumptions about paths encoded right now. Example usage below: setenv DATA ~/Desktop/broadLocal/localData/ java -Djava.io.tmpdir=tmp -jar ../dist/Queue.jar -S ../scala/qscript/oneoffs/depristo/ExomePostQCEval.scala --gatkjarfile ../dist/GenomeAnalysisTK.jar -R $DATA/human_g1k_v37.fasta $* -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch1.vcf -intervals ~/Desktop/broadLocal/localData/whole_exome_agilent_1.1_refseq_plus_3_boosters.Homo_sapiens_assembly19.targets.interval_list -dbSNP ~/Desktop/broadLocal/localData/dbsnp_132_b37.vcf -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch2.vcf git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6004 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-16 12:49:54 +00:00
fromer	b4c30bf124	Added option of minMappingQuality git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6002 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-16 00:02:26 +00:00
depristo	165befd38a	V1 of the post processing QC plotting scala script and R function. The scala script runs VariantEval on a VCF file, and computes QC metrics. The R script generates the report. Will discuss usage with data processing group. Ryan -- please add your additional plotting routines to this script, as you see fit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5980 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-13 17:06:42 +00:00
carneiro	95f3da1126	limiting the number of reads in memory for the SamValidateFile.jar git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5976 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-10 20:14:30 +00:00
ebanks	077862958d	Oops, forgot to define the hg19 variable git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5975 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-10 18:26:48 +00:00
depristo	44287ea8dc	ReducedBAM changes to downsample to a fixed coverage over the variable regions. Evaluation script now has filters and eval. commands. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5965 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 19:36:08 +00:00
delangel	78f5309656	Intermediate commit of indel consensus VQSR script, a couple of new features added, not for general use git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5951 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-06 13:27:02 +00:00
kshakir	ac3f1be7f0	Added a samtools merge CLF. Using samtools to merge the low pass bams before cleaning to avoid "Too many open files." with 1500+ bams. Other minor cleanup as pointed out by the IntelliJ scala plugin. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5942 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 22:20:38 +00:00
kshakir	4c6751ec3c	Added argument to WGP and HSP to allow more memory. Upped the WGP VQSR memory to 32g to power through the filtering whole genome. TODO: Figure out what the right amount is. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5940 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 20:48:37 +00:00
depristo	cd293f145b	More stable reduced reads representation. Bug fixes throughout. No diffs by <1% of sites in an exome, and the majority of these differences are filtered out, or are obvious artifacts. UnitTests for BaseCounts. BaseCounts extended to handle indels, but not yet enabled in the consensus reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5939 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 20:11:31 +00:00
carneiro	32ac7be86a	new name to the pipeline, it's now in core, happy to support it. ps: Can't wait for GIT ! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5933 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 23:34:54 +00:00
carneiro	a4ffae880d	Subversion crashed my intellij BADLY, so now moving the data processing pipeline to core in 2 steps. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5932 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 23:31:24 +00:00
carneiro	36db9bdcd5	Implemented and tested BWA alignment in the data processing pipeline. caveat: Right now bwa only supports one read group, so if the original file had multiple @RG lines, only the first one will be kept. (working on a solution to this) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5931 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 23:03:07 +00:00
carneiro	c85a1d9210	Implemented and tested BWA alignment in the data processing pipeline. Renamed it and moved to core. Happy to support it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5930 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 22:58:55 +00:00
fromer	ef56b48eef	Add CNV sub-dir git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5928 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 21:47:13 +00:00
carneiro	355be57539	fixing the pipeline so that it still works while I'm adding support for BWA. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5921 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 19:32:28 +00:00
kshakir	8d294dd6e6	For the snps to create combine snps and filtered indels, now using a VCF with just snps instead of vcf with snps plus unfiltered indels. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5904 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-29 04:17:18 +00:00
carneiro	5974675b43	Two intermediate commits, to work over the weekend. ReplicationValidationWalker: Just the skeleton of what will be the implementation of the replication/validation model. dataProcessingV2: Committing an UNTESTED implementation of BWA alignment. I am running tests on it over the weekend. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5900 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-27 22:03:08 +00:00
carneiro	2524216d4b	Added the R script for VQSR git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5898 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-27 21:56:56 +00:00
depristo	549172af10	removing dependance on jobQueue == gsa git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5889 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-27 10:12:09 +00:00
fromer	b4af28c7df	Handle case where -L argument (intervals) not given git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5886 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-26 20:24:56 +00:00
ebanks	d393f59ad2	Moving the hg19 reference to a new location as per instruction from our intrepid leader git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5875 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-25 17:48:59 +00:00
kshakir	9d8c963fcc	Switched arguments from short name to long name. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5873 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-25 17:24:56 +00:00
kshakir	6ec3dd0f8c	Updated GridEngineJobRunner to return status RUNNING instead of PENDING when a job has been sent to GridEngine, even if it hasn't started. Added GridEngine to pipeline tests. Removed passing -jobProject since GridEngine projects must be predefined. Writing the HybridSelectionPipelineTest yaml into the temp directory. Disabled job priority as it needs to be refactored for use by GridEngine and LSF. Fixed WholeGenomePipeline variantmergeoption rename to filteredRecordsMergeType. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5872 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-25 17:16:44 +00:00
delangel	3565eca2dd	Script to run UG to create annotated all-pop VCF files to use for Phase1 VQSR indel project consensus. Paralleles and generalizes SNP version, so in theory this script can be used for both SNP and Indel consensus. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5871 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-25 16:50:59 +00:00
ebanks	3d134a8497	Updated to produce (actual) hg19 resources too git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5870 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-25 02:14:55 +00:00
delangel	e6396062c0	Script to use VQSR on indels - does VR, AR on each continental group, combines variants and then does VariantEval comparing with different chr20 all-pop 1000G callsets. Not for general use yet! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5866 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-24 17:19:30 +00:00
depristo	0448ef28d3	Actually use the right parameter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5864 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-24 12:09:01 +00:00
depristo	d551ce720d	Updated with new CombineVariants syntax git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5862 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-24 11:38:09 +00:00
carneiro	2efd807952	No more default callsets, they're now mandatory arguments. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5858 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 21:56:43 +00:00
fromer	bc4305c956	Added memory limit parameter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5855 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 21:11:44 +00:00
fromer	833dff658a	Small script to do full variant annotation in parallel git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5853 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 20:33:20 +00:00
chartl	912c6cdbfa	Moving this script out of playground while I figure out what's going on. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5848 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 17:48:44 +00:00
depristo	72ad8ded19	Removed unused importants, but some of these scripts are now out of date (they have been for a long time) so they don't compile anyway git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5837 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-22 18:43:48 +00:00
carneiro	3a2e32eef3	wex is wex, wgs is wgs.... i think i got it right this time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5828 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-20 16:44:25 +00:00
kshakir	6c6e52def9	Renamed FCP to HybridSelectionPipeline. Reviewed pipelines with dev team. HSP updates: - Calling SNPs and Indels at the same time then using SelectVariants to separate them for filtering - Moved logs next to the files like in WGP - Flattened outputs into one directory - The file names for the final outputs are now <projectName>.vcf and <projectName>.eval - Updated test to pass the chr20 intervals instead of a boolean - Removed MultiFCP WGP updates: - Only cleaning and calling chromosomes 1-22, X, Y, MT - Splitting SNPs from indels, filtering indels, then merging the selected SNPs and selected Indels back together to make sure there are no collisions in CombineVariants - Still running VQSR on the recombined SNPs plus hard filtered indels - Using hard indel filters from delangel - Reduced number of tranches with rpoplin - Changed prior for dbsnp from 10 to 8 with rpoplin - Assuming identical samples on both CombineVariants - Explicitly using variant merge option UNION even though it's the default - Not setting the default genotype merge option PRIORITIZE - Generating a vcf and eval for each tranche git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5825 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 22:47:02 +00:00
carneiro	76c87c9f1d	trio WGS was creating trio WEX filenames. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5822 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 17:45:45 +00:00

1 2 3 4 5 ...

353 Commits (c2ec2891d1e185b4cc0f30e2dfd18991e2837b69)