gatk-3.8

Commit Graph

Author	SHA1	Message	Date
droazen	29a0e08aa2	Testing bug fix process #3 (changes are irrelevant) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6048 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:54:40 +00:00
droazen	e148a75c32	Testing the 'bug fix' process #2 (changes are irrelevant) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6047 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:54:37 +00:00
droazen	9b90e9385d	Putting new association files, some qscripts, and the new pick sequenom probes file under local version control. I notice some dumb emacs backup files, I'll kill those off momentarily. Also minor changes to GenomeLoc (getStartLoc() and getEndLoc() convenience methods) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6034 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:53:37 +00:00
depristo	4c6d0e6143	Added stratification by discrete allele count, just like AF, but requiring genotypes so it can be exact. Added docs on wiki, and integrationtest using Kiran's very nice fundamental VCF VariantEvalWalker now passes a pointer to itself to the Stratefication setVariantEvalWalker (and assoc. get method) so that stratefications can look at VEWalker variables to obtain information necessary for their calculations, like the list of eval samples. This is a better interface, in my opinion, than the current approach of extending the base abstract Stratefication to include an initialize function that has all arguments necessarily for any Strat. JEXL expressions now provide access to the VariantContext vc object itself, so you can write JEXL's that directly use VariantContext and associated functions from the command line. ExomePostQC Queue script now creates a byAC eval using the new strat, and no longer produces a byAF file (as this was not exact, and lead to strange punctile behavior when actual AF quantization was out of sync with fix quantization of AF strat. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6015 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-19 03:11:00 +00:00
fromer	03a0185566	Control unscattered output file location git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6011 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-17 15:53:25 +00:00
depristo	27d4b317fc	Simple program that calls indels in CEU trio exomes and WGS can compared the results. Overall the indel calls really look good to me, given reasonably good input BAM files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6006 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-16 12:56:04 +00:00
depristo	43fdd31e20	Significant performance optimization for reduced reads due to better algorithm for including reads in the variable regions. Fixed a critical bug that actually produced multiple copies of the same read in the variable regions with this optimization as well. Scala exploration script updated as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6005 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-16 12:54:59 +00:00
depristo	38d7733989	Now accepts any number of VCFs to evaluate. Runs the standard (now three) variant eval commands and invokes the exomeQC R script. Has some annoying assumptions about paths encoded right now. Example usage below: setenv DATA ~/Desktop/broadLocal/localData/ java -Djava.io.tmpdir=tmp -jar ../dist/Queue.jar -S ../scala/qscript/oneoffs/depristo/ExomePostQCEval.scala --gatkjarfile ../dist/GenomeAnalysisTK.jar -R $DATA/human_g1k_v37.fasta $* -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch1.vcf -intervals ~/Desktop/broadLocal/localData/whole_exome_agilent_1.1_refseq_plus_3_boosters.Homo_sapiens_assembly19.targets.interval_list -dbSNP ~/Desktop/broadLocal/localData/dbsnp_132_b37.vcf -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch2.vcf git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6004 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-16 12:49:54 +00:00
fromer	b4c30bf124	Added option of minMappingQuality git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6002 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-16 00:02:26 +00:00
depristo	165befd38a	V1 of the post processing QC plotting scala script and R function. The scala script runs VariantEval on a VCF file, and computes QC metrics. The R script generates the report. Will discuss usage with data processing group. Ryan -- please add your additional plotting routines to this script, as you see fit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5980 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-13 17:06:42 +00:00
depristo	44287ea8dc	ReducedBAM changes to downsample to a fixed coverage over the variable regions. Evaluation script now has filters and eval. commands. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5965 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 19:36:08 +00:00
delangel	78f5309656	Intermediate commit of indel consensus VQSR script, a couple of new features added, not for general use git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5951 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-06 13:27:02 +00:00
depristo	cd293f145b	More stable reduced reads representation. Bug fixes throughout. No diffs by <1% of sites in an exome, and the majority of these differences are filtered out, or are obvious artifacts. UnitTests for BaseCounts. BaseCounts extended to handle indels, but not yet enabled in the consensus reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5939 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 20:11:31 +00:00
carneiro	a4ffae880d	Subversion crashed my intellij BADLY, so now moving the data processing pipeline to core in 2 steps. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5932 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 23:31:24 +00:00
carneiro	36db9bdcd5	Implemented and tested BWA alignment in the data processing pipeline. caveat: Right now bwa only supports one read group, so if the original file had multiple @RG lines, only the first one will be kept. (working on a solution to this) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5931 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 23:03:07 +00:00
carneiro	c85a1d9210	Implemented and tested BWA alignment in the data processing pipeline. Renamed it and moved to core. Happy to support it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5930 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 22:58:55 +00:00
fromer	ef56b48eef	Add CNV sub-dir git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5928 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 21:47:13 +00:00
carneiro	355be57539	fixing the pipeline so that it still works while I'm adding support for BWA. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5921 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 19:32:28 +00:00
carneiro	5974675b43	Two intermediate commits, to work over the weekend. ReplicationValidationWalker: Just the skeleton of what will be the implementation of the replication/validation model. dataProcessingV2: Committing an UNTESTED implementation of BWA alignment. I am running tests on it over the weekend. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5900 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-27 22:03:08 +00:00
depristo	549172af10	removing dependance on jobQueue == gsa git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5889 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-27 10:12:09 +00:00
fromer	b4af28c7df	Handle case where -L argument (intervals) not given git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5886 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-26 20:24:56 +00:00
delangel	3565eca2dd	Script to run UG to create annotated all-pop VCF files to use for Phase1 VQSR indel project consensus. Paralleles and generalizes SNP version, so in theory this script can be used for both SNP and Indel consensus. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5871 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-25 16:50:59 +00:00
delangel	e6396062c0	Script to use VQSR on indels - does VR, AR on each continental group, combines variants and then does VariantEval comparing with different chr20 all-pop 1000G callsets. Not for general use yet! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5866 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-24 17:19:30 +00:00
carneiro	2efd807952	No more default callsets, they're now mandatory arguments. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5858 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 21:56:43 +00:00
fromer	bc4305c956	Added memory limit parameter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5855 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 21:11:44 +00:00
fromer	833dff658a	Small script to do full variant annotation in parallel git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5853 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 20:33:20 +00:00
chartl	912c6cdbfa	Moving this script out of playground while I figure out what's going on. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5848 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 17:48:44 +00:00
depristo	72ad8ded19	Removed unused importants, but some of these scripts are now out of date (they have been for a long time) so they don't compile anyway git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5837 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-22 18:43:48 +00:00
carneiro	b5b8cb959a	Added VQSR to the downsampling script and changed memory limits for the clean script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5817 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 20:07:42 +00:00
depristo	9423652ad8	Computes how well a genotype chip covers a reference panel git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5806 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-14 15:07:28 +00:00
rpoplin	825682f58c	oops, putting the script back into a sensible state git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5765 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 20:17:05 +00:00
rpoplin	b5ab2274f6	Committing the base qscript I used to make the Phase1 Project Consensus. Does per-population cleaning and simplifyBAM, and then per-analysis-panel calling with genotype given alleles. Combines info fields using the panel with max AC. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5764 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 20:13:26 +00:00
carneiro	f35d955490	recalibrates a dataset splitting between good and bad regions for comparison (used to be named justRecalibrate) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5747 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:43:09 +00:00
carneiro	9f2a8033ff	just recalibrates now recalibrates one sample, fully, not splitting intervals (naming makes more sense) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5746 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:42:23 +00:00
carneiro	c2f8536e02	removing old GATK options git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5745 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:40:39 +00:00
carneiro	8bb92160b5	Script to identify mendelian violations in the CEU Trio and follow up with supposedly incorrect SNP calls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5744 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:19:42 +00:00
carneiro	e2b9227d8d	script to test BQSR on good/bad regions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5743 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 21:16:37 +00:00
rpoplin	4bbce42861	Renaming ContrastiveRecalibrator --> VariantRecalibrator in preparation for move to core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5733 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 18:12:47 +00:00
carneiro	a93a9ac663	adding gold standard (full coverage) to the variant eval analysis output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5721 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 16:29:11 +00:00
carneiro	2384e23274	Added the capability of running count covariates only on a given interval. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5717 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 21:30:14 +00:00
carneiro	3868a7e778	Oneoff project to downsample, bootstrap and call snps to test sensitivity/specificity of downsampled coverage in WEX projects. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5713 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 19:17:30 +00:00
carneiro	f04cc4321f	fixed a bug when the pipeline was used on a single bam. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5708 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-28 17:19:22 +00:00
depristo	122d5845d3	GATK Resource bundle, latest version (now with b37 -> b36 support). Oneoff scala script that assesses chip coverage of calls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5703 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 22:01:36 +00:00
chartl	5b9a8555cd	Queue graph time is currently of O(n^m) where n = num jobs, m = num unique base files. This script therefore was running in order 1200^16, which I don't think would finish before the heat death of the universe. For now, push down the number of files to 1 and gather them outside of Queue, once I've fixed up scatter-gather in core, outputs can be uncommented. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5674 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-21 12:56:25 +00:00
carneiro	d35c7d1029	- minor changes to the 'justclean' script to handle the Trio Cleaning. - fixing a bug on single ended BWA option of the data processing pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5662 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-19 16:35:24 +00:00
chartl	23fac043d9	Fix the outputs so the proper files are gathered (not automatic due to multiplexer) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5654 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 23:55:12 +00:00
chartl	8125b8b901	Old changes to the exome VQSR search. SGA updated to include new proportion-based insert size test. Major fix for dichotomization test: MathUtils now optionally ignores NaN values for sums, averages, variances. In the future this feature can be pushed back into the AssociationContext object iself (e.g. no data? no entry), but it's kept like this for transparency for now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5618 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-12 23:00:50 +00:00
chartl	b81228fec1	Minor bug fixes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5603 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-08 17:30:40 +00:00
hanna	437db28937	Incorporating Khalid's feedback. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5602 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-08 16:22:49 +00:00
chartl	cc58e19621	This is now running. Expect results in a few weeks when the ~7k jobs have percolated through the week queue. Pray gsa1 doesn't go down. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5593 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-07 21:12:59 +00:00

1 2 3

141 Commits (ea47ccf032d3417eca0f1c6a48bc250d0d507d6a)