Commit Graph

5965 Commits (27d4b317fc0bb16d4a13b83a3be5a0e736c9d081)

Author SHA1 Message Date
depristo 27d4b317fc Simple program that calls indels in CEU trio exomes and WGS can compared the results. Overall the indel calls really look good to me, given reasonably good input BAM files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6006 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:56:04 +00:00
depristo 43fdd31e20 Significant performance optimization for reduced reads due to better algorithm for including reads in the variable regions. Fixed a critical bug that actually produced multiple copies of the same read in the variable regions with this optimization as well. Scala exploration script updated as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6005 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:54:59 +00:00
depristo 38d7733989 Now accepts any number of VCFs to evaluate. Runs the standard (now three) variant eval commands and invokes the exomeQC R script. Has some annoying assumptions about paths encoded right now. Example usage below:
setenv DATA ~/Desktop/broadLocal/localData/
java -Djava.io.tmpdir=tmp -jar ../dist/Queue.jar -S ../scala/qscript/oneoffs/depristo/ExomePostQCEval.scala --gatkjarfile ../dist/GenomeAnalysisTK.jar -R $DATA/human_g1k_v37.fasta $* -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch1.vcf -intervals ~/Desktop/broadLocal/localData/whole_exome_agilent_1.1_refseq_plus_3_boosters.Homo_sapiens_assembly19.targets.interval_list -dbSNP ~/Desktop/broadLocal/localData/dbsnp_132_b37.vcf -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch2.vcf

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6004 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:49:54 +00:00
depristo 9254faa27e Added density plots by sample for each metric. New command line argument ordering. No longer requires the per-sample.tsv suppl. data -- will conditionally load if available
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6003 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:46:29 +00:00
fromer b4c30bf124 Added option of minMappingQuality
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6002 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 00:02:26 +00:00
depristo ce4e8d2093 A few comments / todos.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6001 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-15 13:09:09 +00:00
rpoplin d7430c23f8 Bringing VQSR up to date with the 1000G v2b changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6000 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 20:23:43 +00:00
asivache 04ecbf10ab Fixes the constraint-generated error about stop being less than start in GenomeLocParser.createGenomeLoc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5999 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 17:44:11 +00:00
hanna 14d7ee073b Rev Picard to get new PF_INDEL_RATE metric. Rev preQC generator script
to incorporate PF_INDEL_RATE.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5998 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 17:43:43 +00:00
ebanks 5be4f31515 Surprisingly, the TileCovariate was indeed covered in integration tests. Updated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5997 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 17:40:23 +00:00
rpoplin 6f7c4d1142 Removing exomePostQC.R
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5996 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 16:10:34 +00:00
hanna 7aec71f0e1 Add some very simple documentation on running and modifying the per-sample
metrics generator.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5995 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 15:22:17 +00:00
hanna cde2b409a7 Oops. Failed to add DbSnpMatchMetrics to Picard private jar.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5994 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 15:14:19 +00:00
ebanks d00d4fd4d6 Obsolete covariate class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5993 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 14:11:47 +00:00
hanna 11eb74e44f Request from Kiran: include PCT_TARGET_BASES_2X,PCT_TARGET_BASES_10X,
PCT_TARGET_BASES_20X,PCT_TARGET_BASES_30X into pre-QC metrics.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5992 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 01:12:17 +00:00
hanna 1fec811a47 Updated input to accept BAM list, and output to emit proper sample name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5991 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 00:15:01 +00:00
hanna 1b1aefc385 Move fingerprinting metrics reader into our Picard private extract.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5990 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 00:13:10 +00:00
depristo 85e20be7b7 Renamed. More general
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5989 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 20:50:56 +00:00
depristo a837a49328 Minor fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5988 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 20:50:34 +00:00
hanna e0ed30681e If data is not available, use R-compatible 'NA' string.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5987 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:53:38 +00:00
rpoplin db43e3f1ab Fixing an apparent parenthesis matching problem
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5986 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:52:14 +00:00
hanna 52f930d708 Bug fix.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5985 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:48:55 +00:00
hanna 1d1c9da783 First pass at a script that generates per-sample metrics from a pipeline yaml
input file.  Output is an R-parseable tsv.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5984 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:38:21 +00:00
droazen 44a29680bf Explicitly marked the updated tribble jar added in r5982 as binary
(Oh yes, there was a r5982, in case you were wondering. It was the first
tentative git -> svn commit, and just added an updated tribble jar. It went great except for the fact that svn didn't mark the jar as binary, causing a textual diff for 500k of binary data to be generated in the notification email, cause Gsa_svn_list to very probably choke on the notification email rather than deliver it. Now let us never speak of r5982 again...) 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5983 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:37:48 +00:00
droazen 480598842c Updated the tribble jar
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5982 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:00:09 +00:00
depristo 14a358e5e8 Oops, forgot one tiny thing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5981 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 17:12:42 +00:00
depristo 165befd38a V1 of the post processing QC plotting scala script and R function. The scala script runs VariantEval on a VCF file, and computes QC metrics. The R script generates the report. Will discuss usage with data processing group. Ryan -- please add your additional plotting routines to this script, as you see fit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5980 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 17:06:42 +00:00
rpoplin 3534f412c9 Better error message for the case of input variants found in ApplyRecalibration that were never seen during VariantRecalibrator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5979 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 14:45:28 +00:00
rpoplin 6231bba288 Bug fix for mergeInfoWithMaxAC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5978 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-12 20:10:16 +00:00
ebanks 1f4469976e Made into UserException with better error message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5977 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-12 03:38:52 +00:00
carneiro 95f3da1126 limiting the number of reads in memory for the SamValidateFile.jar
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5976 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 20:14:30 +00:00
ebanks 077862958d Oops, forgot to define the hg19 variable
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5975 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 18:26:48 +00:00
rpoplin 0d6ce91614 When running CombineVariants with -mergeInfoWithMaxAC the set field will be added appropriately
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5974 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 14:35:48 +00:00
delangel f8ffda6835 a) Hidden, experimental argument to UnifiedGenotyper that makes code, when in GenotypeGivenAlleles mode, ignore SNP alleles mixed in with indels in complex records - theory is that SNP sites behave statistically differently when doing VQSR so those alleles/sites should be treated separately.
b) Bug fix: multiallelic indel records where not being treated properly by VQSR because vc.isIndel() returns false with them. Correct general treatment for now is to do (vc.isIndel()||vc.isMixed()).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5973 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 19:19:23 +00:00
rpoplin 17e17d3c3c Misc cleanup in VQSR.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5972 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 18:37:37 +00:00
depristo e87c40d89c Fix for CoFoJa exception by upgrading to latest version
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5971 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:49:15 +00:00
depristo ac3620839c Very basic intergration tests for ReducedReads, to allow safe optimization of the code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5970 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:06:32 +00:00
rpoplin 895e86c544 Annotations used to build the 1000G consensus callsets are now standard annotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5969 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:03:39 +00:00
hanna 6c4f2f1b36 Temporarily disable contracts during integrationtest, take 2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5968 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 16:36:31 +00:00
hanna 44b98bed8c Killed sonatype repository; it's failed me too many times at this point.
Temporarily disabled contracts in integrationtests until we can find the cause
of the new error that's cropping up for Ryan.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5967 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 16:07:02 +00:00
depristo 93d6e17762 Final, documented version of CalibrateGenotypeLikelihoods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5966 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 20:22:28 +00:00
depristo 44287ea8dc ReducedBAM changes to downsample to a fixed coverage over the variable regions. Evaluation script now has filters and eval. commands.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5965 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 19:36:08 +00:00
hanna fbb68ae94c (Hopefully) short-lived script to rework the directory structure from core /
playground / oneoffs to public / private.  Currently implemented as an svn ->
svn merge, but will have to be tweaked to do a proper svn -> git merge.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5964 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 19:18:22 +00:00
kiran 49b021d435 Changed the definition of degeneracy (it's at the site level - degeneracy of a position in a codon, not degeneracy of the amino acid itself like I initially thought. Added the ability to supply an ancestral allele track (available in /humgen/gsa-hpprojects/GATK/data/Ancestor/).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5963 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 15:07:31 +00:00
kshakir d784dac495 samtools merge requires indexed files, so added them as implicit inputs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5962 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 15:02:51 +00:00
depristo a331e13721 Slightly more extensive test includes a 0/0 site to genotype
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5961 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 14:48:55 +00:00
chartl d035d8eb7b Updating the bam list is a bit trickier than most of us originally thought. Need to ensure that *3* files exist: the .bam, the .bai, and the finished.txt (or else bad things can happen)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5960 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 14:42:31 +00:00
depristo 0f43b10c39 Optimization in CombineVariants when merging into a sites_only VCF
VariantContextUtils now was a utility function that creates a sitesOnlyVariantContext from an input VC
Add complex merge test of SNPs and indels from the new batch merge wiki in :

http://www.broadinstitute.org/gsa/wiki/index.php/Merging_batched_call_sets

with multiple alleles for an indel.  Created a BatchMergeIntegrationTest that uses GGA with the complex merged input alleles to genotype SNPs and Indels with multiple alleles simultaneously in NA12878.  Looks great.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5959 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 14:31:46 +00:00
delangel 1d6486a28f First part of fix for correctly processing mixed multi-allelic records: correctly compute start/stop of vc when there are no null alleles (i.e. record is not a simple indel).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5958 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 13:36:18 +00:00
delangel d27800e07c a) Forgot to commit this ages ago: uncomment code to ignore hard clipped bases when computing indel likelihoods. b) First part of fix for correctly processing mixed multi-allelic records: correctly compute start/stop of vc when there are no null alleles (i.e. record is not a simple indel).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5957 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 11:28:17 +00:00