Commit Graph

5960 Commits (ce4e8d2093b36a4dcff069d9a980a51c0d974c24)

Author SHA1 Message Date
depristo ce4e8d2093 A few comments / todos.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6001 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-15 13:09:09 +00:00
rpoplin d7430c23f8 Bringing VQSR up to date with the 1000G v2b changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6000 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 20:23:43 +00:00
asivache 04ecbf10ab Fixes the constraint-generated error about stop being less than start in GenomeLocParser.createGenomeLoc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5999 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 17:44:11 +00:00
hanna 14d7ee073b Rev Picard to get new PF_INDEL_RATE metric. Rev preQC generator script
to incorporate PF_INDEL_RATE.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5998 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 17:43:43 +00:00
ebanks 5be4f31515 Surprisingly, the TileCovariate was indeed covered in integration tests. Updated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5997 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 17:40:23 +00:00
rpoplin 6f7c4d1142 Removing exomePostQC.R
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5996 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 16:10:34 +00:00
hanna 7aec71f0e1 Add some very simple documentation on running and modifying the per-sample
metrics generator.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5995 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 15:22:17 +00:00
hanna cde2b409a7 Oops. Failed to add DbSnpMatchMetrics to Picard private jar.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5994 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 15:14:19 +00:00
ebanks d00d4fd4d6 Obsolete covariate class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5993 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 14:11:47 +00:00
hanna 11eb74e44f Request from Kiran: include PCT_TARGET_BASES_2X,PCT_TARGET_BASES_10X,
PCT_TARGET_BASES_20X,PCT_TARGET_BASES_30X into pre-QC metrics.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5992 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 01:12:17 +00:00
hanna 1fec811a47 Updated input to accept BAM list, and output to emit proper sample name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5991 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 00:15:01 +00:00
hanna 1b1aefc385 Move fingerprinting metrics reader into our Picard private extract.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5990 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 00:13:10 +00:00
depristo 85e20be7b7 Renamed. More general
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5989 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 20:50:56 +00:00
depristo a837a49328 Minor fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5988 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 20:50:34 +00:00
hanna e0ed30681e If data is not available, use R-compatible 'NA' string.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5987 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:53:38 +00:00
rpoplin db43e3f1ab Fixing an apparent parenthesis matching problem
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5986 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:52:14 +00:00
hanna 52f930d708 Bug fix.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5985 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:48:55 +00:00
hanna 1d1c9da783 First pass at a script that generates per-sample metrics from a pipeline yaml
input file.  Output is an R-parseable tsv.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5984 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:38:21 +00:00
droazen 44a29680bf Explicitly marked the updated tribble jar added in r5982 as binary
(Oh yes, there was a r5982, in case you were wondering. It was the first
tentative git -> svn commit, and just added an updated tribble jar. It went great except for the fact that svn didn't mark the jar as binary, causing a textual diff for 500k of binary data to be generated in the notification email, cause Gsa_svn_list to very probably choke on the notification email rather than deliver it. Now let us never speak of r5982 again...) 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5983 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:37:48 +00:00
droazen 480598842c Updated the tribble jar
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5982 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:00:09 +00:00
depristo 14a358e5e8 Oops, forgot one tiny thing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5981 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 17:12:42 +00:00
depristo 165befd38a V1 of the post processing QC plotting scala script and R function. The scala script runs VariantEval on a VCF file, and computes QC metrics. The R script generates the report. Will discuss usage with data processing group. Ryan -- please add your additional plotting routines to this script, as you see fit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5980 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 17:06:42 +00:00
rpoplin 3534f412c9 Better error message for the case of input variants found in ApplyRecalibration that were never seen during VariantRecalibrator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5979 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 14:45:28 +00:00
rpoplin 6231bba288 Bug fix for mergeInfoWithMaxAC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5978 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-12 20:10:16 +00:00
ebanks 1f4469976e Made into UserException with better error message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5977 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-12 03:38:52 +00:00
carneiro 95f3da1126 limiting the number of reads in memory for the SamValidateFile.jar
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5976 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 20:14:30 +00:00
ebanks 077862958d Oops, forgot to define the hg19 variable
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5975 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 18:26:48 +00:00
rpoplin 0d6ce91614 When running CombineVariants with -mergeInfoWithMaxAC the set field will be added appropriately
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5974 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 14:35:48 +00:00
delangel f8ffda6835 a) Hidden, experimental argument to UnifiedGenotyper that makes code, when in GenotypeGivenAlleles mode, ignore SNP alleles mixed in with indels in complex records - theory is that SNP sites behave statistically differently when doing VQSR so those alleles/sites should be treated separately.
b) Bug fix: multiallelic indel records where not being treated properly by VQSR because vc.isIndel() returns false with them. Correct general treatment for now is to do (vc.isIndel()||vc.isMixed()).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5973 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 19:19:23 +00:00
rpoplin 17e17d3c3c Misc cleanup in VQSR.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5972 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 18:37:37 +00:00
depristo e87c40d89c Fix for CoFoJa exception by upgrading to latest version
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5971 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:49:15 +00:00
depristo ac3620839c Very basic intergration tests for ReducedReads, to allow safe optimization of the code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5970 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:06:32 +00:00
rpoplin 895e86c544 Annotations used to build the 1000G consensus callsets are now standard annotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5969 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:03:39 +00:00
hanna 6c4f2f1b36 Temporarily disable contracts during integrationtest, take 2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5968 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 16:36:31 +00:00
hanna 44b98bed8c Killed sonatype repository; it's failed me too many times at this point.
Temporarily disabled contracts in integrationtests until we can find the cause
of the new error that's cropping up for Ryan.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5967 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 16:07:02 +00:00
depristo 93d6e17762 Final, documented version of CalibrateGenotypeLikelihoods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5966 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 20:22:28 +00:00
depristo 44287ea8dc ReducedBAM changes to downsample to a fixed coverage over the variable regions. Evaluation script now has filters and eval. commands.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5965 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 19:36:08 +00:00
hanna fbb68ae94c (Hopefully) short-lived script to rework the directory structure from core /
playground / oneoffs to public / private.  Currently implemented as an svn ->
svn merge, but will have to be tweaked to do a proper svn -> git merge.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5964 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 19:18:22 +00:00
kiran 49b021d435 Changed the definition of degeneracy (it's at the site level - degeneracy of a position in a codon, not degeneracy of the amino acid itself like I initially thought. Added the ability to supply an ancestral allele track (available in /humgen/gsa-hpprojects/GATK/data/Ancestor/).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5963 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 15:07:31 +00:00
kshakir d784dac495 samtools merge requires indexed files, so added them as implicit inputs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5962 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 15:02:51 +00:00
depristo a331e13721 Slightly more extensive test includes a 0/0 site to genotype
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5961 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 14:48:55 +00:00
chartl d035d8eb7b Updating the bam list is a bit trickier than most of us originally thought. Need to ensure that *3* files exist: the .bam, the .bai, and the finished.txt (or else bad things can happen)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5960 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 14:42:31 +00:00
depristo 0f43b10c39 Optimization in CombineVariants when merging into a sites_only VCF
VariantContextUtils now was a utility function that creates a sitesOnlyVariantContext from an input VC
Add complex merge test of SNPs and indels from the new batch merge wiki in :

http://www.broadinstitute.org/gsa/wiki/index.php/Merging_batched_call_sets

with multiple alleles for an indel.  Created a BatchMergeIntegrationTest that uses GGA with the complex merged input alleles to genotype SNPs and Indels with multiple alleles simultaneously in NA12878.  Looks great.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5959 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 14:31:46 +00:00
delangel 1d6486a28f First part of fix for correctly processing mixed multi-allelic records: correctly compute start/stop of vc when there are no null alleles (i.e. record is not a simple indel).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5958 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 13:36:18 +00:00
delangel d27800e07c a) Forgot to commit this ages ago: uncomment code to ignore hard clipped bases when computing indel likelihoods. b) First part of fix for correctly processing mixed multi-allelic records: correctly compute start/stop of vc when there are no null alleles (i.e. record is not a simple indel).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5957 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 11:28:17 +00:00
hanna ad97099df6 Getting rid of a few extra, very explicit qualifications so that the public/
private bifurcation script doesn't have to discover them.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5956 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 03:08:47 +00:00
ebanks bb6c0db783 We found the cause of the inconsistency. Woo hoo!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5955 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-07 15:13:58 +00:00
hanna ca48ea78df At Picard team's request, generate md5s for generated BAM files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5954 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-07 04:25:40 +00:00
depristo 311dfa0998 Now builds examples, as I expected. GATKPaperGenotyper lives again.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5953 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-07 00:13:44 +00:00
alecw 2901abf070 Switch from PriorityQueue to TreeSet for better and more consistent performance.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5952 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-06 20:41:30 +00:00