hanna
11eb74e44f
Request from Kiran: include PCT_TARGET_BASES_2X,PCT_TARGET_BASES_10X,
...
PCT_TARGET_BASES_20X,PCT_TARGET_BASES_30X into pre-QC metrics.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5992 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 01:12:17 +00:00
hanna
1fec811a47
Updated input to accept BAM list, and output to emit proper sample name.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5991 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 00:15:01 +00:00
hanna
1b1aefc385
Move fingerprinting metrics reader into our Picard private extract.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5990 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 00:13:10 +00:00
depristo
85e20be7b7
Renamed. More general
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5989 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 20:50:56 +00:00
depristo
a837a49328
Minor fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5988 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 20:50:34 +00:00
hanna
e0ed30681e
If data is not available, use R-compatible 'NA' string.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5987 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:53:38 +00:00
rpoplin
db43e3f1ab
Fixing an apparent parenthesis matching problem
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5986 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:52:14 +00:00
hanna
52f930d708
Bug fix.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5985 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:48:55 +00:00
hanna
1d1c9da783
First pass at a script that generates per-sample metrics from a pipeline yaml
...
input file. Output is an R-parseable tsv.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5984 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:38:21 +00:00
droazen
44a29680bf
Explicitly marked the updated tribble jar added in r5982 as binary
...
(Oh yes, there was a r5982, in case you were wondering. It was the first
tentative git -> svn commit, and just added an updated tribble jar. It went great except for the fact that svn didn't mark the jar as binary, causing a textual diff for 500k of binary data to be generated in the notification email, cause Gsa_svn_list to very probably choke on the notification email rather than deliver it. Now let us never speak of r5982 again...)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5983 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:37:48 +00:00
droazen
480598842c
Updated the tribble jar
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5982 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:00:09 +00:00
depristo
14a358e5e8
Oops, forgot one tiny thing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5981 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 17:12:42 +00:00
depristo
165befd38a
V1 of the post processing QC plotting scala script and R function. The scala script runs VariantEval on a VCF file, and computes QC metrics. The R script generates the report. Will discuss usage with data processing group. Ryan -- please add your additional plotting routines to this script, as you see fit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5980 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 17:06:42 +00:00
rpoplin
3534f412c9
Better error message for the case of input variants found in ApplyRecalibration that were never seen during VariantRecalibrator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5979 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 14:45:28 +00:00
rpoplin
6231bba288
Bug fix for mergeInfoWithMaxAC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5978 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-12 20:10:16 +00:00
ebanks
1f4469976e
Made into UserException with better error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5977 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-12 03:38:52 +00:00
carneiro
95f3da1126
limiting the number of reads in memory for the SamValidateFile.jar
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5976 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 20:14:30 +00:00
ebanks
077862958d
Oops, forgot to define the hg19 variable
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5975 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 18:26:48 +00:00
rpoplin
0d6ce91614
When running CombineVariants with -mergeInfoWithMaxAC the set field will be added appropriately
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5974 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 14:35:48 +00:00
delangel
f8ffda6835
a) Hidden, experimental argument to UnifiedGenotyper that makes code, when in GenotypeGivenAlleles mode, ignore SNP alleles mixed in with indels in complex records - theory is that SNP sites behave statistically differently when doing VQSR so those alleles/sites should be treated separately.
...
b) Bug fix: multiallelic indel records where not being treated properly by VQSR because vc.isIndel() returns false with them. Correct general treatment for now is to do (vc.isIndel()||vc.isMixed()).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5973 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 19:19:23 +00:00
rpoplin
17e17d3c3c
Misc cleanup in VQSR.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5972 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 18:37:37 +00:00
depristo
e87c40d89c
Fix for CoFoJa exception by upgrading to latest version
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5971 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:49:15 +00:00
depristo
ac3620839c
Very basic intergration tests for ReducedReads, to allow safe optimization of the code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5970 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:06:32 +00:00
rpoplin
895e86c544
Annotations used to build the 1000G consensus callsets are now standard annotations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5969 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:03:39 +00:00
hanna
6c4f2f1b36
Temporarily disable contracts during integrationtest, take 2.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5968 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 16:36:31 +00:00
hanna
44b98bed8c
Killed sonatype repository; it's failed me too many times at this point.
...
Temporarily disabled contracts in integrationtests until we can find the cause
of the new error that's cropping up for Ryan.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5967 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 16:07:02 +00:00
depristo
93d6e17762
Final, documented version of CalibrateGenotypeLikelihoods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5966 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 20:22:28 +00:00
depristo
44287ea8dc
ReducedBAM changes to downsample to a fixed coverage over the variable regions. Evaluation script now has filters and eval. commands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5965 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 19:36:08 +00:00
hanna
fbb68ae94c
(Hopefully) short-lived script to rework the directory structure from core /
...
playground / oneoffs to public / private. Currently implemented as an svn ->
svn merge, but will have to be tweaked to do a proper svn -> git merge.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5964 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 19:18:22 +00:00
kiran
49b021d435
Changed the definition of degeneracy (it's at the site level - degeneracy of a position in a codon, not degeneracy of the amino acid itself like I initially thought. Added the ability to supply an ancestral allele track (available in /humgen/gsa-hpprojects/GATK/data/Ancestor/).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5963 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 15:07:31 +00:00
kshakir
d784dac495
samtools merge requires indexed files, so added them as implicit inputs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5962 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 15:02:51 +00:00
depristo
a331e13721
Slightly more extensive test includes a 0/0 site to genotype
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5961 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 14:48:55 +00:00
chartl
d035d8eb7b
Updating the bam list is a bit trickier than most of us originally thought. Need to ensure that *3* files exist: the .bam, the .bai, and the finished.txt (or else bad things can happen)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5960 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 14:42:31 +00:00
depristo
0f43b10c39
Optimization in CombineVariants when merging into a sites_only VCF
...
VariantContextUtils now was a utility function that creates a sitesOnlyVariantContext from an input VC
Add complex merge test of SNPs and indels from the new batch merge wiki in :
http://www.broadinstitute.org/gsa/wiki/index.php/Merging_batched_call_sets
with multiple alleles for an indel. Created a BatchMergeIntegrationTest that uses GGA with the complex merged input alleles to genotype SNPs and Indels with multiple alleles simultaneously in NA12878. Looks great.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5959 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 14:31:46 +00:00
delangel
1d6486a28f
First part of fix for correctly processing mixed multi-allelic records: correctly compute start/stop of vc when there are no null alleles (i.e. record is not a simple indel).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5958 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 13:36:18 +00:00
delangel
d27800e07c
a) Forgot to commit this ages ago: uncomment code to ignore hard clipped bases when computing indel likelihoods. b) First part of fix for correctly processing mixed multi-allelic records: correctly compute start/stop of vc when there are no null alleles (i.e. record is not a simple indel).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5957 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 11:28:17 +00:00
hanna
ad97099df6
Getting rid of a few extra, very explicit qualifications so that the public/
...
private bifurcation script doesn't have to discover them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5956 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 03:08:47 +00:00
ebanks
bb6c0db783
We found the cause of the inconsistency. Woo hoo!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5955 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-07 15:13:58 +00:00
hanna
ca48ea78df
At Picard team's request, generate md5s for generated BAM files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5954 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-07 04:25:40 +00:00
depristo
311dfa0998
Now builds examples, as I expected. GATKPaperGenotyper lives again.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5953 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-07 00:13:44 +00:00
alecw
2901abf070
Switch from PriorityQueue to TreeSet for better and more consistent performance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5952 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-06 20:41:30 +00:00
delangel
78f5309656
Intermediate commit of indel consensus VQSR script, a couple of new features added, not for general use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5951 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-06 13:27:02 +00:00
ebanks
2c57721ed2
Updated printouts to help with debugging. Issue does appear to be deterministic though.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5950 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-06 01:04:07 +00:00
ebanks
27dfb53d26
We really don't want to be advising the user to use an unsafe option - really, they should fix their busted bam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5949 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-05 05:18:16 +00:00
delangel
7e49e1668f
Finished changing md5's due to recent change in definition of mixed and indel vc's.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5948 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-05 00:40:51 +00:00
delangel
d534241f35
Major revamp of annotations for indels:
...
a) All rank sum tests now work for indels including multiallelic sites. For the latter cases, rank sum test is REF vs most common allele
b) Redid computation of HaplotypeScore for indels. It's now trivially easy to do because we are already computing likelihoods of each read vs haplotypes in GL computation so we reuse that if available. For multiallelic case, we score against N haplotypes where N is total called alleles.
Drawback is that all cases need information contained in likelihood table that stores likelihood for each pileup element, for each allele. If this table is not available we dont annotate, so we can only fully annotate indels right now when running UG but not when running VariantAnnotator alone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5947 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-04 15:34:24 +00:00
delangel
1448a1f155
Change md5 because conversion of a tri-allelic dbsnp indel record is now legit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5946 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-04 11:24:16 +00:00
delangel
53667ce8fa
Disabled test that checks whether output is the same whether in Genotype Given Alleles mode or not - it won't as long as extended events are finally removed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5945 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-04 00:52:54 +00:00
delangel
35df80de14
Updated md5 due to changes to changes in QUAL field when in Genotype given alleles mode w/indels when in insertions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5944 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-03 23:52:38 +00:00
ebanks
b93829e505
The underlying bam file for this test was busted for many reasons preventing Picard folks from making unrelated changes, so I needed to fix it. Updating md5s accordingly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5943 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-03 22:26:06 +00:00