depristo
907018768c
Now uses AC directly from eval, not via AF, internally for AC vs. X plotting. Requires at least 1 SNP to include a site in TiTv plotting or snp/indel ratio. Uses .byAC not .byAF eval file now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6014 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-19 03:05:17 +00:00
asivache
64196b6c7a
Writer implementation that can dispatch reads to maltiple underlying bam files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6013 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-17 20:44:17 +00:00
depristo
1afd24c831
SliceBams now handles properly the case where multiple read groups clash in the input BAM files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6012 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-17 20:19:19 +00:00
fromer
03a0185566
Control unscattered output file location
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6011 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-17 15:53:25 +00:00
depristo
285da580f3
Now with dbSNP rate
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6010 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-17 13:21:40 +00:00
ebanks
dd1d9cd76f
Forgot to deprecate the old args
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6009 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 17:54:44 +00:00
ebanks
4e85416af1
[Foiled yet again when trying to do this in git] Slight modifications in the argument structure for the IndelRealigner. Instead of boolean flags -knownsOnly and -doNotUseSW, we now have an enum --consensusDeterminationModel which lets you specify knowns only, also use indels in reads, or also use SW. Please note that the default behavior of IR has not changed at all (and won't for a few more days) - that'll be done in GIT (fingers crossed).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6008 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 17:35:37 +00:00
depristo
4304fc4862
Fixed up md5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6007 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 16:20:41 +00:00
depristo
27d4b317fc
Simple program that calls indels in CEU trio exomes and WGS can compared the results. Overall the indel calls really look good to me, given reasonably good input BAM files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6006 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:56:04 +00:00
depristo
43fdd31e20
Significant performance optimization for reduced reads due to better algorithm for including reads in the variable regions. Fixed a critical bug that actually produced multiple copies of the same read in the variable regions with this optimization as well. Scala exploration script updated as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6005 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:54:59 +00:00
depristo
38d7733989
Now accepts any number of VCFs to evaluate. Runs the standard (now three) variant eval commands and invokes the exomeQC R script. Has some annoying assumptions about paths encoded right now. Example usage below:
...
setenv DATA ~/Desktop/broadLocal/localData/
java -Djava.io.tmpdir=tmp -jar ../dist/Queue.jar -S ../scala/qscript/oneoffs/depristo/ExomePostQCEval.scala --gatkjarfile ../dist/GenomeAnalysisTK.jar -R $DATA/human_g1k_v37.fasta $* -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch1.vcf -intervals ~/Desktop/broadLocal/localData/whole_exome_agilent_1.1_refseq_plus_3_boosters.Homo_sapiens_assembly19.targets.interval_list -dbSNP ~/Desktop/broadLocal/localData/dbsnp_132_b37.vcf -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch2.vcf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6004 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:49:54 +00:00
depristo
9254faa27e
Added density plots by sample for each metric. New command line argument ordering. No longer requires the per-sample.tsv suppl. data -- will conditionally load if available
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6003 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:46:29 +00:00
fromer
b4c30bf124
Added option of minMappingQuality
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6002 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 00:02:26 +00:00
depristo
ce4e8d2093
A few comments / todos.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6001 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-15 13:09:09 +00:00
rpoplin
d7430c23f8
Bringing VQSR up to date with the 1000G v2b changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6000 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 20:23:43 +00:00
asivache
04ecbf10ab
Fixes the constraint-generated error about stop being less than start in GenomeLocParser.createGenomeLoc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5999 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 17:44:11 +00:00
hanna
14d7ee073b
Rev Picard to get new PF_INDEL_RATE metric. Rev preQC generator script
...
to incorporate PF_INDEL_RATE.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5998 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 17:43:43 +00:00
ebanks
5be4f31515
Surprisingly, the TileCovariate was indeed covered in integration tests. Updated.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5997 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 17:40:23 +00:00
rpoplin
6f7c4d1142
Removing exomePostQC.R
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5996 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 16:10:34 +00:00
hanna
7aec71f0e1
Add some very simple documentation on running and modifying the per-sample
...
metrics generator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5995 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 15:22:17 +00:00
hanna
cde2b409a7
Oops. Failed to add DbSnpMatchMetrics to Picard private jar.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5994 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 15:14:19 +00:00
ebanks
d00d4fd4d6
Obsolete covariate class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5993 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 14:11:47 +00:00
hanna
11eb74e44f
Request from Kiran: include PCT_TARGET_BASES_2X,PCT_TARGET_BASES_10X,
...
PCT_TARGET_BASES_20X,PCT_TARGET_BASES_30X into pre-QC metrics.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5992 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 01:12:17 +00:00
hanna
1fec811a47
Updated input to accept BAM list, and output to emit proper sample name.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5991 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 00:15:01 +00:00
hanna
1b1aefc385
Move fingerprinting metrics reader into our Picard private extract.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5990 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 00:13:10 +00:00
depristo
85e20be7b7
Renamed. More general
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5989 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 20:50:56 +00:00
depristo
a837a49328
Minor fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5988 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 20:50:34 +00:00
hanna
e0ed30681e
If data is not available, use R-compatible 'NA' string.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5987 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:53:38 +00:00
rpoplin
db43e3f1ab
Fixing an apparent parenthesis matching problem
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5986 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:52:14 +00:00
hanna
52f930d708
Bug fix.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5985 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:48:55 +00:00
hanna
1d1c9da783
First pass at a script that generates per-sample metrics from a pipeline yaml
...
input file. Output is an R-parseable tsv.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5984 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:38:21 +00:00
droazen
44a29680bf
Explicitly marked the updated tribble jar added in r5982 as binary
...
(Oh yes, there was a r5982, in case you were wondering. It was the first
tentative git -> svn commit, and just added an updated tribble jar. It went great except for the fact that svn didn't mark the jar as binary, causing a textual diff for 500k of binary data to be generated in the notification email, cause Gsa_svn_list to very probably choke on the notification email rather than deliver it. Now let us never speak of r5982 again...)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5983 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:37:48 +00:00
droazen
480598842c
Updated the tribble jar
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5982 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:00:09 +00:00
depristo
14a358e5e8
Oops, forgot one tiny thing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5981 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 17:12:42 +00:00
depristo
165befd38a
V1 of the post processing QC plotting scala script and R function. The scala script runs VariantEval on a VCF file, and computes QC metrics. The R script generates the report. Will discuss usage with data processing group. Ryan -- please add your additional plotting routines to this script, as you see fit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5980 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 17:06:42 +00:00
rpoplin
3534f412c9
Better error message for the case of input variants found in ApplyRecalibration that were never seen during VariantRecalibrator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5979 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 14:45:28 +00:00
rpoplin
6231bba288
Bug fix for mergeInfoWithMaxAC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5978 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-12 20:10:16 +00:00
ebanks
1f4469976e
Made into UserException with better error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5977 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-12 03:38:52 +00:00
carneiro
95f3da1126
limiting the number of reads in memory for the SamValidateFile.jar
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5976 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 20:14:30 +00:00
ebanks
077862958d
Oops, forgot to define the hg19 variable
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5975 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 18:26:48 +00:00
rpoplin
0d6ce91614
When running CombineVariants with -mergeInfoWithMaxAC the set field will be added appropriately
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5974 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 14:35:48 +00:00
delangel
f8ffda6835
a) Hidden, experimental argument to UnifiedGenotyper that makes code, when in GenotypeGivenAlleles mode, ignore SNP alleles mixed in with indels in complex records - theory is that SNP sites behave statistically differently when doing VQSR so those alleles/sites should be treated separately.
...
b) Bug fix: multiallelic indel records where not being treated properly by VQSR because vc.isIndel() returns false with them. Correct general treatment for now is to do (vc.isIndel()||vc.isMixed()).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5973 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 19:19:23 +00:00
rpoplin
17e17d3c3c
Misc cleanup in VQSR.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5972 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 18:37:37 +00:00
depristo
e87c40d89c
Fix for CoFoJa exception by upgrading to latest version
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5971 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:49:15 +00:00
depristo
ac3620839c
Very basic intergration tests for ReducedReads, to allow safe optimization of the code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5970 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:06:32 +00:00
rpoplin
895e86c544
Annotations used to build the 1000G consensus callsets are now standard annotations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5969 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 17:03:39 +00:00
hanna
6c4f2f1b36
Temporarily disable contracts during integrationtest, take 2.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5968 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 16:36:31 +00:00
hanna
44b98bed8c
Killed sonatype repository; it's failed me too many times at this point.
...
Temporarily disabled contracts in integrationtests until we can find the cause
of the new error that's cropping up for Ryan.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5967 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-09 16:07:02 +00:00
depristo
93d6e17762
Final, documented version of CalibrateGenotypeLikelihoods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5966 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 20:22:28 +00:00
depristo
44287ea8dc
ReducedBAM changes to downsample to a fixed coverage over the variable regions. Evaluation script now has filters and eval. commands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5965 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 19:36:08 +00:00