Commit Graph

5711 Commits (2e1c09c03b32fd4e80237f009a078623452c6565)

Author SHA1 Message Date
corin 2e1c09c03b Updated tearsheet drop
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5752 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:47:47 +00:00
corin f386cad58c Updated Tearsheet with by sample QC metrics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5751 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:47:26 +00:00
rpoplin 6c7a0adc76 Updating VariantGaussianMixtureModelUnitTest to use truth sensitivity cutting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5750 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 13:56:01 +00:00
kshakir 08f0509a5c Disabling the queue/pipeline package by default so that scala code can build. If it's not going to be fixed the package should be removed. If it is going to be fixed this patch to build.xml should be reverted.
Also added the old model of indel calling to the FCP.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5749 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 12:17:33 +00:00
delangel a19389528d Bring back from the dead the old likelihoods model for indels, which has worse performance but is about 4x faster. Enabled with argument -GSA_PRODUCTION_ONLY in UG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5748 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 22:38:33 +00:00
carneiro f35d955490 recalibrates a dataset splitting between good and bad regions for comparison (used to be named justRecalibrate)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5747 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:43:09 +00:00
carneiro 9f2a8033ff just recalibrates now recalibrates one sample, fully, not splitting intervals (naming makes more sense)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5746 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:42:23 +00:00
carneiro c2f8536e02 removing old GATK options
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5745 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:40:39 +00:00
carneiro 8bb92160b5 Script to identify mendelian violations in the CEU Trio and follow up with supposedly incorrect SNP calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5744 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:19:42 +00:00
carneiro e2b9227d8d script to test BQSR on good/bad regions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5743 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:16:37 +00:00
carneiro e5cc0f4eec Added 'specificity' to variant eval's Validation Report evaluator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5742 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:48:30 +00:00
rpoplin b88dec387c clean up from VQSR movement
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5741 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:35:30 +00:00
rpoplin 23cd3a7a5d Moving VQSR v2 to core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5740 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:20:06 +00:00
rpoplin 44a717f63a Good bye VQSR v1. This commit will break the build.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5739 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:09:52 +00:00
hanna 2dacf1b2b2 Better header support when running R's read.table(...,header=T).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5738 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:56:20 +00:00
hanna ad8c786b2d Now more easily R-parseable.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5737 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:30:50 +00:00
rpoplin 5bade81c6d Adding tranche plot generation back to VQSR
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5736 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:26:26 +00:00
rpoplin e73720c2db Updating VQSLOD annotation description
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5735 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:01:08 +00:00
rpoplin 11052918d9 Better exception text for common error in VQSR.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5734 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 18:37:25 +00:00
rpoplin 4bbce42861 Renaming ContrastiveRecalibrator --> VariantRecalibrator in preparation for move to core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5733 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 18:12:47 +00:00
rpoplin 6323fb8673 misc cleanup in VQSR
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5732 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 18:00:22 +00:00
hanna f3bd11a02e Dress up some formatting issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5731 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 17:35:18 +00:00
hanna 9c809ed68e A walker to analyze the memory consumption of reference, reads, and RODs at
each base both in bytes and as a percentage of the used heap size.  

May be a bit buggy at this point; there are a lot of metrics around the Java
heap and I'm not completely sure that the metrics I'm outputting are exactly
the ones that I'm looking for. 

Also fixed a documentation bug in my Sizeof class.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5730 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 17:08:15 +00:00
ebanks d4cbd8691c Make the default that we only output SNPs (so that when I make another release we don't get flooded with questions about why the UG is all of a sudden so slow)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5729 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 16:38:55 +00:00
rpoplin 70f8ab6f89 Adding AF bin stratification for VariantEval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5728 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 15:22:50 +00:00
hanna 870e65a685 Fixing a build failure because I want to be completely sure that the code I
checked in immediately following the build breaking code passes integration
tests.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5727 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 02:09:53 +00:00
hanna 411980a50a Performance enhancements in GATKBAMIndex. Not sure these will assist in a
normal use case, but they cut startup times and memory allocation noise in
the profiler, making my profiling time more productive.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5726 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 20:48:16 +00:00
delangel 422d4ceeea removed useless file - no need for tableRecalibration, right now everything is done in PairHMMIndelErrorModel
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5725 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 20:35:44 +00:00
delangel 2a80ffa2ee Totally experimental, barely useable not to be used yet implementation of an "Indel Quality Recalibrator" Idea is that any indel that's not in input dbsnp is treated as an artifact, and then a csv is built with # of indels and # of observations as a function of each input covariate (initially, only cycle, read group and homopolymer run are useful). Then, when computing likelihoods of indels based on input haplotypes we compute gap penalties based on value of covariates at read. Feature is disabled by default with hidden arguments. TBD if usefulness of feature is worth the extra time and pain.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5724 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 20:31:43 +00:00
rpoplin 3224bbe750 New visualization output for VQSR. It creates the R script file on the fly and then runs Rscript on it. Adding 1000G Project consensus code. First pass of having VQSR work with missing data by marginalizing over the missing dimension for that data point (thanks Chris and Bob for ideas). Updated math functions to use apache math commons instead of approximations from wikipedia. New parameters available for the priors based on further reading in Bishop and looking at the new visualizations. Updated integration test to use more modern files. Updated MDCP to use new best practices w.r.t. annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5723 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 19:14:42 +00:00
ebanks fcf8cff64a We didn't actually support all of these extensions. Updated to be accurate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5722 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 19:03:46 +00:00
carneiro a93a9ac663 adding gold standard (full coverage) to the variant eval analysis output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5721 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 16:29:11 +00:00
corin f76e2791a9 Update to work with latest eval format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5720 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 13:44:50 +00:00
corin 139ae79d9e Update to work with latest eval format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5719 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 13:21:23 +00:00
kshakir 2d81262f87 Fixed a bug where empty intervals were being scattered zero ways parallel. Would be awesome to use the GAE at some point.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5718 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 22:42:48 +00:00
carneiro 2384e23274 Added the capability of running count covariates only on a given interval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5717 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 21:30:14 +00:00
carneiro 34092fd32f minor update...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5716 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 21:29:01 +00:00
carneiro 36ac8beee1 Making the GATK unpredictably random...
through an option! 

set -ndrs if you want the GATK to be really random (non-deterministic). Engine option, available to every walker.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5715 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 19:29:08 +00:00
carneiro f97e7d2fb4 Walker that calculates the percentage of bases that are covered to at least 20x. Very useful! In oneoffs until someone else thinks it's as useful as I think it is ;)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5714 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 19:19:39 +00:00
carneiro 3868a7e778 Oneoff project to downsample, bootstrap and call snps to test sensitivity/specificity of downsampled coverage in WEX projects.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5713 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 19:17:30 +00:00
ebanks deed7c47a1 Continuing the epic fail, some of our existing integration tests were wrong because of the lazy loading failure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5712 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 17:54:41 +00:00
ebanks ab9ffb1a74 Epic failure on the lazy loading of genotypes: if the input VCF had its samples unsorted and we used a walker that didn't require genotypes, then we would sort the samples but not load genotypes (and therefore the genotypes wouldn't match the samples anymore). Added simple integration test to cover this case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5711 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 16:03:45 +00:00
hanna 96571b55be Disable caching of ReadShards by the GenomeLocProcessingTracker (at least
temporarily).  Unfortunately this does not completely fix the IndelRealigner 
exception that Ryan is seeing, but it helps things quite a bit.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5710 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 13:59:34 +00:00
carneiro a5b96e0e04 I have to remember that this is Java, not C.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5709 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 17:40:14 +00:00
carneiro f04cc4321f fixed a bug when the pipeline was used on a single bam.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5708 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 17:19:22 +00:00
rpoplin b7334dcc1e Rank sum test annotations are the Z-scores from the test instead of the p-value.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5707 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 14:35:00 +00:00
ebanks 45081c32d7 continuing from last night, the integration tests weren't covering the right behavior either
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5706 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 13:30:57 +00:00
ebanks f34e6d5b8c Somewhere along the way someone broke this tool and failed to update the documentation to boot. Fixing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5705 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 03:16:20 +00:00
ebanks ae8f3f2cde Check for bad reference bases before creating simple/'empty' VCs. Updated the code in the indel GL model to be consistent and to use the existing utility in the Allele class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5704 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 23:55:20 +00:00
depristo 122d5845d3 GATK Resource bundle, latest version (now with b37 -> b36 support). Oneoff scala script that assesses chip coverage of calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5703 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 22:01:36 +00:00