rpoplin
44a717f63a
Good bye VQSR v1. This commit will break the build.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5739 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:09:52 +00:00
hanna
2dacf1b2b2
Better header support when running R's read.table(...,header=T).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5738 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:56:20 +00:00
hanna
ad8c786b2d
Now more easily R-parseable.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5737 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:30:50 +00:00
rpoplin
5bade81c6d
Adding tranche plot generation back to VQSR
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5736 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:26:26 +00:00
rpoplin
e73720c2db
Updating VQSLOD annotation description
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5735 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:01:08 +00:00
rpoplin
11052918d9
Better exception text for common error in VQSR.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5734 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 18:37:25 +00:00
rpoplin
4bbce42861
Renaming ContrastiveRecalibrator --> VariantRecalibrator in preparation for move to core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5733 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 18:12:47 +00:00
rpoplin
6323fb8673
misc cleanup in VQSR
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5732 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 18:00:22 +00:00
hanna
f3bd11a02e
Dress up some formatting issues.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5731 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 17:35:18 +00:00
hanna
9c809ed68e
A walker to analyze the memory consumption of reference, reads, and RODs at
...
each base both in bytes and as a percentage of the used heap size.
May be a bit buggy at this point; there are a lot of metrics around the Java
heap and I'm not completely sure that the metrics I'm outputting are exactly
the ones that I'm looking for.
Also fixed a documentation bug in my Sizeof class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5730 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 17:08:15 +00:00
ebanks
d4cbd8691c
Make the default that we only output SNPs (so that when I make another release we don't get flooded with questions about why the UG is all of a sudden so slow)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5729 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 16:38:55 +00:00
rpoplin
70f8ab6f89
Adding AF bin stratification for VariantEval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5728 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 15:22:50 +00:00
hanna
870e65a685
Fixing a build failure because I want to be completely sure that the code I
...
checked in immediately following the build breaking code passes integration
tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5727 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 02:09:53 +00:00
hanna
411980a50a
Performance enhancements in GATKBAMIndex. Not sure these will assist in a
...
normal use case, but they cut startup times and memory allocation noise in
the profiler, making my profiling time more productive.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5726 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 20:48:16 +00:00
delangel
422d4ceeea
removed useless file - no need for tableRecalibration, right now everything is done in PairHMMIndelErrorModel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5725 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 20:35:44 +00:00
delangel
2a80ffa2ee
Totally experimental, barely useable not to be used yet implementation of an "Indel Quality Recalibrator" Idea is that any indel that's not in input dbsnp is treated as an artifact, and then a csv is built with # of indels and # of observations as a function of each input covariate (initially, only cycle, read group and homopolymer run are useful). Then, when computing likelihoods of indels based on input haplotypes we compute gap penalties based on value of covariates at read. Feature is disabled by default with hidden arguments. TBD if usefulness of feature is worth the extra time and pain.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5724 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 20:31:43 +00:00
rpoplin
3224bbe750
New visualization output for VQSR. It creates the R script file on the fly and then runs Rscript on it. Adding 1000G Project consensus code. First pass of having VQSR work with missing data by marginalizing over the missing dimension for that data point (thanks Chris and Bob for ideas). Updated math functions to use apache math commons instead of approximations from wikipedia. New parameters available for the priors based on further reading in Bishop and looking at the new visualizations. Updated integration test to use more modern files. Updated MDCP to use new best practices w.r.t. annotations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5723 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 19:14:42 +00:00
ebanks
fcf8cff64a
We didn't actually support all of these extensions. Updated to be accurate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5722 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 19:03:46 +00:00
carneiro
34092fd32f
minor update...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5716 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 21:29:01 +00:00
carneiro
36ac8beee1
Making the GATK unpredictably random...
...
through an option!
set -ndrs if you want the GATK to be really random (non-deterministic). Engine option, available to every walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5715 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 19:29:08 +00:00
carneiro
f97e7d2fb4
Walker that calculates the percentage of bases that are covered to at least 20x. Very useful! In oneoffs until someone else thinks it's as useful as I think it is ;)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5714 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 19:19:39 +00:00
ebanks
deed7c47a1
Continuing the epic fail, some of our existing integration tests were wrong because of the lazy loading failure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5712 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 17:54:41 +00:00
ebanks
ab9ffb1a74
Epic failure on the lazy loading of genotypes: if the input VCF had its samples unsorted and we used a walker that didn't require genotypes, then we would sort the samples but not load genotypes (and therefore the genotypes wouldn't match the samples anymore). Added simple integration test to cover this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5711 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 16:03:45 +00:00
hanna
96571b55be
Disable caching of ReadShards by the GenomeLocProcessingTracker (at least
...
temporarily). Unfortunately this does not completely fix the IndelRealigner
exception that Ryan is seeing, but it helps things quite a bit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5710 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 13:59:34 +00:00
carneiro
a5b96e0e04
I have to remember that this is Java, not C.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5709 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 17:40:14 +00:00
rpoplin
b7334dcc1e
Rank sum test annotations are the Z-scores from the test instead of the p-value.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5707 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 14:35:00 +00:00
ebanks
45081c32d7
continuing from last night, the integration tests weren't covering the right behavior either
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5706 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 13:30:57 +00:00
ebanks
f34e6d5b8c
Somewhere along the way someone broke this tool and failed to update the documentation to boot. Fixing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5705 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 03:16:20 +00:00
ebanks
ae8f3f2cde
Check for bad reference bases before creating simple/'empty' VCs. Updated the code in the indel GL model to be consistent and to use the existing utility in the Allele class.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5704 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 23:55:20 +00:00
depristo
6cce3e00f3
A test walker that does consensus compression of deep read data sets.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5702 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 22:00:48 +00:00
rpoplin
3907377f37
When genotyping given alleles, for multiallelic sites we go back to the reads and use the alternate base with the highest sum of quality scores instead of taking the first alternate allele from the vcf file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5701 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 21:31:09 +00:00
droazen
6e9e766a71
The tighter interval validation wasn't interacting well with unmapped
...
intervals -- altered the validation methods to not throw an error for
unmapped intervals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5700 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 20:56:46 +00:00
hanna
6d5e45b5c6
Revbump Picard dependencies at Tim/Kathleen's request. Exclude anonymous
...
classes from PluginManager.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5699 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 20:38:05 +00:00
droazen
d650efd40a
Fix for bug GSA-449: Intervals that are not in GATK format are not validated
...
to the same standard as GATK format intervals. Full validation against contig
bounds is now performed for all intervals, regardless of their source. Also
fixed a few tests for validation exclusions that were backwards.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5698 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 18:12:10 +00:00
kshakir
df35a143b2
Removed -debug/--debug_mode.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5697 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 10:56:39 +00:00
hanna
27495a0c64
Killed quiet mode. Should probably kill debugMode as well, but Queue's using
...
it. Will check with Khalid tomorrow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5695 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 04:17:36 +00:00
hanna
f3dacd3c40
Use ByteBuffer.allocateDirect() instead of ByteBuffer.allocate().
...
ByteBuffer.allocateDirect() behaves like Java NIO MappedByteBuffers in that
it consumes address space, which counts against our virtual memory allocation;
but cannot be destroyed or otherwise freed. This was definitely contributing
to the LSF failures that I was seeing, but I'm not yet convinced that it's the
sole source of these virtual memory 'leaks'. More tomorrow as the results of
my whole exome tests start to roll in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5693 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 02:01:11 +00:00
chartl
7afeb1ab17
Removing broken imports (boo)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5692 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 18:55:25 +00:00
rpoplin
379f837e82
RankSum z-scores are looking quite good, so RIP Wilcoxon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5691 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 18:34:39 +00:00
chartl
bc3fd70b0a
Removing the old association walker, switching test to just validate that MannWhitneyU is doing the right thing. Unit tests still pass.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5690 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 18:05:19 +00:00
kshakir
f619dd3ca7
Refactored IntervalUtils used to parse and scatter intervals for Queue.
...
Scattering non-contig interval lists by number of loci in the intervals instead of just number of intervals.
Queue caches the list of locs and how to split them up instead of reloading them from disk repeatedly.
TODO: general purpose function to divide data evenly.
Skip over comments when parsing picard analysis files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5687 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 00:06:00 +00:00
hanna
57a4700299
Ported small BAM performance test suite to the Google Caliper microbenchmarking suite. Looks promising,
...
but I'm still not sure that GC is a good long-term solution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5683 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 22:09:17 +00:00
chartl
a56a2dfdb7
Nothing to see here. Move along.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5681 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 15:01:02 +00:00
delangel
600617a63c
Enabled code to deal with hard-clipping adaptor sequence when processing reads in pileup in indel caller. Proven now that changes are minimal (4 less calls in NA12878 chr20, quals slightly different), minor changes in vcf fields in integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5679 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 14:10:33 +00:00
chartl
88735a8c9b
Adding in a delta to try and better measure effect size -- equivalent to looking at the lower end of the N^th percentile confidence interval. Kind of a hacky way to add it in, the infrastructure is about due for a streamlining rewrite.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5676 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 03:53:33 +00:00
hanna
7428ae338a
A fix for Marian Thieme's NPE in the new sharding system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5675 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 19:47:14 +00:00
chartl
5b9a8555cd
Queue graph time is currently of O(n^m) where n = num jobs, m = num unique base files. This script therefore was running in order 1200^16, which I don't think would finish before the heat death of the universe. For now, push down the number of files to 1 and gather them outside of Queue, once I've fixed up scatter-gather in core, outputs can be uncommented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5674 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 12:56:25 +00:00
ebanks
cbcdfc584d
Moving out of core and into playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5671 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 02:30:22 +00:00
depristo
cc78027bd3
Two optimizations. Even more aggressive printProgress meter optimization to only even consider doing work once every 1000 cycles through the engine. Second, GenomeLocParser now uses a single indirection around the contigInfo variable. This class uses a last used cache to retrieve efficiently contig information instead of always returning to the underlying SAMSequenceDictionary hashmap to make genome locs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5670 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 01:31:26 +00:00
depristo
29857f5ba6
Fix for instability in output of fasta alternative reference maker when snpmask and snp files are provided and have overlapping records. The order of the records changed due to optimization of the refmetadatatracker, and uncovered this non-determinanism. Now preferrentially masks out includes sites from snps before considering masking out sites in snpmask
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5669 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 21:54:09 +00:00
kshakir
8619f49d20
Added a utility method to retrieve the contig lengths for WG chunking.
...
Added a rudimentary GATKReportParser for parsing VE3 results.
Re-enabled the FCPTest using VE3, the GATKRP, and the PicardAggregationUtils.
The tag type for .rod files is DBSNP, not ROD.
More explicit return types on implicit methods.
Added null checks for implicit string to/from file conversions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5668 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 19:22:21 +00:00
delangel
59dd79faab
One more optimization: don't use Math.round(), but do my own rouding/casting. UG now about 40% faster calling indels, 30-35% faster calling snp's+indels simultaneously.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5667 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 19:15:58 +00:00
delangel
246d8190b5
Round one of "easy" zero-effort optimizations to UG's indel caller. Mostly inline functions, avoid repeated computation and try to optimize SoftMaxPair() which is by far the bigest runtime hog. More to come...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5666 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 18:57:34 +00:00
depristo
a8f8077d7a
Simple optimizations for cases where there is no data or RODs at sites, such as with the FastaStats walker. private static immutable Lists and Maps in underlying data structures that have no associated data. Also, avoiding a double map.get() in the low-level genome loc parser. RefMetaDataTracker is now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5664 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 10:52:16 +00:00
hanna
54660a8c25
Fix requested by Lee Lichtenstein: first check to see whether it's time for
...
a progress message, then aggregate metrics. Makes the overhead of
printProgress in RealignerTargetCreator go from >20% to ~3%.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5663 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 03:22:48 +00:00
hanna
49550e257f
Fix for JamesP's issue. This issue appeared because of a design flaw in the
...
interface between SAMDataSource and IntervalSharder that needs to stay around
until the original BAM sharder is retired. Will add a JIRA to fix design
flaw.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5661 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-19 00:52:13 +00:00
depristo
541c9109b3
V1 of GATK Resource Bundling system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5659 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-18 19:23:45 +00:00
ebanks
673772a522
Catch samtools exceptions and make them 'BAM Exceptions' asking the user to run Picard's validator and re-index the file before posting anything to the forum. Let's see whether this helps or not.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5658 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-18 03:52:43 +00:00
ebanks
e97a5ca161
Rename 'verbose' argument to 'debug_file'.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5657 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-18 03:17:13 +00:00
chartl
e28fc21642
Spurious associations can develop from including ambiguous reads in these tests. Perhaps MQ0 reads shouldn't be used for anything except MQ0, but the best way to do that is to restructure the code, so for now I'll put it off.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5656 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-17 23:17:03 +00:00
ebanks
49ea07acce
My fixes to Tribble yesterday revealed that some of the test VCFs for integration tests were actually malformed. Also, Guillermo updated the b37 dbSNP VCF and that broke some tests. Should be good for now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5655 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-17 03:39:11 +00:00
chartl
e5ef8388fc
BatchMerge - AlleleVCF --> AllelesVCF, this (combined with Eric's fix) will solve James P.'s forum issue.
...
After viewing results on real case/control data from RAW -- it's really working quite well. ReadIndels, however, needs to use a T-test rather than a U-test, especially in deep coverage (at indel sites, the reads with indels will have mostly the same number of CIGAR indel elements -- one -- which doesn't really play nicely with the UTest when sample sets are large). Modified ReadsLargeInsertSize to be a two-way test (e.g. ReadsLarge and ReadsSmall). BaseQualityScore also suffers from the same issue as read indels, so switching over to a T-test in that case as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5653 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 22:03:16 +00:00
ebanks
1c32deb108
For some reason I wasn't allowing expressions to be used with the -all argument.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5652 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 20:59:10 +00:00
corin
2cf6a06503
Throwing an error if INFO fields arguments contain whitespace.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5651 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 20:52:55 +00:00
corin
fce6d25075
Moved the reference ID to a meta data field for validity declaration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5650 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 20:28:56 +00:00
corin
59215dab48
Now writes results to a minimal vcf with annotations included in the INFO field. Must be run with -NO_HEADER to totally remove header for the most bare bones vcf; otherwise also includes command line meta data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5649 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 20:14:02 +00:00
ebanks
fe26954ac6
Minimal support for reading in VCF4.1 files. Added TODOs that need to be fixed or cleaned up to truly support this version. VCF constants updated. Lower-case bases permitted. Please let's make sure to refactor once we're ready to support it for good.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5648 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 18:59:37 +00:00
ebanks
7e9051ea25
The solution to James's bug was just to clean up the code and simplify it. What happened was that functionality that got put into UGCalcLikelihoods was then generalized into the UG engine but then never removed from UGCalcLikelihoods. This knowingly breaks the batch merger, but Chris said he'll take care of it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5647 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 18:05:10 +00:00
hanna
0d7cca169e
Sigh.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5645 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 14:37:24 +00:00
hanna
0965020804
Screwed up the doc string.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5644 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 14:30:20 +00:00
hanna
be3bad1f61
Low-memory sharding is now enabled by default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5643 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 14:22:07 +00:00
ebanks
2830dc70b7
UG can still return null in certain nasty cases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5642 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 20:11:17 +00:00
fromer
8e0f5bc5a5
Prevent NullPointerException in cases where SNP is filtered
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5641 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 19:59:59 +00:00
depristo
ee94af3539
Oops, left out of earlier commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5640 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 18:21:16 +00:00
depristo
8ed9c0f518
VariantsToTable now blows up by default if you ask for a field that isn't present in a record.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5636 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 14:42:43 +00:00
fromer
b3cd14d10a
Since GCcontentIntervalWalker no longer uses any ROD, turn it into a LocusWalker that traverses by REFERENCE
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5635 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 03:15:09 +00:00
aaron
2089c3bdef
removing; should of gone to the CGA repo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5633 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 22:17:45 +00:00
aaron
da6f2d3c9d
adding the capseg tools to the new walker repo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5632 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 22:11:08 +00:00
kshakir
4bb573b1f5
Centralizing a bunch of Broad specific utility functions from code scattered in GSA-Firehose, PipelineTest, custom QScripts, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5631 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 21:29:02 +00:00
ebanks
91d308fc6d
temporary patch until Picard (hopefully) fixes the NM calculation to deal with reads that align off the end of the contig
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5630 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 19:18:18 +00:00
ebanks
fa6468d167
Remove the adaptor sequence clipping read filter because it is dangerous (it breaks LocusIteratorByState). We'll bring it back to life when ReadTransformers are created. Instead, have the utility code return a new clipped SAMRecord (necessary so that we don't break SNP calling in UG when the indel caller tries to hard-clip the reads).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5629 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 18:47:47 +00:00
hanna
5849e112e1
Fix exception in block weighting minus function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5628 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 17:07:04 +00:00
hanna
a36adf0c6b
Request from the cancer team -- guarantee via javadoc that the returned
...
read metrics are actually a clone, which they can do with as they wish.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5626 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 15:10:46 +00:00
delangel
06b1497902
Corrected bad merge.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5625 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 15:02:09 +00:00
delangel
9134bf3129
Long-forgotten change I neglected to commit a while back: add ability for SelectVariants to extracts either SNPs or Indels from combined vcf file. Not the ideal place to do it but it's important to at least have something to split vcfs now that we call snp's and indels combined.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5624 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 14:58:44 +00:00
chartl
8e0d191a70
Added a walker to help sort out which samples in a region are giving signal. Lots of reused code that shouldn't be. Will refactor later.
...
Also fixed an "issue" with InsertSizeDistribution -- apparently for mate pairs, the first mate (karyotypically) will have a POSITIVE insert size, and the second a NEGATIVE insert size -- thus the insert size distribution was being conflated with enrichment/depletion of first-in-pair or second-in-pair reads. Gah.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5623 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 13:53:31 +00:00
chartl
efe6c539ac
Re-enabling disabled test. Apparently T-tests are very picky about your using an unbiased variance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5622 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 03:05:50 +00:00
chartl
42bc003f46
Oops. I'll need to look at this, I think it was accidentally enabled. Disabling for now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5621 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 00:54:52 +00:00
hanna
22a11e41e1
Rewrite of GATKBAMIndex to avoid mmaps causing false reports of heavy memory
...
usage.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5620 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:49:58 +00:00
chartl
36d8f55286
Use the 'standard' arcsine transform
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5619 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:11:45 +00:00
chartl
8125b8b901
Old changes to the exome VQSR search.
...
SGA updated to include new proportion-based insert size test.
Major fix for dichotomization test: MathUtils now optionally ignores NaN values for sums, averages, variances. In the future this feature can be pushed back into the AssociationContext object iself (e.g. no data? no entry), but it's kept like this for transparency for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5618 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:00:50 +00:00
rpoplin
30a19a00fe
Fix for when running with EMIT_ALL_SITES but not GENOTYPE_GIVEN_ALLELES. Still want to emit a site even when over the deletion fraction for example.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5617 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 20:07:06 +00:00
delangel
488622041d
Further trivial cleanup: Renamed DindelGenotypeLikelihoodsCalculationModel to IndelGenotypeLikelihoodsCalculationModel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5616 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 18:00:48 +00:00
delangel
3b424fd74d
Enable new indel likelihood model by default, cleanup code, remove dead arguments, still more cleanups to follow. This isn't final version but at least it performs better in all cases than previous Dindel-based version, so no reason to keep old one around.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5615 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 17:54:46 +00:00
depristo
9c36b0a39b
Refactored read clipping framework into a generic utilities class, independent of ClipReadsWalker, which now uses this framework. Some more cleanup is really needed, as some of the arguments to the classes are really only useful for ClipReads
...
ReduceReadsWalker -- does consensus-based read compression, v2. Does all of the consensus calculations within the ConsensusReadCompressor per sample, and multi-sample case is handled by MultiSampleConsensusReadCompressor. For deeply covered data sets, this projects a significant reduction in the number of mapped reads. Impact on analysis call quality tbd. Expected to be relatively minor, as the system automatically detects regions without a strong consensus, and expands a window around these so that +/- 10bp of all reads are shown around the unclear sites. Not usable yet -- as it does not yet support streaming output, and actually holds all reads in memory at once.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5610 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-10 13:55:05 +00:00
depristo
13c5f3322d
Added argument to avoid writing 0 over all uncovered contigs, so you can just plot chrX, for example
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5609 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-10 13:50:21 +00:00
chartl
de4eaa455e
Squashing some bugs. Current implementation of AlignmentContextUtils.splitContextBySample() eliminates all sample meta data. Per Mark's request I'm working around this rather than fixing it -- the extender now maintains a mapping from sample id to sample object. Addition of a proportion test for large-insert-size reads, and slight refactoring of code to deal with bad window initialization of subclasses (e.g. chris forgot that constructors aren't inherited)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5608 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-09 21:07:52 +00:00
hanna
b4b52cc0fe
Reduce unnecessary repetitive accesses to the BAM index file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5607 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 19:28:14 +00:00
kshakir
0a58d7aa1a
Marked boolean SAMFileWriterATD arguments as flags so scala generator maps them to Boolean instead of Option[Boolean].
...
Using the VCFWriterATD isCompressed to check if the VCF index will be auto generated.
Tracking BAM and Tribble indexes as @Inputs and @Outputs in generated QFunctions.
Updates to the BamGatherFunction to disable the index during merge when disable_bam_indexing = true.
Made a shortcut for live-running pipelinetest, pipelinetestrun.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5606 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 18:44:32 +00:00
depristo
866f4fd569
Test version of consensus compressing strategy. Cannot be used, and is being rewritten right now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5605 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 18:37:03 +00:00
droazen
80d547ae71
Fix for bug GSA-445: Sequence dictionary validation can be very slow with
...
large numbers of contigs. SequenceDictionaryUtils.getCommonContigsByName() was
running in O(n^2) time due to poor choice of data structure -- modified it to
run in O(n) time. Also removed an unnecessary O(n log n) step at another stage
in the sequence dictionary validation process. In tests with a 181,813-entry
sequence dictionary, runtime improved from an average of 21.4 minutes to 45.1
seconds.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5604 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 18:33:10 +00:00
ebanks
b6e7b5dace
Updating to reflect my recent Tribble fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5601 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 11:48:00 +00:00
ebanks
4f17004590
Allow walkers to enforce the ordering in which ReadFilters are applied (so that they're now done in the order specified in the walker). Useful if you have a computationally expensive filter (like adaptor clipping) that should only be applied to reads passing all other filters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5600 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 03:34:50 +00:00
hanna
53db7b8faa
Did some refactoring which broke some unit tests, and then failed to run
...
the unit tests. Definitely not my best effort...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5599 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 03:31:52 +00:00
ebanks
74755cfd1c
Adding a ReadFilter to hard-clip out bases from adaptor sequences. This is actually slightly more correct than having it be part of LocusIteratorByState because it allows us to remove reads that are complete garbage (and there are definitely some) based on the insert sizes. However, although conceptually this is great, it doesn't actually work. 'Why?' you may ask. Because when we hard-clip reads it often changes their start positions... which means that reads are no longer passed to LocusIteratorByState in coordinate order... which makes it (understandably) barf all over the place (and makes for some really fascinating SNP calls). This took me forever to find. I'm going to bed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5598 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 03:15:58 +00:00
ebanks
cd61ef7169
Re-enabling multi-threaded integration tests. To make this work, downsampling and annotations are disabled for this test so that we don't have randomization issues for it based on which shards get executed first.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5597 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 03:07:39 +00:00
hanna
fece2167b3
Prototype implementation of protoshard merging when protoshard n and protoshard
...
n+1 completely overlap. Gives a small but consistent performance increase in
non-intervaled whole exome traversals (2.79min original, 2.69min revised).
Needs a more in depth analysis of optimal shard sizing to determine a true
optimum.
Also renamed a variable because Khalid disapproved of my naming choices.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5595 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 02:09:14 +00:00
hanna
32d502c122
Enable BAM OTF index writing by default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5594 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 23:44:25 +00:00
droazen
cb3e8aec5e
Modified the buildfile and help extractor doclet so that help text is only
...
extracted from source files that have been modified since the help resource
file was last generated. This significantly speeds up builds where only a few
source files have been modified, at the expense of making clean builds take
slightly longer. Here's some performance data gathered by testing the old and
new versions of extracthelp in isolation and averaging across 10 runs:
old extracthelp, 1 modified source file: 20.1 seconds
new extracthelp, 1 modified source file: 7.2 seconds <-- woohoo! :)
old extracthelp, clean build: 17.8 seconds
new extracthelp, clean build: 20.5 seconds
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5590 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 18:40:53 +00:00
ebanks
af09170167
As I threatened yesterday, I've moved the various and disparate randomization code out of the walkers. Now they all (except VQSRv1, whose days are numbered anyways) use a static generator available in the engine itself. Please use this from now on. The seed is reset before every individual integration test is run. I think there may still be an issue with the IndelRealigner but I need to confirm with the commit to see what testNG does. Integration tests are already broken anyways, so no big deal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5589 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 17:03:48 +00:00
kshakir
45ebbf725c
Instead of always merging Picard interval files they are optionally merged by Sting Utils.
...
Disabled the MFCP while the FCP gets an update.
Minor updates to email messages for upcoming scala 2.9.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5588 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 21:12:05 +00:00
carneiro
89bb21d024
typo in the argument description
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5587 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 19:45:32 +00:00
rpoplin
3f3f35dea0
UnifiedGenotyper now BAQs via ADD_TAG to facilitate using BAQed quals for GL calculations but unBAQed quals for annotation calculations. UnifiedGenotyper now produces SNP and indel calls simultaneously. 40 base mismatch intrinsic filter removed from UG to greatly simplify the code. RankSumTests are now standard annotations but the integration tests are commented out pending changes that will allow random annotations to work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5585 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 19:06:24 +00:00
ebanks
1aa4083352
Fortunately this code isn't used by anyone right now, but it needs to be fixed before someone unwitingly does: flags were wrong according to the SAM spec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5584 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 17:16:41 +00:00
hanna
b231a40da5
Augment PrintLocusContextWalker with extended event info.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5583 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 13:42:48 +00:00
aaron
ab5c4064ed
quick bug fix for variant context utils: only calculate the max AC if we're using the mergeInfoWithMaxAC flag, and if so deal with sites that have multiple alternate alleles correctly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5582 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 05:36:52 +00:00
rpoplin
cc713f2769
fixing exception text
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5581 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 00:29:13 +00:00
ebanks
4b451314b2
Only store a read in the mate hash if it could possibly be moved. This reduces memory consumption especially when dealing with a case of tons of unmapped reads at the end of the bam; however, it's only mildly helpful for chr1 of the Papuans (there's a truly massive pileup 120Mb into it; more thought needed at a later point). Integration tests changed only because some of the reads in the original bam were busted to begin with (it's an old pilot 1000G bam).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5580 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 22:20:09 +00:00
chartl
79b5fa6cc5
Structural refactoring in advance of dichotomization statistics; generalization of statistical test infrastructure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5579 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 18:52:32 +00:00
asivache
77ca4eef31
IntelliJ complains that @Override is not allowed when implementing interface methods. Whatever.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5578 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 16:57:59 +00:00
ebanks
f4c06bb4ce
Traversal now says 'done with mapped reads' instead of 'done' so we don't confuse users when there are a lot of unmapped reads left to process.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5577 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 15:11:28 +00:00
fromer
5eccc7e528
Added annotation of INCORRECT SNP-based aa annotations in case of MNPdependentAA:true
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5576 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 02:46:45 +00:00
chartl
bb6a30611c
Forgot to modify the test too. What a bad commit. Sorry guys.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5575 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 02:11:08 +00:00
chartl
a0d096c993
Forgot an import statement
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5574 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 22:55:00 +00:00
chartl
b52c3e7e30
Make the window and slide-by values command-line accessible, and standardize for every context. Move the test classes (which are abstract association context modules) into the proper directory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5573 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 22:37:12 +00:00
droazen
db9908ec02
Small correction to the unit test code from my last commit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5572 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:55:38 +00:00
droazen
a5acb0b7a6
Fix for bug GSA-314: Detect -XL and -L incompatibility. An ArgumentException is
...
now thrown if the combination of -L and -XL intervals specified on the command
line results in an empty interval set after set subtraction.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5571 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:41:55 +00:00
carneiro
b722ebf244
quick help/comments updates to match the wikipage.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5569 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 12:55:55 +00:00
rpoplin
96f0f0d706
Fixing use of String != String
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5568 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 01:12:00 +00:00
depristo
095125152b
Updated to now longer include 2nd-best base output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5567 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 20:13:10 +00:00
rpoplin
b2a0331e2d
Pushing hard coded arguments into VariantRecalibratorArgumentCollection
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5566 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 19:55:09 +00:00
rpoplin
79c43845ad
Changing Uniform approximation to Normal approximation in rank sum test. n factorial was overflowing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5565 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 18:18:39 +00:00
depristo
b316c9a590
Renamed StratifyAlignmentContext to AlignmentContextUtils, and StatiefyContextType to ReadOrientation. Also, went through the system and deleted all references to second bases. That ship passed long ago. This was the actual commit, the last was an intellij error
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5564 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 15:36:17 +00:00
depristo
5cca100aea
Eliminated the redundant StratifiedAlignmentContext, which previously just held a ReadBackedPileup, and made all of the class methods here just static functions. Far more logical organization, and avoided O(N) endless copying of data for the COMPLETE context. Many tools have been trivially reorganized to take an alignment context now. Everything passes integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5562 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 14:20:43 +00:00
rpoplin
98798eb276
Adding ReadPos rank sum test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5560 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 22:28:41 +00:00
rpoplin
09e89c8c97
Adding ReadPos rank sum test. Transitioned rank sum tests over to using Chris's implementation in order to harmonize the codebase. There isn't any reason to have competing implementations of rank sum. Thanks to Chris for adding the necessary hypothesis testing options. WilcoxonRankSum.java will be deleted soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5559 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 22:26:35 +00:00
depristo
11822da578
Stand alone, GATK dependent tool that Reads a list of BAM files and slices all of them into a single merged BAM file containing reads in overlapping chr:start-stop interval. Highly efficient when working with thousands of BAM files. Can merge 1MB of sequence of 1600 4x BAMs in 4g in only 2 hours.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5558 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 13:41:29 +00:00
fromer
27bfec785e
Some walkers for printing FASTA of reference for bed ROD, and "inverting" a bed file (finding regions not covered in bed)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5554 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 21:13:51 +00:00
droazen
0927b7c297
Fix for bug GSA-441: BAM file list with blank lines gives a confusing error
...
message. Lines containing only whitespace in .list files are now ignored.
Also added support for comments in .list files: lines whose first
non-whitespace character is '#' are now also ignored.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5550 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 15:04:35 +00:00
kshakir
4f8411f4b5
Revved Picard to access new flag to disable mmap for bam indices. Only added a 3% speed boost but the mmap was added to the heap count, making it harder to specify/restrict the total resident memory size in LSF. Specifying -Xmx4g will now stay much closer to 4g resident memory usage versus bumping up to 9g when accessing 900 x ~8Mb bai's.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5549 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 01:40:41 +00:00
asivache
df53351b0f
Get rid of score cutoff at 0 in the alignment matrix (i.e. score[cell] = max(0, score[from_parent_cells]). Use the computed score as is. Technically, it's pretty much NW now, not SW.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5548 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 00:11:04 +00:00
carneiro
0a772688fe
implementation of the Gatherer class for CountCovariates, which makes it now scatter/gatherable. Kudos to the @Gather annotation Khalid just introduced!
...
QuickCCTest is my test script for the gatherer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5547 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 21:15:21 +00:00
carneiro
dac1309dbd
Added two modes for selecting variants at random (random sampling).
...
-number N -- generates a VCF with exactly N randomly chosen variants with equal probability.
-fraction F -- generates a VCF with approximately F (between 0-1) randomly chosen variants with equal probability. (Similar behavior to RandomlySplitVariants walker).
The reason for two modes is that the first one may need a lot of memory if your sample size is too large. The wiki is being updated with this information now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5545 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 21:12:40 +00:00
carneiro
8a3b7d88aa
It was returning 1 when it should return 0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5544 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 20:50:38 +00:00
depristo
c7445a6fbd
Now that logging is so standard, only prints messages about logging to DEBUG. Also, found a way to silence the mime.types warning, that doesn't matter at all to us.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5543 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 16:49:39 +00:00
droazen
7b452ea2b9
Fix for bug GSA-430: Can't specify same BAM file twice on the command line. An ArgumentException with an appropriate error message and a list of the duplicate BAMs is now thrown in this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5542 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 22:23:24 +00:00
hanna
deab9f0aa5
Initial work on proto-shard merger:
...
- create size() method that returns an approximation of the uncompressed size in bytes of BAM span.
I'll use this method as a protoshard weighting function until we determine how to normalize the
weights across the different data access mechanisms (reads, reference, RODs).
- Implementations of basic union/intersection/subtraction mechanisms for BAM spans; should be enough
to get an accurate weight for two proto-shards put together.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5541 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 22:03:43 +00:00
chartl
328f89f66a
Minor changes to MannWhitneyU:
...
- Comment fixes to better explain why two-sided test wants to use the LOWER (not higher) value for U
- Much more direct testing of MWU functions
- Uniform approximation was always using the < cumulant (sometimes the > cumulant should be used instead)
- Uniform approximation currently not used (regime in which it was being used was not the right one -- not necessarily bad, but not an improvement over normal)
+ this particular approximation is for major imbalances of the form m >> n. Code may be altered in the future to use this method for this particular regime, if the method's not too slow.
- Hook into one-sided test.
RegionalAssociationRecalibrator: NaNs were being caused by presence of Infinity and -Infinity values out of the walker. Currently I'm just re-setting them to arbitrary post-whitened values, but the walker will be changed to prevent output of these values, and the "fix" will undone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5539 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 17:03:02 +00:00
chartl
fff11a3279
No more pesky NaNs for norms ( HINT::: ((double) x) == Double.NaN is NOT (somehow) the same as Double.compare(x,Double.NaN) == 0). Effectively reverse sorting by changing (rank/size) to ((size-rank)/size).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5538 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 22:43:24 +00:00
carneiro
5d26c66769
Count Covariates is almost scatter-gatherable now!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5537 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 22:25:33 +00:00
rpoplin
5ddc0e464a
Under guidance from Matt added ability to use key-value tags with ROD binding command line arguments, so now one can say -B:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf and get the tags in a walker. Look at ContrastiveRecalibrator for an example of how to use the new ReferenceOrderedDataSource.getTags(). Removed references to FDR in tranches since we are only using truth sensitivity. Finally fixed long standing bug where tranche filters weren't set appropriately.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5536 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 21:04:09 +00:00
carneiro
0f4ace0902
fixed a bug when the concordance track doesn't have the sample in the variant track.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5535 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 18:24:19 +00:00
chartl
f6dfdc7f3b
Single-tailed hypothesis testing in MWU
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5533 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 15:53:40 +00:00
hanna
8ae14793f2
Small standalone utility to aggregate BGZF block statistics in a BAM file.
...
Works in the same coordinate space as BAM chunks, so this will be used to
calibrate chunk weighting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5531 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-28 22:25:45 +00:00
chartl
f3e4c24f63
Framework works properly now, but whitening still has a kink which is that the covariance matrix gets re-sorted automatically by the eigendecomposition, so somehow the association between eigenvalue and dimension (e.g. association track) needs to be maintained throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5530 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-28 22:22:37 +00:00
chartl
4c04c5a47a
Addition of a BedTableCodec to allow for parsing of Bed-formatted tables (e.g. bedGraphs). Fixes for the recalibrator. Implementation of the data whitening input. Some TODOs in the RAW.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5529 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-28 21:35:09 +00:00
corin
f2d84bf746
Changes the validity declaration from a true to false to a five point scale
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5527 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-28 18:31:53 +00:00
depristo
cd8321cdc9
Removed the completely unused generic but extremely expensive infrastructure for dynamic LocusIteratorFilters. Now the one, and probably only useful one, is called directly in the LocusIteratorByState itself to filter adaptor bases from reads. This shaves 10% off the runtime of all walkers, apparently. Has the additional benefit of eliminating a lot of complex infrastructure that resulted ultimately in only a single function call.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5525 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 20:48:24 +00:00
depristo
231d095316
A clean, fast way to compute fragment pileups. Now consumes no CPU time at all. Ready for general use.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5524 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 14:26:29 +00:00
depristo
6a1d12cf7b
Intermediate commit refactoring FragmentPileup to (1) make it more accessible (now in utils.pileup) as well as (2) improve performance. Passes all integration tests now. Upcoming refactoring will change further how the system can be accessed, and further improve performance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5522 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 12:42:22 +00:00
depristo
3bcd4c5d75
--simplifyBAM is now in the SAMFileWriterArgumentTypeDescriptor, as suggested by map. PrintReads has an integrationtest now that writes out a 1 MB bit of HiSeq normally, with compress 0, and with simplifyBAM on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5521 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 14:57:18 +00:00
hanna
28ae53d796
Merging the best parts of Mark's fix for the O(n^2) algorithm and my
...
concurrently-written fix for the same.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5520 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 13:32:23 +00:00
depristo
d8fbda17ab
O(N^2) bug found and removed -- very subtle and hard to find. ArrayLists underlying read backed pileups were being initialized with size() from the entire pileup up all samples, not the sample-specific sizes. So in 1000 samples at 4x, we were creating 1000 x 4000 element array lists, instead of 1000 x 4x element array lists. This fix results in a 2-3x speedup for 900 sample calling, and moves UG.map() back into the main CPU cost of UG with many samples.
...
900 samples in a single BAM:
Release: 64.29
With sample-specific size: 24s - 35s
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5519 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 12:38:19 +00:00
depristo
7272fcf539
Now uses the NO_HEADER option to avoid breaking MD5s due to changes in GATKArgumentCollection
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5518 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 12:00:37 +00:00
depristo
27c8fb1e4d
Added support for a general GATK option --simplifyBAM to automatically remove and simplify kept reads in an output BAM file. Specifically, duplicate, non-PF, and unmapped reads are removed, and all extended tags in the retained SAM records are removed except the RG:Z tag. This option is very useful when creating temporary BAM files (merged per-population or multi-sample cleaned) for future calling (as in the 1000G processing pipeline). Results in a significant reduction in space of the resulting BAM, faster reading of the BAM, and surprisingly even faster UG performance:
...
1-10mb of chromosome one, from NA12878 HiSeq 64x data set on hg18:
Full BAM
Write time: 8.6 m
Size: 866M
CountReads time: 2.9 m
UG time: 11.3 m
Simplified BAM:
Write time: 6.2
Size: 458M
CountReads time: 85.7 s
UG time: 10.1 m
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5517 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 01:21:35 +00:00
kshakir
fc8acd503e
Enabled the parameterize option for debugging PipelineTest MD5s.
...
Fixed escaping expressions that have more than one space between arguments.
Updated example to match the wiki.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5516 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 00:41:47 +00:00
chartl
fe7f45ee2e
First pass at recalibrating associations, with optional data whitening. Modification to the TableCodec so it can natively read bedgraph files (just needed to add an extra header marker: "track").
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5515 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-25 19:35:39 +00:00
hanna
ac39f5532e
Turn off index caching.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5514 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-25 18:48:23 +00:00
hanna
8d8aed6a67
Fix correctness issue when dynamically merging many files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5512 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-25 16:35:43 +00:00
delangel
c9283e6bc5
Refinement to previous commit: no need to duplicate code to annotate rsID since variantAnnotatorEngine is called from UG anyways.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5511 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-25 15:00:32 +00:00
delangel
3383733379
Same commit as previous one for VariantAnnotator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5510 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-25 12:07:18 +00:00
delangel
8701dfe8d3
Hideous, horrible, hairy mutant bug: when we annotate ID field in indels, we were looking for SNP records matching the position, instead of indel records.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5509 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-25 12:04:08 +00:00
kshakir
3e3ff4a9e7
Bam gathering passes on the compression_level and the create_index flag to MergeSamFiles.
...
VCF gathering passes on the no_header and sites_only flags to CombineVariants.
Fixed deletion of gathered log files. Although they are intermediate and do not need to be re-run if not present, they should not be deleted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5508 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-25 03:58:38 +00:00
carneiro
47279ee56e
Added --concordance option that outputs the intersection between two VCF files. Useful to see what calls were made in both technologies/algorithms.
...
Wiki has been updated accordingly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5507 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 21:27:16 +00:00
kshakir
e47513f043
Minor updates to match the wiki documentation.
...
Upper cased the PartitionType enum values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5506 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 20:22:23 +00:00
kshakir
f3e94ef2be
Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output.
...
JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar.
JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar.
Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath.
Walkers from the GATK package are now also embedded into the Queue package.
Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP.
Removed the GATK jar argument from the example QScripts.
Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts:
1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers.
2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3
Removed other unused code.
Re-fixed dry run function ordering.
Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 14:03:51 +00:00
ebanks
18271aa1f4
It never fails to amaze me that aligners can find so many different ways to place indels off the ends of contigs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5503 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 04:17:23 +00:00
ebanks
48b15d42e0
More fixes and improvements. We no longer use any bases under Q20 because random ~Q5s were cluttering the graphs; instead we grab any contiguous segments of size at least MIN_SEQUENCE_LENGTH where all bases are above Q20. Also, I implemented a quick algorithm to traverse the graph (using DFS) to choose the two best scoring paths (haplotypes). Used it successfully at NA12878 HM3 SNP sites to determine whether they are homozygous (no distiction yet between ref and alt) or heterozygous! Indels are the next target. Still have some issues to work out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5502 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 03:51:19 +00:00
hanna
26e3bea76e
Fix for == used to test object equality.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5499 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-23 18:15:19 +00:00
ebanks
401d1cb97f
Bug fixes plus some debugging code added. Broke out DeBruijnVertex into its own class so that the interface is now cleaner. Still very much a work in progress.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5498 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-23 17:35:34 +00:00
hanna
37fbf17da8
Finally restored code after accidentally removing three days worth of work:
...
schedule file infrastructure has been restored, and is now a single file.
Only the exact bins required for the traversal are stored in the schedule.
Very close to being able to merge schedule entries.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5497 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-23 05:52:40 +00:00
ebanks
69646ff840
... and the corresponding integration test update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5496 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-23 01:58:07 +00:00
ebanks
ded80e0c57
Trivial change to remove space at the end of the description
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5495 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-23 01:47:46 +00:00
carneiro
3414bccb46
documentation changes to agree with the wiki
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5494 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 21:48:49 +00:00
carneiro
28149e5c5e
GenotypeAndValidate version 2, ready to be used.
...
- now it differentiates between confident REF calls and not confident calls.
- you can now use a BAM file as the truth set.
- output is much clearer now
dataProcessingPipeline version 2, ready to be used.
- All the processing is now done at the sample level
- Reads the input bam file headers to combine all lanes of the same sample.
- Cleaning is now scattered/gathered. Inteligently breaks down in as many intervals as possible, given the dataset.
- Outputs one processed bam file per sample (and a .list file with all processed files listed)
- Much faster, low pass (read Papuans) can run in the hour queue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5493 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 20:18:02 +00:00
chartl
687b2e51b4
Switch from togglable wiggle output to togglable bedgraph format. Can be pulled directly into IGV to show the statistics values. I'll need to bug jim to allow value-toggling in a bedgraph, currently 2nd and 3rd columns are just ignored.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5492 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 17:58:53 +00:00
chartl
5a79f16ea4
Fixed an edge case where an exception was thrown if either of the sets was empty for the MWU test. Also altered the output format so U itself is not printed (which though interesting, isn't so useful for recalibration), but rather a value I call V (really the deviation of U from its expectation).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5490 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 16:28:44 +00:00
ebanks
af7f78e8ba
Minor debugging output change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5488 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 12:59:26 +00:00
ebanks
b463faad92
Fixing typo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5487 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 03:57:11 +00:00
ebanks
1a9e65bcd4
Updating other walkers now that VCC extends from VC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5486 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 03:10:40 +00:00
ebanks
0ee687e49d
For Mauricio: now, even in GENOTYPE_GIVEN_ALLELES mode, the VariantCallContext (which now inherits directly from VC) will report reference calls as confidently called if they pass the threshold even if the QUAL of the record itself is low because we were forced to have an ALT allele.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5485 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 02:42:28 +00:00
ebanks
ab6a815184
As per the comments in the commit itself: when reads get mapped to the junction of two chromosomes (e.g. MT since it is actually circular DNA), their unmapped bit is set, but they are given legitimate coordinates. The Picard code will come in and move the read all the way back to its mate - which can be arbitrarily far away and cause records to be written out of order. Very evil.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5484 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-21 20:30:24 +00:00
ebanks
d9202f2764
Don't try to create a GenomeLoc from an unmapped read
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5480 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-21 13:46:55 +00:00
ebanks
1c95208e26
Finally found the bug that everyone is reporting on GS. Iterators on PriorityQueues aren't guaranteed to return elements in sorted order (a pretty stupid contract) - so we were passing items to the constrained writer out of order. Just do a Collections.sort instead (1 line of code). Happy father's day!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5476 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 21:28:19 +00:00
ebanks
9568c84af9
Don't output these messages in INFO mode because they are scaring people unnecessarily
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5475 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 19:55:22 +00:00
depristo
22ff2573d5
Removed MAG entirely
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5474 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 19:43:23 +00:00
kiran
55897631ad
Initial attempt at identifying potentially interesting variants in a Mendelian disease context when the called genotypes are uncertain.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5473 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 19:41:35 +00:00
kshakir
b2b8a4f19f
Re-un-final'ed BAQ.MAG as it was pre r5469.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5472 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 19:40:31 +00:00
asivache
1d5326ff0c
Minor fixes to the cmd-line help messages
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5470 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 18:18:04 +00:00
depristo
7857cb5a22
Waiting to go to the hospital -- fixed a bug in the BAQ calculation where the BAQ would NPE if a read had no usable bases (all clipped, for example) but didn't fail the PF filter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5469 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 17:45:21 +00:00
fromer
e84a27ceea
OverlapWithBedInIntervalWalker calculates the average per-input-interval coverage by the BED intervals track
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5468 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 17:44:46 +00:00
depristo
abc7d1aef9
BeagleOutputToVCF now accepts an option to keep monomorphic sites. This is useful to genotype a single sample, where having AC=0 just means that the sample is hom-ref at the site.
...
ProduceBeagleInputWalker can optionally emit a beagle markers file, necessary to use the beagled reference panel for imputation. Also supports the VQSR calibration curve idea that a site can be flagged as a certain FP, based on the VQSLOD field. This allows us to have both continuous quality in the refinement of sites as well as hard filtering at some threshold so we don't end up with lots of sites with all 1/3 1/3 1/3 likelihoods for all samples (i.e., a definite FP site where we don't know anything about the samples).
Added a new VariantsToBeagleUnphased walker that writes out a marker drive hard-call unphased genotypes file suitable for imputating missing genotypes with a reference panel with beagle. Can optionally keep back a fraction of sites, marked as missing in the genotypes file, for assessment of imputation accuracy and power. The bootstrap sites can be written to a separate VCF for assessment as well.
Finally, my general Queue script for creating and evaluating reference panels from VCF files. Supports explicitly genotyping a BAM file at each panel SNP site, for assessment of imputation accuracy of a reference panel. Lots of options for exploring the impact of the VQS likelihooods, multiple VCFs for constructing the reference panel, as well as fraction of sites left out in assessing the panel's power.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5467 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 03:08:38 +00:00
depristo
9b8d41160b
GENOTYPE_GIVEN_ALLELES now respects the filter status of the incoming alleles file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5466 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 02:59:28 +00:00
depristo
6281c1db6f
A nicer error (UserException now) for malformed genome locs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5465 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 02:58:29 +00:00
delangel
b45afe5ba8
Several major fixes and changes to new indel likelihood model:
...
a) Scrapped the way in which we constructed candidate haplotypes because it wasnt fully correct and yielded corner conditions with incorrect genotyping and likelihood computation. Ideally, a haplotype should "cover" the read and the most likely alignments should be such that the ends of the read are inside the ends of the haplotype. This wasn't happening, and if you have a "dangling read off a haplotype" the probabilistic alignment model may prefer to shift a read instead of scoring it correctly - this is especially bad with tandem repeat insertions.
So now, we build haplotypes based on the reference context and adaptively change them based on read alignment positions, plus some padding and uncertainty in the alignment.
b) Changed the way soft clipped based are dealt with. Instead of either ignoring them or using them, we only use them if the read start or end position (after soft clipping) are within eventDistance of the current location. This is done because it's very common that BWA's strictly local SW implementation will soft clip every single read at an insertion position because it couldn't place that end of the read without too many mismatches, but the read is legit and the bases are good quality. If we don't take these bases into consideration, reads which are informative of an insertion event are essentially discarded because the informative part is clipped away.
c) Several cleanups and fixes to the context-dependent gap penalty model based on length of HRun.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5464 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-17 18:39:31 +00:00
depristo
cd38dfb4ef
Now with a clearer, grammatically correct message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5462 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-17 18:06:05 +00:00
depristo
10466dc7d1
I finally broke down and added a default documentation string to @Input for use in Queue scripts. It's not ideal, but I couldn't take any more queue scripts with doc="x" all over the place.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5461 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-17 18:05:25 +00:00
depristo
c1798a7dbc
Whitespace cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5460 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-17 18:04:08 +00:00
corin
30237e6824
Updated the walker to specify the build based on the user's input file name if the user does not specify the build.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5459 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-17 17:49:17 +00:00
carneiro
3de300e504
A walker that moves annotations from the filter field to the info field of truth annotated vcfs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5458 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-17 17:11:28 +00:00
ebanks
481750cbf9
Probable patch to Jerry Glenn's GetSatisfaction report. I'm having him test it out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5456 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-17 16:00:50 +00:00
ebanks
3eea6e92b7
An extremely basic implementation of a deBruijn-based local assembler, using the jgrapht graph library. This is not at all optimized and has only been tested on my very simple 3-read test bams. I'm sure there are bugs in there - more testing coming soon. Insertions and deletions confirmed to generate identical graphs (except for the multiplicity of edges of course). Not worth using yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5455 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-17 14:03:07 +00:00
hanna
28a5a177ce
Very crude implementation of writing BAM 'schedules' to disk rather that 'meta-
...
indexes'. Not yet elegant, but proves that it circumvents the performance
issues associated with the meta-index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5454 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-16 21:48:47 +00:00
rpoplin
8d0880d33e
Misc cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5453 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-16 17:33:19 +00:00
rpoplin
c6ef6ee8b7
Recal file is in input to ApplyRecalibration not an output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5452 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-16 12:08:58 +00:00
rpoplin
8e89ff170e
Can't check substitution type of tri-allelic SNPs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5451 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-16 03:06:03 +00:00
carneiro
e2e435d52c
GenotypeAndValidate: now looks at annotations in the INFO field instead of filter field. Better output and filters repetitive calls to indel extended events.
...
IndelUtils: added a isInsideExtendedIndel() method to filter the above mentioned.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5449 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-15 21:54:40 +00:00
rpoplin
d98503ca50
Removing some debug code from VQSRv2. VariantEval can now be stratified by contig with -ST Contig. New hidden option in CombineVariants for overlapping records to take the info fields from the record with the highest AC (while still updating AC/AN/AF correctly) instead of dropping info fields which aren't exactly the same.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5448 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-15 21:28:10 +00:00
carneiro
4b9b767eb1
SelectVariants: now keeps the YAML stuff internal... it's there if you wanna use it, but won't be published anymore. Official parameter is the string for now.
...
VariantEval: now sports the new MendelianViolation utility class.
MendelianViolationClassifier: I noticed I had broken chartl's walker by changing VariantEval, so I took the liberty to modify it to use the new library too, though I kept modifications to a minimum, could have gone into full integration if this is a useful tool, but since it's in oneoffs, I decided not to go all out.
MendelianViolation: Some getter methods were added for chartl and VariantEval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5447 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-15 18:36:55 +00:00
delangel
653fb09bb7
a) Next iteration of context-dependent gap penalty model for new probabilistic alignment indel model. Actual model is now implemented, computes homopolymer run profile for candidate haplotypes and looks up in table gap penalties based on hrun length at each position. Initial penalty model is a very naive affine penalty model with each extra hrun increment decreasing Q2 the gap open penalty, until a minimum is reached. Still needs to be tuned and ideally get data from recalibration.
...
b) small bug fix when setting debug arguments
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5446 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-15 16:46:28 +00:00
rpoplin
bbcc4ed700
The second pass of the contrastive VQSRv2.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5444 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 21:05:02 +00:00
rpoplin
2a2538136d
A version of VQSRv2 that does contrastive clustering in two passes. The walkers will be renamed when they are moved to core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5443 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 21:03:56 +00:00
carneiro
fcc347bb05
making sure the output is as pretty as I said it would be on the wiki.
...
wikipage for this walker is up, at : http://www.broadinstitute.org/gsa/wiki/index.php/Genotype_and_Validate#Examples
use it ;)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5442 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 20:32:09 +00:00
ebanks
239dae0985
Absolutely nothing to get excited about. This is just the skeleton for the local assembler. It doesn't do anything at all now except for collect reads over each -L interval and pass them to an assembly engine (which isn't implemented yet). The interface for the AssemblyEngine will change later, but for now this one is the most conducive to debugging.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5441 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 20:31:54 +00:00
corin
6d09cdd4bc
This is a walker that lets the user generate the bed file for declaring variants true positives or false positives. For use with the IGV crowd sourcing project.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5440 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 19:56:16 +00:00
depristo
f75ad0dee3
Now in Picard, and released to the public
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5439 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 19:36:56 +00:00
carneiro
9dfe4c9cb7
moving GenotypeAndValidate to the playground. It's ready to be used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5438 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 19:19:18 +00:00
carneiro
33c7593218
YAML integrated mendelian violation utility class, integrated and tested through select variants. Wiki is updated.
...
ps: I moved it out of tribble. If you think it should reside in a different place, just yell at me.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5436 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 16:43:37 +00:00
hanna
5406e779d2
Ryan noticed that I accidentally killed a public interface method for getting tag information.
...
Reinstated. Proper unit test to follow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5434 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 15:51:19 +00:00
depristo
3e3ec85807
Checked for consistency with the previous integration tests, and updated the walker and test to use the new I/O system (always prints 4 digits on floats.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5433 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-13 15:24:22 +00:00
depristo
b99e27bf9b
In the process of optimizing ProduceBeagleInputWalker, discovered that the GenotypeLikelihoods, the UG, and Genotype objects were using old-style GL tags internally, and then converting from Likelihoods -> GL String -> Likelihoods -> PL String throughout the GATK. It was both painful and led to convoluted code throughout the system. Removed everything but GL conversion -> PL in the GenotypeLikelihoods objects, and now all of the codes in UG now immediately provides GenotypeLikelihoods to the Genotype objects, which is converted straight to PL now. Resulted in a 30% speed up in ProduceBeagleLikelihoods, passes integration tests without any modifications, and likely speeds up writing any VCFs with likelihoods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5432 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-13 00:07:51 +00:00
rpoplin
ceb08f9ee6
Moving some math around in VQSRv2.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5431 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 15:15:05 +00:00
depristo
d01d4fdeb5
Optimized version of produce beagle tool, along with experimental (hidden) support for combining likelihoods depending on estimate false positive rate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5430 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 02:06:28 +00:00
depristo
ee8f2871f7
A better output for Genotype Concordance summary. Now does only % comp hom-ref called hom-ref, het called het, and hom-var called hom-var, which are the quantities we typically show in slides. Updated intergration tests to reflect this change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5429 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 02:03:48 +00:00
kshakir
93de326066
Added a new @PartitionBy for walkers to specify how to cut up their inputs.
...
Now building all javadoc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5428 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 01:33:08 +00:00
delangel
8ca3390ee0
Low level plumbing work required to have a context dependent error model with the new indel probabilistic alignment model. This just adds an extra input argument and does some refactoring so that when an actual model is ready it will be easy to plug in.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5427 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 00:00:55 +00:00
carneiro
e35a67b3cc
changed the name of the parameter to make the wiki more uniform.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5426 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 17:54:53 +00:00
carneiro
4a84a81d17
SelectVariants: added parameters for mendelian violation. Given a trio vcf, it will generate a VCF with the sites that are mendelian violations.
...
GenotypeAndValidate: now annotates the validations with callStatus.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5425 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 17:47:53 +00:00
delangel
b03055099a
a) Changed the way we classify and log indel events (e.g. in IndelClasses table inside IndelStatistics VE module). Made names clearer, and split logging of event length with number of repetitions of event.
...
b) Add an experimental annotation to log indel type string inside the INFO field, just for debugging/temp analysis purposes (will consider making it standard if it proves useful).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5424 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 17:37:41 +00:00
rpoplin
b3464a6031
Initial commit of VQSRv2 that passes the old integration tests. Not ready to be used yet unless your name rhymes with ... oh wait, that's me.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5419 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 15:18:34 +00:00
depristo
ccc773d175
Refactoring, cleanup, and performance improvements to ProduceBeagleInput. It's really a shame that there's no integration tests...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5418 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 13:55:30 +00:00
kshakir
097a9a59e8
Updated LSF libraries to use Pointer instead of Structure.ByReference for struct arrays since the the latter is autoRead() and LSF doesn't always return null for empty arrays.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5417 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 22:58:54 +00:00
ebanks
4baeb5979f
It turns out that Math.log10() can return 0, which leads to QUALs being set to -0, which is off-spec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5415 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 03:08:56 +00:00
ebanks
3596c56602
New attempt at the constrained movement version of the indel realigner (I've kept around the old writer for now). The new contract is that the realigner must ask permission before trying to clean an area; permission will be denied by the CM-Manager if it was required to flush its cache of reads because of too much depth within a distance of maxInsertSizeForMovingReadPairs. Added integration tests to cover different max cache sizes, including an expected exception when too small a value is chosen. The actual logic changes were fairly minor - much of this commit is really just some cleanup. I'd like to throw 1000G Phase I at it, but will respectfully wait for Ryan to hit his deadline before doing so.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5414 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 02:48:29 +00:00
rpoplin
ff7edc4493
Minor bug fix in empiricalMu prior calculation in VQSR.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5412 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 00:42:38 +00:00
fromer
0b45de14ed
Some minor updates to fully utilize the functionality of reduceByInterval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5411 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-09 20:38:08 +00:00
rpoplin
509daac9f7
Minor bug fix in k-means implementation. Updating VQSR integration tests in preparation for VQSRv2 by removing some unused features such as VariantDatum.weight and ti/tv cutting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5410 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-09 00:26:28 +00:00
carneiro
fa7284b7a1
Genotype And Validate walker is now ready to be used by anyone.
...
given an annotated VCF and a BAM file, it genotypes (using the reads in the BAM) each variant in the VCF (for snp or indel) and validates (or not) the 'known' annotation. Outputs a truth table with the PPV and NPV values, and optionally a vcf file with the variants that had enough coverage to be validated. You can optionally provide a minimum depth of coverage and only do the analysis conditional on that. (will write a wiki for this walker, as it might be useful for future validation essays).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5409 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 22:10:38 +00:00
chartl
da88c29b6e
Added a module to test for reference mismatch associations, and a self-normalized/self-normalizing version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5408 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 20:01:28 +00:00
chartl
31a2575c7b
Fixes:
...
- Don't know how I got the wiggle header so utterly wrong. Fixed.
- Q-values now have a static maximum of 2000 so IGV averaging won't make everything look spikey and ugly.
- Changing windows to size 100 for (hopefully) better resolution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5406 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 17:16:21 +00:00