carneiro
f97e7d2fb4
Walker that calculates the percentage of bases that are covered to at least 20x. Very useful! In oneoffs until someone else thinks it's as useful as I think it is ;)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5714 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 19:19:39 +00:00
carneiro
3868a7e778
Oneoff project to downsample, bootstrap and call snps to test sensitivity/specificity of downsampled coverage in WEX projects.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5713 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 19:17:30 +00:00
ebanks
deed7c47a1
Continuing the epic fail, some of our existing integration tests were wrong because of the lazy loading failure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5712 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 17:54:41 +00:00
ebanks
ab9ffb1a74
Epic failure on the lazy loading of genotypes: if the input VCF had its samples unsorted and we used a walker that didn't require genotypes, then we would sort the samples but not load genotypes (and therefore the genotypes wouldn't match the samples anymore). Added simple integration test to cover this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5711 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 16:03:45 +00:00
hanna
96571b55be
Disable caching of ReadShards by the GenomeLocProcessingTracker (at least
...
temporarily). Unfortunately this does not completely fix the IndelRealigner
exception that Ryan is seeing, but it helps things quite a bit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5710 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 13:59:34 +00:00
carneiro
a5b96e0e04
I have to remember that this is Java, not C.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5709 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 17:40:14 +00:00
carneiro
f04cc4321f
fixed a bug when the pipeline was used on a single bam.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5708 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 17:19:22 +00:00
rpoplin
b7334dcc1e
Rank sum test annotations are the Z-scores from the test instead of the p-value.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5707 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 14:35:00 +00:00
ebanks
45081c32d7
continuing from last night, the integration tests weren't covering the right behavior either
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5706 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 13:30:57 +00:00
ebanks
f34e6d5b8c
Somewhere along the way someone broke this tool and failed to update the documentation to boot. Fixing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5705 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 03:16:20 +00:00
ebanks
ae8f3f2cde
Check for bad reference bases before creating simple/'empty' VCs. Updated the code in the indel GL model to be consistent and to use the existing utility in the Allele class.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5704 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 23:55:20 +00:00
depristo
122d5845d3
GATK Resource bundle, latest version (now with b37 -> b36 support). Oneoff scala script that assesses chip coverage of calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5703 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 22:01:36 +00:00
depristo
6cce3e00f3
A test walker that does consensus compression of deep read data sets.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5702 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 22:00:48 +00:00
rpoplin
3907377f37
When genotyping given alleles, for multiallelic sites we go back to the reads and use the alternate base with the highest sum of quality scores instead of taking the first alternate allele from the vcf file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5701 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 21:31:09 +00:00
droazen
6e9e766a71
The tighter interval validation wasn't interacting well with unmapped
...
intervals -- altered the validation methods to not throw an error for
unmapped intervals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5700 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 20:56:46 +00:00
hanna
6d5e45b5c6
Revbump Picard dependencies at Tim/Kathleen's request. Exclude anonymous
...
classes from PluginManager.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5699 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 20:38:05 +00:00
droazen
d650efd40a
Fix for bug GSA-449: Intervals that are not in GATK format are not validated
...
to the same standard as GATK format intervals. Full validation against contig
bounds is now performed for all intervals, regardless of their source. Also
fixed a few tests for validation exclusions that were backwards.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5698 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 18:12:10 +00:00
kshakir
df35a143b2
Removed -debug/--debug_mode.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5697 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 10:56:39 +00:00
kshakir
ca817356b6
Quick disabling test to restore build. TODO fix test or complete removal of the MFCP.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5696 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 04:26:11 +00:00
hanna
27495a0c64
Killed quiet mode. Should probably kill debugMode as well, but Queue's using
...
it. Will check with Khalid tomorrow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5695 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 04:17:36 +00:00
kshakir
6b1b4931e7
Added FCP VE stratifications for Filter, FunctionalClass, and Stratification as requested by Corin.
...
Feeding FCP UG the bam list instead of individual bams to cut scatter gather time from O(m^100) as measured by Chris to O(m^1).
Fixed NPE when eval values aren't found in PipelineTests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5694 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 02:29:56 +00:00
hanna
f3dacd3c40
Use ByteBuffer.allocateDirect() instead of ByteBuffer.allocate().
...
ByteBuffer.allocateDirect() behaves like Java NIO MappedByteBuffers in that
it consumes address space, which counts against our virtual memory allocation;
but cannot be destroyed or otherwise freed. This was definitely contributing
to the LSF failures that I was seeing, but I'm not yet convinced that it's the
sole source of these virtual memory 'leaks'. More tomorrow as the results of
my whole exome tests start to roll in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5693 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 02:01:11 +00:00
chartl
7afeb1ab17
Removing broken imports (boo)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5692 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 18:55:25 +00:00
rpoplin
379f837e82
RankSum z-scores are looking quite good, so RIP Wilcoxon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5691 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 18:34:39 +00:00
chartl
bc3fd70b0a
Removing the old association walker, switching test to just validate that MannWhitneyU is doing the right thing. Unit tests still pass.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5690 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 18:05:19 +00:00
hanna
b915520653
Updating to apache commons math v2.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5689 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 17:31:49 +00:00
kshakir
58c7b27ccc
Missing file from last checkin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5688 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 00:12:41 +00:00
kshakir
f619dd3ca7
Refactored IntervalUtils used to parse and scatter intervals for Queue.
...
Scattering non-contig interval lists by number of loci in the intervals instead of just number of intervals.
Queue caches the list of locs and how to split them up instead of reloading them from disk repeatedly.
TODO: general purpose function to divide data evenly.
Skip over comments when parsing picard analysis files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5687 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 00:06:00 +00:00
kshakir
6ca4e3cebf
Updating FCPT nCalledLoci due to fixed QD<2.0 filter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5686 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-25 21:37:04 +00:00
kshakir
ed6da6f72d
Added JavaMail dependencies to Queue package since bcel wasn't picking them up.
...
Added the ability to add a file path to a package.
Checking for missing files when packaging.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5685 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-23 20:48:40 +00:00
kshakir
1158c99726
Only running chr20 test on the hour queue.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5684 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 22:09:42 +00:00
hanna
57a4700299
Ported small BAM performance test suite to the Google Caliper microbenchmarking suite. Looks promising,
...
but I'm still not sure that GC is a good long-term solution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5683 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 22:09:17 +00:00
kshakir
00b57c751b
Added missing ".0".
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5682 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 21:50:07 +00:00
chartl
a56a2dfdb7
Nothing to see here. Move along.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5681 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 15:01:02 +00:00
ebanks
8bc93046f4
Adding chain files for Mark. Tested by lifting over back and forth between builds. Note that they comprise only the standard contigs so no _randoms or GL000xxx.1s.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5680 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 14:53:15 +00:00
delangel
600617a63c
Enabled code to deal with hard-clipping adaptor sequence when processing reads in pileup in indel caller. Proven now that changes are minimal (4 less calls in NA12878 chr20, quals slightly different), minor changes in vcf fields in integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5679 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 14:10:33 +00:00
ebanks
e050d94df4
Renaming because they actually map to b37, not hg19
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5678 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 13:34:48 +00:00
ebanks
831ad0cd1a
Quit immediately with an error message if any of the individual steps fails.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5677 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 13:23:33 +00:00
chartl
88735a8c9b
Adding in a delta to try and better measure effect size -- equivalent to looking at the lower end of the N^th percentile confidence interval. Kind of a hacky way to add it in, the infrastructure is about due for a streamlining rewrite.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5676 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 03:53:33 +00:00
hanna
7428ae338a
A fix for Marian Thieme's NPE in the new sharding system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5675 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 19:47:14 +00:00
chartl
5b9a8555cd
Queue graph time is currently of O(n^m) where n = num jobs, m = num unique base files. This script therefore was running in order 1200^16, which I don't think would finish before the heat death of the universe. For now, push down the number of files to 1 and gather them outside of Queue, once I've fixed up scatter-gather in core, outputs can be uncommented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5674 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 12:56:25 +00:00
corin
9f006be425
Updates Omni path and removes a typo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5673 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 04:17:13 +00:00
ebanks
0007481890
Might as well store these here too even if they aren't used for the resource bundle
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5672 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 04:14:08 +00:00
ebanks
cbcdfc584d
Moving out of core and into playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5671 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 02:30:22 +00:00
depristo
cc78027bd3
Two optimizations. Even more aggressive printProgress meter optimization to only even consider doing work once every 1000 cycles through the engine. Second, GenomeLocParser now uses a single indirection around the contigInfo variable. This class uses a last used cache to retrieve efficiently contig information instead of always returning to the underlying SAMSequenceDictionary hashmap to make genome locs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5670 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 01:31:26 +00:00
depristo
29857f5ba6
Fix for instability in output of fasta alternative reference maker when snpmask and snp files are provided and have overlapping records. The order of the records changed due to optimization of the refmetadatatracker, and uncovered this non-determinanism. Now preferrentially masks out includes sites from snps before considering masking out sites in snpmask
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5669 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 21:54:09 +00:00
kshakir
8619f49d20
Added a utility method to retrieve the contig lengths for WG chunking.
...
Added a rudimentary GATKReportParser for parsing VE3 results.
Re-enabled the FCPTest using VE3, the GATKRP, and the PicardAggregationUtils.
The tag type for .rod files is DBSNP, not ROD.
More explicit return types on implicit methods.
Added null checks for implicit string to/from file conversions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5668 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 19:22:21 +00:00
delangel
59dd79faab
One more optimization: don't use Math.round(), but do my own rouding/casting. UG now about 40% faster calling indels, 30-35% faster calling snp's+indels simultaneously.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5667 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 19:15:58 +00:00
delangel
246d8190b5
Round one of "easy" zero-effort optimizations to UG's indel caller. Mostly inline functions, avoid repeated computation and try to optimize SoftMaxPair() which is by far the bigest runtime hog. More to come...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5666 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 18:57:34 +00:00
depristo
d8b8f857f3
V2 -- now working -- of a core walker that creates the standard GATK resource bundle
...
See https://www.broadinstitute.org/gsa/wiki/index.php/GATK_resource_bundle
Which live locally in /humgen/gsa-hpprojects/GATK/bundle/current
You use this following command to create the bundle:
java -Djava.io.tmpdir=/broad/shptmp/depristo/tmp -jar dist/Queue.jar -S scala/qscript/core/GATKResourcesBundle.scala --gatkjarfile dist/GenomeAnalysisTK.jar -bsub -jobQueue gsa -svn 5660 $*
Annoyingly, it must be run in the trunk directory, and requires an explicit svn version number to create the directory. It also must be run in two stages manually. First, the local bundle is created, and then with the -phase2 argument all of the files in the local bundle are compressed and pushed to the FTP server. I'm likely going to shift most of my processes over to using this location for data file access, especially for b37 data sets.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5665 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 12:48:47 +00:00