chartl
88735a8c9b
Adding in a delta to try and better measure effect size -- equivalent to looking at the lower end of the N^th percentile confidence interval. Kind of a hacky way to add it in, the infrastructure is about due for a streamlining rewrite.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5676 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 03:53:33 +00:00
hanna
7428ae338a
A fix for Marian Thieme's NPE in the new sharding system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5675 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 19:47:14 +00:00
chartl
5b9a8555cd
Queue graph time is currently of O(n^m) where n = num jobs, m = num unique base files. This script therefore was running in order 1200^16, which I don't think would finish before the heat death of the universe. For now, push down the number of files to 1 and gather them outside of Queue, once I've fixed up scatter-gather in core, outputs can be uncommented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5674 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 12:56:25 +00:00
ebanks
cbcdfc584d
Moving out of core and into playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5671 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 02:30:22 +00:00
depristo
cc78027bd3
Two optimizations. Even more aggressive printProgress meter optimization to only even consider doing work once every 1000 cycles through the engine. Second, GenomeLocParser now uses a single indirection around the contigInfo variable. This class uses a last used cache to retrieve efficiently contig information instead of always returning to the underlying SAMSequenceDictionary hashmap to make genome locs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5670 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 01:31:26 +00:00
depristo
29857f5ba6
Fix for instability in output of fasta alternative reference maker when snpmask and snp files are provided and have overlapping records. The order of the records changed due to optimization of the refmetadatatracker, and uncovered this non-determinanism. Now preferrentially masks out includes sites from snps before considering masking out sites in snpmask
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5669 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 21:54:09 +00:00
kshakir
8619f49d20
Added a utility method to retrieve the contig lengths for WG chunking.
...
Added a rudimentary GATKReportParser for parsing VE3 results.
Re-enabled the FCPTest using VE3, the GATKRP, and the PicardAggregationUtils.
The tag type for .rod files is DBSNP, not ROD.
More explicit return types on implicit methods.
Added null checks for implicit string to/from file conversions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5668 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 19:22:21 +00:00
delangel
59dd79faab
One more optimization: don't use Math.round(), but do my own rouding/casting. UG now about 40% faster calling indels, 30-35% faster calling snp's+indels simultaneously.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5667 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 19:15:58 +00:00
delangel
246d8190b5
Round one of "easy" zero-effort optimizations to UG's indel caller. Mostly inline functions, avoid repeated computation and try to optimize SoftMaxPair() which is by far the bigest runtime hog. More to come...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5666 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 18:57:34 +00:00
depristo
a8f8077d7a
Simple optimizations for cases where there is no data or RODs at sites, such as with the FastaStats walker. private static immutable Lists and Maps in underlying data structures that have no associated data. Also, avoiding a double map.get() in the low-level genome loc parser. RefMetaDataTracker is now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5664 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 10:52:16 +00:00
hanna
54660a8c25
Fix requested by Lee Lichtenstein: first check to see whether it's time for
...
a progress message, then aggregate metrics. Makes the overhead of
printProgress in RealignerTargetCreator go from >20% to ~3%.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5663 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 03:22:48 +00:00
hanna
49550e257f
Fix for JamesP's issue. This issue appeared because of a design flaw in the
...
interface between SAMDataSource and IntervalSharder that needs to stay around
until the original BAM sharder is retired. Will add a JIRA to fix design
flaw.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5661 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-19 00:52:13 +00:00
depristo
541c9109b3
V1 of GATK Resource Bundling system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5659 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-18 19:23:45 +00:00
ebanks
673772a522
Catch samtools exceptions and make them 'BAM Exceptions' asking the user to run Picard's validator and re-index the file before posting anything to the forum. Let's see whether this helps or not.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5658 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-18 03:52:43 +00:00
ebanks
e97a5ca161
Rename 'verbose' argument to 'debug_file'.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5657 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-18 03:17:13 +00:00
chartl
e28fc21642
Spurious associations can develop from including ambiguous reads in these tests. Perhaps MQ0 reads shouldn't be used for anything except MQ0, but the best way to do that is to restructure the code, so for now I'll put it off.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5656 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-17 23:17:03 +00:00
ebanks
49ea07acce
My fixes to Tribble yesterday revealed that some of the test VCFs for integration tests were actually malformed. Also, Guillermo updated the b37 dbSNP VCF and that broke some tests. Should be good for now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5655 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-17 03:39:11 +00:00
chartl
e5ef8388fc
BatchMerge - AlleleVCF --> AllelesVCF, this (combined with Eric's fix) will solve James P.'s forum issue.
...
After viewing results on real case/control data from RAW -- it's really working quite well. ReadIndels, however, needs to use a T-test rather than a U-test, especially in deep coverage (at indel sites, the reads with indels will have mostly the same number of CIGAR indel elements -- one -- which doesn't really play nicely with the UTest when sample sets are large). Modified ReadsLargeInsertSize to be a two-way test (e.g. ReadsLarge and ReadsSmall). BaseQualityScore also suffers from the same issue as read indels, so switching over to a T-test in that case as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5653 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 22:03:16 +00:00
ebanks
1c32deb108
For some reason I wasn't allowing expressions to be used with the -all argument.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5652 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 20:59:10 +00:00
corin
2cf6a06503
Throwing an error if INFO fields arguments contain whitespace.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5651 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 20:52:55 +00:00
corin
fce6d25075
Moved the reference ID to a meta data field for validity declaration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5650 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 20:28:56 +00:00
corin
59215dab48
Now writes results to a minimal vcf with annotations included in the INFO field. Must be run with -NO_HEADER to totally remove header for the most bare bones vcf; otherwise also includes command line meta data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5649 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 20:14:02 +00:00
ebanks
fe26954ac6
Minimal support for reading in VCF4.1 files. Added TODOs that need to be fixed or cleaned up to truly support this version. VCF constants updated. Lower-case bases permitted. Please let's make sure to refactor once we're ready to support it for good.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5648 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 18:59:37 +00:00
ebanks
7e9051ea25
The solution to James's bug was just to clean up the code and simplify it. What happened was that functionality that got put into UGCalcLikelihoods was then generalized into the UG engine but then never removed from UGCalcLikelihoods. This knowingly breaks the batch merger, but Chris said he'll take care of it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5647 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 18:05:10 +00:00
hanna
0d7cca169e
Sigh.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5645 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 14:37:24 +00:00
hanna
0965020804
Screwed up the doc string.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5644 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 14:30:20 +00:00
hanna
be3bad1f61
Low-memory sharding is now enabled by default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5643 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 14:22:07 +00:00
ebanks
2830dc70b7
UG can still return null in certain nasty cases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5642 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 20:11:17 +00:00
fromer
8e0f5bc5a5
Prevent NullPointerException in cases where SNP is filtered
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5641 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 19:59:59 +00:00
depristo
ee94af3539
Oops, left out of earlier commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5640 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 18:21:16 +00:00
depristo
8ed9c0f518
VariantsToTable now blows up by default if you ask for a field that isn't present in a record.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5636 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 14:42:43 +00:00
fromer
b3cd14d10a
Since GCcontentIntervalWalker no longer uses any ROD, turn it into a LocusWalker that traverses by REFERENCE
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5635 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 03:15:09 +00:00
aaron
2089c3bdef
removing; should of gone to the CGA repo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5633 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 22:17:45 +00:00
aaron
da6f2d3c9d
adding the capseg tools to the new walker repo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5632 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 22:11:08 +00:00
kshakir
4bb573b1f5
Centralizing a bunch of Broad specific utility functions from code scattered in GSA-Firehose, PipelineTest, custom QScripts, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5631 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 21:29:02 +00:00
ebanks
91d308fc6d
temporary patch until Picard (hopefully) fixes the NM calculation to deal with reads that align off the end of the contig
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5630 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 19:18:18 +00:00
ebanks
fa6468d167
Remove the adaptor sequence clipping read filter because it is dangerous (it breaks LocusIteratorByState). We'll bring it back to life when ReadTransformers are created. Instead, have the utility code return a new clipped SAMRecord (necessary so that we don't break SNP calling in UG when the indel caller tries to hard-clip the reads).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5629 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 18:47:47 +00:00
hanna
5849e112e1
Fix exception in block weighting minus function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5628 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 17:07:04 +00:00
hanna
a36adf0c6b
Request from the cancer team -- guarantee via javadoc that the returned
...
read metrics are actually a clone, which they can do with as they wish.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5626 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 15:10:46 +00:00
delangel
06b1497902
Corrected bad merge.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5625 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 15:02:09 +00:00
delangel
9134bf3129
Long-forgotten change I neglected to commit a while back: add ability for SelectVariants to extracts either SNPs or Indels from combined vcf file. Not the ideal place to do it but it's important to at least have something to split vcfs now that we call snp's and indels combined.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5624 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 14:58:44 +00:00
chartl
8e0d191a70
Added a walker to help sort out which samples in a region are giving signal. Lots of reused code that shouldn't be. Will refactor later.
...
Also fixed an "issue" with InsertSizeDistribution -- apparently for mate pairs, the first mate (karyotypically) will have a POSITIVE insert size, and the second a NEGATIVE insert size -- thus the insert size distribution was being conflated with enrichment/depletion of first-in-pair or second-in-pair reads. Gah.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5623 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 13:53:31 +00:00
chartl
efe6c539ac
Re-enabling disabled test. Apparently T-tests are very picky about your using an unbiased variance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5622 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 03:05:50 +00:00
chartl
42bc003f46
Oops. I'll need to look at this, I think it was accidentally enabled. Disabling for now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5621 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 00:54:52 +00:00
hanna
22a11e41e1
Rewrite of GATKBAMIndex to avoid mmaps causing false reports of heavy memory
...
usage.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5620 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:49:58 +00:00
chartl
36d8f55286
Use the 'standard' arcsine transform
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5619 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:11:45 +00:00
chartl
8125b8b901
Old changes to the exome VQSR search.
...
SGA updated to include new proportion-based insert size test.
Major fix for dichotomization test: MathUtils now optionally ignores NaN values for sums, averages, variances. In the future this feature can be pushed back into the AssociationContext object iself (e.g. no data? no entry), but it's kept like this for transparency for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5618 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:00:50 +00:00
rpoplin
30a19a00fe
Fix for when running with EMIT_ALL_SITES but not GENOTYPE_GIVEN_ALLELES. Still want to emit a site even when over the deletion fraction for example.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5617 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 20:07:06 +00:00
delangel
488622041d
Further trivial cleanup: Renamed DindelGenotypeLikelihoodsCalculationModel to IndelGenotypeLikelihoodsCalculationModel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5616 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 18:00:48 +00:00
delangel
3b424fd74d
Enable new indel likelihood model by default, cleanup code, remove dead arguments, still more cleanups to follow. This isn't final version but at least it performs better in all cases than previous Dindel-based version, so no reason to keep old one around.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5615 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 17:54:46 +00:00