ebanks
af09170167
As I threatened yesterday, I've moved the various and disparate randomization code out of the walkers. Now they all (except VQSRv1, whose days are numbered anyways) use a static generator available in the engine itself. Please use this from now on. The seed is reset before every individual integration test is run. I think there may still be an issue with the IndelRealigner but I need to confirm with the commit to see what testNG does. Integration tests are already broken anyways, so no big deal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5589 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 17:03:48 +00:00
kshakir
45ebbf725c
Instead of always merging Picard interval files they are optionally merged by Sting Utils.
...
Disabled the MFCP while the FCP gets an update.
Minor updates to email messages for upcoming scala 2.9.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5588 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 21:12:05 +00:00
carneiro
89bb21d024
typo in the argument description
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5587 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 19:45:32 +00:00
rpoplin
3f3f35dea0
UnifiedGenotyper now BAQs via ADD_TAG to facilitate using BAQed quals for GL calculations but unBAQed quals for annotation calculations. UnifiedGenotyper now produces SNP and indel calls simultaneously. 40 base mismatch intrinsic filter removed from UG to greatly simplify the code. RankSumTests are now standard annotations but the integration tests are commented out pending changes that will allow random annotations to work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5585 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 19:06:24 +00:00
ebanks
1aa4083352
Fortunately this code isn't used by anyone right now, but it needs to be fixed before someone unwitingly does: flags were wrong according to the SAM spec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5584 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 17:16:41 +00:00
hanna
b231a40da5
Augment PrintLocusContextWalker with extended event info.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5583 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 13:42:48 +00:00
aaron
ab5c4064ed
quick bug fix for variant context utils: only calculate the max AC if we're using the mergeInfoWithMaxAC flag, and if so deal with sites that have multiple alternate alleles correctly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5582 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 05:36:52 +00:00
rpoplin
cc713f2769
fixing exception text
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5581 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 00:29:13 +00:00
ebanks
4b451314b2
Only store a read in the mate hash if it could possibly be moved. This reduces memory consumption especially when dealing with a case of tons of unmapped reads at the end of the bam; however, it's only mildly helpful for chr1 of the Papuans (there's a truly massive pileup 120Mb into it; more thought needed at a later point). Integration tests changed only because some of the reads in the original bam were busted to begin with (it's an old pilot 1000G bam).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5580 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 22:20:09 +00:00
chartl
79b5fa6cc5
Structural refactoring in advance of dichotomization statistics; generalization of statistical test infrastructure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5579 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 18:52:32 +00:00
asivache
77ca4eef31
IntelliJ complains that @Override is not allowed when implementing interface methods. Whatever.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5578 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 16:57:59 +00:00
ebanks
f4c06bb4ce
Traversal now says 'done with mapped reads' instead of 'done' so we don't confuse users when there are a lot of unmapped reads left to process.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5577 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 15:11:28 +00:00
fromer
5eccc7e528
Added annotation of INCORRECT SNP-based aa annotations in case of MNPdependentAA:true
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5576 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 02:46:45 +00:00
chartl
bb6a30611c
Forgot to modify the test too. What a bad commit. Sorry guys.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5575 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 02:11:08 +00:00
chartl
a0d096c993
Forgot an import statement
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5574 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 22:55:00 +00:00
chartl
b52c3e7e30
Make the window and slide-by values command-line accessible, and standardize for every context. Move the test classes (which are abstract association context modules) into the proper directory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5573 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 22:37:12 +00:00
droazen
db9908ec02
Small correction to the unit test code from my last commit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5572 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:55:38 +00:00
droazen
a5acb0b7a6
Fix for bug GSA-314: Detect -XL and -L incompatibility. An ArgumentException is
...
now thrown if the combination of -L and -XL intervals specified on the command
line results in an empty interval set after set subtraction.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5571 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:41:55 +00:00
carneiro
b722ebf244
quick help/comments updates to match the wikipage.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5569 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 12:55:55 +00:00
rpoplin
96f0f0d706
Fixing use of String != String
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5568 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 01:12:00 +00:00
depristo
095125152b
Updated to now longer include 2nd-best base output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5567 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 20:13:10 +00:00
rpoplin
b2a0331e2d
Pushing hard coded arguments into VariantRecalibratorArgumentCollection
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5566 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 19:55:09 +00:00
rpoplin
79c43845ad
Changing Uniform approximation to Normal approximation in rank sum test. n factorial was overflowing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5565 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 18:18:39 +00:00
depristo
b316c9a590
Renamed StratifyAlignmentContext to AlignmentContextUtils, and StatiefyContextType to ReadOrientation. Also, went through the system and deleted all references to second bases. That ship passed long ago. This was the actual commit, the last was an intellij error
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5564 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 15:36:17 +00:00
depristo
5cca100aea
Eliminated the redundant StratifiedAlignmentContext, which previously just held a ReadBackedPileup, and made all of the class methods here just static functions. Far more logical organization, and avoided O(N) endless copying of data for the COMPLETE context. Many tools have been trivially reorganized to take an alignment context now. Everything passes integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5562 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 14:20:43 +00:00
rpoplin
98798eb276
Adding ReadPos rank sum test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5560 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 22:28:41 +00:00
rpoplin
09e89c8c97
Adding ReadPos rank sum test. Transitioned rank sum tests over to using Chris's implementation in order to harmonize the codebase. There isn't any reason to have competing implementations of rank sum. Thanks to Chris for adding the necessary hypothesis testing options. WilcoxonRankSum.java will be deleted soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5559 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 22:26:35 +00:00
depristo
11822da578
Stand alone, GATK dependent tool that Reads a list of BAM files and slices all of them into a single merged BAM file containing reads in overlapping chr:start-stop interval. Highly efficient when working with thousands of BAM files. Can merge 1MB of sequence of 1600 4x BAMs in 4g in only 2 hours.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5558 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 13:41:29 +00:00
fromer
27bfec785e
Some walkers for printing FASTA of reference for bed ROD, and "inverting" a bed file (finding regions not covered in bed)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5554 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 21:13:51 +00:00
droazen
0927b7c297
Fix for bug GSA-441: BAM file list with blank lines gives a confusing error
...
message. Lines containing only whitespace in .list files are now ignored.
Also added support for comments in .list files: lines whose first
non-whitespace character is '#' are now also ignored.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5550 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 15:04:35 +00:00
kshakir
4f8411f4b5
Revved Picard to access new flag to disable mmap for bam indices. Only added a 3% speed boost but the mmap was added to the heap count, making it harder to specify/restrict the total resident memory size in LSF. Specifying -Xmx4g will now stay much closer to 4g resident memory usage versus bumping up to 9g when accessing 900 x ~8Mb bai's.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5549 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 01:40:41 +00:00
asivache
df53351b0f
Get rid of score cutoff at 0 in the alignment matrix (i.e. score[cell] = max(0, score[from_parent_cells]). Use the computed score as is. Technically, it's pretty much NW now, not SW.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5548 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 00:11:04 +00:00
carneiro
0a772688fe
implementation of the Gatherer class for CountCovariates, which makes it now scatter/gatherable. Kudos to the @Gather annotation Khalid just introduced!
...
QuickCCTest is my test script for the gatherer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5547 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 21:15:21 +00:00
carneiro
dac1309dbd
Added two modes for selecting variants at random (random sampling).
...
-number N -- generates a VCF with exactly N randomly chosen variants with equal probability.
-fraction F -- generates a VCF with approximately F (between 0-1) randomly chosen variants with equal probability. (Similar behavior to RandomlySplitVariants walker).
The reason for two modes is that the first one may need a lot of memory if your sample size is too large. The wiki is being updated with this information now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5545 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 21:12:40 +00:00
carneiro
8a3b7d88aa
It was returning 1 when it should return 0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5544 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 20:50:38 +00:00
depristo
c7445a6fbd
Now that logging is so standard, only prints messages about logging to DEBUG. Also, found a way to silence the mime.types warning, that doesn't matter at all to us.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5543 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 16:49:39 +00:00
droazen
7b452ea2b9
Fix for bug GSA-430: Can't specify same BAM file twice on the command line. An ArgumentException with an appropriate error message and a list of the duplicate BAMs is now thrown in this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5542 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 22:23:24 +00:00
hanna
deab9f0aa5
Initial work on proto-shard merger:
...
- create size() method that returns an approximation of the uncompressed size in bytes of BAM span.
I'll use this method as a protoshard weighting function until we determine how to normalize the
weights across the different data access mechanisms (reads, reference, RODs).
- Implementations of basic union/intersection/subtraction mechanisms for BAM spans; should be enough
to get an accurate weight for two proto-shards put together.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5541 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 22:03:43 +00:00
chartl
328f89f66a
Minor changes to MannWhitneyU:
...
- Comment fixes to better explain why two-sided test wants to use the LOWER (not higher) value for U
- Much more direct testing of MWU functions
- Uniform approximation was always using the < cumulant (sometimes the > cumulant should be used instead)
- Uniform approximation currently not used (regime in which it was being used was not the right one -- not necessarily bad, but not an improvement over normal)
+ this particular approximation is for major imbalances of the form m >> n. Code may be altered in the future to use this method for this particular regime, if the method's not too slow.
- Hook into one-sided test.
RegionalAssociationRecalibrator: NaNs were being caused by presence of Infinity and -Infinity values out of the walker. Currently I'm just re-setting them to arbitrary post-whitened values, but the walker will be changed to prevent output of these values, and the "fix" will undone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5539 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 17:03:02 +00:00
chartl
fff11a3279
No more pesky NaNs for norms ( HINT::: ((double) x) == Double.NaN is NOT (somehow) the same as Double.compare(x,Double.NaN) == 0). Effectively reverse sorting by changing (rank/size) to ((size-rank)/size).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5538 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 22:43:24 +00:00
carneiro
5d26c66769
Count Covariates is almost scatter-gatherable now!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5537 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 22:25:33 +00:00
rpoplin
5ddc0e464a
Under guidance from Matt added ability to use key-value tags with ROD binding command line arguments, so now one can say -B:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf and get the tags in a walker. Look at ContrastiveRecalibrator for an example of how to use the new ReferenceOrderedDataSource.getTags(). Removed references to FDR in tranches since we are only using truth sensitivity. Finally fixed long standing bug where tranche filters weren't set appropriately.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5536 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 21:04:09 +00:00
carneiro
0f4ace0902
fixed a bug when the concordance track doesn't have the sample in the variant track.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5535 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 18:24:19 +00:00
chartl
f6dfdc7f3b
Single-tailed hypothesis testing in MWU
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5533 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 15:53:40 +00:00
hanna
8ae14793f2
Small standalone utility to aggregate BGZF block statistics in a BAM file.
...
Works in the same coordinate space as BAM chunks, so this will be used to
calibrate chunk weighting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5531 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-28 22:25:45 +00:00
chartl
f3e4c24f63
Framework works properly now, but whitening still has a kink which is that the covariance matrix gets re-sorted automatically by the eigendecomposition, so somehow the association between eigenvalue and dimension (e.g. association track) needs to be maintained throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5530 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-28 22:22:37 +00:00
chartl
4c04c5a47a
Addition of a BedTableCodec to allow for parsing of Bed-formatted tables (e.g. bedGraphs). Fixes for the recalibrator. Implementation of the data whitening input. Some TODOs in the RAW.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5529 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-28 21:35:09 +00:00
corin
f2d84bf746
Changes the validity declaration from a true to false to a five point scale
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5527 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-28 18:31:53 +00:00
depristo
cd8321cdc9
Removed the completely unused generic but extremely expensive infrastructure for dynamic LocusIteratorFilters. Now the one, and probably only useful one, is called directly in the LocusIteratorByState itself to filter adaptor bases from reads. This shaves 10% off the runtime of all walkers, apparently. Has the additional benefit of eliminating a lot of complex infrastructure that resulted ultimately in only a single function call.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5525 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 20:48:24 +00:00
depristo
231d095316
A clean, fast way to compute fragment pileups. Now consumes no CPU time at all. Ready for general use.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5524 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 14:26:29 +00:00