ebanks
cd61ef7169
Re-enabling multi-threaded integration tests. To make this work, downsampling and annotations are disabled for this test so that we don't have randomization issues for it based on which shards get executed first.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5597 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 03:07:39 +00:00
hanna
32d502c122
Enable BAM OTF index writing by default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5594 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 23:44:25 +00:00
ebanks
af09170167
As I threatened yesterday, I've moved the various and disparate randomization code out of the walkers. Now they all (except VQSRv1, whose days are numbered anyways) use a static generator available in the engine itself. Please use this from now on. The seed is reset before every individual integration test is run. I think there may still be an issue with the IndelRealigner but I need to confirm with the commit to see what testNG does. Integration tests are already broken anyways, so no big deal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5589 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 17:03:48 +00:00
kshakir
45ebbf725c
Instead of always merging Picard interval files they are optionally merged by Sting Utils.
...
Disabled the MFCP while the FCP gets an update.
Minor updates to email messages for upcoming scala 2.9.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5588 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 21:12:05 +00:00
rpoplin
3f3f35dea0
UnifiedGenotyper now BAQs via ADD_TAG to facilitate using BAQed quals for GL calculations but unBAQed quals for annotation calculations. UnifiedGenotyper now produces SNP and indel calls simultaneously. 40 base mismatch intrinsic filter removed from UG to greatly simplify the code. RankSumTests are now standard annotations but the integration tests are commented out pending changes that will allow random annotations to work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5585 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 19:06:24 +00:00
ebanks
4b451314b2
Only store a read in the mate hash if it could possibly be moved. This reduces memory consumption especially when dealing with a case of tons of unmapped reads at the end of the bam; however, it's only mildly helpful for chr1 of the Papuans (there's a truly massive pileup 120Mb into it; more thought needed at a later point). Integration tests changed only because some of the reads in the original bam were busted to begin with (it's an old pilot 1000G bam).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5580 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 22:20:09 +00:00
chartl
79b5fa6cc5
Structural refactoring in advance of dichotomization statistics; generalization of statistical test infrastructure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5579 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 18:52:32 +00:00
chartl
bb6a30611c
Forgot to modify the test too. What a bad commit. Sorry guys.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5575 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 02:11:08 +00:00
droazen
db9908ec02
Small correction to the unit test code from my last commit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5572 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:55:38 +00:00
droazen
a5acb0b7a6
Fix for bug GSA-314: Detect -XL and -L incompatibility. An ArgumentException is
...
now thrown if the combination of -L and -XL intervals specified on the command
line results in an empty interval set after set subtraction.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5571 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:41:55 +00:00
depristo
095125152b
Updated to now longer include 2nd-best base output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5567 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 20:13:10 +00:00
droazen
0927b7c297
Fix for bug GSA-441: BAM file list with blank lines gives a confusing error
...
message. Lines containing only whitespace in .list files are now ignored.
Also added support for comments in .list files: lines whose first
non-whitespace character is '#' are now also ignored.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5550 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 15:04:35 +00:00
droazen
7b452ea2b9
Fix for bug GSA-430: Can't specify same BAM file twice on the command line. An ArgumentException with an appropriate error message and a list of the duplicate BAMs is now thrown in this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5542 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 22:23:24 +00:00
chartl
328f89f66a
Minor changes to MannWhitneyU:
...
- Comment fixes to better explain why two-sided test wants to use the LOWER (not higher) value for U
- Much more direct testing of MWU functions
- Uniform approximation was always using the < cumulant (sometimes the > cumulant should be used instead)
- Uniform approximation currently not used (regime in which it was being used was not the right one -- not necessarily bad, but not an improvement over normal)
+ this particular approximation is for major imbalances of the form m >> n. Code may be altered in the future to use this method for this particular regime, if the method's not too slow.
- Hook into one-sided test.
RegionalAssociationRecalibrator: NaNs were being caused by presence of Infinity and -Infinity values out of the walker. Currently I'm just re-setting them to arbitrary post-whitened values, but the walker will be changed to prevent output of these values, and the "fix" will undone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5539 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 17:03:02 +00:00
rpoplin
5ddc0e464a
Under guidance from Matt added ability to use key-value tags with ROD binding command line arguments, so now one can say -B:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf and get the tags in a walker. Look at ContrastiveRecalibrator for an example of how to use the new ReferenceOrderedDataSource.getTags(). Removed references to FDR in tranches since we are only using truth sensitivity. Finally fixed long standing bug where tranche filters weren't set appropriately.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5536 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 21:04:09 +00:00
chartl
f6dfdc7f3b
Single-tailed hypothesis testing in MWU
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5533 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 15:53:40 +00:00
depristo
cd8321cdc9
Removed the completely unused generic but extremely expensive infrastructure for dynamic LocusIteratorFilters. Now the one, and probably only useful one, is called directly in the LocusIteratorByState itself to filter adaptor bases from reads. This shaves 10% off the runtime of all walkers, apparently. Has the additional benefit of eliminating a lot of complex infrastructure that resulted ultimately in only a single function call.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5525 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 20:48:24 +00:00
depristo
3bcd4c5d75
--simplifyBAM is now in the SAMFileWriterArgumentTypeDescriptor, as suggested by map. PrintReads has an integrationtest now that writes out a 1 MB bit of HiSeq normally, with compress 0, and with simplifyBAM on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5521 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 14:57:18 +00:00
depristo
7272fcf539
Now uses the NO_HEADER option to avoid breaking MD5s due to changes in GATKArgumentCollection
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5518 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 12:00:37 +00:00
kshakir
fc8acd503e
Enabled the parameterize option for debugging PipelineTest MD5s.
...
Fixed escaping expressions that have more than one space between arguments.
Updated example to match the wiki.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5516 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 00:41:47 +00:00
ebanks
69646ff840
... and the corresponding integration test update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5496 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-23 01:58:07 +00:00
chartl
5a79f16ea4
Fixed an edge case where an exception was thrown if either of the sets was empty for the MWU test. Also altered the output format so U itself is not printed (which though interesting, isn't so useful for recalibration), but rather a value I call V (really the deviation of U from its expectation).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5490 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 16:28:44 +00:00
ebanks
1c95208e26
Finally found the bug that everyone is reporting on GS. Iterators on PriorityQueues aren't guaranteed to return elements in sorted order (a pretty stupid contract) - so we were passing items to the constrained writer out of order. Just do a Collections.sort instead (1 line of code). Happy father's day!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5476 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 21:28:19 +00:00
rpoplin
d98503ca50
Removing some debug code from VQSRv2. VariantEval can now be stratified by contig with -ST Contig. New hidden option in CombineVariants for overlapping records to take the info fields from the record with the highest AC (while still updating AC/AN/AF correctly) instead of dropping info fields which aren't exactly the same.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5448 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-15 21:28:10 +00:00
rpoplin
2a2538136d
A version of VQSRv2 that does contrastive clustering in two passes. The walkers will be renamed when they are moved to core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5443 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 21:03:56 +00:00
depristo
3e3ec85807
Checked for consistency with the previous integration tests, and updated the walker and test to use the new I/O system (always prints 4 digits on floats.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5433 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-13 15:24:22 +00:00
depristo
ee8f2871f7
A better output for Genotype Concordance summary. Now does only % comp hom-ref called hom-ref, het called het, and hom-var called hom-var, which are the quantities we typically show in slides. Updated intergration tests to reflect this change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5429 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 02:03:48 +00:00
rpoplin
b3464a6031
Initial commit of VQSRv2 that passes the old integration tests. Not ready to be used yet unless your name rhymes with ... oh wait, that's me.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5419 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 15:18:34 +00:00
ebanks
3596c56602
New attempt at the constrained movement version of the indel realigner (I've kept around the old writer for now). The new contract is that the realigner must ask permission before trying to clean an area; permission will be denied by the CM-Manager if it was required to flush its cache of reads because of too much depth within a distance of maxInsertSizeForMovingReadPairs. Added integration tests to cover different max cache sizes, including an expected exception when too small a value is chosen. The actual logic changes were fairly minor - much of this commit is really just some cleanup. I'd like to throw 1000G Phase I at it, but will respectfully wait for Ryan to hit his deadline before doing so.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5414 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 02:48:29 +00:00
rpoplin
ff7edc4493
Minor bug fix in empiricalMu prior calculation in VQSR.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5412 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 00:42:38 +00:00
rpoplin
509daac9f7
Minor bug fix in k-means implementation. Updating VQSR integration tests in preparation for VQSRv2 by removing some unused features such as VariantDatum.weight and ti/tv cutting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5410 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-09 00:26:28 +00:00
delangel
00ac51acc8
Added several integration tests for UG indel caller:
...
- Basic
- Multiple technology
- Test minIndelCnt parameter
Added also 2 disabled tests:
- Parallelization: issue w/code right now is that if -nt > 1, filter field shows "PASS" instead to ".", cause TBD
- Genotype given alleles mode: code not working yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5404 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 16:21:21 +00:00
kiran
d0598c7a04
Somehow missed this test when I was updating the md5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5400 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 23:53:42 +00:00
kiran
b6339967f8
Updated GenomicAnnotator integration tests to include the -NO_HEADER argument so that they tests op yelling about trtrivial differences
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5398 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 23:07:01 +00:00
kiran
43056d0188
Fixed integration test to reflect changes regarding when comp tracks got subset to fewer samples and whether no-call sites would get pulled in for comp tracks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5393 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 20:25:57 +00:00
kshakir
dc33fbed7c
Switched the CVUnitTest broken info from an Integer to a String since as of r5383 Integers are no longer broken when converted to Floats.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5390 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 16:33:14 +00:00
chartl
60ddc08cdf
Added a boatload of new case-control association modules. Switched the U-test to use longs rather than ints (it just so happened that I overflowed and started getting negative U statistics. Not good.) Added the ALL association type for ease of specifying that we want to throw the book at something. Added an svn-commit.tmp~ because i can't get rid of it even with --force. Hopefully I can remove it after.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5386 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 21:58:12 +00:00
depristo
af71576a07
CalculateChromosomeCounts() now only calculates AC, AF, and AN when there are genotypes. Can now combine variants with headers that differ in only whether a field is a integer or a float. Updated CombineVariants integrationtest, as incorrect AC values where being calculated in the previous GS outputs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5383 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 19:25:52 +00:00
chartl
a40a8006b5
Added in unit tests for the statistics calculated by the test runner; and bug-fixes to the calculations; so we have some assurance that the statistics coming out the back-end are correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5380 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 16:54:02 +00:00
kiran
1861ca90fc
A change to the definition of CpG sites (is now, from 5' to 3' a CG dinucleotide in the reference, and the CpG site is at the C, rather than either at the C or a G).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5373 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-04 15:36:07 +00:00
hanna
7a22f19366
More descriptive error when VerifyingSamIterator hits an inconsistent alignment. Also updated
...
case UserException.MalformedBAM to match case of UserExceptio.MissortedBAM for consistency and
ease-of-use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5364 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 03:55:24 +00:00
ebanks
bb969cd3a2
EMIT_ALL_SITES now does exactly that - even when there's no coverage or too many deletions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5343 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 05:05:00 +00:00
ebanks
5ac9af472c
Adding performance test for case with very high coverage (> 600,000x) over an interval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5336 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 19:48:56 +00:00
ebanks
05fac8583d
Following up Mark's recent commit: hooking up the --maxPositionalMoveAllowed argument into the indel realigner and through to the SAM writer. We now ensure that no read is realigned more than N bases (200 by default, which is nowhere close to realistically possible). If anyone ever sees a warning message about this with the default value then please let me know because I need to see it for myself.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5331 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 04:40:54 +00:00
hanna
600f73cbd6
A checkpoint commit of two BAM reading projects going on simultaneously. These two projects
...
are works in progress, and this checkin will provide a baseline against which to gauge
improvements to both projects.
Low-memory BAM protoshards (disabled by default):
- Currently passing ValidatingPileupIntegrationTest.
- Gets progressively slower throughout the traversal, but should run at least as fast as original implementation.
- Uses 10+ file handles per BAM, but should use 3.
BAM performance microbenchmark test system:
- Currently tests performance of BAM reading using SAM-JDK vs. GATK
- Tests do not hit all GATK performance hotspots.
- New tests that require input data in a slightly different form are hard to implement.
- Output of test results is not easily parseable (investigating Google Caliper for possible improvements).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5317 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 17:50:32 +00:00
ebanks
cba88a8861
Elegant solution to the determinism problem: force testNG to run tests in the order that I want it to.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5312 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 21:32:35 +00:00
ebanks
15dfac6bf7
Updating integration test to be in sync with previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5309 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:21:58 +00:00
ebanks
06e3c34e7f
Updating performance test to be in sync with previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5308 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:13:35 +00:00
chartl
97e1a5262e
-ct x no longer includes coverage in the previous bin
...
BatchMerge - additional support for indels (can't just test the alternate allele when it's an extended event, must also specify that you want to use the dindel model when you actually test the allele)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5300 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 15:52:04 +00:00
ebanks
ee6f112556
Phase 3: constrained movement is now the only option available in the realigner (so I guess technically it's not really an option). Several command-line options are deprecated. Code cleaned up. Wiki updated. Release coming. One phase left...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5299 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 14:59:48 +00:00
ebanks
93888e570b
Phase 2: after hours of testing, confirming that constrained mode looks good so moving the integration tests over to use it. Some cleanup. More cleanup coming in Phase 3.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5298 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 06:23:41 +00:00
ebanks
c59c8b9872
Phase I of my promise to Mark: fleshed out integration tests for Indel Realigner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5297 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 21:05:20 +00:00
ebanks
318035c147
Fixing up the output system of the Unified Genotyper. Deprecating the -all_bases and -genotype arguments. Adding instead the --output_mode (EMIT_VARIANTS_ONLY, EMIT_ALL_CONFIDENT_SITES, EMIT_ALL_SITES) and --genotyping_mode (DISCOVERY, GENOTYPE_GIVEN_ALLELES) arguments. UG now does the correct thing when passed alleles (bound to the 'alleles' rod) to use for genotyping; added several integration tests to cover this case. This commit will break the batched calls merging script, but Chris knows this and is ready for it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5288 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 06:07:18 +00:00
kiran
52f860c9b2
Modified MD5s to account for Andrey's new MNP column in CountVariants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5274 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 13:13:58 +00:00
kiran
cb95e68fc0
CpG is no longer a standard stratification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5273 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 07:17:35 +00:00
kiran
9ddee96f93
When subsetting by sample, need to take extra care that hom-ref sites don't accidentally get treated as variant sites in CompOverlap. Renamed convenience method for creating command-lines in integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5272 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 06:26:38 +00:00
kiran
92c82200c9
Fixed an issue where an eval module with TableType objects would get an extra, empty table in the output, screwing up the parse in R.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5267 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:03:46 +00:00
kshakir
d185c2961f
Added pipeline for calling FCP in batches called MultiFullCallingPipeline.
...
Bug smashes for the MCFP:
Synchronized access to LSF library and modifications to the QGraph.
If values are missing from the graph with -run make sure to exit with a non-zero.
Refactored QGraph to pre-generate a unique Int for each QNode speeding up getHashCode/equals inside the graph.
Added jobPriority and removed jobLimitSeconds from QFunction.
All scatter gather is by default in a single sub directory queueScatterGather.
Moved some FCPTest into BaseTest/PipelineTest for use by MFCPTest.
Rev'ed the 1000G bams used for validation from v1 to v2 and added code to look for the bams before running other tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5247 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 18:26:14 +00:00
kiran
d3660aa00e
Very basic functionality for annotating indels (specifies whether the indel is frameshift, inframe, or non-coding). Does not attempt to recalculate the variant codon, variant amino acid, or whether the site falls within a splice region. Added a convenience method to WalkerTest for building command-line arguments with the proper spacing (so that I stop getting annoyed when I've gotten it wrong and the test system yells at me.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5235 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-13 17:58:20 +00:00
ebanks
9554df1a7c
Adding integration test for indels in VF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5227 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 16:58:57 +00:00
hanna
b992abb6eb
A few more unit tests plus some extra
...
functionality for BAM index visualization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5222 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-09 01:51:34 +00:00
kshakir
4d1cca95bb
Removed deprecated getDbsnpFile.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:12:15 +00:00
kiran
ecbc38aff0
If no comp rod is specified, specify the dummy name none so that we still get counts.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5211 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 19:24:52 +00:00
ebanks
698096dc5a
Moving VariantsToVCF to the proper directory; removing the oneoffs CG indel converter in preparation for a ligitimate CG variant Feature class in the works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5207 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 05:21:01 +00:00
kiran
35c688ac67
Updated md5 for testVCFStreamingChain to reflect latest changes to VariantEval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5206 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 21:22:05 +00:00
kiran
1085bbf303
Fixed issue where all comp tracks were being treated as known tracks. Fixed issue where multiple JEXL expressions were causing an exception because the underlying object did not implement the Comparable interface. Fixed issue where variants being compared to the known track were not being checked for equality of variation type. Fixed issue where functional annotations were not being iterated over properly. Refactored a lot of helper methods into a separate VariantEvalUtils utility class. Significantly expanded the test suite using a small VCF with SNPs, indels, and non-variant loci which makes it much easier to see what the proper answer should be, and included the appropriate grep and awk commands in the comments to confirm the values.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5204 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 19:19:20 +00:00
depristo
ce51ffb56e
Oops, old local paths committed on accident.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5200 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 23:35:56 +00:00
depristo
29f3ad72f3
SAMFileWriter that allows the user to move reads, but only a bit, in an incoming coordinated sorted BAM files. Does some local reordering and local mate fixing, under specified constrained. These constrains allow us to make a special -- under testing for Eric, who promised to try this out a bit, expand test cases and integration tests -- but soon to be the default and only model of the realigner that only moves reads with ISIZE < 3000 that directly emits a coordinate sorted, mate fixed validating BAM file without needing FixMates externally. Preliminary testing shows this runs in a totally fine amount of memory and produces equivalent results to the previous version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5199 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:27:05 +00:00
hanna
5c3198520c
A few minor modifications masquerading as significant changes according to
...
svn's logs:
- Copied BAM indexing engine from Picard back into the GATK anticipating
shard merging algorithm. Tried to leave most of the building blocks in
Picard. If this turns into a logistical nightmare, I'll merge the building
blocks into the GATK as well.
- Reorganized the org.broadinstitute.sting.gatk.datasources package, giving
better separation of query and management functionality for reads, ref, rmd,
and samples.
- Merged Shard building blocks into org.broadinstitute.sting.gatk.datasources.
reads package, indicating it's current strong relationship with the reads,
rather than the general unifying element I wish this would be.
- Collapsed BAMFormatAwareShard into Shard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5184 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:59:19 +00:00
kiran
9ddc95c833
NewEvaluationContext needs to be generated in the inner loop. Otherwise, multiple comp tracks end up getting routed to the same row of the output table. Added test to cover multiple comp tracks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5181 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 07:04:53 +00:00
kiran
cb6454bf98
Multiple eval tracks should be bound with different names, rather than just 'eval'. Added tests to cover usage with multiple tracks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5177 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 22:33:50 +00:00
kiran
2732c839d4
Restored parallelism and associated tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5170 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 02:04:03 +00:00
kiran
fd8dd8fb9b
Fixed an issue where a no-call in the eval track would prevent a site from a comparison track from being loaded. Added a new test to cover the use case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5169 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 01:47:53 +00:00
hanna
06b63d8336
Pulled out CpG stratification in test results at Kiran's suggestion.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5165 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 18:36:09 +00:00
hanna
91297c138b
Update VCFStreamingIntegrationTest to use new variant eval command-line
...
arguments, output format.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5162 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 15:40:43 +00:00
hanna
7d89ce820b
Got tired of waiting for Kiran to fix the build: updated NewVariantEval ->
...
VariantEval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5161 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 15:32:39 +00:00
hanna
96241c6637
More testng fallout: fixing another seemingly 'random' issue arising from an
...
alternate test ordering.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5160 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 15:25:50 +00:00
kshakir
e74f28ad89
If there's an LSF queue maximum time limit set and the user hasn't specified one for this job, pass on the queue defined maximum limit with the job.
...
Updated LibBatIntegrationTest to use proper networked temp directory accessible by local machines and nodes.
Disabling the FCPTest until the VE3 is incorporated into the fullCallingPipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5151 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 23:13:09 +00:00
kshakir
d4f744a4d4
Checking if the interval files exist before using them to calculate the minimum scatter parts.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5143 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 18:07:34 +00:00
kiran
401feca90d
Updates to VariantEval 3.0 integration test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5140 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 17:45:06 +00:00
ebanks
d406d9b3fc
There's no reason to special case no-calls if they already have PLs associated with them. Just use the PLs!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5136 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 15:05:45 +00:00
hanna
5e7a5cf924
Quick fix for Danny Lieber: flesh out the additional functionality required
...
to align to a reference other than what's specified in the header.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5133 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 05:28:37 +00:00
kshakir
2ef66af903
Moved the maximum number of intervals check from FCP to the Queue core so that scatter gather will no longer blow up if you specify a scatter count that is too high.
...
Moved the BamListWriter from FCP to ListWriterFunction in the Queue core.
Added an ExampleCountLoci QScript along with an example pipeline integration test which checks MD5s.
Added a few more utility methods to PipelineTest including a currentGATK variable that points to the GATK jar.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5121 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 23:33:58 +00:00
asivache
04d66a7d0d
Updated integration test's MD5s reflect the fact that assay sequences were previously designed incorrectly for indels, the bug is now fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5120 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 23:00:22 +00:00
depristo
2182b8c7e2
Better query start / stop function that directly parses the cigar string, unlike the previous version. Now properly handles H (hard-clipped) reads. Added -baq OFF and -baq RECALCULATE integration tests on all three 1KG technologies. Please let me know if this new code somehow fails.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5108 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 15:08:21 +00:00
kiran
9cb1ae384c
Constant precision for floating point numbers. Added integration test - carries over tests from VariantEval with the necessary modifications to command-line arguments and md5s. Disabled use of 'synchronized' keyword because I clearly don't get how that keyword is supposed to work yet...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5107 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 05:19:18 +00:00
depristo
f29bb0639b
Documentation and cleanup of the distributed GATK implementation. Detailed documentation -- given that Matt will be extending the system in the near future -- about how the locking and processing trackers work. Added error trapping to note that distributed, shared-memory parallelism isn't yet implemented, instead of just not working silently. General utility function for the analysis of distributedGATK operation in the analysis directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5106 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 03:40:09 +00:00
depristo
f522eb2848
Previous tests were just too big...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5095 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 13:48:38 +00:00
hanna
4a33cdacde
Some basic integration tests detecting breakage in OTF BAM index generation.
...
Doing it manually for the moment so that there's at least something testing
this capability; will followup eventually with Mark to see whether we can
shape the VCF index generation code in such a way that it supports BAM index
testing as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5093 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 23:48:04 +00:00
ebanks
dfc5a3d1f3
added integration test for --sites_only option
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5082 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 14:58:15 +00:00
depristo
be697d96f9
An apparently robust implementation of the file locking for distributed computation, using Lucene's file creation locking approach. It is worth trying out for those with large-scale, high-cost data sets. Details and discussion at group meeting on Wednesday. Some cleanup still needed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5079 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 13:45:40 +00:00
kshakir
9923e05e0a
Moved MD5 utils from WalkerTest to BaseTest for use by PipelineTests.
...
Moved VariantEval validation from FCPTest to PipelineTest.
Cleaned up some duplicate code for writing temp files during tests.
Moved FCPTest to playground namespace to match move for FCP.q.
Added a basic HelloWorldPipelineTest for the HelloWorld QScript.
Moved duplicated error handling from JobRunners into the FunctionEdge.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5068 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 04:11:49 +00:00
hanna
9db02059ac
Fix for Ryan's issue: reads ending with indel distort the location of the
...
pileup, resulting a two map() calls for the same locus (and no map call for
the locus immediately following).
Fixed bug and added comprehensive unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5067 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 19:49:39 +00:00
depristo
c50f39a147
V3 of the distributed GATK. High-efficiency implementation. Support for status tracking for debugging and display. Still not safe for production use due to NFS filelock problem. V4 will use alternative file locking mechanism
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5063 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 16:45:07 +00:00
depristo
a51061fd96
Improved distributed processing analytics. Still not 100% ready for prime-time. More improvements incoming. Iterator claim now supports requests to obtain in a single atomic claim (one lock) multiple sequential shards, which radically reduces overhead. However, deadlocking is still possible...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5061 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 16:17:25 +00:00
ebanks
2d4bcb60a1
Don't print out alt alleles for ref calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5060 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 06:33:31 +00:00
ebanks
2bbcc9275a
Committing the fragment-based calling code. Results look great in all datasets (will show this at 1000G this week with Ryan). Note that this is an intermediate commit. The code needs to be cleaned up and the fragmentation code needs to be moved up into LocusIteratorByState. This should all happen later this week, but I don't want Ryan to have to keep running from my own personal Sting directory. The current crappy implementation adds ~10% to the runtime, but that should all go away in the next iteration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5058 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 05:04:17 +00:00
depristo
9b1b8d46aa
Performance tracking of GenomeLocProcessingTrackers, as well as a marker for where to put tracker in HierarchicalMicroScheduler
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5051 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 22:24:42 +00:00
hanna
aea121a9d5
<key>=<value> tagging support for command-line arguments. Unfortunately, still
...
very hard to validate and still very hard to use (requires core hacking to
support additional tags).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5038 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 00:22:42 +00:00
kshakir
8855f080c2
For the fullCallingPipeline.q:
...
- Reading the refseq table from the YAML if not specified on the command line.
- Removed obsolete -bigMemQueue now that CombineVariants runs in 4g.
- Added a -mountDir /broad/software option to work around adpr automount issues.
- Merged the LSF preexec used for automount into the shell script used to execute tasks.
- Using the LSF C Library to determine when jobs are complete instead of postexec.
- Updated queue.sh to match the changes above.
- Updated the FCPTest to match the changes above.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5036 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 22:34:43 +00:00