hanna
22a11e41e1
Rewrite of GATKBAMIndex to avoid mmaps causing false reports of heavy memory
...
usage.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5620 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:49:58 +00:00
rpoplin
30a19a00fe
Fix for when running with EMIT_ALL_SITES but not GENOTYPE_GIVEN_ALLELES. Still want to emit a site even when over the deletion fraction for example.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5617 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 20:07:06 +00:00
delangel
3b424fd74d
Enable new indel likelihood model by default, cleanup code, remove dead arguments, still more cleanups to follow. This isn't final version but at least it performs better in all cases than previous Dindel-based version, so no reason to keep old one around.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5615 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 17:54:46 +00:00
ebanks
b6e7b5dace
Updating to reflect my recent Tribble fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5601 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 11:48:00 +00:00
ebanks
cd61ef7169
Re-enabling multi-threaded integration tests. To make this work, downsampling and annotations are disabled for this test so that we don't have randomization issues for it based on which shards get executed first.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5597 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 03:07:39 +00:00
ebanks
af09170167
As I threatened yesterday, I've moved the various and disparate randomization code out of the walkers. Now they all (except VQSRv1, whose days are numbered anyways) use a static generator available in the engine itself. Please use this from now on. The seed is reset before every individual integration test is run. I think there may still be an issue with the IndelRealigner but I need to confirm with the commit to see what testNG does. Integration tests are already broken anyways, so no big deal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5589 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 17:03:48 +00:00
rpoplin
3f3f35dea0
UnifiedGenotyper now BAQs via ADD_TAG to facilitate using BAQed quals for GL calculations but unBAQed quals for annotation calculations. UnifiedGenotyper now produces SNP and indel calls simultaneously. 40 base mismatch intrinsic filter removed from UG to greatly simplify the code. RankSumTests are now standard annotations but the integration tests are commented out pending changes that will allow random annotations to work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5585 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 19:06:24 +00:00
ebanks
4b451314b2
Only store a read in the mate hash if it could possibly be moved. This reduces memory consumption especially when dealing with a case of tons of unmapped reads at the end of the bam; however, it's only mildly helpful for chr1 of the Papuans (there's a truly massive pileup 120Mb into it; more thought needed at a later point). Integration tests changed only because some of the reads in the original bam were busted to begin with (it's an old pilot 1000G bam).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5580 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 22:20:09 +00:00
droazen
db9908ec02
Small correction to the unit test code from my last commit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5572 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:55:38 +00:00
droazen
a5acb0b7a6
Fix for bug GSA-314: Detect -XL and -L incompatibility. An ArgumentException is
...
now thrown if the combination of -L and -XL intervals specified on the command
line results in an empty interval set after set subtraction.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5571 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:41:55 +00:00
depristo
095125152b
Updated to now longer include 2nd-best base output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5567 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 20:13:10 +00:00
droazen
0927b7c297
Fix for bug GSA-441: BAM file list with blank lines gives a confusing error
...
message. Lines containing only whitespace in .list files are now ignored.
Also added support for comments in .list files: lines whose first
non-whitespace character is '#' are now also ignored.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5550 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 15:04:35 +00:00
droazen
7b452ea2b9
Fix for bug GSA-430: Can't specify same BAM file twice on the command line. An ArgumentException with an appropriate error message and a list of the duplicate BAMs is now thrown in this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5542 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 22:23:24 +00:00
rpoplin
5ddc0e464a
Under guidance from Matt added ability to use key-value tags with ROD binding command line arguments, so now one can say -B:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf and get the tags in a walker. Look at ContrastiveRecalibrator for an example of how to use the new ReferenceOrderedDataSource.getTags(). Removed references to FDR in tranches since we are only using truth sensitivity. Finally fixed long standing bug where tranche filters weren't set appropriately.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5536 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 21:04:09 +00:00
depristo
cd8321cdc9
Removed the completely unused generic but extremely expensive infrastructure for dynamic LocusIteratorFilters. Now the one, and probably only useful one, is called directly in the LocusIteratorByState itself to filter adaptor bases from reads. This shaves 10% off the runtime of all walkers, apparently. Has the additional benefit of eliminating a lot of complex infrastructure that resulted ultimately in only a single function call.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5525 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 20:48:24 +00:00
depristo
3bcd4c5d75
--simplifyBAM is now in the SAMFileWriterArgumentTypeDescriptor, as suggested by map. PrintReads has an integrationtest now that writes out a 1 MB bit of HiSeq normally, with compress 0, and with simplifyBAM on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5521 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 14:57:18 +00:00
ebanks
69646ff840
... and the corresponding integration test update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5496 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-23 01:58:07 +00:00
ebanks
1c95208e26
Finally found the bug that everyone is reporting on GS. Iterators on PriorityQueues aren't guaranteed to return elements in sorted order (a pretty stupid contract) - so we were passing items to the constrained writer out of order. Just do a Collections.sort instead (1 line of code). Happy father's day!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5476 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 21:28:19 +00:00
depristo
3e3ec85807
Checked for consistency with the previous integration tests, and updated the walker and test to use the new I/O system (always prints 4 digits on floats.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5433 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-13 15:24:22 +00:00
depristo
ee8f2871f7
A better output for Genotype Concordance summary. Now does only % comp hom-ref called hom-ref, het called het, and hom-var called hom-var, which are the quantities we typically show in slides. Updated intergration tests to reflect this change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5429 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 02:03:48 +00:00
ebanks
3596c56602
New attempt at the constrained movement version of the indel realigner (I've kept around the old writer for now). The new contract is that the realigner must ask permission before trying to clean an area; permission will be denied by the CM-Manager if it was required to flush its cache of reads because of too much depth within a distance of maxInsertSizeForMovingReadPairs. Added integration tests to cover different max cache sizes, including an expected exception when too small a value is chosen. The actual logic changes were fairly minor - much of this commit is really just some cleanup. I'd like to throw 1000G Phase I at it, but will respectfully wait for Ryan to hit his deadline before doing so.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5414 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 02:48:29 +00:00
rpoplin
ff7edc4493
Minor bug fix in empiricalMu prior calculation in VQSR.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5412 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 00:42:38 +00:00
rpoplin
509daac9f7
Minor bug fix in k-means implementation. Updating VQSR integration tests in preparation for VQSRv2 by removing some unused features such as VariantDatum.weight and ti/tv cutting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5410 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-09 00:26:28 +00:00
delangel
00ac51acc8
Added several integration tests for UG indel caller:
...
- Basic
- Multiple technology
- Test minIndelCnt parameter
Added also 2 disabled tests:
- Parallelization: issue w/code right now is that if -nt > 1, filter field shows "PASS" instead to ".", cause TBD
- Genotype given alleles mode: code not working yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5404 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 16:21:21 +00:00
kiran
d0598c7a04
Somehow missed this test when I was updating the md5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5400 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 23:53:42 +00:00
kiran
b6339967f8
Updated GenomicAnnotator integration tests to include the -NO_HEADER argument so that they tests op yelling about trtrivial differences
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5398 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 23:07:01 +00:00
kiran
43056d0188
Fixed integration test to reflect changes regarding when comp tracks got subset to fewer samples and whether no-call sites would get pulled in for comp tracks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5393 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 20:25:57 +00:00
kshakir
dc33fbed7c
Switched the CVUnitTest broken info from an Integer to a String since as of r5383 Integers are no longer broken when converted to Floats.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5390 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 16:33:14 +00:00
depristo
af71576a07
CalculateChromosomeCounts() now only calculates AC, AF, and AN when there are genotypes. Can now combine variants with headers that differ in only whether a field is a integer or a float. Updated CombineVariants integrationtest, as incorrect AC values where being calculated in the previous GS outputs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5383 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 19:25:52 +00:00
kiran
1861ca90fc
A change to the definition of CpG sites (is now, from 5' to 3' a CG dinucleotide in the reference, and the CpG site is at the C, rather than either at the C or a G).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5373 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-04 15:36:07 +00:00
hanna
7a22f19366
More descriptive error when VerifyingSamIterator hits an inconsistent alignment. Also updated
...
case UserException.MalformedBAM to match case of UserExceptio.MissortedBAM for consistency and
ease-of-use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5364 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 03:55:24 +00:00
ebanks
bb969cd3a2
EMIT_ALL_SITES now does exactly that - even when there's no coverage or too many deletions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5343 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 05:05:00 +00:00
ebanks
5ac9af472c
Adding performance test for case with very high coverage (> 600,000x) over an interval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5336 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 19:48:56 +00:00
hanna
600f73cbd6
A checkpoint commit of two BAM reading projects going on simultaneously. These two projects
...
are works in progress, and this checkin will provide a baseline against which to gauge
improvements to both projects.
Low-memory BAM protoshards (disabled by default):
- Currently passing ValidatingPileupIntegrationTest.
- Gets progressively slower throughout the traversal, but should run at least as fast as original implementation.
- Uses 10+ file handles per BAM, but should use 3.
BAM performance microbenchmark test system:
- Currently tests performance of BAM reading using SAM-JDK vs. GATK
- Tests do not hit all GATK performance hotspots.
- New tests that require input data in a slightly different form are hard to implement.
- Output of test results is not easily parseable (investigating Google Caliper for possible improvements).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5317 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 17:50:32 +00:00
ebanks
cba88a8861
Elegant solution to the determinism problem: force testNG to run tests in the order that I want it to.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5312 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 21:32:35 +00:00
ebanks
15dfac6bf7
Updating integration test to be in sync with previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5309 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:21:58 +00:00
ebanks
06e3c34e7f
Updating performance test to be in sync with previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5308 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:13:35 +00:00
chartl
97e1a5262e
-ct x no longer includes coverage in the previous bin
...
BatchMerge - additional support for indels (can't just test the alternate allele when it's an extended event, must also specify that you want to use the dindel model when you actually test the allele)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5300 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 15:52:04 +00:00
ebanks
ee6f112556
Phase 3: constrained movement is now the only option available in the realigner (so I guess technically it's not really an option). Several command-line options are deprecated. Code cleaned up. Wiki updated. Release coming. One phase left...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5299 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 14:59:48 +00:00
ebanks
93888e570b
Phase 2: after hours of testing, confirming that constrained mode looks good so moving the integration tests over to use it. Some cleanup. More cleanup coming in Phase 3.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5298 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 06:23:41 +00:00
ebanks
c59c8b9872
Phase I of my promise to Mark: fleshed out integration tests for Indel Realigner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5297 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 21:05:20 +00:00
ebanks
318035c147
Fixing up the output system of the Unified Genotyper. Deprecating the -all_bases and -genotype arguments. Adding instead the --output_mode (EMIT_VARIANTS_ONLY, EMIT_ALL_CONFIDENT_SITES, EMIT_ALL_SITES) and --genotyping_mode (DISCOVERY, GENOTYPE_GIVEN_ALLELES) arguments. UG now does the correct thing when passed alleles (bound to the 'alleles' rod) to use for genotyping; added several integration tests to cover this case. This commit will break the batched calls merging script, but Chris knows this and is ready for it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5288 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 06:07:18 +00:00
kiran
52f860c9b2
Modified MD5s to account for Andrey's new MNP column in CountVariants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5274 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 13:13:58 +00:00
kiran
cb95e68fc0
CpG is no longer a standard stratification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5273 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 07:17:35 +00:00
kiran
9ddee96f93
When subsetting by sample, need to take extra care that hom-ref sites don't accidentally get treated as variant sites in CompOverlap. Renamed convenience method for creating command-lines in integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5272 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 06:26:38 +00:00
kiran
92c82200c9
Fixed an issue where an eval module with TableType objects would get an extra, empty table in the output, screwing up the parse in R.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5267 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:03:46 +00:00
kiran
d3660aa00e
Very basic functionality for annotating indels (specifies whether the indel is frameshift, inframe, or non-coding). Does not attempt to recalculate the variant codon, variant amino acid, or whether the site falls within a splice region. Added a convenience method to WalkerTest for building command-line arguments with the proper spacing (so that I stop getting annoyed when I've gotten it wrong and the test system yells at me.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5235 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-13 17:58:20 +00:00
ebanks
9554df1a7c
Adding integration test for indels in VF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5227 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 16:58:57 +00:00
hanna
b992abb6eb
A few more unit tests plus some extra
...
functionality for BAM index visualization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5222 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-09 01:51:34 +00:00
kiran
ecbc38aff0
If no comp rod is specified, specify the dummy name none so that we still get counts.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5211 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 19:24:52 +00:00
ebanks
698096dc5a
Moving VariantsToVCF to the proper directory; removing the oneoffs CG indel converter in preparation for a ligitimate CG variant Feature class in the works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5207 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 05:21:01 +00:00
kiran
35c688ac67
Updated md5 for testVCFStreamingChain to reflect latest changes to VariantEval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5206 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 21:22:05 +00:00
kiran
1085bbf303
Fixed issue where all comp tracks were being treated as known tracks. Fixed issue where multiple JEXL expressions were causing an exception because the underlying object did not implement the Comparable interface. Fixed issue where variants being compared to the known track were not being checked for equality of variation type. Fixed issue where functional annotations were not being iterated over properly. Refactored a lot of helper methods into a separate VariantEvalUtils utility class. Significantly expanded the test suite using a small VCF with SNPs, indels, and non-variant loci which makes it much easier to see what the proper answer should be, and included the appropriate grep and awk commands in the comments to confirm the values.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5204 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 19:19:20 +00:00
hanna
5c3198520c
A few minor modifications masquerading as significant changes according to
...
svn's logs:
- Copied BAM indexing engine from Picard back into the GATK anticipating
shard merging algorithm. Tried to leave most of the building blocks in
Picard. If this turns into a logistical nightmare, I'll merge the building
blocks into the GATK as well.
- Reorganized the org.broadinstitute.sting.gatk.datasources package, giving
better separation of query and management functionality for reads, ref, rmd,
and samples.
- Merged Shard building blocks into org.broadinstitute.sting.gatk.datasources.
reads package, indicating it's current strong relationship with the reads,
rather than the general unifying element I wish this would be.
- Collapsed BAMFormatAwareShard into Shard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5184 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:59:19 +00:00
kiran
9ddc95c833
NewEvaluationContext needs to be generated in the inner loop. Otherwise, multiple comp tracks end up getting routed to the same row of the output table. Added test to cover multiple comp tracks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5181 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 07:04:53 +00:00
kiran
cb6454bf98
Multiple eval tracks should be bound with different names, rather than just 'eval'. Added tests to cover usage with multiple tracks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5177 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 22:33:50 +00:00
kiran
2732c839d4
Restored parallelism and associated tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5170 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 02:04:03 +00:00
kiran
fd8dd8fb9b
Fixed an issue where a no-call in the eval track would prevent a site from a comparison track from being loaded. Added a new test to cover the use case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5169 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 01:47:53 +00:00
hanna
06b63d8336
Pulled out CpG stratification in test results at Kiran's suggestion.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5165 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 18:36:09 +00:00
hanna
91297c138b
Update VCFStreamingIntegrationTest to use new variant eval command-line
...
arguments, output format.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5162 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 15:40:43 +00:00
hanna
7d89ce820b
Got tired of waiting for Kiran to fix the build: updated NewVariantEval ->
...
VariantEval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5161 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 15:32:39 +00:00
hanna
96241c6637
More testng fallout: fixing another seemingly 'random' issue arising from an
...
alternate test ordering.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5160 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 15:25:50 +00:00
kiran
401feca90d
Updates to VariantEval 3.0 integration test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5140 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 17:45:06 +00:00
ebanks
d406d9b3fc
There's no reason to special case no-calls if they already have PLs associated with them. Just use the PLs!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5136 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 15:05:45 +00:00
asivache
04d66a7d0d
Updated integration test's MD5s reflect the fact that assay sequences were previously designed incorrectly for indels, the bug is now fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5120 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 23:00:22 +00:00
depristo
2182b8c7e2
Better query start / stop function that directly parses the cigar string, unlike the previous version. Now properly handles H (hard-clipped) reads. Added -baq OFF and -baq RECALCULATE integration tests on all three 1KG technologies. Please let me know if this new code somehow fails.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5108 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 15:08:21 +00:00
kiran
9cb1ae384c
Constant precision for floating point numbers. Added integration test - carries over tests from VariantEval with the necessary modifications to command-line arguments and md5s. Disabled use of 'synchronized' keyword because I clearly don't get how that keyword is supposed to work yet...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5107 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 05:19:18 +00:00
hanna
4a33cdacde
Some basic integration tests detecting breakage in OTF BAM index generation.
...
Doing it manually for the moment so that there's at least something testing
this capability; will followup eventually with Mark to see whether we can
shape the VCF index generation code in such a way that it supports BAM index
testing as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5093 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 23:48:04 +00:00
ebanks
dfc5a3d1f3
added integration test for --sites_only option
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5082 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 14:58:15 +00:00
hanna
9db02059ac
Fix for Ryan's issue: reads ending with indel distort the location of the
...
pileup, resulting a two map() calls for the same locus (and no map call for
the locus immediately following).
Fixed bug and added comprehensive unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5067 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 19:49:39 +00:00
ebanks
2d4bcb60a1
Don't print out alt alleles for ref calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5060 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 06:33:31 +00:00
ebanks
2bbcc9275a
Committing the fragment-based calling code. Results look great in all datasets (will show this at 1000G this week with Ryan). Note that this is an intermediate commit. The code needs to be cleaned up and the fragmentation code needs to be moved up into LocusIteratorByState. This should all happen later this week, but I don't want Ryan to have to keep running from my own personal Sting directory. The current crappy implementation adds ~10% to the runtime, but that should all go away in the next iteration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5058 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 05:04:17 +00:00
hanna
aea121a9d5
<key>=<value> tagging support for command-line arguments. Unfortunately, still
...
very hard to validate and still very hard to use (requires core hacking to
support additional tags).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5038 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 00:22:42 +00:00
hanna
8831ec3dce
Some refactoring and cleanup around the area of my sleep-deprived integration
...
test typo, which Khalid already fixed for me. Sorry, Khalid!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5035 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 15:03:14 +00:00
kshakir
3022f4dfa0
Fixed missing space character in testSimpleVCFStreaming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5034 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 14:49:38 +00:00
depristo
41c8552d0a
Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:54:03 +00:00
hanna
7087c2f422
Very simple integration tests for basic VCF streaming functionality.
...
Rather than try to fork the integration test process to get a pipe source
and sink, creates a new named pipe by Runtime.exec()ing the 'mkfifo' shell
command. We'll see whether this proves to be a reliable method for testing
streaming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5028 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 04:38:54 +00:00
hanna
af31d02a2d
Fix concurrency issue that periodically kills VariantEvalIntegrationTest --
...
a member field of RMDTrackBuilder was getting rebuilt every time it was
called, creating concurrency issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5001 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 18:52:21 +00:00
rpoplin
ce3d226183
Reverting back to the old definition of QD because it works better with large numbers of samples. The new QD is relegated to a new annotation: sumGLbyD. Tweaks to the new HaplotypeScore based on evaluation with better QD calculation. The default qual threshold in GenerateVariantClusters is updated to be in line with the variant quality scores coming from the exact model.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4984 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 14:12:30 +00:00
hanna
6d855041ec
Oops...forgot to commit the changes that allow primitive VCF streaming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4979 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 21:54:51 +00:00
depristo
8fe5641b2e
can explicitly set the now required ReferenceDataSource in unit tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4977 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 18:25:12 +00:00
aaron
7916ab0ed5
remove the index each run
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4976 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 17:38:22 +00:00
carneiro
5e9a8f9cb3
Implemented a new argument (-DQS --defaultQualityScore) that allows GATK to deal with BAM files missing quality scores. If a value is specified, all reads are filled with the default quality score. Appropriate exception is thrown if -DQS is not provided and BAM file doesn't have quality scores for every base.
...
Adding the first version of the techdev pipeline (tdPipeline)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 22:25:08 +00:00
aaron
cba436fa2f
small fix for the table codec; if you see a header line, you know you've finished parsing the header. Also also some changes to return the ref ordered data pool test to using MappedStreamSegment instead of EntireStream
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4942 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 21:20:26 +00:00
hanna
0982d35f5b
Bug fixes in streaming in Tribble data via /dev/stdin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4935 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 02:43:04 +00:00
rpoplin
23dbc5ccf3
HaplotypeScore is revamped. It now uses reads' Cigar strings when building the haplotype blocks to skip over soft-clipped bases and factor in insertions and deletions. The statistic now uses only the reads from the filtered context to build the haplotypes but it scores all reads against the two best haplotypes. The score is now computed individually for each sample's reads and then averaged together. Bug fixes throughout. The math for the base quality and mapping quality rank sum tests is fixed. The annotations remain as ExperimentalAnnotations pending more investigation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4934 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 00:28:05 +00:00
hanna
8d2c14b29c
Update Picard / sam-jdk at Tim's request.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4925 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 02:17:25 +00:00
hanna
3fc9862964
Unit test fixed - Tribble codecs aren't designed to be stateless, but I was
...
using one as though it was. Fixed, and debug code reverted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4917 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 17:47:52 +00:00
hanna
b9cb57f4b9
A unit test is failing on bamboo in a way I can't reproduce (or even explain).
...
Checking in some debugging info.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4916 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 16:35:04 +00:00
hanna
cba18116e4
A significant refactoring of the ROD system, done largely to simplify the process of
...
streaming/piping VCFs into the GATK. Notable changes:
- Public interface to RMDTrackBuilder is greatly simplified; users can use it only to build
RMDTracks and lookup codecs.
- RODDataSource and RMDTrack are no longer functionally at the same level; RODDataSources now
manage RMDTracks on behalf of the GATK, and the only direct consumers of the RMDTrack class
are the walkers that feel the need to access the ROD system directly. (We need to stamp out
this access pattern.
A few minor warts were introduced as part of this process, labeled with TODOs. These'll be
fixed as part of the VCF streaming project.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4915 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 04:52:22 +00:00
ebanks
848977678d
No reason to convert the GLs to a String for formatting when they're just going to be converted to PLs later. That was 5% of the UG runtime...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4913 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-29 22:06:19 +00:00
ebanks
8a0c07b865
Support for indels in hapmap. This was non-trivial because not only does hapmap not tell you whether the allele is an insertion or deletion, but it also has a completely different positioning strategy (rightmost base). I'll send out an email tomorrow when the new HapMap3.3 VCF is ready.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4908 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-27 07:37:46 +00:00
hanna
e313eeede8
Push command-line expansions, such as BAM list unpacking and -B tag parsing, out
...
into the CommandLine* classes. This makes it easier for external functionality
(such as the VCF streamer) to use GenomeAnalysisEngine directly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4897 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-22 19:00:17 +00:00
depristo
5dd0e8388b
Fixed a bug in UnitTest
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4867 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 19:44:35 +00:00
ebanks
5c0b66cb7c
3 big changes that all kill the integration tests: 1. Don't cap the PLs by 255 anymore. 2. Move over to the 3state model as the only available base model for UG (no more base transition tables). 3. New QD implementation when GLs/PLs are available.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4846 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 16:24:28 +00:00
chartl
5a27d231fa
Rename it so that nobody else falls into the trap laid out (the test is VariantToTable, the walker is Variant[s]ToTable)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4844 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 11:43:00 +00:00
chartl
5e27e9162f
Huh? I thought we parsed out comma-separated command line arguments into list automatically...just change the syntax of the integration test, no need to update the md5
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4843 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 11:40:27 +00:00
kshakir
01323447c6
Removed LibBat.SUB2_BSUB_BLOCK since the use of it exits the JVM.
...
Fixed integration tests to wait on their own for the job to run instead of using SUB2_BSUB_BLOCK.
Updated VariantRecalibrationIntegrationTests MD5s which were knocked out of sync whele SUB2_BSUB_BLOCK was exiting in the middle of integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4840 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 19:57:20 +00:00
depristo
abd6ce1c77
A TiTv-free approach for cutting variants! Apparently much better than previous approach, and will work for indels and SV will truly minor modifications to the code. Will discuss with methods group on Monday.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4822 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-11 23:08:13 +00:00
depristo
5b46a900b3
Final version of BAQ calculation. default gap open is 1e-4, a good sensitive value. Useful timer class SimpleTimer added. BAQ is now live.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4818 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 19:35:12 +00:00