ebanks
1e06d2bf68
Initial HLA Caller integration tests. Kind of painful, but will improve with code refactoring.
...
This baby is now officially ours.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3593 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 20:35:27 +00:00
rpoplin
724affc3cc
Major bug fixes for the Variant Recalibrator. Covariance matrix values are now allowed to be negative. When probabilities are multiplied together the calculation is done in log space, normalized, then converted back to real valued probabilities. Clustering weights have been changed to only use HapMap and by-1000genomes sites. The -nI argument was removed and now clustering simply runs until convergence. Test cases seem to work best when using just two annotations (QD and SB). More changes are in the works and are being evaluated. Misc fixes to walkers that use RScript due to CentOS changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3590 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 17:37:11 +00:00
aaron
c3434493b0
fixed integration test for VCF Header changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3589 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 16:31:48 +00:00
aaron
42e7ff4f28
forgot to update a test, the md5sum of the underlying file changed (which is recorded in the ROD tests).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3586 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 13:27:56 +00:00
aaron
b978d5946b
adding changes for VCF 4, mostly in the way we handle VCF headers. The header fields are now aware of the differences between different VCF formats. There was also a bunch of clean-up of out-of-spec VCF used in the tests (mismatched VCF file format fields, etc), and updates to the associated integration tests. Also some logging statements for BTI.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3584 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 08:23:23 +00:00
weisburd
e26a273ef5
Turned the test back on
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3582 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 22:57:42 +00:00
hanna
48cbc5ce37
Merging the sharding-specific inherited classes down into the base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3581 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 22:36:13 +00:00
hanna
612c3fdd9d
First pass at eliminating the old sharding system. Classes required for the original sharding system
...
are gone where I could identify them, but hierarchies that split to support two sharding systems have
not yet been taken apart.
@Eric: ~4k lines.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3580 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 20:17:31 +00:00
aaron
3d049204ed
some refactoring for the variant eval output system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3576 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 05:34:31 +00:00
hanna
db1383d0b2
Rev the latest version of Picard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3575 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 23:55:07 +00:00
weisburd
5b370ffc62
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3574 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 20:42:58 +00:00
ebanks
01ffa307c2
When going NWay out in the cleaner, use the new *merged* header (instead of the original one) for each bam file so that it matches the new uniquified read group ids in the reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3569 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:36:36 +00:00
ebanks
7a91dbd490
Renamed some of the column names in Ti/Tv and Concordance modules so that they are clearer. Removed ValidationRate module (it was busted).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3564 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 15:53:06 +00:00
asivache
671ac00748
A simple utility class that implements a merging Iterator<GenomeLoc> built over an interval or bed file (this is NOT a rod, but rather a direct line-by-line file reader that converts strings to genome locs on the fly and merges overlapping intervals)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3546 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 15:54:37 +00:00
ebanks
8c28be5933
Fixing a VCF bug for Sendu: we weren't emitting flags (booleans) correctly in VCF3.3 (rev'ed tribble for this).
...
Updated dbsnp/hapmap membership info fields to be flags now instead of ints.
While I was there, I added the change in the Annotator for Jan to force reads to be from a specific sample.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3536 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 16:42:06 +00:00
bthomas
99b684ea89
Adding new support for reference data. ReferenceDataSource is a new class that manages reference data, and allows IndexedFastaSequenceFile to be a simple reader. This checkin also includes FastaSequenceIndexBuilder, which reads a fasta file and creates an index, like samtools faidx. Right now this is not enabled, because we are still working out thread safety. So the only new UI change is that GATK can be run without a fai file. Soon, we will enable 1) GATK to be run without a dict file too, and 2) both dict and fai files will be saved on disk for future program executions. For more info, see ReferenceDataSource.java
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3527 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:10:23 +00:00
ebanks
ca4eab1d23
Now annotations that require reads return null if there's no alignment context, so that running without reads adds annotations only for the appropriate fields.
...
Added an integration test for the read-less case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3525 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 20:36:46 +00:00
ebanks
9b2fcc4711
Refactoring of the annotation system:
...
1. VA is now a ROD walker so it no longer requires reads (needs a little more testing)
2. Annotations can now represent multiple INFO fields (i.e. sets of key/value pairs)
3. The chromosome count annotations have been pulled out of UG and the VCF writer code and into VA where they belong. Fixed the headers too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3513 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:05:51 +00:00
aaron
6d5556939d
updating Tribble with a couple of important Tabix fixes, and updating the variant eval integration tests to run each test with both plain vcf and gzipped tabix (added the tabix version
...
to the vlidation directory), using the same md5sum.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3509 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 01:47:04 +00:00
depristo
6eeb1693ca
JEXL2 upgrade. Improvements to JEXL processing including dynamically resolving variable -> value bindings instead of up front adding them to a map. Performance improvements and code cleanup throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3494 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 00:33:02 +00:00
depristo
3ea506fe52
No more new Allele() -- must use create. Allelel simple alleles are now cached for efficiency reasons. VCF4 codec optimizations -- 4x performance in general. Now working in general but hooked up to the ROD system now as VCF4. WARNING -- does not actually work with indels, genotype filters, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3489 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 23:03:55 +00:00
aaron
0b03e28b60
updating the tribble library to include the reference dictionary reading / writing. We now check the dictionaries of any tracks that have them against the reference (all new tribble tracks and out-of-date tracks will have this). Also renamed some classes to be more reflective of their function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3485 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 06:34:26 +00:00
depristo
e2b41082af
GATK now does automatic adaptor filtering in locus iterators (but not expt. downsampling iterator). General support for LocusIteratorFilters just like read filters but only applying at particular bases. Updated tools with new MD5 sums due to adaptor bases in their integrationtest data. Not that as a side effect here reads close to each other with odd orientations are also filtered out. Updated minor argument to VariantRecalibrator to change the qStep value on the command line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3481 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 22:26:32 +00:00
aaron
8ec091d6d2
re-enabling regeneration of the tribble index if it's out of date. Also moved the class that can detect text in the log4j stream (useful in testing to make sure appropriate messages are generated).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3480 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 17:45:51 +00:00
depristo
21427211c0
Personal MD5 database system now live. WalkerTest now maintains a database of result files associated with MD5 results in integrationtest/, and provides command lines for diff-ing expected to current md5 results when encountering failed intergration tests. The suite currently takes 200Mb to store. Update and run intergrationtest to build your very own expectation database for future development work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3466 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-31 16:06:16 +00:00
depristo
2b02324587
Support for detecting and automatically excluding reads reading into the adaptor sequence and, if desired, also only showing the first pair when two reads overlap in the fragment. Not enabled, an intermediate check in before updating and verifying the impact on locus walkers everywhere.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3465 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-30 18:00:12 +00:00
ebanks
ffeb3fd80d
Thanks to Guillermo, I found a bug in the Unified Genotyper output: GL was posteriors instead of likelihoods. Not a huge deal because the
...
priors were flat, but fixed nonetheless.
Also, needed to update Tribble.
Minor updates to the Beagle input maker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3461 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 19:28:26 +00:00
rpoplin
4e268ef6ac
Removing the Variant Recalibration Performance test because it isn't ready yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3460 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 18:27:25 +00:00
rpoplin
522dd7a5b2
Adding the variantrecalibration classes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3459 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 18:21:27 +00:00
rpoplin
2014837f8a
VariantOptimizer package is moved to core, renamed as VariantRecalibration, and added to the binary release package. VariantOptimizer walker is renamed to GenerateVariantClustersWalker and ApplyVariantClustersWalker renamed to VariantRecalibrator. Integration tests added, performance tests still to be done.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3458 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 18:20:18 +00:00
aaron
871cf0f4f6
Call out ROD types by there record type, instead of the codec type (which was clumsy). So instead of:
...
@Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFCodec.class))
you'd say:
@Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFRecord.class))
Which is more in-line with what was done before. All instances in the existing codebase should be switched over.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3457 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 14:52:44 +00:00
depristo
cc2bf549c8
Removing my unnecessary optimization. 10 lines later in the code the same optimization was applied. A monumental waste of time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3455 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 14:10:48 +00:00
aaron
a4d834cc01
fixing the test I broke
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3454 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 02:06:20 +00:00
depristo
f2e7582cfc
Reorganization of SW code for clarity. Totally failure at raw optimization. Discovered that ~50% of reads being cleaned were perfect reference matches. New code comes with flag to look at NM field and not clean perfect matches. Can we turned off with command line option (needed for 1KG bams with bad NM fields). Going to rerun cleaning jobs due to accidentally rebuilding of stable codebase and loss of 2 days of runtime.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3452 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-27 23:16:00 +00:00
ebanks
058441fa39
Trivial renaming of test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3441 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 16:56:42 +00:00
aaron
a2fab07258
fixed the build problem: there were two copies of the AnnotatorInputTable Codec and Feature in two different spots.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3439 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 14:47:15 +00:00
chartl
88a06ad81f
Changes to Depth of Coverage:
...
- For speedup in large number of samples, base counts are done on a per read group level, then
merged into counts on larger partitions (samples, libraries, etc)
+ passed all integration tests before next item
- Added additional summary item, a coverage threshold. Set by (possibly multiple) -ct flags,
the summary outputs will have columns for "%_bases_covered_to_X"; both per sample, and
per sample per interval summary files are effected (thus md5s changed for these)
NOTE:
This is the last revision that will include the per-gene summary files. Once DesignFileGenerator is sufficiently general, and has integration tests, it will be moved to core and the per-gene summary from Depth of Coverage will be retired.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3437 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 03:39:22 +00:00
ebanks
0607f76a15
commenting out this test until I can figure out what the hell is going on with the codecs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3436 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 01:12:10 +00:00
ebanks
ae6c014884
Fixed UG parallelization bug. Better integration test to catch this in the future.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3432 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-25 21:03:45 +00:00
ebanks
434e920da9
Oops, forgot to update integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3431 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-25 20:37:45 +00:00
delangel
a280a0ff0d
a) Made HaplotypeScore default annotation. This changed several integration tests, whose MD5 is now updated.
...
b) Disabled BaseQualRankSumTest, the returned p-values differ wildly from Matlab/R-provided ones, cause TBD.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3419 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 22:25:17 +00:00
chartl
745d7c582f
added integration test for intervals with no coverage due to filtering
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3414 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 16:52:42 +00:00
chartl
88cb93cc3c
Changes to Depth of Coverage (added maximum base and mapping quality flags; with new integration tests -- because they use b36, and the other test uses hg18, it's in a different class (integration test system can't change refs on the fly). Initial change to VariantAnnotator to allow it to see extended event pilups; you currently have to throw the -dels flag; and it's specified as "very experimental". Yet,all the integration tests pass.
...
Homopolymer Run now does the "right" thing (e.g. single bases are represented as HRun = 0 rather than HRun = 1) for indels. AlleleBalance now does something close enough to correct.
Added a convenience method to VariantContext that will return the indel length (or lengths if a site is not biallelic).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3409 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 13:02:01 +00:00
depristo
6faf101c6c
Minor improvements to Callable Loci for public consumption
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3408 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 12:50:11 +00:00
depristo
a10fca0d5c
Genotyper now is using bytes not chars. Passes all tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3406 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 21:02:44 +00:00
depristo
6ce3835622
Removing unused methods in QualityUtils; ReferenceContext now converting all bases to upper case, but can be disabled with static boolean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3399 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 12:38:06 +00:00
depristo
5abac5c057
A few more char -> byte cleanups
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3398 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 00:02:06 +00:00
depristo
8a725b6c93
Restructuring of ReferenceContext and ReadWalkers to accept a ReferenceContext. Now ReferenceContext is byte[] backed not char[]. Please no more chars for the reference. All of the tests pass now. Coming check-ins are going to clean up the char / byte problems in the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3397 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 23:27:55 +00:00
aaron
ca386439be
only emit a warning if the tribble index is out of date, don't remove and replace it for them. Added a test case where the log4j appender checks the logging messages for the appropriate output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3393 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 15:12:48 +00:00
hanna
017ab6b690
Experimental versions of downsampler and Ryan's deduper are now available either
...
as walker attributes or from the command-line. Not ready yet! Downsampling/deduping
works in a general sense, but this approach has not been completely optimized or validated.
Use with caution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3392 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 05:40:05 +00:00
aaron
7cfb9ff3dc
updates for Tribble 82, fixes for Ryans case where multiple processes would attempt to read/write to the same index, and a couple other Tribble-centric bug fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3382 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 19:34:45 +00:00
chartl
e016491a3d
Major refactoring of Depth of Coverage to allow for more extensible partitions of data (now can do read group, sample, and library; in any combination; adding more is fairly easy). Changed the by-gene code to use clones of stats objects, rather than munging the interval DoCs. (Fix for Avinash. Who, hilariously, thinks my name is Carl.) Added sorting methods to ensure static ordering of header and body fields.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3377 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 16:58:13 +00:00
hanna
0791beab8f
Checking in downsampling iterator alongside LocusIteratorByState, and removing
...
the reference implementation. Also implemented a heap size monitor that can
be used to programmatically report the current heap size.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3367 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 21:00:44 +00:00
chartl
b7d21627ab
Changes to DepthOfCoverage (JIRA items) and added back an integration test to cover it. Alterations to the design file generator to output all transcripts (rather than choosing one at random).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3366 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 17:23:00 +00:00
ebanks
32389dc0a9
Fixed GQ estimate when chosen genotype isn't the most likely according to the GLs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3362 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-14 19:17:46 +00:00
hanna
88bd7a2045
Reenabling UG parallelization performance tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3360 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 16:28:08 +00:00
hanna
0490909285
Fixed epic generic paths fail.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3359 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 15:59:57 +00:00
hanna
7ef87e5126
An integration test based on validating pileup to test parallelism in reads, reference, and RODs. This test runs in less
...
than a minute and fell over instantly in the case of the Tribble parallelism issue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3358 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 15:40:43 +00:00
hanna
ceec525420
Got rid of stray unicode characters in copyright message.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3357 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 14:47:39 +00:00
ebanks
c81b910f73
Commenting out the parallelization test which is failing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3354 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 18:39:53 +00:00
aaron
cac98ba5ef
a couple of small documentation fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3353 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 17:40:27 +00:00
aaron
2c55ac1374
fixes for parallel processing problems with Tribble, a small bug in the resource pool, and some more documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3349 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 06:13:26 +00:00
ebanks
34969f304c
Adding dbsnp to all UG performance tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3347 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-11 15:48:05 +00:00
ebanks
140e43b93b
Checking in to see whether it fails. If I start getting bombarded with Bamboo error reports, I'm commenting it out...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3346 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-11 15:39:42 +00:00
ebanks
572b383fe2
Make VA annotate dbsnp again
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3345 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-11 14:06:53 +00:00
depristo
64ccaa4c6a
Walkers and integration tests that calculate and compare callable bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3328 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 21:33:47 +00:00
aaron
7d2df3f511
example windowed ROD walker for Kristian, and updates to Tribble
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3325 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 17:12:50 +00:00
rpoplin
57f254b13a
VE integration test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3324 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 13:58:25 +00:00
aaron
78409dca0d
turned off the progress output from tribble when making an index, and fixing a case where the index file isn't writable so we instead make the index in memory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3312 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 16:36:58 +00:00
aaron
a0d71540df
speed-up for VCF, adding code to the VCF reader to automagically make an index if one doesn't already exist, and a change to the VCF writer unit test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3305 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 20:19:42 +00:00
aaron
a68f3b2e9c
VCF moved over to tribble.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3302 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 17:28:48 +00:00
aaron
ad11201235
adding more ROD pile-up tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3301 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 16:01:11 +00:00
aaron
f497213933
DbSNP moved over to tribble
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3288 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-03 06:02:35 +00:00
ebanks
9dff578706
Added PG tag to bam header to let people know it's been cleaned.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3284 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 17:30:30 +00:00
ebanks
850f36aa61
Changes to the Unified Genotyper's arguments:
...
1. User can specify 4 confidence thresholds: for calling vs. emitting and at standard vs. 'trigger' sites.
2. User can cap the base quality by the read's mapping quality (not done yet).
3. Default confidence threshold is now Q30.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3281 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 16:44:24 +00:00
aaron
cbed0b1ade
Adding GeliText tribble track as the first enabled Tribble track. This mean 'Variants' is no longer valid for a ROD type, use GeliText instead. I've updated all the references in the codebase.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3271 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-29 22:50:17 +00:00
aaron
7fbfd34315
adding the GELI ROD validation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3270 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-29 21:43:00 +00:00
depristo
5dce16a8f1
Better genotype concordance module. Code refactoring for clarity (please see below/after for educational purposes). Now reports variant sensitivity, concordance, and genotype error rate by default. Also aggregates this data across all samples, so you get a per sample and overall stats for each of these in the allSamples row.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3265 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-28 13:10:11 +00:00
ebanks
df31eeff9f
minor change
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3259 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-26 06:05:29 +00:00
depristo
7f4d5d9973
Ti/Tv by AC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3252 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 17:56:29 +00:00
rpoplin
e7c0ded40e
Fixed long-standing bug in GenotypeConcordance module of VariantEval which caused incorrect numbers to be displayed in the concordance table. The format of the concordance table has changed. Added a concordance summary table which gives overall genotype concordance summary stats by sample. None of the VE integration tests contained genotype information so I added a comp track with genotypes to one of the tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3247 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 15:48:41 +00:00
aaron
f050beada6
make sure we do delete the temp file we create
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3244 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 05:32:49 +00:00
aaron
536f22f3bd
adding VC adaptor for GELI, along with unit tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3243 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 05:28:39 +00:00
hanna
32d86cf457
Rev the reservoir downsampler to support partitioning through a functor.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3232 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 19:50:26 +00:00
ebanks
e9e844fbf5
1. Reverting: dbsnp automatically is a comp
...
2. Fixing logic for min Qscore calculation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3230 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 18:51:35 +00:00
asivache
532263ea25
Oooops, forgot to update the test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3229 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 18:38:24 +00:00
ebanks
4abd3b0b7b
Fixing known/novel calc now that dbsnp isn't a default comp track
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3223 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 05:43:59 +00:00
ebanks
3b5673d967
1. Removed -all; by default all modules are used; use -none for no modules.
...
2. Don't make dbsnp track be a comp by default (to cut back on output). Please let me know if someone wants this back for some reason.
3. Cleaned up dbsnp module output to print the right numbers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3220 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 02:46:42 +00:00
aaron
4e18c54bb8
fixing a couple of commented out portions of the VCFReader test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3219 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 22:20:35 +00:00
aaron
80c4f88a72
removing the Variation interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3216 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 18:56:45 +00:00
hanna
c1e53d407d
The copyright tag that I copied/pasted from a LaTeX document into IntelliJ had
...
unicode quote characters embedded in it. These characters were invisible inside
IntelliJ but cause compile warnings for Ryan and Aaron, who for whatever reason
have a different default charset. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3203 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 15:26:32 +00:00
aaron
b5f6f54968
Almost done removing any trace of the old Variation and Genotype interfaces.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3202 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 14:52:15 +00:00
hanna
1bc26f69e9
An attempt to cleanup the Utils directory. Email to follow.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3198 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 23:00:08 +00:00
hanna
c08936d6f4
Added a reservoir downsampler which can sample elements in an iterator uniformly
...
from a stream (see Vitter 1985). Thanks to Eric and Andrey for the pointer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3197 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 20:48:14 +00:00
ebanks
c44f63c846
Fixing the performance tests: we need to catch the RuntimeException (not samtools' RuntimeIOExcpetion). Also, CountCovariates doesn't need the catch.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3196 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 14:28:12 +00:00
ebanks
abf48cee05
Moving over to VariantContext from Variation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3195 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 06:56:29 +00:00
ebanks
d73c63a99a
Redoing the conversion to VariantContext: instead of walkers passing in a ref allele, they pass in the ref context and the adaptors create the allele. This is the right way of doing it.
...
Also, adding some more useful integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3194 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 05:47:17 +00:00
aaron
be7cbf948b
adding a catch for the exception thrown by samtools when it attempts to close /dev/null in the performance tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3186 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-16 17:41:48 +00:00
ebanks
7adff5b81a
Renaming for consistency
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3180 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 20:36:19 +00:00
ebanks
e702bea99f
Moving VE2 to core; calling it "VariantEval" (one more checkin coming)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3179 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 20:25:47 +00:00
chartl
ac6f6363ce
Execs() temporarily disabled after removal of bam file. New tests forthcoming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3178 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 20:11:56 +00:00
ebanks
ac9dc0b4b4
Removing VariantEval (v1); everyone should be using VE2 now. Docs coming ASAP.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3177 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 19:53:02 +00:00
ebanks
5f7564bf0a
Better naming of output columns
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3175 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 18:08:07 +00:00
aaron
e682460c1f
add a fix so that XL arguments won't cancel out -BTI arguments, fixed a bug for Ben where the ROD -> interval list conversion was throwing an exception, and some old code removal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3174 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 16:31:43 +00:00
ebanks
04909fa6ad
Removing arbitrary selects
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3169 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 17:46:39 +00:00
weisburd
b930dc52a5
Integration test for GenomicAnnotator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3167 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 14:43:25 +00:00
ebanks
dde092fb61
Added the ability in VE2 to select which eval modules to run, so that you aren't forced to use all of them. You can use --list to list all of the possible modules to run.
...
Heads up everyone: by default, *no* modules are run. Please add "-all" to your scripts to maintain the previous behavior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3161 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 22:15:58 +00:00
hanna
8573b0bc6f
Refactoring intervals, separating the process of parsing interval lists,
...
sorting and merging interval lists, and creating RODs from intervals. This
gives Doug the ability to keep using our interval list parsing code when
sorting intervals on our behalf.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3159 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 15:50:38 +00:00
ebanks
e413882302
Generalizing the SequenomValidationConverter to be able to take in any arbitrary rod type (provided it can be converted to VariantContext).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3155 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-12 20:42:18 +00:00
ebanks
d06c7835d8
Adding performance tests for the indel realigner; should take ~3 hours.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3151 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-11 04:45:22 +00:00
ebanks
961ca05abc
Removed outdated Sequenom rod and renamed HapMapGenotypeROD to HapMapROD.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3149 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-11 01:43:07 +00:00
ebanks
fa01876255
UnifiedGenotyper performance tests (WG, WEx); currently takes just over an hour.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3148 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 19:42:29 +00:00
rpoplin
c2a37e4b5c
Variant Quality Score modules in VariantEval2 no longer create huge lists which hold all of the quality scores encountered and instead cast the quality score to an integer and use hash tables. Bug fix for files in which all the quality scores are set to -1.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3146 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 18:36:06 +00:00
ebanks
71f38a9199
Adding performance tests for the recalibrator (Whole Genome and Whole Exome tests).
...
Should take ~3 hours to run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3145 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 18:30:59 +00:00
ebanks
fba48b515a
Heads up everyone:
...
For consistency, these tools should be writing to the walker's output stream and no longer use the -vcf argument.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3140 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 05:37:25 +00:00
chartl
7025f5b51d
Added an auxiliary table to DepthOfCoverage, which is the cumulative equivalent of the locus table (got tired of doing the calculation by hand). Also took care of a trailing tab in the per-locus output table.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3138 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 19:37:17 +00:00
aaron
9f6377f7fb
added a performance test build option (for the upcoming performance test suite), and added a sample performance test for VariantEval.
...
IMPORTANT: it was really redundant that we had -Dsingle and -Dsingleintegration to run single unit tests and integration tests, now you can just use -Dsingle to run a single test for performance, unit, and integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3136 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 15:37:15 +00:00
aaron
4014a8a674
A long overdue correction; all unit tests now end in 'UnitTest'. This was something we wanted to do for a while, and now with the performance tests coming, it was a good time to clean-up. Please label any new test appropriately: *UnitTest and *IntegrationTest are the two valid file name patterns for tests.
...
Thanks!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3135 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 06:14:15 +00:00
aaron
8fd59c8823
Modified the report system based on Ryan's feedback: tables are now created independently to avoid the permutation problem when they were all compressed in rows, and removed our dependency on FreeMarker. The Grep format stays the same.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3130 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 20:39:55 +00:00
depristo
918b746798
More detailed validation output. Fixes for genotyping overflow -- these are temporary and need to be properly resolved
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3129 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 16:38:28 +00:00
rpoplin
60c227d67f
Added new VE2 module to create a plot of titv ratio by variant quality score
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3125 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 15:19:27 +00:00
chartl
d7880ef7ad
Forgot to uncomment the AlignerIntegrationTest before committing. And yes, matt, commenting it out is, in fact, easier than just setting my classpath.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3110 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 17:17:16 +00:00
chartl
f7d1b8f5de
CoverageStatistics has now replaced DepthOfCoverage -- old DoC is in the archive.
...
Also, I can't be bothered to fix the spelling of "oldepthofcoverage" to contain the necessary number of D's. Be content that it does, however, contain the requisite number of O's.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3109 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:27:23 +00:00
aaron
585cc880a2
changed jexl expressions to jexl names in the VariantEval2 output, fixed integration test, and fixed a problem where a line was getting dropped in CSV output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3108 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:23:14 +00:00
bthomas
b4f6f54502
Reorganizing the way interval arguments are processed
...
Most of the changes occur in GenomeAnalysisEngine.java and GenomeLocParser.java:
-- parseIntervalRegion and parseGenomeLocs combined into parseIntervalArguments
-- initializeIntervals modified
-- some helper functions deprecated for cleanliness
Includes new set of unit tests, GenomeAnalysisEngineTest.java
New restrictions:
-- all interval arguments are now checked to be on the reference contig
-- all interval files must have one of the following extensions: .picard, .bed, .list, .intervals, .interval_list
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3106 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 12:47:48 +00:00
aaron
3d3d19a6a7
the last-mile commit for Tribble integration. The system is now ready for Tribble to be turned on, as soon as we've removed any dependencies in the ROD code on interfaces that aren't in the Tribble library (i.e. the Variation or Genotype interface on RODs). All of the walkers should be up to date.
...
a caveat: for anyone asking for all of the ROD's back from the RefMetaDataTracker (if your not using the facilities to get the track by name), you'll now be getting back a collection of GATKFeature objects. This object will contain the track name, and a method for getting the underlying object (getUnderlyingObject()), which will be the traditional RodVCF, rodDbSNP, etc. This layer is needed so we can integrate Tribble tracks (which don't natively have names). Calls that ask for RODs by name will still get back the traditional reference ordered data objects (RodVCF, rodDbSNP, etc).
Sorry for the inconvenience! More changes to come, but this is by far the largest (as has the greatest effect on end users).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3104 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 22:39:56 +00:00
chartl
dc802aa26f
Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
depristo
8ea98faf47
Deleting the pooled calcluation model -- no longer supported.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3088 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 11:44:27 +00:00
aaron
074ec77dcc
First go of the new output system for VE2. There are three different report types supported right now (Table, Grep, CSV), which can be
...
specified with the reportType command line option in VE2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3083 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 03:59:32 +00:00
kshakir
20e3ba15ca
Added an optional argument -rgbl --read_group_black_list to filter read groups.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3079 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 19:38:57 +00:00
ebanks
73a14a985b
Moving VariantsToVCF to core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3078 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:55:12 +00:00
ebanks
14bf6923a8
HapMap-to-VCF now works fine within Variants-to-VCF. Added integration test for it and removed old code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3077 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:34:59 +00:00
ebanks
4398a8b370
Updated. Now uses VariantContext and is truly "variants" to vcf (i.e. not just GELI to vcf).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3074 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 04:53:31 +00:00
aaron
5079f35e40
better method names for read based reference ordered data access.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3069 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 16:13:31 +00:00
aaron
7462a0b2d1
cleaned-up of VariantContextAdapter tests, fixed the double comparisons in equals() in RodGeliText (nice MathUtils.compareDoubles Kiran)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3064 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 15:18:30 +00:00
aaron
a69b8555dd
Geli to variant context.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3063 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 06:45:29 +00:00
aaron
eafdd047f7
GLF to variant context. Added some methods in GLF to aid testing; and added a test that reads GLF, converts to VC, writes GLF and reads back to compare.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3062 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 03:43:25 +00:00
asivache
ee1dc6092f
Test updated. Now we do not throw an exception when locus interval is out of bounds, we just return silently a reference context trimmed to the current shard boundaries. New test checks for trimming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3058 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 17:37:52 +00:00
aaron
439c34ed38
clean-up before annotating VariantEval2 for output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3055 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 07:39:20 +00:00
ebanks
4c4d048f14
Moving VariantFiltration over to use VariantContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3048 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 18:35:23 +00:00
ebanks
c88a2a3027
Fixing/cleaning up the vcf merge util
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3047 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 15:13:32 +00:00
ebanks
03480c955c
And now the UnifiedGenotyper can officially annotate genotype (FORMAT) fields too.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3039 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 04:58:37 +00:00
ebanks
0311980668
The VariantAnnotator can now officially annotate genotype (FORMAT) fields.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3037 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 03:30:14 +00:00
aaron
8a5f0b746e
some cleanup for the output system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3032 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 12:54:39 +00:00
ebanks
0247548400
Fixed one test and (temporarily) punted on another
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3030 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 06:22:48 +00:00
ebanks
ee0e833616
Some significant changes to the annotator:
...
1. Annotations can now be "decorated" with any arbitrary interface description - not just standard or experimental.
2. Users can now not only specify specific annotations to use, but also the interface names from #1 . Any number of them can be specified, e.g. -G Standard -G Experimental -A RankSumTest.
3. These same arguments can be used with the Unified Genotyper for when it calls into the Annotator.
4. There are now two types of annotations: those that are applied to the INFO field and those that are applied to specific genotypes (the FORMAT field) in the VCF (however, I haven't implemented any of these latter annotations just yet; coming soon).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3029 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 05:38:32 +00:00
ebanks
4340601c26
-Pushed base quals back down into SAMRecord; if -OQ is used, the SAMRecord quals get updated automatically
...
-Better integration test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3020 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 16:00:10 +00:00
hanna
2525ecaa43
Oops. Commented out some tests to improve performance and then checked in the commented out tests. Reverted.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3012 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 16:34:50 +00:00
hanna
6dd5f192e7
Performance improvements for RODs in conjunction with new sharding system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3010 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 14:54:12 +00:00
aaron
10e76abbbc
adding some VE2 report infrastructure; work-in-progress.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3008 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 03:57:42 +00:00
ebanks
202231141c
-Push the --use_original_qualities argument into the engine.
...
-Check that base and qual strings are the same lengths
-Fix one more bug in the clipper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3006 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 02:06:11 +00:00
ebanks
411d25c8d1
-Integration tests for walkers that use original quals.
...
-framework for pushing -OQ into GATK (not done)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3004 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 18:46:31 +00:00
aaron
e365d308d4
add a new JEXLContext that lazy-evaluates JEXL expressions given the VariantContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3003 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 16:00:55 +00:00
ebanks
73d6167bd6
Fixing broken integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2998 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 23:18:49 +00:00
depristo
4dd7c5972c
Unit tests for -XL arguments; expt. annotation calculating the GC content within 100 bp of the current SNP
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2997 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 21:08:14 +00:00
aaron
ecb59f5d0d
removed old tests and old code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2995 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:57:01 +00:00
aaron
88a48821ea
removed the dependence on removeRegion() in GenomeLocSortedSet
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2993 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:35:49 +00:00
depristo
b39b5edca8
Bug fix in variant eval 2. Preliminary (slow and buggy) support for -XL exclude lists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2991 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:23:12 +00:00
aaron
1eb5f97255
fixed dropping single base intervals from deleteRegion, moving onto performance fixes.
...
(stop - start is length-1 on closed intervals, so we need to check greater than OR equals to zero)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2990 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:14:21 +00:00
aaron
661a043cef
adding methods to get RODs by name or type in read traversals, performance improvements to RODs for Reads in general, and some more Tribble infrastructure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2984 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 21:13:39 +00:00
hanna
a7ba88e649
Rework the way the MicroScheduler handles locus shards to handle intervals that span shards
...
with less memory consumption.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2981 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 18:40:31 +00:00
aaron
dde9fd8a15
some rods-for-reads cleaning and performance improvements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2979 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:54:58 +00:00
depristo
4f4555c80f
PPV and Sensitivity added to validation tool output; support for arbitrary -sample arguments to subset variant contexts by sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2978 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:28:31 +00:00
ebanks
40d305bc7e
Added test of Nway cleaning for Matt; thanks to Aaron for the help.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2977 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 21:00:41 +00:00
depristo
486bef9318
Support for validationRate calculation in variant eval 2; better error messages for failed genome loc parsing; tolerance to odd whitespace in plinkrod, and fix for monomorphic sites in vcf2variantcontext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2976 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 16:25:16 +00:00
ebanks
7ddd45d059
Hmm. I thought I removed this already.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2973 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 03:09:13 +00:00
ebanks
1a576525e9
misc improvements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2972 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 03:00:28 +00:00
ebanks
6e855809e1
Renaming and moving relevant tools into a sequenom directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2971 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 02:31:10 +00:00
chartl
0a49dffa8f
Row/Column names are now R-friendly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2966 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 19:01:03 +00:00
ebanks
e5475a7ba9
re-enabling PlinkToVCF integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2964 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 17:35:49 +00:00
ebanks
5a20bf0e64
3 changes to UG which break integration tests:
...
1. emit AA,AB,BB likelihoods in the FORMAT field for Mark
2. remove constraint that genotype alleles (in the GT field) need to be lexigraphically sorted.
3. Add bam file(s) used by genotyper to header for Kiran
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2963 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 17:16:47 +00:00
ebanks
9f3b99c11b
Moving UnifiedGenotyper and VariantAnnotator over to VariantContext system.
...
Removing obsolete genotyping classes.
First stage of removing dependence on old Genotype class.
More changes to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2960 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 03:41:07 +00:00
chartl
bca9bdcc68
Add integration test for quartiles overflowing on interval reduce
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2957 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 16:18:45 +00:00
hanna
a7fe07c404
A few stopgap fixes to get the GATK to the point where the old sharding
...
infrastructure can be torn down:
1) New sharding system emulates old MonolithicSharding mechanism.
2) Better awareness of differences between fasta and BAM files when creating
shards.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2948 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 21:01:25 +00:00
hanna
dd6122f682
Fixed another bug in the original sharding system. Updated integration tests
...
as appropriate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2947 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 15:32:18 +00:00
hanna
ee2ec7ced9
Fix off-by-one error in original implementation of read sharding. Tested by
...
awking output of BamToFastq vs. samtools until the outputs matched exactly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2945 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-06 18:52:53 +00:00
depristo
ee913eca07
Forgot to check in fix this morning
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2943 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 21:07:19 +00:00
chartl
8738c544f1
Minor refactoring of CoverageStatistics to allow simultaneous output of per-sample and per-read group statistics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2940 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 17:06:52 +00:00
hanna
7104a3a96c
Fix for accumulator exception when running reduce by interval walkers without
...
intervals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2935 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 01:04:08 +00:00
aaron
366771d5a6
another test-with-multiple outputs fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2934 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 22:46:15 +00:00
chartl
706d49d84c
Commit for Aaron
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2932 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:29:07 +00:00
aaron
54f04dc541
forgot to uncomment the auto-deletion of temp files...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2930 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 20:29:42 +00:00
aaron
80cc6bbeb4
add a way to test files generated by a walker that aren't command-line arguments; added some example code in CoverageStatisticsIntegrationTest for Chris.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2929 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 20:20:58 +00:00
ebanks
0dd65461a1
Various improvements to plink, variant context, and VCF code.
...
We almost completely support indels. Not yet done with plink stuff.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2926 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 17:58:01 +00:00
aaron
c8077b7a22
Waypoint check-in: a couple of changes to for Tribble, and adding some options to the integration test for passing in auxillary files that aren’t “%s” command line options.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2925 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 16:02:21 +00:00
aaron
ca2cd9d4f5
a little clean-up: move setting the bases of generated reads into Artificial SAM Utils now that the clean read injector test is gone.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2919 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 16:31:45 +00:00
aaron
790d2a7776
adding the initial ROD for Reads support; more convenience methods in ReadMetaDataTracker to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2918 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 15:56:44 +00:00
ebanks
0e9a6826b0
Update to VCF code to get it up to spec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2917 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 06:12:42 +00:00
ebanks
74a5223b11
oops - didn't mean to check this in
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2914 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:28:22 +00:00
ebanks
5f3c80d9aa
1. To make indel calls, we need to get rid of the SNP-centricity of our code. First step is to have the reference be a String, not a char in the Genotype. Note that this is just a temporary patch until the genotype code is ported over to use VariantContext.
...
2. Significant refactoring of Plink code to work in the rods and use VariantContext. More coming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2913 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:26:40 +00:00
aaron
d8fedd59be
docs, cleanup, and some improvements to the iterators.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2901 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 22:36:04 +00:00
chartl
87f8fb7282
Quick commit in advance of Aaron's. Just a bunch of refactoring (private classes separated out, put in proper package). Also support added for coverage by read group rather than sample.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2897 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:39:47 +00:00
depristo
9a6b384adb
Support for no qual fields in VCF; better support for Mendelian violation calculations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2893 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 00:29:17 +00:00
aaron
246fa28386
RODs for reads phase 2: modified RODRecordList to implement List<ReferenceOrderedDatum> so I could stub it out for testing, added a FlashBackIterator which is needed to prevent the ResourcePool from opening infinity+1 iterators, and some other interfaces to make unit testing much smoother.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2892 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 22:48:55 +00:00
chartl
3d92e5a737
Initial commit of integration test(s) for CoverageStatistics, currently in progress [midway commit is for Matt]
...
Modifications to CoverageStatistics - now includes and extends much of the behavior of DepthOfCoverage (per-base output, per-target output).
Additional functionality (coverage without deletions, base counts, by read group instead of by sample) is upcoming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2888 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 20:25:07 +00:00
hanna
553d39bb00
Clean up the code a bit following the introduction of reduceByInterval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2887 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 01:20:22 +00:00
hanna
199b43fcf2
Reduce by interval alterations to interface with new sharding system. This checkin with be followed by a
...
simplification of some of the locus traversal code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2886 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 00:16:50 +00:00
aaron
fef1154fc8
starting on RODs for Reads: made RODRecordList implement list<RODatum> (so we can sub in fake lists during testing), and removed unnecessary generic-ness. Removed BrokenRODSimulator, which isn't being used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2884 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 22:11:53 +00:00
aaron
5546aa4416
adding code to deal with the off-spec situation where our minimum likelihood is above the GLF max of 255.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2871 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 22:27:39 +00:00
ebanks
8b555ff17c
Killed the old cleaner code. Bye bye.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2868 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:49:58 +00:00