aaron
ad98512f6c
adding changes so that we look at the headers already loaded by the engine for samples and other VCF utils, and not create readers for each file to get them (this caused Tribble to regerenate indices if the index file can't be written to disk).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3518 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:21:12 +00:00
weisburd
c1b7bcc786
Fixed handling of mitochondrial genes - added special cases such as ATT being a start codon in mitochondria. Added warning if a gene doesn't start with Met or end in a stop codon
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3517 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:15:47 +00:00
weisburd
4f1181974b
Added toString() method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3516 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:12:57 +00:00
ebanks
9b2fcc4711
Refactoring of the annotation system:
...
1. VA is now a ROD walker so it no longer requires reads (needs a little more testing)
2. Annotations can now represent multiple INFO fields (i.e. sets of key/value pairs)
3. The chromosome count annotations have been pulled out of UG and the VCF writer code and into VA where they belong. Fixed the headers too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3513 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:05:51 +00:00
hanna
84563b37e5
Partial flattening of the hanger data structure. Hanger data structure is
...
not currently as flat as it could / should be, but it's already comparable
to the speed of the reference implementation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3512 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 16:28:49 +00:00
chartl
8f9e3e8ad7
Commit for Kiran; but this is now working, barring little exceptions that I've yet to run across...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3511 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 14:21:19 +00:00
aaron
6d5556939d
updating Tribble with a couple of important Tabix fixes, and updating the variant eval integration tests to run each test with both plain vcf and gzipped tabix (added the tabix version
...
to the vlidation directory), using the same md5sum.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3509 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 01:47:04 +00:00
hanna
c2858c8988
Minor performance enhancement. Checkpoint commit before major performance
...
overhaul.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3504 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 21:39:39 +00:00
chartl
5ed2818ffb
Forgot to commit code i relied upon
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3503 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 21:01:35 +00:00
chartl
736098b58d
A quick commit before running home. This is a re-factored version of the OppositeHomozygoteClassifier which will work with deNovo violations as well. Some code still needs to be migrated from OHC which is wy that walker isn't yet deleted. This'll be up and running tonight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3502 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 20:47:01 +00:00
delangel
de134c226d
Removed ability of users to specify annotations to recompute, cleanups.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3501 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 19:17:59 +00:00
ebanks
4d1a6b3d99
quick changes for G
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3500 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 16:33:27 +00:00
delangel
907931c902
a) Update annotations when creating new vcf with Beagle's imputed data. Since genotypes may (will) change based on imputation, several annotations need to be updated. By default, AC, AF, AN and AB will be updated. User can force extra annotaqtions to be updated with -A <annotation> argument.
...
b) Several cleanups and beautifications.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3499 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 15:12:04 +00:00
chartl
933133ee28
Initial commit of the opposite homozygote classifier. Currently does the following, given a trio vcf:
...
+ Identifies opposite homozygote sites
+ Identifies the parent from whom it is expected that a null allele was inherited (or whether it was a putative genotype error; e.g. mom=homref, dad=homref, child=homvar)
+ Labels each opposite homozygote with its homozygous region in the child (e.g. region 1, region 2)
+ Labels each opposite homozygote with the size of the homozygous region in which it was found, the number of child homozygotes in the region, and the number of opposite homozygote violations within that region
To come:
+ Classification of sites as likely tri-allelic
Note that this is very experimental
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3498 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 03:56:07 +00:00
hanna
199e4208cd
Bug fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3497 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 00:30:33 +00:00
hanna
52ab9f2417
Feature parity between LocusIteratorByState, DownsamplingLocusIteratorByState, including pushing mrl /
...
the LocusOverflowTracker into LocusIteratorByState. Note that the 'Matt Hanna exception', is still enabled
because I haven't yet validated the performance of the DownsamplingLocusIteratorByState when running
without downsampling.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3496 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 22:58:21 +00:00
hanna
5c4d070566
Push Mark's changes in LocusIteratorByState into DownsamplingLocusIteratorByState
...
in preparation for merging the two into one.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3495 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 17:29:30 +00:00
depristo
6eeb1693ca
JEXL2 upgrade. Improvements to JEXL processing including dynamically resolving variable -> value bindings instead of up front adding them to a map. Performance improvements and code cleanup throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3494 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 00:33:02 +00:00
hanna
c1ecf75dd5
Update to the latest rev of the picard sharding patch. Includes updates reflecting
...
the imminent move of IlluminaUtil into picard public.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3493 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 20:33:21 +00:00
delangel
c503f01dcf
More cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3492 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 17:41:38 +00:00
delangel
d4c66d6191
a) Small cleanup
...
b) Fix major issue with Beagle likelihood converter: if likelihood triplets from UG end up being too low, then Beagle input file will be produced with 0.00,0.00,0.00 triplet. If all samples at a marker have this issue, Beagle will effectively produce junk. To fix, likelihoods are renormalized before converting to linear space.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3491 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 17:31:59 +00:00
depristo
cfa18f6743
Fixing missed update with new Allele in it
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3490 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 23:56:34 +00:00
depristo
3ea506fe52
No more new Allele() -- must use create. Allelel simple alleles are now cached for efficiency reasons. VCF4 codec optimizations -- 4x performance in general. Now working in general but hooked up to the ROD system now as VCF4. WARNING -- does not actually work with indels, genotype filters, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3489 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 23:03:55 +00:00
delangel
ef47a69c50
a) First fully functional (sort of) version of walker that parses Beagle imputation output files and produce a vcf with imputed genotypes.
...
More doc/info to follow shortly. Issues still to be solved:
a) Walker changes all genotypes based on Beagle data, but annotations on the original VCF are unchanged. They should in theory be recomputed based on new genotypes.
b) Current implementation is ugly, dirty unwieldy and will necessitate a refactoring soon so I can keep my pride. Most aesthetically affronting issue right now is that we read the full Beagle files at initialization and keep them in memory, but a more delicate implementation would just read from files on a marker by marker basis. Issue that currently prevents this is that BufferedReader() instances don't seem to play nice when called from the map() function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3488 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 20:37:25 +00:00
depristo
b811e61ae1
Optimized, nearly complete VCF4 reader 2-4x faster than the previous implementation, along with a VCF4 reader performance testing walker that can read 3/4 files, useful for benchmarking
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3487 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 18:11:38 +00:00
aaron
6482b87741
adding the super experimental, half-broken, generally crippled, awkwardly commented, header ignoring vcf4 code. Don't use this, unless you're a developer for VCF4. If so, remove the exception from the constructor so that it won't always exception out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3486 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 07:38:46 +00:00
aaron
0b03e28b60
updating the tribble library to include the reference dictionary reading / writing. We now check the dictionaries of any tracks that have them against the reference (all new tribble tracks and out-of-date tracks will have this). Also renamed some classes to be more reflective of their function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3485 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 06:34:26 +00:00
hanna
3d055e3d16
Fail fast if users try to parallelize a read walker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3484 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-03 18:14:33 +00:00
hanna
7d79848f40
Better error message when bam file / list file with wrong extension is
...
supplied.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3483 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-03 17:52:48 +00:00
ebanks
597b3744ab
Always use phasing info when converting genotypes to strings
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3482 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-03 17:50:50 +00:00
depristo
e2b41082af
GATK now does automatic adaptor filtering in locus iterators (but not expt. downsampling iterator). General support for LocusIteratorFilters just like read filters but only applying at particular bases. Updated tools with new MD5 sums due to adaptor bases in their integrationtest data. Not that as a side effect here reads close to each other with odd orientations are also filtered out. Updated minor argument to VariantRecalibrator to change the qStep value on the command line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3481 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 22:26:32 +00:00
aaron
8ec091d6d2
re-enabling regeneration of the tribble index if it's out of date. Also moved the class that can detect text in the log4j stream (useful in testing to make sure appropriate messages are generated).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3480 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 17:45:51 +00:00
asivache
f0c379dde8
Unconsequential changes in report formatting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3479 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 17:43:25 +00:00
weisburd
3ab936181c
Supports the join feature of GenomicAnnotator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3478 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:29:57 +00:00
weisburd
f5f7217413
Implemented joins
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3477 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:28:53 +00:00
weisburd
09c3b15af3
Implemented joins
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3476 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:28:06 +00:00
weisburd
e14ae471a0
Refactored some of the small utility methods
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3475 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:26:00 +00:00
weisburd
898a78e97d
Added toString()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3474 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:24:25 +00:00
weisburd
12c3e3ecda
Added back the check for values.size() != header.size(). Now exception will be thrown if number of columns in a record doesn't equal number of columns in the header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3473 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:23:05 +00:00
rpoplin
290771a8c2
Automatic cutting of recalibrated variant calls using ApplyVariantCuts. VariantRecalibrator produces the tranches plot alongside the optimization curve. Specify the levels using -tranche 1.0 -tranche 5.0 etc
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3472 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 15:03:00 +00:00
ebanks
4a555827aa
Removing more toUpperCase sanity checks
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3471 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 14:38:39 +00:00
ebanks
56e504789a
trivial change: toUpperCase no longer necessary
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3470 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 14:00:47 +00:00
rpoplin
87fe60fe4f
Fix for Sendu. new Process and p.waitFor() don't seem to work on his farm. Throws an IOException. This was a problem way back with AnalyzeCovariates too.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3469 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 11:37:10 +00:00
ebanks
7f0c638653
Fix for the indel cleaner: I forgot to "unclip" the cigar string (even though the clipped bases were removed) before using it as an alternate consensus in a particular instance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3468 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-01 02:07:20 +00:00
depristo
21427211c0
Personal MD5 database system now live. WalkerTest now maintains a database of result files associated with MD5 results in integrationtest/, and provides command lines for diff-ing expected to current md5 results when encountering failed intergration tests. The suite currently takes 200Mb to store. Update and run intergrationtest to build your very own expectation database for future development work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3466 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-31 16:06:16 +00:00
depristo
2b02324587
Support for detecting and automatically excluding reads reading into the adaptor sequence and, if desired, also only showing the first pair when two reads overlap in the fragment. Not enabled, an intermediate check in before updating and verifying the impact on locus walkers everywhere.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3465 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-30 18:00:12 +00:00
ebanks
eb25e41111
minor update to new tribble name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3462 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 20:23:25 +00:00
ebanks
ffeb3fd80d
Thanks to Guillermo, I found a bug in the Unified Genotyper output: GL was posteriors instead of likelihoods. Not a huge deal because the
...
priors were flat, but fixed nonetheless.
Also, needed to update Tribble.
Minor updates to the Beagle input maker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3461 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 19:28:26 +00:00
rpoplin
4e268ef6ac
Removing the Variant Recalibration Performance test because it isn't ready yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3460 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 18:27:25 +00:00
rpoplin
522dd7a5b2
Adding the variantrecalibration classes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3459 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 18:21:27 +00:00