weisburd
e26a273ef5
Turned the test back on
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3582 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 22:57:42 +00:00
hanna
48cbc5ce37
Merging the sharding-specific inherited classes down into the base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3581 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 22:36:13 +00:00
hanna
612c3fdd9d
First pass at eliminating the old sharding system. Classes required for the original sharding system
...
are gone where I could identify them, but hierarchies that split to support two sharding systems have
not yet been taken apart.
@Eric: ~4k lines.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3580 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 20:17:31 +00:00
delangel
b694ca9633
Cleanup: Don't require likelihood ROD in Beagle parameters when generating output VCF. Likelihoods file is only an input to Beagle but the Walker that generates a VCF doesn't need it, so it's silly to ask for it and it's error-prone.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3579 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 17:45:48 +00:00
hanna
c1595a383a
More bugfixes for cases where no sample name is present.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3578 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 16:46:02 +00:00
aaron
3d049204ed
some refactoring for the variant eval output system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3576 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 05:34:31 +00:00
hanna
db1383d0b2
Rev the latest version of Picard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3575 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 23:55:07 +00:00
weisburd
5b370ffc62
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3574 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 20:42:58 +00:00
hanna
5972ad1199
Fixes to mrl integration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3573 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 20:40:10 +00:00
ebanks
b75ded61b8
Removing obsolete rod; no longer needed given previous addition to SampleUtils.
...
JIRA GSA-318
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3572 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 20:03:14 +00:00
kshakir
c671864228
Re-allowing blacklist by read group id.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3571 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:45:44 +00:00
ebanks
f003703912
Allow specification of particular rods for pulling out sample names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3570 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:37:09 +00:00
ebanks
01ffa307c2
When going NWay out in the cleaner, use the new *merged* header (instead of the original one) for each bam file so that it matches the new uniquified read group ids in the reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3569 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:36:36 +00:00
kshakir
05c2f96bb4
Small update to the command line docs for read_group_black_list.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3568 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:23:34 +00:00
ebanks
d7f3102c3f
Fixed read group blacklist filter to look only at readgroups (and not the read's themselves). Otherwise, it fails when attribute tags with different meanings show up in both places (e.g. SM). Added performance improvement.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3567 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:14:37 +00:00
hanna
e77f76f8e1
Reenabled downsampling by sample after basic sanity testing and fixes of the
...
new implementation. Hard testing and performance enhancements are still
pending.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3566 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 17:23:27 +00:00
kshakir
c44fd05aa1
Fix for a reflection issue with generic types.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3565 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 15:58:38 +00:00
ebanks
7a91dbd490
Renamed some of the column names in Ti/Tv and Concordance modules so that they are clearer. Removed ValidationRate module (it was busted).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3564 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 15:53:06 +00:00
delangel
8cb16a1d45
a) Cleanup, remove -input argument from BeagleOutputToVCFWalker since it's not needed.
...
b) Added back old Beagle ROD to maintain backward compatibility (does anyone even use this???)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3563 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 02:13:08 +00:00
delangel
d319a28be7
Complete rewrite of the Beagle functionality to read from Beagle output files and produce VCF with modified genotypes. Now, a new ROD system using Tribble is in place. Beagle inputs are set using -B beagleType,Beagle,pathToBeagleFile, where beagleType can be either beagleR2, beagleLike, beaglePhased or beagleR2 (BeagleOutputToVCFWalker requires all of the above). Only pending items: -input argument is now unused and can be removed, will be cleaned later. Wiki will be updated with new usage shortly.
...
We can now run with a reduced memory footprint, and output VCF is exactly identical to previous version. Drawback is increased runtime because Tribble has to create an index for all the Beagle files when starting if the idx files are missing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3562 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 02:01:35 +00:00
aaron
d265397bf6
removing a reference to a unused internal Sun class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3560 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-15 15:27:57 +00:00
asivache
42b8a8f295
slight change in output format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3559 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-15 14:52:04 +00:00
kshakir
32fc221ffe
Replaced pattern matched pipeline spec with annotated objects.
...
Old version is no longer available.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3558 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-15 04:43:46 +00:00
sjia
b99a5e06f3
Added option to only consider alleles of > specific allele frequency.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3557 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-15 02:09:35 +00:00
hanna
8a895f481f
Proper exception chaining for troubleshooting Sendu's issue.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3556 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-15 01:38:36 +00:00
sjia
8defb30796
Documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3555 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 21:31:01 +00:00
weisburd
c1046653a2
Fixed handling of records where gene-names are identical (eg. as in refseq NR_030638 in chr20)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3554 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 20:00:49 +00:00
weisburd
1e42984a16
Improved buffer-size arg handling
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3553 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 19:59:15 +00:00
sjia
b3c3023c3c
Allows callers to handle HLA reference files as input (rather than hard-coded paths)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3552 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 18:56:08 +00:00
asivache
9666d47d17
ooops, debug print now removed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3550 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 18:07:12 +00:00
sjia
abdc8521ea
Added debug options for FindClosestHLAWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3549 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 17:52:03 +00:00
sjia
c38390eabb
Added option for min number of matches between reads and alleles required to consider reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3548 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 16:08:49 +00:00
asivache
4ab1f440c3
A new argument: --targetIntervalsSorted (boolean flag). If specified, the interval file is assumed to be sorted (duh!) and it is NOT slurped into the memory but instead traversed directly on disk as needed. If the file turns out to be unsorted, an exception will be thrown at the point where inconsistency occurs (can be late into the processing!).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3547 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 16:00:22 +00:00
asivache
671ac00748
A simple utility class that implements a merging Iterator<GenomeLoc> built over an interval or bed file (this is NOT a rod, but rather a direct line-by-line file reader that converts strings to genome locs on the fly and merges overlapping intervals)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3546 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 15:54:37 +00:00
asivache
f137bf8f85
now adaptor silently skips empty lines in the underlying string iterator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3545 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 15:35:07 +00:00
sjia
d8c963c91c
Remove PhaselikelihoodsWalker.java
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3544 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 15:21:43 +00:00
sjia
5704294f9d
HLA caller updated - now searches all (common and rare) alleles, more efficient read filtering and allele comparison runs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3543 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 15:14:40 +00:00
asivache
d51e6c45a7
a utility class; turns string iterator into GenomeLoc iterator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3542 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 14:07:44 +00:00
asivache
7b7d3341f0
trivial refactoring: isFile renamed to isIntervalFile and made public
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3541 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 14:02:23 +00:00
hanna
c3b68cc58d
Rethinking DownsamplingLocusIteratorByState with a flattened read structure. Samples are kept
...
independent while processing, and only merged back in a priority queue if necessary in a special
variant of the ReadBackedPileup. This code is not live yet except in the case of naive deduping.
Downsampling by sample temporarily disabled, and the ReadBackedPileup variant is sketchy and
not well integrated with StratifiedAlignmentContext or the walkers. Cleanup to follow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3540 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-13 01:47:02 +00:00
kiran
804facb0cc
Removing these utilities as part of a hostage negotation with Matt. Can I have my journal club paper now?!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3539 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 21:41:29 +00:00
asivache
e6d8faf293
making 'parseLocation' public static - as simple as the logic is, it's better kept in one place and I need it!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3537 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 18:19:59 +00:00
ebanks
8c28be5933
Fixing a VCF bug for Sendu: we weren't emitting flags (booleans) correctly in VCF3.3 (rev'ed tribble for this).
...
Updated dbsnp/hapmap membership info fields to be flags now instead of ints.
While I was there, I added the change in the Annotator for Jan to force reads to be from a specific sample.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3536 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 16:42:06 +00:00
ebanks
22620ba95c
Adding "abi_solid" to the list of known platforms.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3534 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 13:37:19 +00:00
ebanks
63ad71cca6
Fix busted code. Note for all:
...
String.valueOf(byte[]) doesn't work. You must use new String(byte[]).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3533 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 05:01:48 +00:00
weisburd
338bb9adf4
CommandLineProgram for measuring java I/O speeds for large plain-text or gzipped files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3532 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 21:34:37 +00:00
weisburd
06fc5eecf8
Implemented TreeReducible - if num threads > 1, the output will be accumulated in memory and written to a vcf file at the end - in onTraveralDone(..). If num threads == 1, things will work as before - where vcf records are written to disk as soon as they are computed with map(..).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3530 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:57:23 +00:00
weisburd
3b375cb237
Sped up parseGenomeLoc(..) by replacing regexp with String.indexOf(..) - attempt 2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3529 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:54:36 +00:00
bthomas
99b684ea89
Adding new support for reference data. ReferenceDataSource is a new class that manages reference data, and allows IndexedFastaSequenceFile to be a simple reader. This checkin also includes FastaSequenceIndexBuilder, which reads a fasta file and creates an index, like samtools faidx. Right now this is not enabled, because we are still working out thread safety. So the only new UI change is that GATK can be run without a fai file. Soon, we will enable 1) GATK to be run without a dict file too, and 2) both dict and fai files will be saved on disk for future program executions. For more info, see ReferenceDataSource.java
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3527 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:10:23 +00:00
hanna
f55f32d4ee
Bug fix.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3526 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 01:53:26 +00:00
ebanks
ca4eab1d23
Now annotations that require reads return null if there's no alignment context, so that running without reads adds annotations only for the appropriate fields.
...
Added an integration test for the read-less case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3525 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 20:36:46 +00:00
aaron
6941c81bfa
reverting revision 3522 to the old code until we fix the tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3524 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 19:25:02 +00:00
hanna
dbee21a50f
Bugfixes for the case when no read groups / no samples are available.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3523 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 18:47:05 +00:00
weisburd
adc4c4e577
Sped up parseGenomeLoc(..) by replacing regexp with String.indexOf(..)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3522 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 18:11:43 +00:00
chartl
20167fd411
Final changes to MVC -- associates variants with regions of homozygosity in child and parents, corrects for genotype errors, and prints out a separate file with informationf or each region of homozygosity.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3521 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 18:05:37 +00:00
weisburd
fdded73861
Improved error reporting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3520 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:52:48 +00:00
aaron
4f00e265a8
quick update for a change I implemented for Ryan
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3519 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:23:31 +00:00
aaron
ad98512f6c
adding changes so that we look at the headers already loaded by the engine for samples and other VCF utils, and not create readers for each file to get them (this caused Tribble to regerenate indices if the index file can't be written to disk).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3518 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:21:12 +00:00
weisburd
c1b7bcc786
Fixed handling of mitochondrial genes - added special cases such as ATT being a start codon in mitochondria. Added warning if a gene doesn't start with Met or end in a stop codon
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3517 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:15:47 +00:00
weisburd
4f1181974b
Added toString() method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3516 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:12:57 +00:00
ebanks
9b2fcc4711
Refactoring of the annotation system:
...
1. VA is now a ROD walker so it no longer requires reads (needs a little more testing)
2. Annotations can now represent multiple INFO fields (i.e. sets of key/value pairs)
3. The chromosome count annotations have been pulled out of UG and the VCF writer code and into VA where they belong. Fixed the headers too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3513 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:05:51 +00:00
hanna
84563b37e5
Partial flattening of the hanger data structure. Hanger data structure is
...
not currently as flat as it could / should be, but it's already comparable
to the speed of the reference implementation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3512 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 16:28:49 +00:00
chartl
8f9e3e8ad7
Commit for Kiran; but this is now working, barring little exceptions that I've yet to run across...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3511 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 14:21:19 +00:00
aaron
6d5556939d
updating Tribble with a couple of important Tabix fixes, and updating the variant eval integration tests to run each test with both plain vcf and gzipped tabix (added the tabix version
...
to the vlidation directory), using the same md5sum.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3509 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 01:47:04 +00:00
hanna
c2858c8988
Minor performance enhancement. Checkpoint commit before major performance
...
overhaul.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3504 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 21:39:39 +00:00
chartl
5ed2818ffb
Forgot to commit code i relied upon
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3503 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 21:01:35 +00:00
chartl
736098b58d
A quick commit before running home. This is a re-factored version of the OppositeHomozygoteClassifier which will work with deNovo violations as well. Some code still needs to be migrated from OHC which is wy that walker isn't yet deleted. This'll be up and running tonight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3502 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 20:47:01 +00:00
delangel
de134c226d
Removed ability of users to specify annotations to recompute, cleanups.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3501 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 19:17:59 +00:00
ebanks
4d1a6b3d99
quick changes for G
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3500 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 16:33:27 +00:00
delangel
907931c902
a) Update annotations when creating new vcf with Beagle's imputed data. Since genotypes may (will) change based on imputation, several annotations need to be updated. By default, AC, AF, AN and AB will be updated. User can force extra annotaqtions to be updated with -A <annotation> argument.
...
b) Several cleanups and beautifications.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3499 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 15:12:04 +00:00
chartl
933133ee28
Initial commit of the opposite homozygote classifier. Currently does the following, given a trio vcf:
...
+ Identifies opposite homozygote sites
+ Identifies the parent from whom it is expected that a null allele was inherited (or whether it was a putative genotype error; e.g. mom=homref, dad=homref, child=homvar)
+ Labels each opposite homozygote with its homozygous region in the child (e.g. region 1, region 2)
+ Labels each opposite homozygote with the size of the homozygous region in which it was found, the number of child homozygotes in the region, and the number of opposite homozygote violations within that region
To come:
+ Classification of sites as likely tri-allelic
Note that this is very experimental
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3498 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 03:56:07 +00:00
hanna
199e4208cd
Bug fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3497 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 00:30:33 +00:00
hanna
52ab9f2417
Feature parity between LocusIteratorByState, DownsamplingLocusIteratorByState, including pushing mrl /
...
the LocusOverflowTracker into LocusIteratorByState. Note that the 'Matt Hanna exception', is still enabled
because I haven't yet validated the performance of the DownsamplingLocusIteratorByState when running
without downsampling.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3496 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 22:58:21 +00:00
hanna
5c4d070566
Push Mark's changes in LocusIteratorByState into DownsamplingLocusIteratorByState
...
in preparation for merging the two into one.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3495 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 17:29:30 +00:00
depristo
6eeb1693ca
JEXL2 upgrade. Improvements to JEXL processing including dynamically resolving variable -> value bindings instead of up front adding them to a map. Performance improvements and code cleanup throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3494 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 00:33:02 +00:00
hanna
c1ecf75dd5
Update to the latest rev of the picard sharding patch. Includes updates reflecting
...
the imminent move of IlluminaUtil into picard public.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3493 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 20:33:21 +00:00
delangel
c503f01dcf
More cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3492 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 17:41:38 +00:00
delangel
d4c66d6191
a) Small cleanup
...
b) Fix major issue with Beagle likelihood converter: if likelihood triplets from UG end up being too low, then Beagle input file will be produced with 0.00,0.00,0.00 triplet. If all samples at a marker have this issue, Beagle will effectively produce junk. To fix, likelihoods are renormalized before converting to linear space.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3491 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 17:31:59 +00:00
depristo
cfa18f6743
Fixing missed update with new Allele in it
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3490 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 23:56:34 +00:00
depristo
3ea506fe52
No more new Allele() -- must use create. Allelel simple alleles are now cached for efficiency reasons. VCF4 codec optimizations -- 4x performance in general. Now working in general but hooked up to the ROD system now as VCF4. WARNING -- does not actually work with indels, genotype filters, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3489 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 23:03:55 +00:00
delangel
ef47a69c50
a) First fully functional (sort of) version of walker that parses Beagle imputation output files and produce a vcf with imputed genotypes.
...
More doc/info to follow shortly. Issues still to be solved:
a) Walker changes all genotypes based on Beagle data, but annotations on the original VCF are unchanged. They should in theory be recomputed based on new genotypes.
b) Current implementation is ugly, dirty unwieldy and will necessitate a refactoring soon so I can keep my pride. Most aesthetically affronting issue right now is that we read the full Beagle files at initialization and keep them in memory, but a more delicate implementation would just read from files on a marker by marker basis. Issue that currently prevents this is that BufferedReader() instances don't seem to play nice when called from the map() function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3488 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 20:37:25 +00:00
depristo
b811e61ae1
Optimized, nearly complete VCF4 reader 2-4x faster than the previous implementation, along with a VCF4 reader performance testing walker that can read 3/4 files, useful for benchmarking
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3487 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 18:11:38 +00:00
aaron
6482b87741
adding the super experimental, half-broken, generally crippled, awkwardly commented, header ignoring vcf4 code. Don't use this, unless you're a developer for VCF4. If so, remove the exception from the constructor so that it won't always exception out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3486 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 07:38:46 +00:00
aaron
0b03e28b60
updating the tribble library to include the reference dictionary reading / writing. We now check the dictionaries of any tracks that have them against the reference (all new tribble tracks and out-of-date tracks will have this). Also renamed some classes to be more reflective of their function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3485 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 06:34:26 +00:00
hanna
3d055e3d16
Fail fast if users try to parallelize a read walker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3484 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-03 18:14:33 +00:00
hanna
7d79848f40
Better error message when bam file / list file with wrong extension is
...
supplied.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3483 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-03 17:52:48 +00:00
ebanks
597b3744ab
Always use phasing info when converting genotypes to strings
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3482 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-03 17:50:50 +00:00
depristo
e2b41082af
GATK now does automatic adaptor filtering in locus iterators (but not expt. downsampling iterator). General support for LocusIteratorFilters just like read filters but only applying at particular bases. Updated tools with new MD5 sums due to adaptor bases in their integrationtest data. Not that as a side effect here reads close to each other with odd orientations are also filtered out. Updated minor argument to VariantRecalibrator to change the qStep value on the command line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3481 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 22:26:32 +00:00
aaron
8ec091d6d2
re-enabling regeneration of the tribble index if it's out of date. Also moved the class that can detect text in the log4j stream (useful in testing to make sure appropriate messages are generated).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3480 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 17:45:51 +00:00
asivache
f0c379dde8
Unconsequential changes in report formatting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3479 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 17:43:25 +00:00
weisburd
3ab936181c
Supports the join feature of GenomicAnnotator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3478 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:29:57 +00:00
weisburd
f5f7217413
Implemented joins
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3477 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:28:53 +00:00
weisburd
09c3b15af3
Implemented joins
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3476 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:28:06 +00:00
weisburd
e14ae471a0
Refactored some of the small utility methods
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3475 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:26:00 +00:00
weisburd
898a78e97d
Added toString()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3474 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:24:25 +00:00
weisburd
12c3e3ecda
Added back the check for values.size() != header.size(). Now exception will be thrown if number of columns in a record doesn't equal number of columns in the header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3473 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:23:05 +00:00
rpoplin
290771a8c2
Automatic cutting of recalibrated variant calls using ApplyVariantCuts. VariantRecalibrator produces the tranches plot alongside the optimization curve. Specify the levels using -tranche 1.0 -tranche 5.0 etc
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3472 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 15:03:00 +00:00
ebanks
4a555827aa
Removing more toUpperCase sanity checks
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3471 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 14:38:39 +00:00
ebanks
56e504789a
trivial change: toUpperCase no longer necessary
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3470 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 14:00:47 +00:00
rpoplin
87fe60fe4f
Fix for Sendu. new Process and p.waitFor() don't seem to work on his farm. Throws an IOException. This was a problem way back with AnalyzeCovariates too.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3469 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 11:37:10 +00:00