Commit Graph

3024 Commits (ae88630d521901bbd98f104a588e8a4115cb9bd1)

Author SHA1 Message Date
ebanks 1e06d2bf68 Initial HLA Caller integration tests. Kind of painful, but will improve with code refactoring.
This baby is now officially ours.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3593 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 20:35:27 +00:00
chartl f44d8b150f Mendelian Violation Classifier now filters violations on the fly via command line arguments; and closes unterminated homozygous regions at the end of a chromosome (so we see arms falling off in the file, rather than in the log)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3592 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 19:32:24 +00:00
ebanks aa1852575e Add -noVerbose flag to stop output of INFO data.
Cuts runtime by 30% and output from 65Mb to 1Kb.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3591 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 18:53:35 +00:00
rpoplin 724affc3cc Major bug fixes for the Variant Recalibrator. Covariance matrix values are now allowed to be negative. When probabilities are multiplied together the calculation is done in log space, normalized, then converted back to real valued probabilities. Clustering weights have been changed to only use HapMap and by-1000genomes sites. The -nI argument was removed and now clustering simply runs until convergence. Test cases seem to work best when using just two annotations (QD and SB). More changes are in the works and are being evaluated. Misc fixes to walkers that use RScript due to CentOS changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3590 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 17:37:11 +00:00
aaron c3434493b0 fixed integration test for VCF Header changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3589 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 16:31:48 +00:00
hanna 52477bd9e6 Add some missing methods to the pileup architecture.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3588 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 15:03:08 +00:00
hanna 5050b19457 We're unable to make the naive deduper more worldly, so we're killing it instead.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3587 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 13:54:27 +00:00
aaron 42e7ff4f28 forgot to update a test, the md5sum of the underlying file changed (which is recorded in the ROD tests).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3586 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 13:27:56 +00:00
aaron b978d5946b adding changes for VCF 4, mostly in the way we handle VCF headers. The header fields are now aware of the differences between different VCF formats. There was also a bunch of clean-up of out-of-spec VCF used in the tests (mismatched VCF file format fields, etc), and updates to the associated integration tests. Also some logging statements for BTI.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3584 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 08:23:23 +00:00
weisburd e26a273ef5 Turned the test back on
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3582 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 22:57:42 +00:00
hanna 48cbc5ce37 Merging the sharding-specific inherited classes down into the base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3581 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 22:36:13 +00:00
hanna 612c3fdd9d First pass at eliminating the old sharding system. Classes required for the original sharding system
are gone where I could identify them, but hierarchies that split to support two sharding systems have
not yet been taken apart.
@Eric: ~4k lines.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3580 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 20:17:31 +00:00
delangel b694ca9633 Cleanup: Don't require likelihood ROD in Beagle parameters when generating output VCF. Likelihoods file is only an input to Beagle but the Walker that generates a VCF doesn't need it, so it's silly to ask for it and it's error-prone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3579 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 17:45:48 +00:00
hanna c1595a383a More bugfixes for cases where no sample name is present.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3578 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 16:46:02 +00:00
aaron 3d049204ed some refactoring for the variant eval output system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3576 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 05:34:31 +00:00
hanna db1383d0b2 Rev the latest version of Picard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3575 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 23:55:07 +00:00
weisburd 5b370ffc62 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3574 348d0f76-0448-11de-a6fe-93d51630548a 2010-06-16 20:42:58 +00:00
hanna 5972ad1199 Fixes to mrl integration.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3573 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 20:40:10 +00:00
ebanks b75ded61b8 Removing obsolete rod; no longer needed given previous addition to SampleUtils.
JIRA GSA-318


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3572 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 20:03:14 +00:00
kshakir c671864228 Re-allowing blacklist by read group id.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3571 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:45:44 +00:00
ebanks f003703912 Allow specification of particular rods for pulling out sample names.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3570 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:37:09 +00:00
ebanks 01ffa307c2 When going NWay out in the cleaner, use the new *merged* header (instead of the original one) for each bam file so that it matches the new uniquified read group ids in the reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3569 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:36:36 +00:00
kshakir 05c2f96bb4 Small update to the command line docs for read_group_black_list.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3568 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:23:34 +00:00
ebanks d7f3102c3f Fixed read group blacklist filter to look only at readgroups (and not the read's themselves). Otherwise, it fails when attribute tags with different meanings show up in both places (e.g. SM). Added performance improvement.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3567 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:14:37 +00:00
hanna e77f76f8e1 Reenabled downsampling by sample after basic sanity testing and fixes of the
new implementation.  Hard testing and performance enhancements are still
pending.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3566 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 17:23:27 +00:00
kshakir c44fd05aa1 Fix for a reflection issue with generic types.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3565 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 15:58:38 +00:00
ebanks 7a91dbd490 Renamed some of the column names in Ti/Tv and Concordance modules so that they are clearer. Removed ValidationRate module (it was busted).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3564 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 15:53:06 +00:00
delangel 8cb16a1d45 a) Cleanup, remove -input argument from BeagleOutputToVCFWalker since it's not needed.
b) Added back old Beagle ROD to maintain backward compatibility (does anyone even use this???)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3563 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 02:13:08 +00:00
delangel d319a28be7 Complete rewrite of the Beagle functionality to read from Beagle output files and produce VCF with modified genotypes. Now, a new ROD system using Tribble is in place. Beagle inputs are set using -B beagleType,Beagle,pathToBeagleFile, where beagleType can be either beagleR2, beagleLike, beaglePhased or beagleR2 (BeagleOutputToVCFWalker requires all of the above). Only pending items: -input argument is now unused and can be removed, will be cleaned later. Wiki will be updated with new usage shortly.
We can now run with a reduced memory footprint, and output VCF is exactly identical to previous version. Drawback is increased runtime because Tribble has to create an index for all the Beagle files when starting if the idx files are missing.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3562 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 02:01:35 +00:00
aaron d265397bf6 removing a reference to a unused internal Sun class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3560 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-15 15:27:57 +00:00
asivache 42b8a8f295 slight change in output format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3559 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-15 14:52:04 +00:00
kshakir 32fc221ffe Replaced pattern matched pipeline spec with annotated objects.
Old version is no longer available.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3558 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-15 04:43:46 +00:00
sjia b99a5e06f3 Added option to only consider alleles of > specific allele frequency.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3557 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-15 02:09:35 +00:00
hanna 8a895f481f Proper exception chaining for troubleshooting Sendu's issue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3556 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-15 01:38:36 +00:00
sjia 8defb30796 Documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3555 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 21:31:01 +00:00
weisburd c1046653a2 Fixed handling of records where gene-names are identical (eg. as in refseq NR_030638 in chr20)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3554 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 20:00:49 +00:00
weisburd 1e42984a16 Improved buffer-size arg handling
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3553 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 19:59:15 +00:00
sjia b3c3023c3c Allows callers to handle HLA reference files as input (rather than hard-coded paths)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3552 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 18:56:08 +00:00
asivache 9666d47d17 ooops, debug print now removed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3550 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 18:07:12 +00:00
sjia abdc8521ea Added debug options for FindClosestHLAWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3549 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 17:52:03 +00:00
sjia c38390eabb Added option for min number of matches between reads and alleles required to consider reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3548 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 16:08:49 +00:00
asivache 4ab1f440c3 A new argument: --targetIntervalsSorted (boolean flag). If specified, the interval file is assumed to be sorted (duh!) and it is NOT slurped into the memory but instead traversed directly on disk as needed. If the file turns out to be unsorted, an exception will be thrown at the point where inconsistency occurs (can be late into the processing!).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3547 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 16:00:22 +00:00
asivache 671ac00748 A simple utility class that implements a merging Iterator<GenomeLoc> built over an interval or bed file (this is NOT a rod, but rather a direct line-by-line file reader that converts strings to genome locs on the fly and merges overlapping intervals)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3546 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 15:54:37 +00:00
asivache f137bf8f85 now adaptor silently skips empty lines in the underlying string iterator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3545 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 15:35:07 +00:00
sjia d8c963c91c Remove PhaselikelihoodsWalker.java
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3544 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 15:21:43 +00:00
sjia 5704294f9d HLA caller updated - now searches all (common and rare) alleles, more efficient read filtering and allele comparison runs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3543 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 15:14:40 +00:00
asivache d51e6c45a7 a utility class; turns string iterator into GenomeLoc iterator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3542 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 14:07:44 +00:00
asivache 7b7d3341f0 trivial refactoring: isFile renamed to isIntervalFile and made public
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3541 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 14:02:23 +00:00
hanna c3b68cc58d Rethinking DownsamplingLocusIteratorByState with a flattened read structure. Samples are kept
independent while processing, and only merged back in a priority queue if necessary in a special
variant of the ReadBackedPileup.  This code is not live yet except in the case of naive deduping.
Downsampling by sample temporarily disabled, and the ReadBackedPileup variant is sketchy and
not well integrated with StratifiedAlignmentContext or the walkers.  Cleanup to follow.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3540 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-13 01:47:02 +00:00
kiran 804facb0cc Removing these utilities as part of a hostage negotation with Matt. Can I have my journal club paper now?!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3539 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 21:41:29 +00:00