Commit Graph

3215 Commits (2a6c2d3098776bab18ccbffd9335932cf4e7c9dc)

Author SHA1 Message Date
aaron 2a6c2d3098 re-enable test; I was moving the input file in prep for my last commit around on Eric, so he rightfully removed the test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3838 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:14:59 +00:00
aaron 0108517b98 updating the Tribble track loading code to use the new shared locks, updated lots of new tests, add infrastructure for the TreeInterval, and removed the old locking class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3837 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:08:10 +00:00
ebanks f742980864 1. Refactoring of GenoypeWriters so that parallelization now works again with VCF4.0. We now have just a single reference to the old VCF classes, and that one will be purged soon.
2. Moved Jared's VCFTool code into archive so that everything would compile.
3. Added the vcf reference base (needed for indels) as an attribute to the VariantContext from the reader.
4. TribbleRMDTrackBuilderUnitTest was complaining that a validation file didn'r exist, so I commented it out.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3835 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 06:16:45 +00:00
depristo 70b07206a2 CombineVariants tests for Guillermo and Eric to explore the correctness of the in/out reader, writer behavior of the system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3834 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 22:41:48 +00:00
depristo c47a5ff5ab Official parallel CountCovariates, passes all integration tests. Now poster-child example of parallelism in GATK (Matt H). Apparent general performance improvements throughout too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3833 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 22:13:18 +00:00
rpoplin 0b56003d1a Remove stray commented out line
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3832 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 19:14:39 +00:00
rpoplin 8e31c01680 Solid processing in base quality recalibrator now has several options for how to handle no calls in the color space. --ignore_nocall_colorspace is removed and replace by --solid_nocall_strategy. Fixed some of the @Deprecated tags in BaseUtils. LocusWalkers now filter out FailsVendorQualityCheck reads. HLA caller integration test bam file had bad vendor reads so its integration test changed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3831 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 19:10:29 +00:00
aaron 18b0114e25 remove FixBAMSortOrder walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3830 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 17:27:23 +00:00
aaron f4cfb0f990 The first step in integrating Jim's tree based index scheme:
- changed to a better method for getting headers from Codecs
- some removal of old commented out code in the GATKAgrumentCollection
- changes for the rename of FeatureReader to FeatureSource
- removed the old Beagle ROD
- cleaned up some of the code in SampleUtils

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3826 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 04:49:27 +00:00
hanna 40a963541d Uniquify the registered MXBean by adding an instanceNumber=... tag to the
ObjectName.  In the Queue-enabled future, we might want to come up with GUIDs
(or at least semi-unique IDs) so that we could use JMX to track runtime
attributes for multiple jobs running simultaneously.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3825 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 00:58:54 +00:00
ebanks 5a1a3fc79a Fix bad VariantContext creation in unit test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3824 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 20:21:01 +00:00
depristo 7c42e6994f FindBugs fixes throughout the code base
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3823 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 16:29:59 +00:00
ebanks 693672a461 Refactoring the VCF writer code; now no longer uses VCFRecord or any of its related classes, instead writing directly to the writer. Integration tests pass, but some are actually broken and will be fixed this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3822 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 13:19:56 +00:00
ebanks 379584f1bf Re-enable (most of) these tests. Guillermo will re-enable the other one when the VCF->VC conversion is done for indels
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3821 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 03:24:28 +00:00
ebanks 982947d328 update to deal with partial indels (I/D with no bases) in the HM records
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3820 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 02:56:37 +00:00
depristo 414ec6f20a Removing version argument constructors that shouldn't be used. Temporary allow -- with global variant to indicate this should be removed -- header records without description fields. Real error checking in the headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3818 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:30:08 +00:00
depristo 14b21e487b always 4.0
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3817 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:28:48 +00:00
depristo d40299840c indenting clean up
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3816 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:28:28 +00:00
hanna 9207c58b8f A fix for the integration test I broke on Friday on my way out the door --
some workflows using AlignmentContext were working with it in a way I didn't
expect and wound up treating extended pileups as base pileups.  I'll work to
make sure the AlignmentContext interface is crystal clear.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3815 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:22:44 +00:00
delangel 55b756f1cc First step in major cleanup/redo of VCF functionality. Specifically, now:
a) VCF track name can work again with 3.3 or 4.0 VCF's when specifying -B name,VCF,file. Code will read header and parse automatically the version. 
b) Old VCF codec is deprecated. Reader goes now direct from parsing VCF lines into producing VariantContext objects, with no intermediate VCF records. If anyone can't resist the urge to still input files using the old method, a new VCF3Codec is in place with the old code, but it will be eventually deleted.
c) VCF headers and VCF info fields no longer keep track of the version. They are parsed into an internal representation and will be output only in VCF4.0 format.
d) As a consequence, the existing GATK bug where files are produced with VCF4 body but VCF3.3 headers is solved.
e) Several VCF 4.0 writer bugs are now solved.
f) Integration test MD5's are changed, mostly because of corrected VCF4.0 headers and because validation data mostly uses now VCF4.0.
g) Several VCF files in the ValidationData/ directory have been converted to VCF 4.0 format. I kept the old versions, and the new versions have a .vcf4 extension.

Pending issues:
a) We are still not dealing with indels consistently or correctly when representing them. This will be a second part of the changes.
b) The VCF writer doesn't use VCFRecord but it does still use a lot of leftovers like VCFGenotypeEncoding, VCFGenotypeRecord, etc. This needs to be simplified and cleaned.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3813 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 22:49:16 +00:00
chartl 75bea4881a Modified SampleFilter to allow for multiple samples to be given. AminoAcidTransition now turns on when you give VariantEval the right commands.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3812 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 21:27:32 +00:00
aaron 36ac73cf9a comment out broken test until it can be fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3810 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 20:04:40 +00:00
hanna 96034aee0e Cleanup for Steve Hershman's issue. In the midst of doing this, I discovered
that the semantics for which reads are in an extended event pileup are not
clear at this point.  Eric and I have planned a future clarification for this
and the two of us will discuss who will implement this clarification and when
it'll happen.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3809 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 18:57:58 +00:00
asivache 6aedede7f3 Added Type.MNP to allowed variant context types; this does not break the tests (yet)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3808 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 15:50:25 +00:00
asivache 1dd8a28a5d Added new query: isMNP(feature); returns true if dbsnp feature is multi-nucleotide polymorfism (e.g. a di-nuc TA ->CC)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3806 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 15:32:10 +00:00
aaron ec94cfdf05 remove unit test for VCF writer, it's not applicable now that we produce only VCF4. Guillermo, it's up to you if you want to adapt this or remove it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3803 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 14:33:25 +00:00
depristo b29eda83bb Parallelized CountCovarites! percent_ref_called_var now a standard genotype concordance module (for validation!). Really much smarter merging of headers for combineVariants. VCF codecs now actually look at the file version and blow up if they are the wrong versions. setHeaderVersion() in VCFHeaderLine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3802 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 14:10:18 +00:00
ebanks f293eb7de1 Fix for Kim: for some ungodly reason, I was initializing the bins that were maintaining counts to 1 instead of 0.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3801 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 03:40:29 +00:00
ebanks e7e58d7129 The SAM spec has now officially reserved my new tags for original cigar and original alignment start... except that OS has been named OP ('original POS')
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3800 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 00:09:36 +00:00
ebanks ab84ed8c68 Fix for Mark: get rid of old program tags whose IDs clash with the recalibrator/realigner tag (including if the id has a .1 at the end, etc.). Keeping them around is dangerous because we don't know which one refers to the latest run of the tool on the bam.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3798 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-15 19:13:50 +00:00
hanna dfddf8fd75 - Bring the PaperGenotyper up to code.
- Remove some old debugging cruft regarding handling of threaded engine exceptions.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3796 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 22:31:21 +00:00
bthomas f65cba6b9a Adding support for shared file locking via a new class for file locking, FSLockWithShared. This will eventually take over for FSLock, the current file locking class - I'll work with Aaron to merge the tribble code that uses FSLock right now.
FYI: creating an exclusive lock on a file that does not exist will create that file as an empty file, and will NOT delete that file after the program terminates. So watch out if it's possible that the file you're locking does not exist - could end up leaving extra files that confuse users.  



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3795 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 20:45:51 +00:00
hanna a8caa20378 Previously the hierarchical microscheduler defensively coded around and reported exceptions of
the walker itself, but didn't do a great job of catching framework exceptions.  This became extremely
unfortunate in the case where walkers caused exceptions that manifested themselves in the framework,
such as when the walker opens more files than file handles are available.

Reworked the exception handling so that framework errors are treated like walker errors and the resulting
exception bubbles out of the walker.  Stack traces for threaded walkers are still convoluted and nasty.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3794 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 20:34:43 +00:00
ebanks bf384f48e1 Reverting previous change because it won't always work. More investigation needed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3793 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:13:17 +00:00
ebanks e4bfb06888 Check header type instead of rod type, since rod type will now be VC and not VCF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3792 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:10:09 +00:00
ebanks 0226412b11 Add GQ to list of genotype attributes for reg exp
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3791 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:01:11 +00:00
ebanks 78a4d8ec3d Removing more references to VCFRecord
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3790 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 16:34:15 +00:00
ebanks af23762778 Removing more references to VCFRecord
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3789 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 11:54:23 +00:00
ebanks a4f8d70d8d oops, forgot to update this integration test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3788 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 11:38:33 +00:00
ebanks 460283f6d2 No more manually converting VariantContexts to VCFRecords. You should be utilizing VCs and not VCFRecords.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3787 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 05:21:28 +00:00
ebanks 6b5c88d4d6 The GATK no longer writes vcf3.3; welcome to the world of vcf4.0. Needed to fix a few output bugs to get this to work, but it's looking great. Much more still to come. Guillermo: hopefully this doesn't break your local build too badly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3786 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 04:56:58 +00:00
chartl 9d2a485532 Update to AminoAcidTransition eval module
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3783 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 17:12:03 +00:00
rpoplin 3db7fbb5e9 Fix for added EOF in csv file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3781 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 16:09:48 +00:00
ebanks 9a05e8143d Move to 4.0 and away from VCFRecord.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3780 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 15:54:54 +00:00
ebanks 6442dabf94 Deleting/archiving as instructed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3779 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 15:23:50 +00:00
ebanks 7e7da75d27 Moving over to 4.0 and away from VCFRecord
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3778 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 14:07:10 +00:00
ebanks d896d03554 Moving VF to vcf 4.0. Still need to fix genotype filters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3777 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 11:39:51 +00:00
ebanks 76b3b39720 Technically, Mark broke this with his commit earlier. But since I had an outstanding broken test, I lose and have to fix this one too...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3776 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 03:58:38 +00:00
ebanks 1bef7dd170 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3775 348d0f76-0448-11de-a6fe-93d51630548a 2010-07-13 00:56:12 +00:00
depristo de969f7cc7 logger != null check
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3774 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 23:07:14 +00:00