Commit Graph

3168 Commits (f39dce1082bf388c88ba16707ca555b00a84f405)

Author SHA1 Message Date
hanna e9d243babb More improvements to exception handling during multithreaded runs based on
a bug reported by Ryan.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3855 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 22:13:01 +00:00
hanna 83798225ac Repackaged datasource-specific command-line tools into their own package. Added a tag renamer tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3854 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 19:50:34 +00:00
asivache 485023ba8e this.intersect(that) method added to GenomeLoc (returns intersection of two intervals or dies if the locations do not overlap)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3852 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 16:00:30 +00:00
asivache 3308d956f4 Added utility shortcut method: getOriginalQualsInCycleOrder(read)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3851 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 15:44:25 +00:00
delangel 473ec91633 a) Bug fix in VCFHeader parsing - Info fields were not being parsed properly, with the result that the Count field was not being properly displayed in records (e.g. if Count=0 for a particular field, the INFO tag was still being displayed as ...;Field=x;... instead of ...;Field;...
b) Bug fixes and update to how we represent indels and other complex events in a VariantContext object. Convention is now that all events are left aligned, with the first variant context location marking the common base before an event occurs. However, alleles in a VC don't have the common base in all VC's. Two new functions are now part of VariantContextUtils: CreateVariantContextWithPaddedAlleles and CreateVariantContextWithTrimmedAlleles. Both take a VC as an input and create a VC as an output.
Main flow is that a VCF reader would create a VC with trimmed alleles, all walkers would ideally work with these trimmed alleles, and then the VCF writer would pad back the alleles before writing. However, there are special cases where we need to pad alleles like for example when merging/combining VC's.

Pending issues:
- PED and DBSNP RODs have to be updated to create VC's for indels following the convention above. Changes will go in after Tribble location is moved and things are tested.
- Need to verify Indel genotyper and other modules that create VC's with indels.- Wiki page describing convention above and how walkers should interpret indel VC's still needs updating/detailing.
 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3850 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 02:36:45 +00:00
chartl b696c3ea98 No more traversal reduce results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3849 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 18:34:54 +00:00
chartl 365b42390d Support for generating (very basic) wiggle files for use with IGV (see UCSC for wiggle spec); and a walker to take in a variant track and create a transition transversion rate track for the whole genome (due to the wiggle spec, this has to be done by chromosome). It's interesting to see the effect of genes!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3848 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 18:04:30 +00:00
depristo f7957bc7f2 Fixed memory leak in VariantEval
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3845 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 12:35:46 +00:00
aaron 1cba81c16f updates to tribble with fixes for some bugs I've found in some new indexing code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3842 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 22:08:04 +00:00
ebanks c6ad26e04f 1) When quals/GQs are really integers (x.00), strip off the floating points.
2) Keep track of whether vcf records are unfiltered vs. pass filters in the variant context so we can regenerate the records on output.
3) No more "ID" hard-coded all over the code to set the VariantContext ID.  Use a static variable instead.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3840 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 18:01:45 +00:00
ebanks 0db7fab1a9 Fixing genotype filtering for VF and adding integration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3839 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:30:21 +00:00
aaron 0108517b98 updating the Tribble track loading code to use the new shared locks, updated lots of new tests, add infrastructure for the TreeInterval, and removed the old locking class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3837 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:08:10 +00:00
ebanks f742980864 1. Refactoring of GenoypeWriters so that parallelization now works again with VCF4.0. We now have just a single reference to the old VCF classes, and that one will be purged soon.
2. Moved Jared's VCFTool code into archive so that everything would compile.
3. Added the vcf reference base (needed for indels) as an attribute to the VariantContext from the reader.
4. TribbleRMDTrackBuilderUnitTest was complaining that a validation file didn'r exist, so I commented it out.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3835 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 06:16:45 +00:00
depristo c47a5ff5ab Official parallel CountCovariates, passes all integration tests. Now poster-child example of parallelism in GATK (Matt H). Apparent general performance improvements throughout too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3833 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 22:13:18 +00:00
rpoplin 0b56003d1a Remove stray commented out line
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3832 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 19:14:39 +00:00
rpoplin 8e31c01680 Solid processing in base quality recalibrator now has several options for how to handle no calls in the color space. --ignore_nocall_colorspace is removed and replace by --solid_nocall_strategy. Fixed some of the @Deprecated tags in BaseUtils. LocusWalkers now filter out FailsVendorQualityCheck reads. HLA caller integration test bam file had bad vendor reads so its integration test changed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3831 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 19:10:29 +00:00
aaron 18b0114e25 remove FixBAMSortOrder walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3830 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 17:27:23 +00:00
aaron f4cfb0f990 The first step in integrating Jim's tree based index scheme:
- changed to a better method for getting headers from Codecs
- some removal of old commented out code in the GATKAgrumentCollection
- changes for the rename of FeatureReader to FeatureSource
- removed the old Beagle ROD
- cleaned up some of the code in SampleUtils

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3826 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 04:49:27 +00:00
hanna 40a963541d Uniquify the registered MXBean by adding an instanceNumber=... tag to the
ObjectName.  In the Queue-enabled future, we might want to come up with GUIDs
(or at least semi-unique IDs) so that we could use JMX to track runtime
attributes for multiple jobs running simultaneously.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3825 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 00:58:54 +00:00
depristo 7c42e6994f FindBugs fixes throughout the code base
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3823 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 16:29:59 +00:00
ebanks 693672a461 Refactoring the VCF writer code; now no longer uses VCFRecord or any of its related classes, instead writing directly to the writer. Integration tests pass, but some are actually broken and will be fixed this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3822 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 13:19:56 +00:00
ebanks 982947d328 update to deal with partial indels (I/D with no bases) in the HM records
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3820 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 02:56:37 +00:00
depristo 414ec6f20a Removing version argument constructors that shouldn't be used. Temporary allow -- with global variant to indicate this should be removed -- header records without description fields. Real error checking in the headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3818 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:30:08 +00:00
depristo 14b21e487b always 4.0
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3817 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:28:48 +00:00
depristo d40299840c indenting clean up
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3816 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:28:28 +00:00
hanna 9207c58b8f A fix for the integration test I broke on Friday on my way out the door --
some workflows using AlignmentContext were working with it in a way I didn't
expect and wound up treating extended pileups as base pileups.  I'll work to
make sure the AlignmentContext interface is crystal clear.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3815 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:22:44 +00:00
delangel 55b756f1cc First step in major cleanup/redo of VCF functionality. Specifically, now:
a) VCF track name can work again with 3.3 or 4.0 VCF's when specifying -B name,VCF,file. Code will read header and parse automatically the version. 
b) Old VCF codec is deprecated. Reader goes now direct from parsing VCF lines into producing VariantContext objects, with no intermediate VCF records. If anyone can't resist the urge to still input files using the old method, a new VCF3Codec is in place with the old code, but it will be eventually deleted.
c) VCF headers and VCF info fields no longer keep track of the version. They are parsed into an internal representation and will be output only in VCF4.0 format.
d) As a consequence, the existing GATK bug where files are produced with VCF4 body but VCF3.3 headers is solved.
e) Several VCF 4.0 writer bugs are now solved.
f) Integration test MD5's are changed, mostly because of corrected VCF4.0 headers and because validation data mostly uses now VCF4.0.
g) Several VCF files in the ValidationData/ directory have been converted to VCF 4.0 format. I kept the old versions, and the new versions have a .vcf4 extension.

Pending issues:
a) We are still not dealing with indels consistently or correctly when representing them. This will be a second part of the changes.
b) The VCF writer doesn't use VCFRecord but it does still use a lot of leftovers like VCFGenotypeEncoding, VCFGenotypeRecord, etc. This needs to be simplified and cleaned.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3813 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 22:49:16 +00:00
chartl 75bea4881a Modified SampleFilter to allow for multiple samples to be given. AminoAcidTransition now turns on when you give VariantEval the right commands.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3812 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 21:27:32 +00:00
hanna 96034aee0e Cleanup for Steve Hershman's issue. In the midst of doing this, I discovered
that the semantics for which reads are in an extended event pileup are not
clear at this point.  Eric and I have planned a future clarification for this
and the two of us will discuss who will implement this clarification and when
it'll happen.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3809 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 18:57:58 +00:00
asivache 6aedede7f3 Added Type.MNP to allowed variant context types; this does not break the tests (yet)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3808 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 15:50:25 +00:00
asivache 1dd8a28a5d Added new query: isMNP(feature); returns true if dbsnp feature is multi-nucleotide polymorfism (e.g. a di-nuc TA ->CC)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3806 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 15:32:10 +00:00
depristo b29eda83bb Parallelized CountCovarites! percent_ref_called_var now a standard genotype concordance module (for validation!). Really much smarter merging of headers for combineVariants. VCF codecs now actually look at the file version and blow up if they are the wrong versions. setHeaderVersion() in VCFHeaderLine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3802 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 14:10:18 +00:00
ebanks f293eb7de1 Fix for Kim: for some ungodly reason, I was initializing the bins that were maintaining counts to 1 instead of 0.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3801 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 03:40:29 +00:00
ebanks e7e58d7129 The SAM spec has now officially reserved my new tags for original cigar and original alignment start... except that OS has been named OP ('original POS')
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3800 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 00:09:36 +00:00
ebanks ab84ed8c68 Fix for Mark: get rid of old program tags whose IDs clash with the recalibrator/realigner tag (including if the id has a .1 at the end, etc.). Keeping them around is dangerous because we don't know which one refers to the latest run of the tool on the bam.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3798 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-15 19:13:50 +00:00
hanna dfddf8fd75 - Bring the PaperGenotyper up to code.
- Remove some old debugging cruft regarding handling of threaded engine exceptions.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3796 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 22:31:21 +00:00
bthomas f65cba6b9a Adding support for shared file locking via a new class for file locking, FSLockWithShared. This will eventually take over for FSLock, the current file locking class - I'll work with Aaron to merge the tribble code that uses FSLock right now.
FYI: creating an exclusive lock on a file that does not exist will create that file as an empty file, and will NOT delete that file after the program terminates. So watch out if it's possible that the file you're locking does not exist - could end up leaving extra files that confuse users.  



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3795 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 20:45:51 +00:00
hanna a8caa20378 Previously the hierarchical microscheduler defensively coded around and reported exceptions of
the walker itself, but didn't do a great job of catching framework exceptions.  This became extremely
unfortunate in the case where walkers caused exceptions that manifested themselves in the framework,
such as when the walker opens more files than file handles are available.

Reworked the exception handling so that framework errors are treated like walker errors and the resulting
exception bubbles out of the walker.  Stack traces for threaded walkers are still convoluted and nasty.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3794 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 20:34:43 +00:00
ebanks bf384f48e1 Reverting previous change because it won't always work. More investigation needed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3793 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:13:17 +00:00
ebanks e4bfb06888 Check header type instead of rod type, since rod type will now be VC and not VCF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3792 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:10:09 +00:00
ebanks 0226412b11 Add GQ to list of genotype attributes for reg exp
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3791 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:01:11 +00:00
ebanks 78a4d8ec3d Removing more references to VCFRecord
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3790 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 16:34:15 +00:00
ebanks af23762778 Removing more references to VCFRecord
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3789 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 11:54:23 +00:00
ebanks 460283f6d2 No more manually converting VariantContexts to VCFRecords. You should be utilizing VCs and not VCFRecords.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3787 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 05:21:28 +00:00
ebanks 6b5c88d4d6 The GATK no longer writes vcf3.3; welcome to the world of vcf4.0. Needed to fix a few output bugs to get this to work, but it's looking great. Much more still to come. Guillermo: hopefully this doesn't break your local build too badly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3786 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 04:56:58 +00:00
chartl 9d2a485532 Update to AminoAcidTransition eval module
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3783 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 17:12:03 +00:00
rpoplin 3db7fbb5e9 Fix for added EOF in csv file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3781 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 16:09:48 +00:00
ebanks 9a05e8143d Move to 4.0 and away from VCFRecord.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3780 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 15:54:54 +00:00
ebanks 6442dabf94 Deleting/archiving as instructed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3779 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 15:23:50 +00:00
ebanks 7e7da75d27 Moving over to 4.0 and away from VCFRecord
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3778 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 14:07:10 +00:00
ebanks d896d03554 Moving VF to vcf 4.0. Still need to fix genotype filters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3777 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 11:39:51 +00:00
ebanks 1bef7dd170 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3775 348d0f76-0448-11de-a6fe-93d51630548a 2010-07-13 00:56:12 +00:00
depristo de969f7cc7 logger != null check
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3774 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 23:07:14 +00:00
depristo 2e445262f2 Promotion to . for variable numbers of arguments
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3773 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 22:53:53 +00:00
delangel 297f15a60c Protect ProduceBeagleInputWalker against evil users who feed to it VCF's with indels, no variation sites or other interesting markers: Write to Beagle input only in biallelic SNP sites since that's the only thing Beagle can do.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3772 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 20:54:42 +00:00
ebanks 52c534a8f2 Updating to VCF 4.0
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3770 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 20:18:30 +00:00
delangel 5992b79159 a) Simplify normalization code in ProduceBeagleInputWalker, as to always normalize, and use MathUtils.normalizeFromLog10 to do this.
b) Several improvements to BeagleOutputToVCFWalker:
1. If a Hapmap input track is provided (e.g. -B comp,VCF,file), Hapmap sites will be annotated with Hapmap Allele count and allele frequency (key ACH, AFH).
2. If probability of correct genotype is lower than ncthr (optional argument provided by user, default = 0.0), walker will keep original calls instead of using Beagle calls.
3. Instead of annotating just whether Beagle had modified a site, annotate instead HOW MANY genotypes in a site were actually changed by Beagle.

All three improvements are mostly for debugging and analysis only.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3769 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 19:54:58 +00:00
ebanks e50627a49e 1. Updated tests and added integration test for liftover code.
2. Updated liftover code (and scripts) to emit vcf 4.0 and no longer depend on VCFRecord.
3. Beagle walker now also emits vcf 4.0.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3767 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 17:58:18 +00:00
ebanks 2a7112302a More archiving
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3766 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 17:04:41 +00:00
ebanks 221e01fb27 deleting/archiving as instructed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3765 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 16:59:45 +00:00
ebanks 8086ab1f75 Pulled sample/header merging routines out of CombineVariants and into util classes. Added more generalized methods for retrieving samples. Updated the Beagle walkers to use these methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3764 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 16:51:54 +00:00
ebanks 0c4a32843c No longer uses VCFRecord
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3763 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 13:57:39 +00:00
ebanks f130d29318 No longer uses VCFRecord.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3762 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 13:34:10 +00:00
ebanks 0427f3554b Bug fix: valid fields were being stripped off the FORMAT for samples because String.match was used instead of String.equals. Also, please use VCFConstants from now on instead of hard-coding e.g. missing values into the code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3760 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 03:06:51 +00:00
ebanks fb717fe128 First pass needed to remove old VCF code: moving all VCF-related constants into a single unified class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3759 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 07:19:16 +00:00
ebanks 6b960bd9c5 Fix for Steve: genotype filters still want to see the values from the VC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3758 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 04:30:15 +00:00
depristo c3c66e853c Improvements for Jason
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3756 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 20:18:37 +00:00
ebanks 405be230d0 Various code improvements based on FindBugs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3755 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 15:04:48 +00:00
ebanks abaec13e38 Bug fix: if there are samples in the VCF but all of them are no-calls, we still need to emit GT for the FORMAT field to be on spec. Note that this is a holdover from 3.3 writing but can't easily be fixed there. Fortunately, that code is all going away soon...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3754 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 14:08:25 +00:00
chartl ea8fd506bf Update to PickSequenomProbes: Option to ignore mask sites within X bp of a variant (very useful for indels where dbSNP entries near the indel are almost always false SNP calls). Also fixed an integration test where the variant site itself, being in dbSNP, was represented as [N/C] rather than [A/C]. Added integration test for 1bp no-mask window.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3753 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 04:03:19 +00:00
depristo 179067e3f4 Support for . values in qual field
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3752 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 01:47:02 +00:00
depristo 45fb614296 Fixes to VE for obscure bug, as well as disabled integration test for CombineVariants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3749 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 00:13:07 +00:00
rpoplin 67f1589652 --fdr_filter_level isn't mandatory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3748 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 22:48:30 +00:00
rpoplin 5d39cd5db8 Added --fdr_filter_level to ApplyVariantCuts so that you can create beautiful tranche plots and also decide which tranche level to filter at. The previous version always filtered at the smallest tranche. The tranche filter names are appropriately added to the VCF header.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3747 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 22:44:10 +00:00
depristo 760aaeda88 Update to CombineVariants. Now splits merge options into variant and genotype options separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3746 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 20:09:48 +00:00
ebanks bd2ba3eb37 deal with very large known indels that fall off our ref context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3745 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 20:05:16 +00:00
aaron 12fecc8d8f remove the picard DbSNP ROD.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3743 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 17:46:00 +00:00
depristo 56a0c7ee6f All headers are now converted to VCF4 by default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3741 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 14:14:17 +00:00
ebanks 6e6ad36523 reallow MNP events through
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3740 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 06:26:52 +00:00
ebanks ed0d0d78fa corresponding fix for dealing with insertions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3739 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 05:25:03 +00:00
ebanks ada8c9931f We were never clipping the VCF-provided ref base off the left end of the alleles for insertions, so the reference allele was never null (and downstream walkers would fail). Didn't this get tested with insertions at some point?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3738 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 05:24:27 +00:00
ebanks 9a81f1d7ef Fixed this tool for chartl so that it now properly handles deletions. Added deletion case to integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3737 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:45:59 +00:00
ebanks 47a42b1507 trivial cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3736 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:42:32 +00:00
ebanks b7a3d1e61f Bug fix: if the FORMAT field consisted of just GT, we were exceptioning out. How did we not catch this until now?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3735 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:41:40 +00:00
ebanks 1c146aebe8 Fix logic bug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3734 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:32:46 +00:00
hanna 9fc05ac2ae eagerDecode is now false.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3733 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 22:51:48 +00:00
ebanks 4bc3ad2194 Shame on me: UG was emitting negative QUALs (-0) in all_bases mode. Thanks, Matt.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3732 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 20:30:22 +00:00
ebanks 30714ec8d9 As per quick chat with Richard Durban, don't increase the mapping quality of realigned reads too much; for now, arbitrarily increase the MQ by 10. We need to figure out a better solution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3731 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 20:12:59 +00:00
ebanks 8ff1a4b929 Don't try to clean reads that fail the PF, in preparation for Ryan
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3730 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 19:49:36 +00:00
depristo b934cc7554 Updates to fix some bugs in merger. Now able to merge into project wide indel VCF files. Integration teests coming tomorrow
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3727 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:16:33 +00:00
kshakir 7be8c35eb2 Workaround for scala trait erasing parameterized types:
- Requiring explicit @ClassType on parameterized fields in traits.
- Scatter / Gather functions are now abstract classes since @ClassType can't be used on parameterized fields with type parameters.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3726 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:15:10 +00:00
hanna 120f90da5b Interval support for ref walkers while streaming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3725 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:14:59 +00:00
hanna 773a72e6ea An initial fix for performance issues when filtering UG with new StratifiedAlignmentContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3724 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 01:07:46 +00:00
delangel be75b087ec a) Add input argument (-ncrate) to BeagleOutputToVCFWalker. If the genotype posterior error probability is higher than this threshold, we declare No-call at this genotype.
b) Add "OG" annotation to genotypes. If Beagle changes genotypes, this annotation gets the original genotype call, to ease performance  comparisons. If not, this annotation gets an empty value.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3723 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-06 18:33:28 +00:00
hanna 4213e05aeb Fix for sharding ref walkers via monolithic sharding. Introduces the potential bug (for
monolithic sharding only) that when traversing by read, map() function will not be called for loci
off the end of the reference.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3722 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-06 04:34:38 +00:00
aaron 86031f4034 part two: todo's in combine variants, fixes for InferredGeneticContext, and some other tests and clean-up.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3721 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 21:07:53 +00:00
ebanks 36edc60ccc Connected UG to the new comp track annotation system in VA. Also, when emit confidence is lower than call confidence (so that we emit records filtered with LowQual), add a corresponding FILTER header field to the VCF so that the validator doesn't complain.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3720 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 13:04:24 +00:00
aaron 3347d1ca7c part one of combining format and info header lines code into a single abstract class for Mark; plus some 'm' removals from access methods for Eric. Adding fixes for CombineVariants next.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3719 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 05:57:58 +00:00
ebanks e7220bc885 Variant Context simple merging routine should keep ID if one of the VCs has it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3718 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 01:10:15 +00:00
delangel 3016e1cf80 Fixes to increase robustness in vcf4 writer. We assume that only at most 1 base was clipped from beginning of allele encoding by reader, and improve the way we find if bases were clipped. We still cant deal with some corner cases, and duplicate records may follow, for example if a snp location is followed at the next base by an indel. Also, if we are reading form a 3.3 vcf and the reference is null (ie we have an insertion), the reference base is not computed correctly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3717 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-04 20:22:04 +00:00
ebanks 07945040f8 Set VariantFiltration's JEXL engine to silent for warning messages
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3716 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-04 18:11:19 +00:00
ebanks be8740b00d Another edge case in left alignment for indels: deal with cases when insertions are ambiguously placed at ends of reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3715 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-04 17:26:38 +00:00
weisburd f7593435eb Implemented decodeLoc(..)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3713 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 21:01:36 +00:00
depristo cd2e4b0a1e merging now very close to working. Bug todo in writer and vcf infrastructure. Can almost create merged snp and indel files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3712 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 20:09:25 +00:00
delangel b6bdd61283 a) Fix bug when multi-base reference is homopolymeric when writing a VCF4.0 variant context: computation of number of trailing bases was incorrect and we ended up with incorrect position.
b) Updated VCF4WriterTestWalker to take either VCF3 or VCF4 as inputs (this walker can also be used to convert from 3.3 to 4.0).
 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3711 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 15:19:42 +00:00
depristo 61e2b2e39b Nearly finalize merging capabilities for CombineVariants. Support for dealing with inconsistent indel alleles at loci. Improvements to Allele and removal of addAllele to MutableGenotype. We are close to being able to merge all of 1000 genomes -- snps and indels -- into a single combined vcf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3710 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 13:32:33 +00:00
hanna cab8394103 The sharding system now buffers reads, with a size determined by command-line argument. Will investigate whether/how this
impacts performance on low-pass data and, if it works well, will create a more automatic version of the tool.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3709 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 22:28:55 +00:00
aaron f967cae1aa tiny comment change
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3708 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 22:04:25 +00:00
aaron 3093a20a55 fixing VCF header format and info fields so that they propery emit the unbounded count value correctly for vcf4 or vcf3. Eric we should update the vcf4 spec page to indicate format fields are allowed to use the unbounded count as well (if this is true).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3707 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 22:02:16 +00:00
delangel 61c07c6f90 Fixes for missing key values that can create null pointer exceptions when reading from 3.3-generated variant contexts. Also, chop missing genotype fields correctly from right to left
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3706 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 20:17:03 +00:00
rpoplin 255b036fb5 Variant Recalibrator MLE EM algorithm is moved over to variational Bayes EM in order to eliminate problems with singularities when clustering in higher than two dimensions. Because of this there is no longer a number of Gaussians parameter. Wiki will be updated shortly with new recommended command.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3704 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 18:51:07 +00:00
aaron 4903d1fb4f fix for a parallelization issue: moving the creation of iterators outside of the sync block so we don't wait for RMD tracks to seek to the correct location. Thanks to Ben for providing the test case!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3703 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 16:37:02 +00:00
aaron 43ca595d15 VCF headers now can be set to a particular VCF version after creation, which converts the header lines to the appropriate encoding on output. Plus some clean-up of the code.
Also commented out the Tribble index out-of-date tests, the timing seems to be troublesome from the farm.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3702 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 05:32:14 +00:00
hanna 4995950d04 IndexedFastaSequenceFile is now in Picard; transitioning to that implementation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3701 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 04:40:31 +00:00
hanna c9d5345150 Redo StratifiedAlignmentContext to use ReadBackedPileup's stratification options.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3699 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 02:46:05 +00:00
delangel dc4715c9c6 Permit empty fields in INFO and FORMAT structures - not fully tested yet but at least failing cases before now pass. Also, corrected a bug where in case we were reading 3.3 VCF's, or VCFs with no original allele encodings, we'd always print 2 bases per allele.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3698 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 01:56:07 +00:00
depristo 5f2b2d860e Final stage of renaming
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3696 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 21:39:07 +00:00
depristo 6e7927a47d Continuing the renaming nightmare...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3695 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:25:01 +00:00
depristo 9d7d5f1747 Continuing the renaming nightmare...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3694 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:24:27 +00:00
depristo aa20c52b88 deleting vcf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3693 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:19:15 +00:00
depristo 4195fc5c4e renaming part 2...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3692 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:18:11 +00:00
depristo 6c9da5525d renaming starting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3691 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:16:51 +00:00
depristo b8d6a95e7a Preliminary commit of new VCFCombine, soon to be called CombineVariants (next commit) that support merging any number of VCF files via a general VC merge routine that support prioritization and merging of samples! It's now possible to merge the pilot1/2/3 call sets into a single (monster) VCF taking genotypes from pilot2, then pilot3, then pilot1 as needed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3690 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:13:03 +00:00
kshakir 178cf64a0c Refactored ArgumentDefinition to absorb functionality from ArgumentDefinition and ArgumentTypeDescriptor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3688 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 18:37:58 +00:00
chartl 569456850d Mark pointed out there's differentiation in the filter field. Rolling back.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3687 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 17:05:53 +00:00
chartl 52a474b27d Fixed an issue with VCF combine in sites like the following:
Broad: Filtered     BC: No call

These were being treated the same as

Broad: Call         BC: No call

Added some verbosity to separate them.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3686 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 16:49:31 +00:00
ebanks 944dbb94ce Refactored and generalized the database/comp annotations in VariantAnnotator. Now one can provide comp tracks as with VariantEval (e.g. compHapMap, comp1KG_CEU) and the INFO field will be annotated with the track name (without the 'comp') if the variant record overlaps a comp site (e.g. ...;1KG_CEU;...). This means that you can now pass 1kg calls to the Unified Genotyper and automatically have records annotated with their presence in 1kg.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3684 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 16:37:31 +00:00
ebanks 47c4a70ac1 It turns out that it is legitimately possible for there to be reads that won't overlap within a target interval for cleaning. While we don't want to attempt cleaning, we also don't want to fail.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3682 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 15:50:44 +00:00
ebanks ae33d8a2f2 I just wanted one more vote. It's settled: we die.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3681 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 14:00:56 +00:00
ebanks 8fb37f5f7a For Kiran: warn the user when the actual and vcf ref bases differ so that if an exception is generated later, he knows why. All: should we generate the actual exception here? Is there any reason to allow cases where the vcf record has a different ref base than the actual reference? I'd vote that we die here. Thoughts?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3680 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 13:56:16 +00:00
delangel d932322190 More necessary fixes for VCF4.0 - now results look more sensible in realistic, bigger VCF files produced by say Dindel and not just the small test VCF:
- Fixed and cleaned code to produce trailing and padding bases in alleles around indels.
- Deal better with missing fields.
Pending:
- Chopping missing fields at end of genotypes.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3679 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 02:59:30 +00:00
ebanks 12c0de6170 Added ability to clean using only known indels. Added integration test for it. Fixed vcf->vc conversion for indels which was busted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3678 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 01:20:56 +00:00
chartl 610cc7ae2b Cool package trick Kiran showed me. VariantEvaluator no longer public, AAT specifies the core package even though it lives in oneoffs. Disabled so integration tests pass.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3677 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 22:42:04 +00:00
chartl 4c6f4e41c6 Include making VariantEvaluator public within the package so my oneoffs can be seen (not included in previous submit specifically because I didn't want to break the build by changing anything in core...the road to hell is paved with good intentions)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3676 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 22:26:52 +00:00
chartl 9ac13b8f5d Name and body change for this module to reflect local code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3675 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 21:45:26 +00:00
aaron 844cb2ed33 fixing a bug that Eric found with RODs for reads, where some records could be omitted. Sorry Eric!
Also putting more tolerance into the timing on the tibble index tests (that check to make sure we're deleting out of date indexes, and not deleting perfectly good indexes).  It seems that some of the farm nodes aren't great with a stopwatch.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3674 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 21:38:55 +00:00
chartl 101c27294d Comment this guy out so we build again. (Hate it when my repository goes all funky.)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3673 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 21:16:33 +00:00
chartl 3017f82550 Initial commit of items for analyzing amino acid transitions in variant eval. Blew up my subversion by coding locally while i did not have internet. I hope this doesn't bust any integrationtests since I changed no existing code but...who knows. Crossing my fingers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3672 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 20:57:18 +00:00
delangel e3fb4d5c70 Intermediate checkin, just to fix null pointer exception that happened when merging implementation with latest VCF4 decoder - field ORIGINAL_ALLELE_LIST in vc shouldn't be written in infoFields structure since this won't be output to file and there is no legal structure under this key.
Base encoding for complex events is still brittle and most probably still has issues, fixes upcoming.
 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3671 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 20:57:09 +00:00
ebanks baf9479c35 An addition for Sendu since he can't seem to tell when his CountCovariate jobs die in the middle of writing the CSVs. We now write an EOF marker at the end of the covariates table and look for it when reading in the file in TableRecalibrationWalker. By default, we warn the user if the EOF marker isn't present, but we exception out if the user provides the --fail_with_no_eof_marker option.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3670 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 18:50:07 +00:00
delangel 3ca2b7374b Fixes to better deal with the "Type" and "Number" field in the INFO and FORMAT header lines in VCF4.0. We now record these fields and provide appropriate conversions. This is the first version that passes fully the VCF validator.
Also, moved the flag indicating VCF4.0 to the VCFWriter constructor.

 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3669 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 16:43:00 +00:00
ebanks 801b47c6e9 For Sendu: a similar addition to the Indel Genotyper allowing it to emit a metrics file (which for now consists only of # of normal/tumor calls made)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3668 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 13:19:17 +00:00
ebanks ddf87e61c2 For Sendu: optionally emit a metrics file with callability info (including number of actual calls made) from UG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3667 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 12:57:28 +00:00
ebanks 929e5b9276 Fix possible null pointer exception
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3666 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 09:01:18 +00:00
hanna 2953c9f069 Efficiency improvement requested by the Picard team in IndexedFastaSequenceFile: improve the memory efficiency
(and loading time) of long reference sequences by better controlling the input buffer size.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3665 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 07:22:07 +00:00
delangel ed71e53dd4 1) Initial complete version of VCF4 writer. There are still issues (see below) but at least this version is fully functional. It incorporates getting rid of intermediate VCFRecord so we now operate from VariantContext objects directly to VCF 4.0 output.
See VCF4WriterTestWalker for usage example: it just amounts to adding
vcfWriter.add(vc,ref.getBases()) in walker.

add() method in VCFWriter is polymorphic and can also take a VCFRecord, lthough eventually this should be obsolete.
addRecord is still supported so all backward compatibility is maintained.

Resulting VCF4.0 are still not perfect, so additional changes are in progress. Specifically:
a) INFO codes of length 0 (e.g. HM, DB) are not emitted correctly (they should emit just "HM" but now they emit "HM=1").
b) Genotype values that are specified as Integer in header are ignored in type and are printed out as Doubles.

Both issues should be corrected with better header parsing.

2) Check in ability of Beagle to mask an additional percentage of genotype likelihoods (0 by default), for testing purposes.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3664 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 23:54:38 +00:00
ebanks 4a451949ba add parallel option to target creator for masking out reads with bad mates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3663 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 22:13:25 +00:00
chartl 20f5fdbcf7 Changes to MVC to make the the header of its output VCF compliant with spec (give expected # of values for info field annotations)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3660 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 18:33:23 +00:00
aaron 62d22ff1aa adding the original allele list to a variant context (as the annotation ORIGINAL_ALLELE_LIST), in the case where the set alleles are the result of clipping. Added tests for both cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3658 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 17:23:46 +00:00
ebanks 1292c96e29 The cleaner now adds the OC (original cigar) and OS (original alignment start) tags as appropriate to reads that get realigned; this feature can be turned off. Also, improved integration tests (sorry, Kiran!).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3657 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 16:46:47 +00:00