Commit Graph

1055 Commits (e8bceb1eaa4a1e64f420a23f4e225dc8271ca362)

Author SHA1 Message Date
Mark DePristo dd65ba5bae @Hidden for DocumentationTest and GATKDocsExample 2011-09-23 09:03:37 -04:00
Mark DePristo dfce301beb Looks for @Hidden annotation on all classes and excludes them from the docs 2011-09-23 09:03:04 -04:00
Mark DePristo 106a26c42d Minor file cleanup 2011-09-23 08:25:20 -04:00
Mark DePristo a9f073fa68 Genotype merging unit tests for simpleMerge
-- Remaining TODOs are all for GdA
2011-09-23 08:24:49 -04:00
Mark DePristo 4397ce8653 Moved removePLs to VariantContextUtils 2011-09-23 08:24:20 -04:00
Eric Banks a8e0fb26ea Updating md5 because the file changed 2011-09-23 07:33:20 -04:00
Mark DePristo c49cc623de Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-22 17:26:21 -04:00
Mark DePristo dab7232e9a simpleMerge UnitTest for not annotating and annotating to different info key 2011-09-22 17:26:11 -04:00
Mark DePristo 30ab3af0c8 A few more simpleMerge UnitTest tests for filtered vcs 2011-09-22 17:14:59 -04:00
Mark DePristo 5cf82f9236 simpleMerge UnitTest tests filtered VC merging 2011-09-22 17:05:12 -04:00
Mark DePristo 46ca33dc04 TestDataProvider now can be named 2011-09-22 17:04:32 -04:00
Mauricio Carneiro 96c875399c Merging many bug fixes to reduce reads 2011-09-22 17:04:11 -04:00
Mauricio Carneiro 39b54211d0 Fixed hard clipping soft clipped bases after hard clips
if soft clipped bases were after a hard clipped section of the read, the hard clip was clipping the left soft clip tail as if it were a right tail. Mayhem.
2011-09-22 15:46:55 -04:00
Mark DePristo 68da555932 UnitTest for simpleMerge for alleles 2011-09-22 15:16:37 -04:00
Mauricio Carneiro 1acf7945c5 Fixed hard clipped cigar and alignment start
* Hard clipped Cigar now includes all insertions that were hard clipped and not the deletions.
* The alignment start is now recalculated according to the new hard clipped cigar representation
2011-09-22 14:51:14 -04:00
Eric Banks 80d7300de4 Unit test was passing in FORMAT as one of the sample names. There used to be a hack in the VCFHeader to check for this and remove it and I couldn't figure out why, but now I know. Hack was removed and now the unit test passes in only the sample names as per the contract. 2011-09-22 13:28:42 -04:00
Mauricio Carneiro 4e9020c9f7 Fixed alignment start for hard clipping insertions 2011-09-22 13:28:25 -04:00
Eric Banks 9c1728416c Revert "Updating md5 for fixed file" because this was fixed properly in unstable (but will break SnpEff if put into Stable).
This reverts commit 6b4182c6ab3e214da4c73bc6f3687ac6d1c0b72c.
2011-09-22 13:16:42 -04:00
Eric Banks 888d8697b1 Merged bug fix from Stable into Unstable 2011-09-22 13:16:31 -04:00
Eric Banks 15a410b24b Updating md5 for fixed file 2011-09-22 13:15:41 -04:00
Mark DePristo ba5f83fee2 start of VariantContextUtils UnitTest
-- tests rsID merging
2011-09-22 12:10:39 -04:00
Mark DePristo 93dd1faa5f Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-22 11:20:10 -04:00
Mark DePristo a05c959e5a Empty unit tests for VariantContextUtils
-- will be expanded over the day
2011-09-22 11:20:07 -04:00
Mark DePristo 3fdee2b9ed Merge from stable into unstable 2011-09-22 11:19:43 -04:00
Christopher Hartl 4f4a0fc38a Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git 2011-09-22 11:01:58 -04:00
Christopher Hartl 982c47bfa7 Remove duplicate effort in ReadUtils (with apologies to Mauricio)
Big (but not major) cleanup of code in ILG - mostly excising the old likelihood model
Activated the early-abort check for ILG. I think it should be better this way.
2011-09-22 10:58:26 -04:00
Mark DePristo c514df6d18 Merge of stable into unstable 2011-09-22 10:34:27 -04:00
Mark DePristo f81a41b889 Updating MD5s for CombineVariants
-- Old version had broken RSIDs, new version is fixed.  No longer see rs1234,. as it is now just rs1234
2011-09-22 10:30:25 -04:00
Eric Banks b8ea9ceb68 Adding integration test that uses the -V:dbsnp binding to make sure it won't fail later on if someone messes with Tribble. 2011-09-21 22:43:31 -04:00
Eric Banks 8f8b59a932 My interpretation of the VCF spec is that the FORMAT field should only be present if there is genotype/sample data. So the VCFCodec now throws an exception when it encounters such a case. I had to fix one of the integration test VCFs. 2011-09-21 22:23:28 -04:00
Christopher Hartl dc96f6da79 Merge branch 'master' of ssh://chartl@gsa2/humgen/gsa-scr1/chartl/dev/git 2011-09-21 18:18:41 -04:00
Christopher Hartl f9cdc119af Added a method to ReadUtils that converts reads of the form 10S20M10S to 40M (just unclips the soft-clips).
Be careful when using this - if you're writing a bam file it will be potentially written out of order (since the previous alignment start was at the M, not the S).
2011-09-21 18:16:42 -04:00
Christopher Hartl faff6e4019 Failed to commit changes to the GATKReport required for more easy access when using the files as data sources (read: histograms) for walkers 2011-09-21 18:15:23 -04:00
Mauricio Carneiro 96768c8a18 Sending latest bug fixes to Reduce Reads to the main repository 2011-09-21 17:43:11 -04:00
Mauricio Carneiro 70335b2b0a Hard clipping soft clipped reads to fix misalignments.
Pre-softclipped reads (with high qual) are a complicated event to deal with in the Reduced Reads environment. I chose to hard clip them out for now and added a todo item to bring them back on in the future, perhaps as a variant region.
2011-09-21 17:12:01 -04:00
Christopher Hartl ef05827c7b Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-21 16:40:47 -04:00
Christopher Hartl 3b51d9106a Adding in likelihood calculations for mendelian violations. Also fixing a minor and rare bug in SelectVariants when specifying family structure on the command line. 2011-09-21 16:40:29 -04:00
Mark DePristo 04968c88b3 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-21 15:43:25 -04:00
Mark DePristo 6bcfce225f Fix for dynamic type determination for bgzip files
-- GZipInputStream handles bgzip files under linux, but not mac
-- Added BlockCompressedInputStream test as well, which works properly on bgzip files
2011-09-21 15:39:19 -04:00
Mark DePristo 9f6f0c443c Marginally cleaner isVCFStream() function
-- cleanup trying to debug minor bug.  Failed to fix the bug, but the code is nicer now
2011-09-21 15:25:01 -04:00
Ryan Poplin 5fef6dc5d0 Merged bug fix from Stable into Unstable 2011-09-21 15:23:06 -04:00
Ryan Poplin 2585fc3d6c Updating Rscript path doc text for Broad users 2011-09-21 15:22:26 -04:00
Mark DePristo 74f9ccf6dd Merge 2011-09-21 11:30:11 -04:00
Mark DePristo 6592972f82 Putative fix for BAQ array out of bounds
-- Old code required qual to be <64, which isn't strictly necessary.  Now uses the Picard SAMUtils.MAX_PHRED_SCORE constant
-- Unittest to enforce this behavior
2011-09-21 11:25:08 -04:00
Eric Banks 174859fc68 Don't allow whitespace in the INFO field 2011-09-21 11:14:54 -04:00
Mark DePristo ecc7f34774 Putative fix for BAQ problem. 2011-09-21 11:09:54 -04:00
Mark DePristo 7d11f93b82 Final bugfix for CombineVariants
-- Now handles multiple records at a site, so that you don't see records like set=dbsnp-dbsnp-dbsnp when combining something with dbsnp
-- Proper handling of ids.  If you are merging files with multiple ids for the same record, the ids are merged into a comma separated list
2011-09-21 10:58:32 -04:00
Mark DePristo a91ac0c5db Intermediate commit of bugfixes to CombineVariants 2011-09-21 10:15:05 -04:00
David Roazen b04d8eab55 Merged bug fix from Stable into Unstable 2011-09-20 17:24:14 -04:00
Mauricio Carneiro 758ecf2d43 Bringing latest updates of ReduceReads to the master repository 2011-09-20 16:35:09 -04:00
David Roazen d9ea764611 SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file.
This change is urgently required for production, which is why it's going into Stable+Unstable
instead of just Unstable.

The keys for the SnpEff version and command header lines in the VCF file output by
VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally
different from the keys for those same lines in the SnpEff output file (SnpEffVersion
and SnpEffCmd), so that output files from VariantAnnotator won't be confused
with output files from SnpEff itself.
2011-09-20 16:30:55 -04:00
Mark DePristo bffd3cca6f Bug fix for reduced read; only adds regular bases for calculation
-- No longer passes on deletions for genotyping
2011-09-20 15:07:06 -04:00
Mark DePristo a1b4cafe7a Bug fix for NPE when timer wasn't initialized 2011-09-20 13:59:59 -04:00
Mark DePristo b7511c5ff3 Fixed long-standing bug in tribble index creation
-- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index.  This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write
-- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary.  This can be used conveniently everywhere, and is what's written into the Tribble index
-- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils
-- VCFWriter now requires the master sequence dictionary
-- Updated walkers that create VCFWriters to provide the master sequence dictionary
2011-09-20 10:53:18 -04:00
Mark DePristo 230e16d7c0 Merge branch 'master' into rodrewrite 2011-09-20 06:54:18 -04:00
Mark DePristo aa8afa3899 Merge 2011-09-19 21:16:47 -04:00
Mauricio Carneiro 56106d54ed Changing ReadUtils behavior to comply with GenomeLocParser
Now the functions getRefCoordSoftUnclippedStart and getRefCoordSoftUnclippedEnd will return getUnclippedStart if the read is all contained within an insertion. Updated the contracts accordingly. This should give the same behavior as the GenomeLocParser now.
2011-09-19 14:00:00 -04:00
Mauricio Carneiro 080c957547 Fixing contracts for SoftUnclippedEnd utils
Now accepts reads that are entirely contained inside an insertion.
2011-09-19 13:53:53 -04:00
Mauricio Carneiro 5e832254a4 Fixing ReadAndInterval overlap comments. 2011-09-19 13:28:41 -04:00
Christopher Hartl ecb8466662 Merged bug fix from Stable into Unstable 2011-09-19 12:32:08 -04:00
Christopher Hartl 8143def292 Fix the -T argument in the DepthOfCoverage docs
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 12:31:47 -04:00
Christopher Hartl 034b868588 Revert "Fix the -T argument in the DepthOfCoverage docs"
This reverts commit 0994efda998cf3a41b1a43696dbc852a441d5316.
2011-09-19 12:16:07 -04:00
Mark DePristo cfde0e674b Merge branch 'sgintervals' 2011-09-19 12:02:41 -04:00
Mark DePristo 3e93f246f7 Support for sample sets in AssignSomaticStatus
-- Also cleaned up SampleUtils.getSamplesFromCommandLine() to return a set, not a list, and trim the sample names.
2011-09-19 11:40:45 -04:00
Mark DePristo 41ffb25b74 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-19 10:55:18 -04:00
Christopher Hartl ca1b30e4a4 Fix the -T argument in the DepthOfCoverage docs
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 10:29:06 -04:00
Mark DePristo 4ad330008d Final intervals cleanup
-- No functional changes (my algorithm wouldn't work)
-- Major structural cleanup (returning more basic data structures that allow us to development new algorithm)
-- Unit tests for the efficiency of interval partitioning
2011-09-19 10:19:10 -04:00
Mark DePristo 6ea57bf036 Merge branch 'master' into sgintervals 2011-09-19 09:50:19 -04:00
Mark DePristo 6bd42c053d Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-18 20:18:39 -04:00
Roger Zurawicki 091c7197cd Fixed memory leak and bug with deletions in clipping
The ClippingOp clip cigar function would run into a endless loop if the parameter were out of the reads range, I stopped the bug.
* There is no check to make sure the read coordinate are covered by the read though
When Hard clipping to interval, I added a check for deletions.
NOTE: method works for NA12878 WEx but needs to be more thoroughly tested/optimized
2011-09-18 19:21:51 -04:00
Guillermo del Angel 7fa1e237d9 Forgot to git stash pop new MD5's for CombineVariants integration test 2011-09-16 12:53:54 -04:00
Guillermo del Angel e7b9a009b7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-16 12:48:30 -04:00
Menachem Fromer b2e8e11128 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-16 00:52:27 -04:00
Christopher Hartl 57b3efa2e2 Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-15 21:06:38 -04:00
Christopher Hartl 939babc820 Updating formating for ValidationAmplicons GATK docs 2011-09-15 21:05:51 -04:00
Christopher Hartl 9fdf1f8eb6 Fix some doc formatting for Depth of Coverage 2011-09-15 21:05:22 -04:00
Menachem Fromer e6e9b08c9a Must provide alleles VCF to UGCallVariants 2011-09-15 18:51:09 -04:00
David Roazen d78e00e5b2 Renaming VariantAnnotator SnpEff keys
This is to head off potential confusion with the output from the SnpEff tool itself,
which also uses a key named EFF.
2011-09-15 17:42:15 -04:00
Eric Banks 1971fb35d7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-15 16:55:33 -04:00
Eric Banks 9dc6354130 Oops didn't mean to touch this test before 2011-09-15 16:55:24 -04:00
Ryan Poplin 2a8b8efd2f Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-15 16:26:35 -04:00
Ryan Poplin 2f58fdb369 Adding expected output doc to CountCovariates 2011-09-15 16:26:11 -04:00
Eric Banks fd1831b4a5 Updating docs to include more details 2011-09-15 16:25:03 -04:00
Eric Banks 6d02a34bfb Updating docs to include output 2011-09-15 16:17:54 -04:00
Eric Banks 4ef6a4598c Updating docs to include output 2011-09-15 16:10:34 -04:00
Eric Banks fe474b77f8 Updating docs so printing looks nicer 2011-09-15 16:05:39 -04:00
Eric Banks f04e51c6c2 Adding docs from Andrey since his repo was all screwed up. 2011-09-15 15:38:56 -04:00
Guillermo del Angel 86480b2e13 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-15 15:31:07 -04:00
Eric Banks d369d10593 Adding documentation before the release for GATK wiki page 2011-09-15 13:56:23 -04:00
Eric Banks 202405b1a1 Updating the FunctionalClass stratification in VariantEval to handle the snpEff annotations; this change really needs to be in before the release so that the pipeline can output semi-meaningful plots. This commit maintains backwards compatibility with the crappy Genomic Annotator output. However, I did clean up the code a bit so that we now use an Enum instead of hard-coded values (so it's now much easier to change things if we choose to do so in the future). I do not see this as the final commit on this topic - I think we need to make some changes to the snpEff annotator to preferentially choose certain annotations within effect classes; Mark, let's chat about this for a bit when you get back next week. Also, for the record, I should be blamed for David's temporary commit the other day because I gave him the green light (since when do you care about backwards compatibility anyways?). In any case, at least now we have something that works for both the old and new annotations. 2011-09-15 13:52:31 -04:00
David Roazen 1e682deb26 Minor html-formatting-related documentation fix to the SnpEff class. 2011-09-15 13:07:50 -04:00
Guillermo del Angel a942fa38ef Refine the way we merge records in CombineVariants of different types. As of before, two records of different types were not combined and were kept separate. This is still the case, except when the alleles of one record are a strict subset of alleles of another record. For example, a SNP with alleles {A*,T} and a mixed record with alleles {A*,T, AAT} are now combined when start position matches. 2011-09-15 10:22:28 -04:00
David Roazen 3db457ed01 Revert "Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames"
After discussing this with Mark, it seems clear that the old version of the
VariantEval FunctionalClass stratification is preferable to this version.
By reverting, we maintain backwards compatibility with legacy output files
from the old GenomicAnnotator, and can add SnpEff support later without
breaking that backwards compatibility.

This reverts commit b44acd1abd9ab6eec37111a19fa797f9e2ca3326.
2011-09-14 10:47:28 -04:00
David Roazen e0c8c0ddcb Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames
This is a temporary and hopefully short-lived solution. I've modified
the FunctionalClass stratification to stratify by effect impact as
defined by SnpEff annotations (high, moderate, and low impact) rather
than by the silent/missense/nonsense categories.

If we want to bring back the silent/missense/nonsense stratification,
we should probably take the approach of asking the SnpEff author
to add it as a feature to SnpEff rather than coding it ourselves,
since the whole point of moving to SnpEff was to outsource genomic
annotation.
2011-09-14 07:09:47 -04:00
David Roazen 1213b2f8c6 SnpEff 2.0.2 support
-Rewrote SnpEff support in VariantAnnotator to support the latest SnpEff release (version 2.0.2)
-Removed support for SnpEff 1.9.6 (and associated tribble codec)
-Will refuse to parse SnpEff output files produced by unsupported versions (or without a version tag)
-Correctly matches ref/alt alleles before annotating a record, unlike the previous version
-Correctly handles indels (again, unlike the previous version
2011-09-14 07:09:47 -04:00
Guillermo del Angel 5b1bf6e244 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-13 17:04:43 -04:00
Guillermo del Angel c6672f2397 Intermediate (but necessary) fix for Beagle walkers: if a marker is absent in the Beagle output files, but present in the input vcf, there's no reason why it should be omitted in the output vcf. Rather, the vc is written as is from the input vcf 2011-09-13 16:57:37 -04:00
Mark DePristo edf29d0616 Explicit info message about uploading S3 log 2011-09-12 22:16:52 -04:00
Mark DePristo 2316b6aad3 Trying to fix problems with S3 uploading behind firewalls
-- Cannot reproduce the very long waits reported by some users.
-- Fixed problem that exception might result in an undeleted file, which is now fixed with deleteOnExit()
2011-09-12 22:02:42 -04:00
Matt Hanna 64707c33bb Merged bug fix from Stable into Unstable 2011-09-12 21:54:11 -04:00
Matt Hanna e63d9d8f8e Mauricio pointed out to me that dynamic merging the unmapped regions of multiple BAMs ('-L unmapped' with a BAM list)
was completely broken.  Sorry about this!  Fixed.
2011-09-12 21:50:59 -04:00
Eric Banks ec4b30de6d Patch from Laurent: typo leads to bad error messages. 2011-09-12 14:45:53 -04:00
David Roazen 9d9d438bc4 New VariantAnnotatorEngine capability: an initialize() method for all annotation classes.
All VariantAnnotator annotation classes may now have an (optional) initialize() method
that gets called by the VariantAnnotatorEngine ONCE before annotation starts.

As an example of how this can be used, the SnpEff annotation class will use the initialize()
method to check whether the SnpEff version number stored in the vcf header is a supported
version, and also to verify that its required RodBinding is present.
2011-09-12 13:00:53 -04:00
Ryan Poplin 981b78ea50 Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts. 2011-09-12 12:17:43 -04:00
Ryan Poplin 60ebe68aff Fixing issue in VariantEval in which insertion and deletion events weren't treated symmetrically. Added new option to require strict allele matching. 2011-09-12 09:43:23 -04:00
Guillermo del Angel 9344938360 Uncomment code to add deleted bases covering an indel to per-sample genotype reporting, update integration tests accordingly 2011-09-10 19:41:01 -04:00
Guillermo del Angel b399424a9c Fix integration test affected by non-calling all-zero PL samples, and add a more complicated multi-sample integration test from a phase 1 case, GBR with mixed technologies and complex input alleles 2011-09-09 20:44:47 -04:00
Guillermo del Angel e95d484757 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-09 18:31:14 -04:00
Guillermo del Angel a807205fc3 a) Minor optimization to softMax() computation to avoid redundant operations, results in about 5-10% increase in speed in indel calling.
b) Added (but left commented out since it may affect integration tests and to isolate commits) fix to per-sample DP reporting, so that deletions are included in count.
c) Bug fix to avoid having non-reference genotypes assigned to samples with PL=0,0,0. Correct behavior should be to no-call these samples, and to ignore these samples when computing AC distribution since their likelihoods are not informative.
2011-09-09 18:00:23 -04:00
Mauricio Carneiro 9e650dfc17 Fixing SelectVariants documentation
getting rid of messages telling users to go for the YAML file. The idea is to not support these anymore.
2011-09-09 16:25:31 -04:00
Mark DePristo 72536e5d6d Done 2011-09-09 15:44:47 -04:00
Mark DePristo 3c8445b934 Performance bugfix for GenomeLoc.hashcode
-- old version overflowed so most GenomeLocs had 0 hashcode.  Now uses or not plus to combine
2011-09-09 14:25:37 -04:00
Mark DePristo c6436ee5f0 Whitespace cleanup 2011-09-09 14:24:29 -04:00
Mark DePristo 87dc5cfb24 Whitespace cleanup 2011-09-09 14:23:42 -04:00
Ryan Poplin 1953edcd2d updating Validate Variants deletion integration test 2011-09-09 13:39:08 -04:00
Ryan Poplin 9ada9b3ed4 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-09 13:15:36 -04:00
Ryan Poplin 354529bff3 adding Validate Variants integration test with a deletion 2011-09-09 13:15:24 -04:00
Ryan Poplin 91c949db74 Fixing ValidateVariants so that it validates deletion records. Fixing GATKdocs. 2011-09-09 12:57:14 -04:00
Mark DePristo 06cb20f2a5 Intermediate commit cleaning up scatter intervals
-- Adding unit tests to ensure uniformity of intervals
2011-09-09 12:56:45 -04:00
Eric Banks 51eb95d638 Missed these tests before 2011-09-09 11:46:37 -04:00
Eric Banks 6ad8943ca0 CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly. 2011-09-09 09:45:24 -04:00
Mark DePristo 507574b1c8 Merge branch 'cancer' 2011-09-08 16:10:02 -04:00
Mark DePristo 48461b34af Added TYPE argument to print out VariantType 2011-09-08 15:01:13 -04:00
Eric Banks eaaba6eb51 Confirming that when stratifying by sample in VE the monomorphic sites for a given sample are not counted for the relevant metrics. Adding integration test to cover it. 2011-09-08 13:17:34 -04:00
Ryan Poplin 2636d216de Adding indel vqsr integration test 2011-09-08 10:38:13 -04:00
Ryan Poplin 9cba1019c8 Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap 2011-09-08 09:25:13 -04:00
Ryan Poplin e0020b2b29 Fixing PrintRODs. Now has input and only prints out one copy of each record 2011-09-08 08:58:37 -04:00
Ryan Poplin 29c968ab60 clean up 2011-09-08 08:42:43 -04:00
Ryan Poplin 59841f8232 Fixing genotype given alleles for indels. Only take the records that start at this locus. 2011-09-08 08:41:16 -04:00
Mark DePristo cd2c511c4a GCF improvements
-- Support for streaming VCF writing via the VCFWriter interface
-- GCF now has a header and a footer.  The header is minimal, and contains a forward pointer to the position of the footer in the file.
-- Readers now read the header, and then jump to the footer to get the rest of the "header" information
-- Version now a field in GCF
2011-09-07 23:28:46 -04:00
Mark DePristo fe5724b6ea Refactored indexing part of StandardVCFWriter into superclass
-- Now other implementations of the VCFWriter can easily share common functions, such as writing an index on the fly
2011-09-07 23:27:08 -04:00
Mark DePristo 01b6177ce1 Renaming GVCF -> GCF 2011-09-07 17:10:56 -04:00
Mark DePristo b220ed0d75 Merge branch 'master' into rodrewrite 2011-09-07 17:05:35 -04:00
Guillermo del Angel 45d54f6258 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-07 16:49:49 -04:00
Guillermo del Angel 9604fb2ba3 Necessary but not sufficient step to fix GenotypeGivenAlleles mode in UG which is now busted 2011-09-07 16:49:16 -04:00
Mark DePristo 2ded027762 Removed dysfunctional tranches support from VariantEval 2011-09-07 16:09:24 -04:00
Eric Banks aa9e32f2f1 Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark. 2011-09-07 15:48:06 -04:00
Mark DePristo d7e355b4b6 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-07 14:54:16 -04:00
Mark DePristo 9127849f5d BugFix for unit test 2011-09-07 14:54:10 -04:00
Eric Banks 3a04955a30 We already had isPolymorphic and isMonomorphic in the VariantContext, but the implementation was incorrect for many edge cases (e.g. sites-only files, sites with samples who were no-called). Fixing. Moving on to VE now. 2011-09-07 14:01:42 -04:00
Guillermo del Angel 743bf7784c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-07 13:21:26 -04:00
Guillermo del Angel 5f22ef9a8c Added missing javadoc info to Beagle arguments 2011-09-07 13:21:11 -04:00
Mark DePristo 3bcbfa6e06 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-07 13:13:17 -04:00
Mark DePristo 430da23446 At least 2 minutes must pass before a status message is printed, further stabilizing time estimates 2011-09-07 13:13:07 -04:00
Mauricio Carneiro 6857d0324e Merge branch 'master' into rr 2011-09-07 12:59:08 -04:00
Mark DePristo 7e9e20fed0 Forgot to delete previous call 2011-09-07 12:54:52 -04:00
Mark DePristo d23d620494 Pushing traversal engine timer start to as close to actual start as possible
-- Should make initial timings more accurate
2011-09-07 12:52:33 -04:00
Mark DePristo 6ff432e1f2 BugFix for TF argument to VariantEval, actually making it work properly 2011-09-07 12:50:17 -04:00
Mauricio Carneiro 131cb7effd Bringing Reduce Reads bug fixes to the main repository 2011-09-07 12:25:53 -04:00
Mark DePristo a1920397e8 Major bugfix for per sample VariantEval
-- per sample stratification was not being calculated correctly.  The alt allele was always remaining, even if the genotype of the sample was hom-ref.  Although conceptually fine, this breaks the assumptions of all of the eval modules, so per sample stratifications actually included all variants for everything.  Eric is going to fix the system in general, so this commit may break the build.
2011-09-07 12:18:11 -04:00
Mark DePristo a02636a1ac Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/ebanks/Sting_rodrefactor into rodrewrite 2011-09-07 10:50:00 -04:00
Mark DePristo d5641cfac5 Merge branch 'variantEvalST' 2011-09-07 10:44:23 -04:00
Mark DePristo 2f4cf82e3b VariantEval cleanup. Added VariantType Stratification
-- ArrayList are List where possible
-- states refactored into VariantStratifier base class (reduces many lines of duplicate code)
-- Added VariantType stratification that partitions report by VariantContext.Type
2011-09-07 10:43:53 -04:00
Christopher Hartl 436f6eb52b Reverting Eric's change and pushing in some command-line-option documentation. 2011-09-07 08:53:30 -04:00
Eric Banks 1ef8a1750a I asked nicely and got nothing. Then I threatened and still got nothing. So I am carrying through on my threats. Guillermo, you have a short reprieve because you were away on vacation, but let's get yours done tomorrow afternoon. 2011-09-06 21:07:49 -04:00
Eric Banks da9c8ab386 Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly. 2011-09-06 20:39:42 -04:00
Mark DePristo 3db7ecb920 ReducedRead flag cached in GATKSAMRecord. 20% performance improvement 2011-09-06 15:11:38 -04:00
Roger Zurawicki 47607a7eff Fixed bug where deletions messed up interval clipping
- Instead of using readLength, the ReadUtil function are used to get a proper read coordinate
 - Added debug info in interval clipping ( with -dl)

  NOTE: method might not be safe for production and checks need to be added to the ClippingOp code
2011-09-06 14:25:57 -04:00
Khalid Shakir 0adb388dee Fixed bug in SelectVariants that was annotating sample_file / exclude_sample_file as @Argument instead of @Input meaning they weren't tracked in Queue.
Updates for HybridSelectionPipeline:
- Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5.
- If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds.
- Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches.
- Removed GSA_PRODUCTION_ONLY flag from indel caller.
- Updated indel hard filters based on delangel's analysis.
- Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.
2011-09-06 12:41:46 -04:00
Mark DePristo d471617c65 GATK binary VCF (gvcf) prototype format for efficiency testing
-- Very minimal working version that can read / write binary VCFs with genotypes
-- Already 10x faster for sites, 5x for fully parsed genotypes, and 1000x for skipping genotypes when reading
2011-09-02 21:15:19 -04:00
Mark DePristo 048202d18e Bugfix for cached quals 2011-09-02 21:13:28 -04:00
Mark DePristo 03aa04e37c Simple refactoring to make formating functions public 2011-09-02 21:13:08 -04:00
Mark DePristo 124ef6c483 MISSING_VALUE now gets defaultValue in getAttribute functions 2011-09-02 21:12:28 -04:00
Mark DePristo 82f2131777 Simplied getAttributeAsX interfaces
-- Removed versions getAttribriteAsX(key) that except on not having the value.
-- Removed version that getAttributeAsXNoException(key)
-- The only available assessors are now getAttributeAsX(key, default).
-- This single accessors properly handle their argument types, so if the value is a double it is returned directly for getAttributeAsDouble(), or if it's a string it's converted to a double.  If the key isn't found, default is returned.
2011-09-02 12:27:11 -04:00
Mauricio Carneiro 08ae6c0c61 ReadClipper is now handling unmapped reads 2011-09-02 11:32:30 -04:00
Mark DePristo c57198a1b9 Optimizations in VCFCodec
-- Don't create an empty LinkedHashSet() for PASS fields.   Just return Collections.emptySet() instead.
-- For filter fields with actual values, returns an unmodifiableSet instead of one that can be changed
2011-09-02 08:46:17 -04:00
Mark DePristo c3ea96d856 Removing many unused functions of unquestionable purpose 2011-09-02 08:42:01 -04:00
Eric Banks d241f0e903 Adding docs for the pcr error rate argument. 2011-09-01 21:57:02 -04:00
Eric Banks 827fe6130c Adding hidden printing option. Also, always run UG in mode GENOTYPE_GIVEN_ALLELES given that we don't actually test for the correct alleles (otherwise UG may choose a different allele and we may falsely validate the wrong one). 2011-09-01 11:40:35 -04:00
Mark DePristo 1aa4b12ff0 Reduced the number of combinations being tested here, which was overkill 2011-09-01 10:42:43 -04:00
Mark DePristo ac49b8d26b Conditional support for PerformanceTrackingQuerySource to measure Tribble / GATK bridge performance
-- Removed DEBUG option, instead use MEASURE_TRIBBLE_QUERY_PERFORMANCE in RMDTrackerBuilder
2011-09-01 10:41:55 -04:00
Mauricio Carneiro 4b5a7046c5 Making ReadLengthDistribution Public
Found this neat little walker Kiran wrote stashed in the private tree. Very useful. Generalized it a bit, added GATKDocs and moved it to public. I might include it as a QC step on the pacbio processing pipeline.
* generalize it so it works with non pair ended reads.
* generalize it to work with no read group information
2011-08-31 15:52:28 -04:00
Mauricio Carneiro 7d79de91c5 Merge branch 'master' into rr 2011-08-30 02:50:19 -04:00
Mauricio Carneiro 0cd9438ac2 fixed soft unclipped calculation
* getRefCoordSoftUnclippedEnd was not resetting the shift when hitting insertions. Fixed.
* getReadCoordinateForReferenceCoordinateBeforeAlignmentEnd was returning the wrong read coordinate position. Fixed.
2011-08-30 02:45:29 -04:00
Mauricio Carneiro fd540592ab Added RMS calculation for consensus MQ
Consensus MQ is now the average of the RMS of the mapping qualities of the reads making each site.
2011-08-30 02:45:20 -04:00
Mauricio Carneiro 6f9264d2b3 Hard Clipping no longer leaves indels on the tails
The clipper could leave an insertion or deletion as the start or end of a read after hardclipping a read if the element adjacent to the clipping point was an indel. Fixed.
2011-08-30 02:44:58 -04:00
Mauricio Carneiro 943876c6eb Added QUAL/MINVAR parameters to the walker 2011-08-30 02:44:46 -04:00
Mauricio Carneiro 7532be7f5a Allowing to clip after AlignmentEnd if end is soft clipped.
Read clipper now identifies and clips even if the requested coordinate is outside the alignment but the read contains soft clipped bases in that region.
2011-08-30 02:44:46 -04:00
Mauricio Carneiro 90a1f5e15c Several bug fixes
* When hard clipping a read that had insertions in it, the insertion was being added to the cigar string's hard clip element. This way, the old UnclippedStart() was being modified and so was the calculation of the new AlignmentStart(). Fixed it by subtracting the number of insertions clipped from the total number of hard clipped bases.
* Walker was sending read instead of filtered read when deleting a read that contains only Q2 bases
* Sliding the window was causing reads that started on the new start position to be entirely clipped.
2011-08-30 02:44:19 -04:00
Mauricio Carneiro 66a8b36cf5 Fixed most indexing bugs
* added bases and quals to consensus
* fixed consensus read cigar generation.
2011-08-30 02:43:41 -04:00
Mark DePristo 1e5001b447 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-29 17:04:21 -04:00
Mark DePristo 3af001fff2 Bugfix for file that must not exist on disk 2011-08-29 17:00:10 -04:00
Mark DePristo 3b09d42ed6 Now only prints 1 warning message about duplicate headers in simpleMerge 2011-08-29 14:41:29 -04:00
Eric Banks c2f0db969b Don't use the default deletion value from UG if not asking to have it set 2011-08-29 13:48:10 -04:00
Eric Banks bb7a37e8f2 We need to allow reference calls in the input VCF for the GenotypeAndValidate walker when using the BAM as truth so that we can test supposed monomorphic calls against the truth. 2011-08-29 13:19:35 -04:00
Ryan Poplin bc252a0d62 misc minor bug fixes in assembly. Increasing the minimum number of bad variants to be used in negative model training in the VQSR 2011-08-29 08:11:31 -04:00
Mark DePristo a5c65fc133 Debugging information to print out the Query tracks 2011-08-28 18:54:49 -04:00
Mark DePristo 7bf006278d Moved ResolveHostname to general utils as a static function 2011-08-28 12:04:16 -04:00
Mark DePristo ccec0b4d73 AnalyzeCovariates uses the general RScript system now
-- Convenience constructor for collection for testing
-- callRScript() now accepts Objects not Strings, for convenience
2011-08-27 12:54:13 -04:00
Mark DePristo 1ceb020fae UnitTests for RScript 2011-08-27 10:50:05 -04:00
Mark DePristo e37a638e09 Fix for disallowed characters in GATKReportTable
-- Illegal characters are automatically replaced with _
2011-08-26 13:24:06 -04:00
Mark DePristo c0503283df Spelling fix requires md5 updates 2011-08-26 07:40:44 -04:00
Mark DePristo eef1ac415a Merge branch 'master' into rodTesting
Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToTable.java
2011-08-26 00:35:41 -04:00
Eric Banks 9b7512fd94 Just because there's a ref base doesn't mean the VC needs to be padded 2011-08-25 22:42:14 -04:00
Mark DePristo e01273ca7c Queue now writes out queueJobReport.pdf
-- General purpose RScript executor in java (please use when invoking RScripts)
-- Removed groupName.  This is now analysisName
-- Explicitly added capability to enable/disable individual QFunction
2011-08-25 16:57:11 -04:00
Eric Banks 09a729da3a Removing incorrect comment 2011-08-25 15:42:52 -04:00
Eric Banks 8bbef79fc2 Create clipped alleles during allele parsing instead of creating a full VC, clipping alleles, and regenerating the VC from scratch. 2011-08-25 15:37:26 -04:00
Ryan Poplin 29c7b10f7b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-24 15:18:58 -04:00
Ryan Poplin e5008aba00 Output the top two haplotypes as a variant call by running smith-waterman alignment against the reference and calling any difference as variation. This is the first verion that runs end-to-end by taking in reads as bam file and writing out variant calls in VCF. 2011-08-24 15:18:44 -04:00
Guillermo del Angel e618cb1e79 a) Renamed/expanded SelectVariants arguments that choose particular kinds of variants and particular allelic types, now instead of -Indels or -SNPs we can specify for example -selectType [MIXED|INDEL|SNP|MNP|SYMBOLIC]. To select biallelic, multiallelic variants, use -restrictAllelesTo [BIALLELIC|MULTIALLELIC]. Corresponding gatkdocs changes.
b) More useful AC,AF logging in VariantsToTable with multiallelic sites: instead of logging comma-separated values, log max value by default. Hidden, experimental argument -logACSum to log sum of ACs instead. This is due to extreme slowness of R in parsing strings to tokens and computing max/sum itself (~100x slower than gatk).
c) Added integrationtest for new SelectVariants commands
2011-08-24 12:25:50 -04:00
Mark DePristo 28ee6dac41 Fixed spelling mistake 2011-08-24 10:14:45 -04:00
Ryan Poplin f37875600a Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-24 09:02:44 -04:00
Khalid Shakir 1ecbf05aae Avoid segfaults due to out of date and possibly abandonded LSF DRMAA implementation when use'ing LSF instead of .combined_LSF_SGE 2011-08-23 23:49:36 -04:00
Mark DePristo 569e1a1089 Walker.isDone() aborts execution early
-- Useful if you want to have a parameter like MAX_RECORDS that wants the walker to stop after some number of map calls without having to resort to the old System.exit() call directly.
2011-08-23 16:53:06 -04:00
Ryan Poplin a1a1fac9e4 Likelihood engine now gives non-zero likelihoods. Using HMM function that can handle context specific gap open and gap continuation penalties 2011-08-23 13:43:07 -04:00
Guillermo del Angel 6e2552a9ef Merge fix 2011-08-23 12:40:43 -04:00
Guillermo del Angel 8b7a0b3b62 Two new arguments to SelectVariants to exclude either multiallelic or biallelic sites from input vcf 2011-08-23 12:40:01 -04:00
Roger Zurawicki ac36271457 Fixed extra reads showing up in Variable Sites
Reads that were not hard clipped for the variable site no longer show up in output file
Walker now uses unclippedStart of Read to determine position in the sliding Window
2011-08-23 11:26:00 -04:00
Mark DePristo 6d6feb5540 Better error message when you cannot determine a ROD type because the file doesn't exist or cannot be read 2011-08-23 10:56:37 -04:00
Mauricio Carneiro feeab6075f Merging ReduceReads development with unstable repo
It is time to bring the ReadClipper class to the main repo. Read Clipper has tested functionality for soft and hard clipping reads. I will prepare thorough documentation for it as it will be very useful for the assembler and the GATK in general.
2011-08-22 23:03:03 -04:00
Guillermo del Angel ee68713267 Further Bug fixes to CountVariants: stratifications were wrong in case genotypes had no-calls, for example if we stratified by sample and a sample had a no-call, this no-call was considered a true variant and counts were incorrectly increased 2011-08-22 20:42:47 -04:00
Guillermo del Angel c270384b2e Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-22 20:39:32 -04:00
Guillermo del Angel 8ae24912f4 a) Misc fixes in Phase1 indel vqsr script,
b) More R-friendly VariantsToTable printing of AC in case of multiple alt alleles
c) Rename FixPLOrderingWalker to FixGenotypesWalker and rewrote: no longer need older code, replaced with code to replace genotypes with all-zero PL's with a no-call.
2011-08-22 20:39:06 -04:00
Mark DePristo 85c5a6f890 Merge branch 'rodTesting'
Conflicts:
	private/java/src/org/broadinstitute/sting/gatk/walkers/performance/ProfileRodSystem.java
2011-08-22 17:43:47 -04:00
Mark DePristo 1eab9be35d Now with accurate javadoc 2011-08-22 17:25:15 -04:00
Mark DePristo 3612a3501d info, not warn, about dynamic type determination 2011-08-22 17:24:51 -04:00
Eric Banks dc42571dd9 Only create the genotype map when necessary 2011-08-22 15:40:36 -04:00
Khalid Shakir c4c90c8826 Updates to JobRunners from the Queue developer community and from running the WholeGenomePipeline:
- Ability to pass a different resident memory reservation and limits. Useful for large pileups of low pass genome data that sometimes need high -Xmx6g but usually don't exceed 2-3g in actual heap size.
- Fixed jobPriority to work for all job runners. Now must be a integer between 0 and 100- even for GridEngine- and will be mapped to the correct values.
- Passing parallel environment and job resource requests to LSF and GridEngine. Useful for passing tokens like iodine_io=1 and -pe pe_slots 8
- Refactored GridEngine JobRunner to also provide basic support for other job dispatchers with DRMAA implementations such as Torque/PBS. Should work for basic running but advanced users must pass their own jobNativeArgs from the command line or in customized QScripts until someone maps properties like jobQueue, jobPriority, residentRequest, etc. into a Torque/PBS/etc. dispatcher.
2011-08-22 15:13:27 -04:00
Eric Banks 2c24b68a96 Working implementation of DecodeLoc for VCF parsing. Makes indexing 3x faster. 2011-08-22 15:11:21 -04:00
Eric Banks 518b3dd291 Don't let the genotypes map be null 2011-08-22 15:10:30 -04:00
Ryan Poplin f93a554b01 updating exome specific parameters in MDCP 2011-08-21 10:25:36 -04:00
Ryan Poplin dbff84c54e Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-21 10:09:19 -04:00
Khalid Shakir 22ca44c015 Fixed Queue's tagging of RodBindings.
Fixed argument definition names.
2011-08-21 02:34:20 -04:00
Eric Banks a8cbced71b Bug fix for Ryan: check for no context 2011-08-20 22:49:51 -04:00
Eric Banks 0ccd173967 Fixing the recent SelectVariants fix 2011-08-20 21:30:08 -04:00
Ryan Poplin b008676878 fixing the previous fix 2011-08-20 21:21:55 -04:00
Guillermo del Angel 782453235a Updated VariantEvalIntegrationTest since there's a new column separating nMixed and nComplex in CountVariants
Misc updates to WholeGenomeIndelCalling.scala
Bug fix in VariantEval (may be temporary, need more investigation): if -disc option is used in sites-only vcf's then a null pointer exception is produced, caused by recent introduction of -xl_sf options.
2011-08-20 12:24:22 -04:00
Ryan Poplin 539e157ecd Fixing misc parameters in MDCP. The pipeline now does VariantEval of output by default. Fix for NaN vqslod values in VQSR 2011-08-20 11:28:48 -04:00
Guillermo del Angel 4939648fd4 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-20 08:50:43 -04:00
Ryan Poplin a96ecbab71 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-19 19:30:05 -04:00
Ryan Poplin ddb5045e14 Updating the methods development calling pipeline for the new rod binding syntax and the new best practices. 2011-08-19 19:29:51 -04:00
Mark DePristo ff018c7964 Swapped argument order but not MD5 order 2011-08-19 16:55:56 -04:00
Mark DePristo 8b3cfb2f1c Final documented version of GATKDoclet and associated classes
-- Docs on everything.
-- Feature complete.  At this point only minor improvements and bugfixes are anticipated
2011-08-19 16:52:17 -04:00
Mark DePristo b08d63a6b8 Documentation and code cleanup for ClipReads, CallableLoci, and VariantsToTable
-- Swapped -o [summary] and -ob [bam] for more standard -o [bam] and -os [summary] arguments.
-- @Advanced arguments
2011-08-19 15:06:37 -04:00
Mark DePristo 49e831a13b Should have checked in 2011-08-19 14:35:16 -04:00
Mauricio Carneiro 7b5fa4486d GenotypeAndValidate - Added docs to the @Arguments 2011-08-19 13:35:11 -04:00
Mark DePristo 9f7d4beb89 Merge branch 'help' 2011-08-19 13:14:02 -04:00
Mark DePristo 4d1fd17a97 GATKDoclet cleanup and documentation
-- Fixed bug in the way ArgumentCollections were handled that lead to failure in handling the dbsnp argument collection.
2011-08-19 13:13:41 -04:00
Ryan Poplin 0f25167efd minor fix in VariantEval docs 2011-08-19 11:01:04 -04:00
Mark DePristo 198955f752 GATKDoc descriptions for all standard codecs, or TODO for their owners
-- Also added vcf.gz support in the VCF codec.  This wasn't committed in the last round, because it was missed by the parallel documentation effort.
2011-08-19 09:57:21 -04:00
Guillermo del Angel 269ed1206c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-19 09:32:20 -04:00
Mark DePristo a5e279d697 Dynamic typing of vcf.gz files
-- CombineVariantsIntegrationTests now use dynamic typing of vcf.gz files
-- FeatureManagerUnitTests tests for correctness.
2011-08-19 09:05:11 -04:00
Eric Banks 40e67cff1b I like the @Advanced annotation 2011-08-18 22:27:34 -04:00
Mark DePristo 2457c7b8f5 Merge branch 'master' into help 2011-08-18 22:20:43 -04:00
Mark DePristo 5fbdf968f7 ArgumentSource no longer comparable. Arguments sorted by GATKDoclet 2011-08-18 22:20:14 -04:00
Eric Banks 77fa2c1546 Renaming read filters with a superfluous 'Read' in their names. Kept the ones that made sense to have it (e.g. MalformedReadFilter). 2011-08-18 22:01:33 -04:00
Mark DePristo 1d3799ddf7 Merge branch 'master' into help 2011-08-18 22:00:29 -04:00
Mark DePristo d1892cd0d7 Bug fixes
-- Sorting of ArgumentSources now done in GATKDoclet, not in the ParsingEngine, as the system depends on the LinkedTreeMap
-- Fixed broken exception throwing in the case where a file's type could not be determined
2011-08-18 21:58:36 -04:00
Mark DePristo c5efb6f40e Usability improvements to GATKDocs
-- ArgumentSources are now sorted by case insensitive names, so arguments are shown in alphabetical order (Ryan)
-- @Advanced annotation can be used to indicate that an argument is an advanced option and should be visually deemphasized in the GATKs.  There's now an advanced section.  Mauricio or Ryan -- could you figure out how to make this section less prominent in the style.css?
2011-08-18 21:39:11 -04:00
Mark DePristo d94da0b1cf Moved CG and SOAP codecs to private 2011-08-18 21:20:26 -04:00
Mark DePristo f7414e39bc Improvements to GATKDocs
-- Allowed values for RodBinding<T> are displayed in the GATKDocs
-- Longest name up to 30 characters is chosen for main argument list (suggested by Ryan/Mauricio)
-- Features are listed in alphabetical order
-- Moved useful getParameterizedType() function to JVMUtils
-- Tests of these features in the Documentation Test
2011-08-18 21:20:09 -04:00
Ryan Poplin 09d099cada Added GATKDocs to the UnifiedGenotyper. 2011-08-18 20:57:02 -04:00
Mauricio Carneiro 6ef01e40b8 Complete rewrite of Hard Clipping (ReadClipper)
Hard clipping is now completely independent from softclipping and plows through previously hard or soft clipped reads.
2011-08-18 18:35:45 -04:00
Guillermo del Angel 626cbf9411 Bug fixes and cleanups for IndelStatistics 2011-08-18 16:28:40 -04:00
Guillermo del Angel 58560a6d50 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 16:17:52 -04:00
Guillermo del Angel 3dfb60a46e Fixing up and refactoring usage of indel categories. On a variant context, isInsertion() and isDeletion() are now removed because behavior before was wrong in case of multiallelic sites. Now, methods isSimpleInsertion() and isSimpleDeletion() will return true only if sites are biallelic. For multiallelic sites, isComplex() will return true in all cases.
VariantEval module CountVariants is corrected and an additional column is added so that we log mixed events and complex indels separately (before they were being conflated).
VariantEval module IndelStatistics is considerably simplified as the sample stratification was wrong and redundant, now it should work with the VE-generic Sample stratification. Several columns are renamed or removed since they're not really useful
2011-08-18 16:17:38 -04:00
Chris Hartl 6b256a8ac5 Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git 2011-08-18 15:29:24 -04:00
Chris Hartl a8935c99fc dding docs for DepthOfCoverage and ValidationAmplicons 2011-08-18 15:28:35 -04:00
Mark DePristo f2f51e35e3 Merge branch 'master' into help 2011-08-18 14:05:33 -04:00
Mark DePristo faa3f8b6f6 Only concrete classes are now documented 2011-08-18 14:04:47 -04:00
Ryan Poplin 7c4ce6d969 Added GATKDocs for the VQSR walkers. 2011-08-18 14:00:39 -04:00
Mark DePristo 5772766dd5 Improvements to GATKDocs
-- Now supports a static list of root classes / interfaces that should receive docs.  A complementary approach to documenting features to the DocumentedGATKFeature annotation
-- Tribble codecs are now documented!
-- No longer displayed sub and super classes
2011-08-18 14:00:09 -04:00
Mark DePristo e03db30ca0 New uses DocumentedGATKFeatureObject instead of annotation directly
-- Step 1 on the way to creating a static list of additional classes that we want to document.
2011-08-18 12:31:04 -04:00
Mark DePristo d4511807ed Merge branch 'master' into help 2011-08-18 11:53:37 -04:00
Mark DePristo c787fd0b70 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 11:52:45 -04:00
Mark DePristo c797616c65 If you have one sample in your BAM, getToolkit().getSamples().size() == 2
Also deleted double initializationm, where a line of code was duplicated in creating the GATK engine.
2011-08-18 11:51:53 -04:00
Mark DePristo cbec69a130 Merge branch 'master' into help
Conflicts:
	public/java/src/org/broadinstitute/sting/utils/help/HelpUtils.java
2011-08-18 11:33:27 -04:00
Eric Banks aa21fc7c9c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 11:30:59 -04:00
Mark DePristo f5d7cabb20 Fix for reintroducing an already solved problem. 2011-08-18 11:20:12 -04:00
Eric Banks a45498150a Remove non-ascii char 2011-08-18 11:18:29 -04:00
Ryan Poplin c08a9964d4 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 10:58:04 -04:00
Ryan Poplin bb79d3edae Added GATKDocs for the BQSR walkers. 2011-08-18 10:57:48 -04:00
Mark DePristo 47bbddb724 Now provides type-specific user feedback
For RodBinding<VariantContext> error messages now list only the Tribble types that produce VariantContexts
2011-08-18 10:47:16 -04:00
Mark DePristo 2d41ba15a4 Vastly better Tribble help message
Here's a new example:
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.1-520-g76495cd):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to parse value /humgen/gsa-hpprojects/GATK/data/refGene_b37.filtered.sorted.txt for argument refSeqRodBinding. Message: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :TYPE listing the correct type from among the supported types:
##### ERROR        Name        FeatureType   Documentation
##### ERROR      BEAGLE      BeagleFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_beagle_BeagleCodec.html
##### ERROR         BED         BEDFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_bed_BEDCodec.html
##### ERROR    BEDTABLE       TableFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_BedTableCodec.html
##### ERROR       CGVAR     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_completegenomics_CGVarCodec.html
##### ERROR       DBSNP       DbSNPFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_dbsnp_DbSNPCodec.html
##### ERROR    GELITEXT    GeliTextFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_gelitext_GeliTextCodec.html
##### ERROR         MAF         MafFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_features_maf_MafCodec.html
##### ERROR MILLSDEVINE     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_MillsDevineCodec.html
##### ERROR   RAWHAPMAP   RawHapMapFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_hapmap_RawHapMapCodec.html
##### ERROR      REFSEQ      RefSeqFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_refseq_RefSeqCodec.html
##### ERROR   SAMPILEUP   SAMPileupFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html
##### ERROR     SAMREAD     SAMReadFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_samread_SAMReadCodec.html
##### ERROR      SNPEFF      SnpEffFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_snpEff_SnpEffCodec.html
##### ERROR     SOAPSNP     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_soapsnp_SoapSNPCodec.html
##### ERROR       TABLE       TableFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_TableCodec.html
##### ERROR         VCF     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCFCodec.html
##### ERROR        VCF3     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCF3Codec.html
##### ERROR ------------------------------------------------------------------------------------------
2011-08-18 10:31:32 -04:00
Mark DePristo c2287c93d7 Cleanup of codec locations. No more dbSNPHelper
-- refdata/features now in utils/codecs with the other codecs
-- Deleted dbsnpHelper.  rsID function now in VCFutils.  Remaining code either deleted or put into VariantContextAdaptors
-- Many associated import updates due to code move
2011-08-18 10:02:46 -04:00
Mark DePristo 9c17d54cb6 getFeatureClass() now returns Class<T> not Class to avoid yesterday's runtime error 2011-08-18 09:39:20 -04:00
Mark DePristo c30e1db744 Better location for help utils 2011-08-18 09:38:51 -04:00
Mark DePristo 4da42d9f39 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 09:32:57 -04:00
Eric Banks c91a442be1 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 22:40:16 -04:00
Eric Banks b75a1807e3 Adding integration test to cover sample exclusion 2011-08-17 22:40:09 -04:00
Eric Banks a7b70e6bb4 Adding feature for Khalid: ability to exclude particular samples. 2011-08-17 22:28:22 -04:00
Mauricio Carneiro cc3df8f11a Moving GAV walker to public
Walker is updated to the new RodBinding system and has the new GATKDocs layout.
2011-08-17 21:55:17 -04:00
Eric Banks fa1db3913b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 21:49:25 -04:00
Eric Banks 8e83b6646b Bug fix for Chris: don't validate ref base for complex events. 2011-08-17 21:49:14 -04:00
Matt Hanna c104dd7a09 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 16:59:12 -04:00
Matt Hanna 81a792afeb Reverting optimization disable in unstable. 2011-08-17 16:58:24 -04:00
Mark DePristo 2e35592295 GATKDocs for CallableLoci 2011-08-17 16:32:01 -04:00
Guillermo del Angel c193f52e5d Fixed up examples: pasting from wiki still had old rod syntax 2011-08-17 16:29:45 -04:00
Matt Hanna 2b2a4e0795 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable 2011-08-17 16:26:45 -04:00
Matt Hanna 297c9e513c Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable into unstable 2011-08-17 16:24:02 -04:00
Matt Hanna a210a62ab9 Merged bug fix from Stable into Unstable 2011-08-17 16:23:31 -04:00
Mark DePristo d59e6ed274 Fix for RefSeqCodec bug and better error messages
-- RefSeqCodec bug: getFeatureClass() returned RefSeqCodec.class, not RefSeqFeature.class.  Really should change this in Tribble to require Class<T extends Feature> to get compile time type checking
-- Better error messages that actually list the available tribble types, when there's a type error
2011-08-17 16:22:07 -04:00
Matt Hanna d170187896 Disable optimization that increases marginal speed of the GATK slightly but
can produce data loss in a narrow corner case where the BGZF block(s) locations
and offsets in the last index bucket of contig n overlap exactly with the BGZF
block locations and offset in the last index bucket of contig n+1.

A proper fix that keeps the optimization has already been introduced into
unstable, but disabling the optimization is a low risk way to make sure that
users of stable experience no data loss.
2011-08-17 16:16:05 -04:00
David Roazen 53006da9a5 Improved descriptions for the SnpEff annotations in the VCF header
(based on Eric's feedback).
2011-08-17 16:09:10 -04:00
Guillermo del Angel 784fb148b9 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 15:47:01 -04:00
Guillermo del Angel 671330950d Updated Beagle walker for gatkdocs format. Pushed unsupported, undocumented arguments to @Hidden 2011-08-17 15:46:31 -04:00
Andrey Sivachenko 0af68e052a Merge branch 'master' of ssh://cga1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 15:17:47 -04:00
Andrey Sivachenko a423546cdd fix: RefSeq contains records with zero coding length and the refsec codec/feature used to crash on those; now such records are ignored, with warning printed (once) 2011-08-17 15:17:31 -04:00
Andrey Sivachenko 710d34633e now the reads that are too long are truly ignored (fix of the fix) 2011-08-17 15:16:23 -04:00
Eric Banks 2f19046f0c Adding docs to the 2 beasts. Saved the worst for last. 2011-08-17 14:19:14 -04:00
Andrey Sivachenko 069554efe5 somatic indel detector does not die on reads that are too long (likely contain a huge deletion) anymore; instead print a warning and ignore the read 2011-08-17 14:05:19 -04:00
Eric Banks c405a75f54 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 13:28:25 -04:00
Eric Banks 575303ae6b Renaming for consistency and bringing up to speed with new rod system 2011-08-17 13:28:19 -04:00
Eric Banks 6d629c176c Adding docs 2011-08-17 13:27:36 -04:00
Eric Banks a21e193a9e Adding docs to 3 more walkers 2011-08-17 12:35:08 -04:00
Menachem Fromer 98acb546a9 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 12:22:29 -04:00
Menachem Fromer d1bb302d12 Added GatkDocs documentation 2011-08-17 12:21:37 -04:00
Mark DePristo 3da71a9bb6 Clean up summary 2011-08-17 12:04:45 -04:00
Mark DePristo c6fb215faf GATKDocs for VariantsToTable
-- Made a previously required argument optional, as this was a long-standing bug
2011-08-17 12:02:41 -04:00
Mark DePristo 5f794d16a7 Fixed bad character in documentation 2011-08-17 12:01:08 -04:00
Mark DePristo 9d1d5bd27a Revert "Fixed bad character in documentation"
This reverts commit a1f50c82d3cb25e5e83d36e9054d74cdee957d87.
2011-08-17 11:57:31 -04:00
Mark DePristo 78deb3f195 Fixed bad character in documentation 2011-08-17 11:57:00 -04:00
Mark DePristo 79dcfca25f Fixed bad character in documentation 2011-08-17 11:56:51 -04:00
Eric Banks b3b5d608ca Adding docs to yet more walkers 2011-08-17 09:57:19 -04:00
Eric Banks fadcbf68fd Adding docs to QC walkers 2011-08-17 09:39:33 -04:00
Mauricio Carneiro 5d6a6fab98 Renamed softUnclipped functions to refCoord*
These functions return reference coordinates, so they should be named accordingly.
2011-08-16 18:56:28 -04:00
Mauricio Carneiro ed8f769dce Fixed index for getSoftUnclippedEnd()
Unclipped end can be calculated simply by looking at the last cigar element and adding it's length in case it's a soft clip.
2011-08-16 18:54:28 -04:00
Eric Banks 5f3f46aad1 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-16 16:26:33 -04:00
Eric Banks 946f5c53fe Adding docs to more walkers 2011-08-16 16:26:26 -04:00
Mark DePristo 6e828260a0 Removed -B support. Now explodes with error if -B provided. 2011-08-16 16:13:47 -04:00
Ryan Poplin 2d5bbecd9e Merged bug fix from Stable into Unstable 2011-08-16 14:19:04 -04:00
Mauricio Carneiro 07c1e113cd Fixed interval traversal for previously hard clipped reads.
If a read was hard clipped for being low quality and no does not overlap the interval anymore, this read will now be discarded instead of treated as an error by the GATK traversal engine.
2011-08-16 14:18:05 -04:00
Ryan Poplin 9d4add3268 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-08-16 14:18:03 -04:00
Ryan Poplin 170d1ff7b6 Fix in UG for trying to call indels at IUPAC code bases when in EMIT_ALL_SITES mode 2011-08-16 14:17:46 -04:00
Mauricio Carneiro b135565183 Added low quality clipping
Clips both tails of a read if the tails are below a given quality threshold (default Q2).
*Added special treatment for reads that get completely clipped.
2011-08-16 13:51:25 -04:00
Andrey Sivachenko 9f3328db53 fixing read group name collision: before writing the read into respective stream in nway-out mode we now retrieve the original rg, not the merged/modified one 2011-08-16 13:45:40 -04:00
Eric Banks ab0b56ed11 Minor doc fixes 2011-08-16 12:55:45 -04:00
Eric Banks 125ad0bcfa Added docs to RTC 2011-08-16 12:46:48 -04:00
Eric Banks ef9216011e Added docs to IR 2011-08-16 12:24:53 -04:00
Eric Banks ab1e3d6a98 Use the right set of sample names 2011-08-16 01:03:05 -04:00
Eric Banks 36c7f83208 Refactoring VE stratifications so that they don't pass around bulky data; instead just pull needed data from the VE parent. This allows us stop using deprecated features of the rod system. 2011-08-15 16:31:57 -04:00
Eric Banks 1246b89049 Forgot to initialize variants on the merge 2011-08-15 16:00:43 -04:00
Mauricio Carneiro 993ecb85da Added Hard Clipping Tail Ends
Added functionality to hard clip the low quality tail ends of reads (lowQual <= 2)
2011-08-15 15:22:54 -04:00
Eric Banks 045e8a045e Updating random walkers to new rod system; removing unused GenotypeAndValidateWalker 2011-08-15 14:05:23 -04:00
Eric Banks fc2c21433b Updating random walkers to new rod system 2011-08-15 13:29:31 -04:00
Eric Banks 3d56bbf087 Resolving merge conflicts 2011-08-15 12:28:05 -04:00
Eric Banks 9ddbfdcb9f Check filtered status before applying to alt reference 2011-08-15 12:25:23 -04:00
Mauricio Carneiro 0d976d6211 Fixed second time clipping
When a read is clipped once, and then in the second operation, because of indels, it doesn't reach the coordinate initially set for hard clipping, the indices were wrong. This should fix it.
2011-08-15 12:04:53 -04:00
Mauricio Carneiro 489c15b99d Fixed indexing issue in coordinate conversion
When a read had been previously soft clipped, the UnclippedEnd could not be used directly as Reference Coordinate for clipping , because the read does not go that far.
2011-08-15 01:42:34 -04:00
Mauricio Carneiro c7b69a4574 Fixed integration tests 2011-08-14 16:38:20 -04:00
Mauricio Carneiro 6ae3f9e322 Wrapped clipping op information
The clipping op extra information being kept by this walker was specific to the walker, not to the read clipper. Created a wrapper ReadClipperWithData class that keeps the extra information and leaves the ReadClipper slim.

(this is a quick commit to unbreak the build, performing integration tests and will make further commits if necessary)
2011-08-14 15:44:48 -04:00
Mauricio Carneiro 8a51732049 Fixes to ReadClipper and added Reference Coordinate clipping.
* Added reference coordinate based hard clipping functions. This allows you to set a hard cut on where you need the read to be trimmed despite indels.
* soft clipping was messing up cigar string if there was already a hard clip at the beginning of the read. Fixed.
* hard clipping now works with previously hard clipped reads.
2011-08-14 14:54:33 -04:00
Mauricio Carneiro 291d8c7596 Fixed HardClipping and Interval containment
* Hard clipping was wrongfully hard clipping unmapped reads while soft clipping then hard clipping mapped reads. Now we throw exception if we try to hard/soft clip unmapped reads and use the soft->hard clip procedure fore every mapped read.

 * Interval containment needed a <= and >= to make sure it caught the borders right.
2011-08-14 14:54:33 -04:00
Mauricio Carneiro 0be1dacddb Refactored interval clipping utility
reads are clipped in map() and now we cover almost all cases. Left behind the case where the read stretches through two intervals. This will need special treatment later.
2011-08-14 14:54:33 -04:00
David Roazen 9d2cda3d41 Removed a public -> private dependency in our test suite. 2011-08-12 17:29:10 -04:00
David Roazen bb4ced3201 SnpEff-related fixes.
-To correctly handle indels and MNPs, only consider features that start at the current locus,
rather than features that span the current locus, when selecting the most significant effect.

-Throw a UserException when a SnpEff rodbinding is not provided instead of simply not adding
any annotations and silently returning.
2011-08-12 15:26:24 -04:00
Mauricio Carneiro 10e873d9c6 Merge branch 'repval' 2011-08-12 15:24:31 -04:00
Guillermo del Angel 31dc831531 Merged bug fix from Stable into Unstable 2011-08-12 13:26:41 -04:00
Menachem Fromer 9121b8ed65 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-12 12:24:19 -04:00
Menachem Fromer 7ed120361d Fixed bug that required symbolic alleles to be padded with reference base and added integration test to test parsing and output of symbolic alleles 2011-08-12 12:23:44 -04:00