Commit Graph

549 Commits (e01273ca7cbdcfab86fb228d8bfd6faf4c8a84d0)

Author SHA1 Message Date
Guillermo del Angel 3dfb60a46e Fixing up and refactoring usage of indel categories. On a variant context, isInsertion() and isDeletion() are now removed because behavior before was wrong in case of multiallelic sites. Now, methods isSimpleInsertion() and isSimpleDeletion() will return true only if sites are biallelic. For multiallelic sites, isComplex() will return true in all cases.
VariantEval module CountVariants is corrected and an additional column is added so that we log mixed events and complex indels separately (before they were being conflated).
VariantEval module IndelStatistics is considerably simplified as the sample stratification was wrong and redundant, now it should work with the VE-generic Sample stratification. Several columns are renamed or removed since they're not really useful
2011-08-18 16:17:38 -04:00
Chris Hartl 6b256a8ac5 Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git 2011-08-18 15:29:24 -04:00
Chris Hartl a8935c99fc dding docs for DepthOfCoverage and ValidationAmplicons 2011-08-18 15:28:35 -04:00
Mark DePristo f2f51e35e3 Merge branch 'master' into help 2011-08-18 14:05:33 -04:00
Mark DePristo faa3f8b6f6 Only concrete classes are now documented 2011-08-18 14:04:47 -04:00
Ryan Poplin 7c4ce6d969 Added GATKDocs for the VQSR walkers. 2011-08-18 14:00:39 -04:00
Mark DePristo 5772766dd5 Improvements to GATKDocs
-- Now supports a static list of root classes / interfaces that should receive docs.  A complementary approach to documenting features to the DocumentedGATKFeature annotation
-- Tribble codecs are now documented!
-- No longer displayed sub and super classes
2011-08-18 14:00:09 -04:00
Mark DePristo e03db30ca0 New uses DocumentedGATKFeatureObject instead of annotation directly
-- Step 1 on the way to creating a static list of additional classes that we want to document.
2011-08-18 12:31:04 -04:00
Mark DePristo d4511807ed Merge branch 'master' into help 2011-08-18 11:53:37 -04:00
Mark DePristo c787fd0b70 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 11:52:45 -04:00
Mark DePristo c797616c65 If you have one sample in your BAM, getToolkit().getSamples().size() == 2
Also deleted double initializationm, where a line of code was duplicated in creating the GATK engine.
2011-08-18 11:51:53 -04:00
Mark DePristo cbec69a130 Merge branch 'master' into help
Conflicts:
	public/java/src/org/broadinstitute/sting/utils/help/HelpUtils.java
2011-08-18 11:33:27 -04:00
Eric Banks aa21fc7c9c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 11:30:59 -04:00
Mark DePristo f5d7cabb20 Fix for reintroducing an already solved problem. 2011-08-18 11:20:12 -04:00
Eric Banks a45498150a Remove non-ascii char 2011-08-18 11:18:29 -04:00
Ryan Poplin c08a9964d4 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 10:58:04 -04:00
Ryan Poplin bb79d3edae Added GATKDocs for the BQSR walkers. 2011-08-18 10:57:48 -04:00
Mark DePristo 47bbddb724 Now provides type-specific user feedback
For RodBinding<VariantContext> error messages now list only the Tribble types that produce VariantContexts
2011-08-18 10:47:16 -04:00
Mark DePristo 2d41ba15a4 Vastly better Tribble help message
Here's a new example:
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.1-520-g76495cd):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to parse value /humgen/gsa-hpprojects/GATK/data/refGene_b37.filtered.sorted.txt for argument refSeqRodBinding. Message: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :TYPE listing the correct type from among the supported types:
##### ERROR        Name        FeatureType   Documentation
##### ERROR      BEAGLE      BeagleFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_beagle_BeagleCodec.html
##### ERROR         BED         BEDFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_bed_BEDCodec.html
##### ERROR    BEDTABLE       TableFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_BedTableCodec.html
##### ERROR       CGVAR     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_completegenomics_CGVarCodec.html
##### ERROR       DBSNP       DbSNPFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_dbsnp_DbSNPCodec.html
##### ERROR    GELITEXT    GeliTextFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_gelitext_GeliTextCodec.html
##### ERROR         MAF         MafFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_features_maf_MafCodec.html
##### ERROR MILLSDEVINE     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_MillsDevineCodec.html
##### ERROR   RAWHAPMAP   RawHapMapFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_hapmap_RawHapMapCodec.html
##### ERROR      REFSEQ      RefSeqFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_refseq_RefSeqCodec.html
##### ERROR   SAMPILEUP   SAMPileupFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html
##### ERROR     SAMREAD     SAMReadFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_samread_SAMReadCodec.html
##### ERROR      SNPEFF      SnpEffFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_snpEff_SnpEffCodec.html
##### ERROR     SOAPSNP     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_soapsnp_SoapSNPCodec.html
##### ERROR       TABLE       TableFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_TableCodec.html
##### ERROR         VCF     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCFCodec.html
##### ERROR        VCF3     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCF3Codec.html
##### ERROR ------------------------------------------------------------------------------------------
2011-08-18 10:31:32 -04:00
Mark DePristo c2287c93d7 Cleanup of codec locations. No more dbSNPHelper
-- refdata/features now in utils/codecs with the other codecs
-- Deleted dbsnpHelper.  rsID function now in VCFutils.  Remaining code either deleted or put into VariantContextAdaptors
-- Many associated import updates due to code move
2011-08-18 10:02:46 -04:00
Mark DePristo 9c17d54cb6 getFeatureClass() now returns Class<T> not Class to avoid yesterday's runtime error 2011-08-18 09:39:20 -04:00
Mark DePristo c30e1db744 Better location for help utils 2011-08-18 09:38:51 -04:00
Mark DePristo 4da42d9f39 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 09:32:57 -04:00
Eric Banks c91a442be1 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 22:40:16 -04:00
Eric Banks b75a1807e3 Adding integration test to cover sample exclusion 2011-08-17 22:40:09 -04:00
Eric Banks a7b70e6bb4 Adding feature for Khalid: ability to exclude particular samples. 2011-08-17 22:28:22 -04:00
Mauricio Carneiro cc3df8f11a Moving GAV walker to public
Walker is updated to the new RodBinding system and has the new GATKDocs layout.
2011-08-17 21:55:17 -04:00
Eric Banks fa1db3913b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 21:49:25 -04:00
Eric Banks 8e83b6646b Bug fix for Chris: don't validate ref base for complex events. 2011-08-17 21:49:14 -04:00
Matt Hanna c104dd7a09 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 16:59:12 -04:00
Matt Hanna 81a792afeb Reverting optimization disable in unstable. 2011-08-17 16:58:24 -04:00
Mark DePristo 2e35592295 GATKDocs for CallableLoci 2011-08-17 16:32:01 -04:00
Guillermo del Angel c193f52e5d Fixed up examples: pasting from wiki still had old rod syntax 2011-08-17 16:29:45 -04:00
Matt Hanna 297c9e513c Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable into unstable 2011-08-17 16:24:02 -04:00
Matt Hanna a210a62ab9 Merged bug fix from Stable into Unstable 2011-08-17 16:23:31 -04:00
Mark DePristo d59e6ed274 Fix for RefSeqCodec bug and better error messages
-- RefSeqCodec bug: getFeatureClass() returned RefSeqCodec.class, not RefSeqFeature.class.  Really should change this in Tribble to require Class<T extends Feature> to get compile time type checking
-- Better error messages that actually list the available tribble types, when there's a type error
2011-08-17 16:22:07 -04:00
Matt Hanna d170187896 Disable optimization that increases marginal speed of the GATK slightly but
can produce data loss in a narrow corner case where the BGZF block(s) locations
and offsets in the last index bucket of contig n overlap exactly with the BGZF
block locations and offset in the last index bucket of contig n+1.

A proper fix that keeps the optimization has already been introduced into
unstable, but disabling the optimization is a low risk way to make sure that
users of stable experience no data loss.
2011-08-17 16:16:05 -04:00
David Roazen 53006da9a5 Improved descriptions for the SnpEff annotations in the VCF header
(based on Eric's feedback).
2011-08-17 16:09:10 -04:00
Guillermo del Angel 784fb148b9 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 15:47:01 -04:00
Guillermo del Angel 671330950d Updated Beagle walker for gatkdocs format. Pushed unsupported, undocumented arguments to @Hidden 2011-08-17 15:46:31 -04:00
Andrey Sivachenko 0af68e052a Merge branch 'master' of ssh://cga1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 15:17:47 -04:00
Andrey Sivachenko a423546cdd fix: RefSeq contains records with zero coding length and the refsec codec/feature used to crash on those; now such records are ignored, with warning printed (once) 2011-08-17 15:17:31 -04:00
Andrey Sivachenko 710d34633e now the reads that are too long are truly ignored (fix of the fix) 2011-08-17 15:16:23 -04:00
Eric Banks 2f19046f0c Adding docs to the 2 beasts. Saved the worst for last. 2011-08-17 14:19:14 -04:00
Andrey Sivachenko 069554efe5 somatic indel detector does not die on reads that are too long (likely contain a huge deletion) anymore; instead print a warning and ignore the read 2011-08-17 14:05:19 -04:00
Eric Banks c405a75f54 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 13:28:25 -04:00
Eric Banks 575303ae6b Renaming for consistency and bringing up to speed with new rod system 2011-08-17 13:28:19 -04:00
Eric Banks 6d629c176c Adding docs 2011-08-17 13:27:36 -04:00
Eric Banks a21e193a9e Adding docs to 3 more walkers 2011-08-17 12:35:08 -04:00
Menachem Fromer 98acb546a9 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 12:22:29 -04:00
Menachem Fromer d1bb302d12 Added GatkDocs documentation 2011-08-17 12:21:37 -04:00
Mark DePristo 3da71a9bb6 Clean up summary 2011-08-17 12:04:45 -04:00
Mark DePristo c6fb215faf GATKDocs for VariantsToTable
-- Made a previously required argument optional, as this was a long-standing bug
2011-08-17 12:02:41 -04:00
Mark DePristo 5f794d16a7 Fixed bad character in documentation 2011-08-17 12:01:08 -04:00
Mark DePristo 9d1d5bd27a Revert "Fixed bad character in documentation"
This reverts commit a1f50c82d3cb25e5e83d36e9054d74cdee957d87.
2011-08-17 11:57:31 -04:00
Mark DePristo 78deb3f195 Fixed bad character in documentation 2011-08-17 11:57:00 -04:00
Mark DePristo 79dcfca25f Fixed bad character in documentation 2011-08-17 11:56:51 -04:00
Eric Banks b3b5d608ca Adding docs to yet more walkers 2011-08-17 09:57:19 -04:00
Eric Banks fadcbf68fd Adding docs to QC walkers 2011-08-17 09:39:33 -04:00
Mauricio Carneiro 5d6a6fab98 Renamed softUnclipped functions to refCoord*
These functions return reference coordinates, so they should be named accordingly.
2011-08-16 18:56:28 -04:00
Mauricio Carneiro ed8f769dce Fixed index for getSoftUnclippedEnd()
Unclipped end can be calculated simply by looking at the last cigar element and adding it's length in case it's a soft clip.
2011-08-16 18:54:28 -04:00
Eric Banks 5f3f46aad1 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-16 16:26:33 -04:00
Eric Banks 946f5c53fe Adding docs to more walkers 2011-08-16 16:26:26 -04:00
Mark DePristo 6e828260a0 Removed -B support. Now explodes with error if -B provided. 2011-08-16 16:13:47 -04:00
Ryan Poplin 2d5bbecd9e Merged bug fix from Stable into Unstable 2011-08-16 14:19:04 -04:00
Mauricio Carneiro 07c1e113cd Fixed interval traversal for previously hard clipped reads.
If a read was hard clipped for being low quality and no does not overlap the interval anymore, this read will now be discarded instead of treated as an error by the GATK traversal engine.
2011-08-16 14:18:05 -04:00
Ryan Poplin 9d4add3268 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-08-16 14:18:03 -04:00
Ryan Poplin 170d1ff7b6 Fix in UG for trying to call indels at IUPAC code bases when in EMIT_ALL_SITES mode 2011-08-16 14:17:46 -04:00
Mauricio Carneiro b135565183 Added low quality clipping
Clips both tails of a read if the tails are below a given quality threshold (default Q2).
*Added special treatment for reads that get completely clipped.
2011-08-16 13:51:25 -04:00
Andrey Sivachenko 9f3328db53 fixing read group name collision: before writing the read into respective stream in nway-out mode we now retrieve the original rg, not the merged/modified one 2011-08-16 13:45:40 -04:00
Eric Banks ab0b56ed11 Minor doc fixes 2011-08-16 12:55:45 -04:00
Eric Banks 125ad0bcfa Added docs to RTC 2011-08-16 12:46:48 -04:00
Eric Banks ef9216011e Added docs to IR 2011-08-16 12:24:53 -04:00
Eric Banks ab1e3d6a98 Use the right set of sample names 2011-08-16 01:03:05 -04:00
Eric Banks 36c7f83208 Refactoring VE stratifications so that they don't pass around bulky data; instead just pull needed data from the VE parent. This allows us stop using deprecated features of the rod system. 2011-08-15 16:31:57 -04:00
Eric Banks 1246b89049 Forgot to initialize variants on the merge 2011-08-15 16:00:43 -04:00
Mauricio Carneiro 993ecb85da Added Hard Clipping Tail Ends
Added functionality to hard clip the low quality tail ends of reads (lowQual <= 2)
2011-08-15 15:22:54 -04:00
Eric Banks 045e8a045e Updating random walkers to new rod system; removing unused GenotypeAndValidateWalker 2011-08-15 14:05:23 -04:00
Eric Banks fc2c21433b Updating random walkers to new rod system 2011-08-15 13:29:31 -04:00
Eric Banks 3d56bbf087 Resolving merge conflicts 2011-08-15 12:28:05 -04:00
Eric Banks 9ddbfdcb9f Check filtered status before applying to alt reference 2011-08-15 12:25:23 -04:00
Mauricio Carneiro 0d976d6211 Fixed second time clipping
When a read is clipped once, and then in the second operation, because of indels, it doesn't reach the coordinate initially set for hard clipping, the indices were wrong. This should fix it.
2011-08-15 12:04:53 -04:00
Mauricio Carneiro 489c15b99d Fixed indexing issue in coordinate conversion
When a read had been previously soft clipped, the UnclippedEnd could not be used directly as Reference Coordinate for clipping , because the read does not go that far.
2011-08-15 01:42:34 -04:00
Mauricio Carneiro c7b69a4574 Fixed integration tests 2011-08-14 16:38:20 -04:00
Mauricio Carneiro 6ae3f9e322 Wrapped clipping op information
The clipping op extra information being kept by this walker was specific to the walker, not to the read clipper. Created a wrapper ReadClipperWithData class that keeps the extra information and leaves the ReadClipper slim.

(this is a quick commit to unbreak the build, performing integration tests and will make further commits if necessary)
2011-08-14 15:44:48 -04:00
Mauricio Carneiro 8a51732049 Fixes to ReadClipper and added Reference Coordinate clipping.
* Added reference coordinate based hard clipping functions. This allows you to set a hard cut on where you need the read to be trimmed despite indels.
* soft clipping was messing up cigar string if there was already a hard clip at the beginning of the read. Fixed.
* hard clipping now works with previously hard clipped reads.
2011-08-14 14:54:33 -04:00
Mauricio Carneiro 291d8c7596 Fixed HardClipping and Interval containment
* Hard clipping was wrongfully hard clipping unmapped reads while soft clipping then hard clipping mapped reads. Now we throw exception if we try to hard/soft clip unmapped reads and use the soft->hard clip procedure fore every mapped read.

 * Interval containment needed a <= and >= to make sure it caught the borders right.
2011-08-14 14:54:33 -04:00
Mauricio Carneiro 0be1dacddb Refactored interval clipping utility
reads are clipped in map() and now we cover almost all cases. Left behind the case where the read stretches through two intervals. This will need special treatment later.
2011-08-14 14:54:33 -04:00
David Roazen 9d2cda3d41 Removed a public -> private dependency in our test suite. 2011-08-12 17:29:10 -04:00
David Roazen bb4ced3201 SnpEff-related fixes.
-To correctly handle indels and MNPs, only consider features that start at the current locus,
rather than features that span the current locus, when selecting the most significant effect.

-Throw a UserException when a SnpEff rodbinding is not provided instead of simply not adding
any annotations and silently returning.
2011-08-12 15:26:24 -04:00
Mauricio Carneiro 10e873d9c6 Merge branch 'repval' 2011-08-12 15:24:31 -04:00
Guillermo del Angel 31dc831531 Merged bug fix from Stable into Unstable 2011-08-12 13:26:41 -04:00
Menachem Fromer 9121b8ed65 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-12 12:24:19 -04:00
Menachem Fromer 7ed120361d Fixed bug that required symbolic alleles to be padded with reference base and added integration test to test parsing and output of symbolic alleles 2011-08-12 12:23:44 -04:00
Eric Banks 7ea9196321 Better error message for name/type clashes. 2011-08-12 11:18:14 -04:00
Eric Banks 27f0748b33 Renaming the HapMap codec and feature to RawHapMap so that we don't get esoteric errors when trying to bind a rod with the name 'hapmap' (since it was also a feature). 2011-08-12 11:11:56 -04:00
Eric Banks 005bd71be3 Working too quickly earlier. Fixing syntax. 2011-08-12 10:29:36 -04:00
Menachem Fromer c7ca33cbff Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-12 10:12:09 -04:00
Eric Banks 639a01f382 Updating integration test now that VE has been updated 2011-08-12 07:15:08 -04:00
Eric Banks 41f3da75d7 Implementation in VE was confusing 'variant' status vs. 'polymorphic' status. This led to issues because we now match types of eval and comp; specifically, subsetting a VC to a monomorphic sample can't change the 'variant' status of the VC (it's still a variant site or otherwise we'll never match the comps, which breaks GenotypeConcordance). CountVariants really got this wrong. Fixed. VE now passes all integration tests. 2011-08-12 02:22:44 -04:00