Commit Graph

117 Commits (5be8bc5dfc87bcd34ee42bf34643da412019eeb2)

Author SHA1 Message Date
Geraldine Van der Auwera 5d8b9a7c20 Moved MQ0 out of HC exclusion and into StandardUGAnnotation 2015-05-03 01:04:49 +02:00
Geraldine Van der Auwera 071d82d1bf Un-exclude SD and TRA from HC annotators; resolves #966
Exclude MQ0BySample
Move SD and TRA to new StandardUGAnnotation interface
There is now annotation interface (StandardUGAnnotation) holding annots that are standard in UG but should't be used as they are now with HC. This allows us to not have to exclude these annotations explicitly in HC, but still be able to use them for development purposes.
2015-05-03 00:45:53 +02:00
Geraldine Van der Auwera e49f6dfd0f Merge pull request #970 from broadinstitute/gg_minor_docfixes
Fairly minor if plentiful fixes to various gatkdocs. Merging this without formal review since all tests pass, the gatkdocs build, and no one really wants to review corrections to grammar, typos and layout for 120+ documents. Review will be done by users in production ;-)
2015-05-03 00:36:12 +02:00
Geraldine Van der Auwera 919c3eaa2e Numerous doc fixes; mostly formatting and clarifications 2015-05-03 00:28:46 +02:00
Ron Levine 9ff827c83a More allele trimming for VariantAnnotator 2015-04-29 21:11:49 -04:00
Ron Levine d5f98e99f0 Bypass reads with a bad CIGAR length 2015-04-21 11:55:56 -04:00
Khalid Shakir 90b579c78e CatVariants now allows different input / output file types.
Escaping the CatVariantsIntegrationTest classpaths for possible spaces in the directory names.
2015-04-13 14:39:46 -03:00
Ron Levine fe87484074 Update -mv example documentation
Made general doc fixes
2015-04-01 02:37:42 -04:00
Geraldine Van der Auwera d7f7022dce Merge pull request #904 from broadinstitute/pd_orig_dp
Added keepOriginalDP argument to SelectVariants
2015-03-30 09:01:33 -04:00
ldgauthier 0101003138 Merge pull request #899 from broadinstitute/ldg_M2_tandemRepeatsAndContamination
Lots of changes to M2:
2015-03-30 07:58:35 -04:00
Geraldine Van der Auwera 87b3dddb39 Merge pull request #894 from broadinstitute/gg_ami_docs_license
Edited ASEReadCounter documentation
2015-03-28 13:15:24 -04:00
Laura Gauthier 5a10758e2e Annotation changes for M2:
Build a ReferenceContext in ActiveRegionWalkers to pass in to annotation engine so we can call the TandemRepeatAnnotator from M2
Make TandemRepeatAnnotator default annotation for M2.
Setup (but don't use yet) HC-style contamination downsampling.
New HC integration test with TandemRepeatAnnotator
2015-03-27 18:25:23 -04:00
Ron Levine aef0a83c52 Automatically choose indexing strategy by file extension 2015-03-27 11:10:35 -04:00
Geraldine Van der Auwera 9b812308b1 Edited ASEReadCounter documentation
Also changed output file variable type from String to Enum
2015-03-26 02:43:53 -04:00
Phillip Dexheimer c97c253ec8 Added keepOriginalDP argument to SelectVariants
Fixes #830
2015-03-25 22:45:31 -04:00
Ami Levy-Moonshine c5fc5c4f8c create 2 new tools:
- ASEReadCounter (public tool) replce Tuuli's script to produce the input to Manny's tool.
   It count the number of reads that support the ref allele and the alt allele, filtereing low qual reads and bases and keep only properPaired reads
- ASECaller (private tool) take both RNA and DNA, and produce ontingencyTables ** still under development **

minor changes in other tools:
- update RNA HC variant calling scala script
- expose FS method pValueForContingencyTable to be able to call it from ASEcaller

In ASEReadCounter:
- allow different option to deal with overlaping read from the same fragment
- add option to ignore or include indels in the pileups
- add option to disabled DuplicateRead

add ASEReadCounterIntegrationTest.java and files for the test
2015-03-21 16:56:00 -04:00
Phillip Dexheimer 4d4d33404e Added gsa.reshape.concordance.table function to gsalib 2015-03-16 22:52:27 -04:00
Geraldine Van der Auwera aa4084d42f Switched VQSR tranches plot ordering rule 2015-03-12 19:57:03 -04:00
Ron Levine bee7f655b7 Log a warning if using incompatible arguments in DepthOfCoverage
Add reference gene list file
2015-03-10 18:14:21 -04:00
Phillip Dexheimer 92c7c103c1 GenotypeConcordance: monomorphic sites in truth are no longer called "Mismatching Alleles" when the comp genotype has an alternate allele
* PT 84700606
2015-02-07 15:54:38 -05:00
Ron Levine 9d4b876ccd Process X and = CIGAR operators
Add simple BaseRecalibrator integration test for CIGAR = and X operators
2015-01-29 17:00:00 -05:00
Phillip Dexheimer 72f76add71 Added -trimAlternates argument to SelectVariants
* PT 84021222
 * -trimAlternates removes all unused alternate alleles from variants.  Note that this is pretty aggressive for monomorphic sites
2015-01-21 21:33:35 -05:00
Ron Levine 804b2a36b7 Fix SplitNCigar reads exception by making the list of RNAReadTransformer non-abstract, add test for -fixNDN
Includes documentation changes for -fixNDN argument and the read transformer documentation.

Documentation changes to CombineVariants
2015-01-14 22:22:05 -05:00
Phillip Dexheimer b73e9d506a Added GATKVCFConstants and GATKVCFHeaderLines to consolidate the GATK-specific VCF annotations
* Removed unused annotations (CCC and HWP)
 * Renamed one of the two GC annotations to "IGC" (for Interval GC)
 * Revved picard & htsjdk (GATK constants are now removed from htsjdk)
 * PT 82046038
2015-01-13 21:32:09 -05:00
Ron Levine 08790e1dab Fix mmultiallelic info field annotation for VariantAnnotator
Add multi-allele test for info field annotations

Fix to process all types of INFO annotations

roll back to previous version, removes INFO and FORMAT

Correct @return for VariantAnnotatorEngine.getNonReferenceAlleles()

Enhance comments and clean up multi-allelic logic, handle header info number = R

only parse counts of A & R

Add INFO for AC

update MD5

Performance enhancement, only parse multiallelic with a count A or R

Make argument final in getNonReferenceAlleles()

Code cleanup, add exceptions for bad expression/allele size mismatch and missing header info for an expression

Change exception to warning for expression value/number of alleles check

remove adevertised exceptions
2014-12-17 22:21:00 -05:00
Phillip Dexheimer 71bdfbe465 Fix VariantsToTable output of FORMAT record lists when -SMA is specified
* PT 84242218
 * Note that FORMAT fields behave the same as INFO fields - if the annotation has a count of A (one entry per Alt Allele), it is split across the multiple output lines.  Otherwise, the entire list is output with each field
2014-12-10 21:41:15 -05:00
Phillip Dexheimer a5dee8a42e Fix NPE in SplitSamFile
* PT 82892316
  * Added integration test
  * Fixed similar error in debug output of HC
2014-12-07 10:37:30 -05:00
rpoplin 00027e1555 Merge pull request #774 from broadinstitute/ldg_makeSelectVariantsTrimAlleles
Add -trim argument to SelectVariants to trim alleles to minimal represen...
2014-11-13 13:58:13 -05:00
Ron Levine 67656bab23 Resolved conflict during rebasing
Add more logging to annotators, change loggers from info to warn

Add comments to testStrandBiasBySample()

Clarify comments in testStrandBiasBySample

remove logic for not prcossing an indel if strand bias (SB) was not computed

remove per variant warnings in annotate()

Log warnings if using the wrong annotator or missing a pedgree file

Log test failures once in annotate(), because HaplotypeCaller does not call initialize(). Avoid using exceptions

Fix so only log once in annotate(), Hardey-Weinberg does not require pedigree files, fix test MD5s so pass

Check if founderIds == null

Update MD5s from HaplotypeCaller integrations tests and clean up code

Change logic so SnpEff does not throw excpetions, change engine to utils in imports

Update test MD5s, return immediately if cannot annotate in SnpEff.initialization()

Post peer review, add more logging warnings

Update MD5 for testHaplotypeCallerMultiSampleComplex1, return null if PossibleDeNovo.annotate() is not called by VariantAnnotator
2014-11-12 02:45:49 -05:00
Laura Gauthier 783a4fd651 Change default behavior of SelectVariants to trim remaining alleles when samples are subset. -noTrim argument preserves original alleles. Add test for trimming. 2014-11-11 16:32:25 -05:00
rpoplin 2ff88d17ca Merge pull request #764 from broadinstitute/ldg_fixCombineVariantsError
Minor change to new CombineVariants error check so identical samples don...
2014-10-31 15:23:23 -04:00
Laura Gauthier 7bae70ec1a Minor change to new CombineVariants error check so identical samples don't need genotypeMergeOption 2014-10-28 08:17:49 -04:00
Khalid Shakir 5c9fe1a06d Split all imports of tools|engine from utils, and all tools from engine.
Second of two commits, modifying actual files.
2014-10-24 20:59:46 +08:00
Khalid Shakir bb7151192a Split all imports of tools|engine from utils, and all tools from engine.
First of two commits, renaming files only.
2014-10-24 20:59:45 +08:00
Geraldine Van der Auwera 3ba94b987c Minor documentation clarifications 2014-10-22 17:54:11 -04:00
rpoplin 0f89d1a362 Merge pull request #755 from broadinstitute/sc_Annotation_Docs_73647570
Improvements to documentation of variant annotations
2014-10-22 13:41:00 -04:00
Khalid Shakir 26ba4c11aa Minor fixups for previous commit once tests (only runnable at Broad) were run.
Fixed off by one error in size calculation IntervalUtils.scatterContigIntervals().
In test for fewer files than intervals, adjusted expected intervals.
In test for more files than intervals, adjusted expected exception.
2014-10-22 17:37:37 +08:00
Chris Smowton a62dc84795 Improved scatter contigs algorithm to be fairer when splitting a large number of contigs into a small number of parts.
See also: http://gatkforums.broadinstitute.org/discussion/comment/16010

Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-10-22 16:26:17 +08:00
Sheila Chandran b3c5ed4414 Improvements to documentation of variant annotations
- Added or modified explanations for majority of variant annotations
	- Generalized NBaseCount to include all tech platforms (not just SOLiD)
2014-10-21 18:20:04 -04:00
Laura Gauthier 0f08065ebc Throw UserException if input VCFs have duplicate samples but no genotypemergeoption is specified 2014-10-15 16:03:10 -04:00
Geraldine Van der Auwera e7e8052f84 Updated license information
- Updated license files (private/protected) for version, address and a couple of legal clauses
- Updated license snippet throught the codebase
2014-10-14 17:10:12 -04:00
rpoplin 426907ddd0 Merge pull request #744 from broadinstitute/gg_gatkdocs_annots_and_GSON
Output JSON version of docs for Galaxy
2014-10-14 11:41:16 -04:00
ldgauthier d259f3c84f Merge pull request #745 from broadinstitute/ldg_VariantAnnotatorDocs
Added docs to VariantFiltration is accordance with new htsjdk changes.  ...
2014-10-10 14:11:24 -04:00
Laura Gauthier 0ecb85d321 Added docs to VariantFiltration is accordance with new htsjdk changes. Fixed typo in VariantAnnotator docs. 2014-10-10 11:54:24 -04:00
Ron Levine 36c27155af Made the threshold for the probability of a state being active a command line argument
remove TODO comment after activeProbThreshold

recover static ACTIVE_PROB_THRESHOLD for unit tests

Add min/max values for active_probability_threshold parameter

Move activeProbThreshold parameter to GATKArguemtnCollection

define ACTIVE_PROB_THRESHOLD in unit tests

add construction of argCollection in in ctor

Move arguments from GATKArgumentCollection to ActiveRegionWalker

Throw exception if threshold < 0 or > 1 in ActivityProfile ctor

max propogation distance parameter to ActiveRegionWalker for AcrtivityProfile

Use polymorphic getMaxProbPropagationDistance() so BandPassActivityProfile computes the crrect region size cutoff

Get the maxProbPropagationDistance from the super class's method, instead of directly, this is safer

Removed extraneous command line imports and make maxProbPropagationDistance a hidden argument

remove limit check for activeProbThreshold, not necessary because the check is made when imput as a command line arg

Remove extra 'region' in the doxygen param description for maxProbPropagationDistance
2014-10-10 10:36:02 -04:00
Geraldine Van der Auwera 3f21f63161 Output JSON version of docs for Galaxy 2014-10-09 06:42:25 -04:00
Ryan Poplin ac1a397024 This warning message actually happens all the time in AssessNA12878 when we subset down to biallelic events but I've verified that it is working as intended. Moving the logging level up to debug. 2014-09-29 11:40:38 -04:00
Phillip Dexheimer 1482a53aba Added -writeFullFormat engine-level argument
* This argument forces GATK to always write every record in the VCF format field, even if some records at the end are missing and could be removed
  * Revved htsjdk and picard
  * PT 70993484
2014-09-17 08:25:27 -04:00
Valentin Ruano-Rubio 95b45443ae Updated test according to changes in the AF calculator framework.
Changes:
-------

* Updated current unit and integration test to use the new API components.
* Added unit tests for new classes AFPriorProvider and AFCalculatorProviders.
* Added integration test for mixed ploidy GenotypeGVCFs and CombineGVCFs
2014-09-12 14:59:47 -04:00
Valentin Ruano-Rubio 3cdeab6e9e GenotypingEngines and walkers now use AFCalc(ulator) providers rathern than instanciate their own (fixed) calculators directly.
Changes:
-------

* GenotypingEngine uses now a AFCalc provider instead of
  its own thread-local with one-time initialized and fixed
  AF calculator.

* All walkers that use a GenotypingEngine now are passing
  the appropiate AF calculator provider. For now most
  just use a fix calculator (FixedAFCalculatorProvider)
  except GenotypeGVCFs as this one now can cope with
  mixture of ploidies failing-over to a general-ploidy
  calculator when the preferred implementation is not
  capable to handle a site's analysis.
2014-09-12 14:25:09 -04:00