Commit Graph

230 Commits (e8b42c4dfd7b2ebbdceafd28b0e5af420c77219e)

Author SHA1 Message Date
ldgauthier dcc6c0f2aa Merge pull request #1306 from broadinstitute/rhl_doc_overlapping_genes
Output coverage for all overlapping genes in DepthOfCoverage
2016-03-08 13:28:30 -05:00
Geraldine Van der Auwera 9a306ca221 Update licenses 2016-03-05 01:09:43 -08:00
Geraldine Van der Auwera 2b70f14740 Misc documentation improvements
Added caveat to VariantFiltration documentation
  Fixed PON creation example in M2 doc
  Improved MalformedReadFilter doc
  Updated N CIGAR error message
2016-03-03 15:48:54 -08:00
seru71 4d203b895a added support for overlapping exons/genes in DepthOfCoverage 2016-03-03 15:09:54 -05:00
Ron Levine 40a5adf767 Change error output to use the correct argument 2016-02-29 13:21:03 -05:00
Geraldine Van der Auwera c93a611ea3 Remove unneeded dependency
Addresses https://github.com/broadgsa/gatk/pull/15 for Guillermo
2016-01-21 16:51:01 -05:00
Laura Gauthier 593c9ddf01 Allow VariantsToTable to evaluate the type of each split variant when -F TYPE and -SMA are specified 2016-01-12 08:12:29 -05:00
Ron Levine d16ed98c9e Backport maxNoCall functionality from GATK4 2016-01-06 11:09:38 -05:00
Ron Levine 9c8f035780 LeftAlignAndTrimVariants --splitMultiallelics keeps GT if valid 2015-12-14 10:42:32 -05:00
Geraldine Van der Auwera 4767a83d8a Update pom versions to mark the start of GATK 3.6 development 2015-11-25 01:52:51 -05:00
Geraldine Van der Auwera 9749adf22a Merge pull request #1236 from broadinstitute/gvda_prep_M2_release_1201
Prep MuTect2 for release
2015-11-24 20:12:57 -05:00
Geraldine Van der Auwera bf875974d1 Prep MuTect2 and ContEst for release
Renamed M2 to MuTect2
    Renamed ContaminationWalker to ContEst
    Refactored related tests and usages (including in Queue scripts)
    Moved M2 and ContEst + accompanying classes from private to protected
    Made QSS a StandardSomaticAnnotation (new annotation group/interface) to prevent it from being sucked in with the rest of the StandardAnnotation group
2015-11-24 16:43:20 -05:00
Geraldine Van der Auwera 88a0514ec7 Fix bug where gatkdocs of RodWalkers reported default LocusWalker downsampling settings 2015-11-23 17:53:19 -05:00
Geraldine Van der Auwera 22fa1511be Merge pull request #1235 from broadinstitute/gvda_deprecate_useless_tools_1192
Deprecate tools that were outdated or redundant
2015-11-21 14:58:00 -05:00
Geraldine Van der Auwera 1cf66addaa Deprecate tools that were outdated or redundant
ReadAdaptorTrimmer (unsound and untested)
BaseCoverageDistribution (redundant with DiagnoseTargets)
CoveredByNSamplesSites (redundant with DiagnoseTargets)
FindCoveredIntervals (redundant with DiagnoseTargets)
VariantValidationAssessor (has a scary TODO -- REWRITE THIS TO WORK WITH VARIANT CONTEXT comment and zero tests)
LiftOverVariants, FilterLiftedVariants and liftOverVCF.pl (in #1106) (use Picard liftover tool)
sortByRef.pl (use Picard SortVCF)
ListAnnotations (useless)

Also deleted the java archive from the private repository (old junk we never use)
2015-11-20 22:49:40 -05:00
meganshand 2570cab24c Assorted documentation fixes, enhancements and reorganization.
See issues referenced by the pull request for details.
2015-11-20 22:44:46 -05:00
Yossi Farjoun 4da0d1300c adding fraction informative reads annotation. 2015-11-18 08:39:47 -05:00
Laura Gauthier 25b8ba45f4 More allele-specific annotations: AS_QD and AS_InbreedingCoeff
Grouped default output annotations to keep them from getting dropped when -A is specified; addresses #918
Also refactored code shared by ExcessHet and InbreedingCoeff
2015-11-09 16:38:31 -05:00
meganshand e4627ed5c3 Addressing comments 2015-11-04 11:00:01 -05:00
meganshand b5165b8d30 Fix for out of date VCF version output 2015-11-03 17:35:47 -05:00
ldgauthier 3d1dc303b3 Merge pull request #1197 from broadinstitute/ts_ve_nullPointer
Prevent null pointer exception in PrintMissingComp module
2015-11-02 14:42:50 -05:00
Takuto Sato 33462c7b50 Removed the line that caused a null pointer, as the information it logged was not useful. Updated docs and added an integration test to ensure the code no longer throws the exception. 2015-11-02 12:45:09 -05:00
Laura Gauthier f7eb5d3082 Enable family-level stratification (if a ped file is provided) 2015-10-28 09:55:04 -04:00
Laura Gauthier fcaf37279c Finished draft of code for new map-combine-reduce annotation framework
All VQSR annotations can be generated in allele-specific mode
Pull out allele-specific annotations in AS_Standard annotation group
2015-10-27 09:23:29 -04:00
Ron Levine 36ca9fe898 Allow LeftAlignAndTrimVariants to handle alleles longer than the default processing window 2015-10-25 20:33:56 -04:00
Ron Levine 795fe75886 Update doc for multiallelics, trimming is the default behavior 2015-10-22 04:04:09 -04:00
Takuto Sato df7a482335 VariantAnnotator now supports annotating FILTER field from an external resource.
Updated the docs.
2015-10-14 14:26:21 -04:00
Ron Levine 2bcded11cb VariantAnnotator checks alleles when annotationg with external resource 2015-10-08 17:01:30 -04:00
Kate Noblett 506958a0b7 Implemented a new VariantEval evaulation module, MetricsCollection. Fixed null pointer exception, updated tests. 2015-09-30 17:21:30 -04:00
Geraldine Van der Auwera 118c559278 Trivial doc typo fix 2015-09-25 18:15:29 -04:00
Ami Levy Moonshine 1ad00cc9d4 fix typo in the ASEReadCounter document 2015-09-21 15:30:06 -04:00
Ron Levine 3ecabf7e45 Allow overriding ValidateVariants' hard-coded cutoff for allele length 2015-09-17 10:49:14 -04:00
Ron Levine 83a7012d69 Mask snps with --snpmask 2015-09-09 16:20:48 -04:00
Ron Levine 29ac64f6ce Calculate GenotypeAnnotations before InfoFieldAnnotations 2015-09-03 09:22:46 -04:00
Ron Levine 2afe3f7a21 Make GenotypeGVCFs subset Strand Allele Counts intelligently 2015-08-22 08:33:09 -04:00
Ron Levine 900fe3f675 Merge pull request #1132 from broadinstitute/rhl_rev_htsjdk
Move htsjdk & picard to rev 1.138
2015-08-20 11:58:41 -04:00
Bertrand Haas eae4c875a9 Logistic transform of MQ + jitter to capped MQ in VariantDataManager 2015-08-20 11:10:45 -04:00
Ron Levine beec624a63 Move htsjdk & picard to rev 1.138 2015-08-20 10:42:25 -04:00
Khalid Shakir 9bee183f6c Switched to using CRAM's SamReader.Indexing implementation.
CRAM now requires .bai index, just like BAM.
Test updates:
- Updated existing MD5s, as TLEN has changed.
- Tests multiple contigs.
- Tests several intervals per contig.
- Tests when `.cram.bai` is missing, even when `.cram.crai` is present.
Updated gatk docs for CRAM support, including:
- Arguments that work for both BAM and CRAM listed as such.
- Arguments that don't work for CRAM either explicitly say "BAM" or "doesn't work for CRAM".
- Instructions on how to recreate a `.cram.bai` using cramtools.
Cleaned up IntelliJ IDEA warnings regarding `Arrays.asList()` -> `Collections.singletonList()`.
2015-08-11 17:52:49 -03:00
Geraldine Van der Auwera 19bbe45cbc Updated licenses for 2015 2015-08-06 15:23:11 -04:00
David Benjamin ddb01058d3 moved DiffObjects 2015-08-05 21:19:02 -04:00
Geraldine Van der Auwera 875c7ffa1a Fixed typos and made some argument docs improvements 2015-07-29 23:06:19 -04:00
Louis Bergelson 9d9827f176 Merge pull request #1031 from broadinstitute/lb_update_for_java8
Updated gatk so it compiles with java 8
2015-07-28 11:09:19 -04:00
Joseph White 3bd988825f Removed walkers for handling Beagle data
Added deprecation statements to DeprecatedToolChecks.java
    Removed integration test for Beagle walker
    Added URL for Beagle documentation
2015-07-21 18:36:08 -04:00
Eric Banks 178bf12b27 Merge pull request #1046 from broadinstitute/rhl_catvariants_sort
Fix for mis-sorted VCF files in CatVariants
2015-07-21 17:37:27 -04:00
Ron Levine 6e46b3696e Merge contiguous intervals properly 2015-07-14 15:23:37 -04:00
John Wallace 8fc631b7ae Fix for mis-sorted VCF files in CatVariants
When using CatVariants, VCF files were being sorted solely on the base
pair position of the first record, ignoring the chromosome.  This can
become problematic when merging files from different chromosomes,
espeically if you have multiple VCFs per chromosome.

As an example, assume the following 3 lines are all in separate files:
1       10
1       100
2       20

The merged VCF from CatVariants (without -assumeSorted) would read:
1       10
2       20
1       100

This has the potential to break tools that expect chromosomes to be
contiguous within a VCF file.

This commit changes the comparator from one of Pair<Integer, File> to
one of Pair<VariantContext, File>.  We construct a
VariantContextComparator from the provided reference, which will sort
the first record by chromosome and position properly.  Additionally, if
-assumeSorted is given, we simply use a null VariantContext as the first
record, which will all be equal (as all will be null)
2015-07-14 14:12:31 -04:00
Louis Bergelson e1c41b2c38 Updated gatk so it compiles on java 8
updated cofoja to 1.2 from 1.0
added explicit type casts in places that java 8 required them
2015-06-26 15:59:46 -04:00
Ron Levine b35085ca28 Indexing parameters not required if output file has the g.vcf.gz extensionv 2015-06-13 11:46:56 -04:00
Geraldine Van der Auwera 95f2899f05 User (mnw21cam) patch to fix DoC slowdown in 3.4 2015-06-05 21:12:46 -04:00
Ron Levine a6ca97ef14 Site-level selection based on genotype filter status 2015-05-21 11:27:20 -04:00
Geraldine Van der Auwera d1a7edd796 Update pom versions to mark the start of GATK 3.5 development 2015-05-15 00:44:54 -04:00
Geraldine Van der Auwera f19618653a Update pom versions for the 3.4 release 2015-05-15 00:40:39 -04:00
Geraldine Van der Auwera 8b20523f5e Merge pull request #979 from broadinstitute/ami-fixASE-bug
solve bug - now work also when the reads does not have mate
2015-05-14 21:09:52 -04:00
David Roazen caafe84e74 Rev htsjdk to version 1.132 and picard to version 1.131, and switch to using the versions in maven central
-We now pull htsjdk and picard from maven central.

-Updated the GATK codebase as necessary to adapt to changes in the Feature
 interface.

-Since VCFHeader now requires that all header lines have unique keys, uniquified
 the keys of GVCFBlock header lines by including the min/max GQ in the key.
 Updated MD5s accordingly.

-Other MD5s changed as a result of an htsjdk fix to eliminate "-0" in VCF output.
2015-05-14 15:26:23 -04:00
Ami Levy-Moonshine 536d550794 solve bug - now work also when the reads does not have mate
reads with no mate will be counted as valid reads
2015-05-12 17:51:01 -04:00
Ron Levine 4a75d54e65 Added invert and exclude flags for variant selection queries 2015-05-12 15:08:28 -04:00
Joseph White abb6bc6f57 Correct errant array element swap in FAM file output.
dad and mom are swapped; paternal first, then maternal

updated MD5 chksums for test files

remove commented lines
2015-05-11 20:45:50 -04:00
Geraldine Van der Auwera 5d8b9a7c20 Moved MQ0 out of HC exclusion and into StandardUGAnnotation 2015-05-03 01:04:49 +02:00
Geraldine Van der Auwera 071d82d1bf Un-exclude SD and TRA from HC annotators; resolves #966
Exclude MQ0BySample
Move SD and TRA to new StandardUGAnnotation interface
There is now annotation interface (StandardUGAnnotation) holding annots that are standard in UG but should't be used as they are now with HC. This allows us to not have to exclude these annotations explicitly in HC, but still be able to use them for development purposes.
2015-05-03 00:45:53 +02:00
Geraldine Van der Auwera e49f6dfd0f Merge pull request #970 from broadinstitute/gg_minor_docfixes
Fairly minor if plentiful fixes to various gatkdocs. Merging this without formal review since all tests pass, the gatkdocs build, and no one really wants to review corrections to grammar, typos and layout for 120+ documents. Review will be done by users in production ;-)
2015-05-03 00:36:12 +02:00
Geraldine Van der Auwera 919c3eaa2e Numerous doc fixes; mostly formatting and clarifications 2015-05-03 00:28:46 +02:00
Ron Levine 9ff827c83a More allele trimming for VariantAnnotator 2015-04-29 21:11:49 -04:00
Ron Levine d5f98e99f0 Bypass reads with a bad CIGAR length 2015-04-21 11:55:56 -04:00
Khalid Shakir 90b579c78e CatVariants now allows different input / output file types.
Escaping the CatVariantsIntegrationTest classpaths for possible spaces in the directory names.
2015-04-13 14:39:46 -03:00
Ron Levine fe87484074 Update -mv example documentation
Made general doc fixes
2015-04-01 02:37:42 -04:00
Geraldine Van der Auwera d7f7022dce Merge pull request #904 from broadinstitute/pd_orig_dp
Added keepOriginalDP argument to SelectVariants
2015-03-30 09:01:33 -04:00
ldgauthier 0101003138 Merge pull request #899 from broadinstitute/ldg_M2_tandemRepeatsAndContamination
Lots of changes to M2:
2015-03-30 07:58:35 -04:00
Geraldine Van der Auwera 87b3dddb39 Merge pull request #894 from broadinstitute/gg_ami_docs_license
Edited ASEReadCounter documentation
2015-03-28 13:15:24 -04:00
Laura Gauthier 5a10758e2e Annotation changes for M2:
Build a ReferenceContext in ActiveRegionWalkers to pass in to annotation engine so we can call the TandemRepeatAnnotator from M2
Make TandemRepeatAnnotator default annotation for M2.
Setup (but don't use yet) HC-style contamination downsampling.
New HC integration test with TandemRepeatAnnotator
2015-03-27 18:25:23 -04:00
Ron Levine aef0a83c52 Automatically choose indexing strategy by file extension 2015-03-27 11:10:35 -04:00
Geraldine Van der Auwera 9b812308b1 Edited ASEReadCounter documentation
Also changed output file variable type from String to Enum
2015-03-26 02:43:53 -04:00
Phillip Dexheimer c97c253ec8 Added keepOriginalDP argument to SelectVariants
Fixes #830
2015-03-25 22:45:31 -04:00
Ami Levy-Moonshine c5fc5c4f8c create 2 new tools:
- ASEReadCounter (public tool) replce Tuuli's script to produce the input to Manny's tool.
   It count the number of reads that support the ref allele and the alt allele, filtereing low qual reads and bases and keep only properPaired reads
- ASECaller (private tool) take both RNA and DNA, and produce ontingencyTables ** still under development **

minor changes in other tools:
- update RNA HC variant calling scala script
- expose FS method pValueForContingencyTable to be able to call it from ASEcaller

In ASEReadCounter:
- allow different option to deal with overlaping read from the same fragment
- add option to ignore or include indels in the pileups
- add option to disabled DuplicateRead

add ASEReadCounterIntegrationTest.java and files for the test
2015-03-21 16:56:00 -04:00
Phillip Dexheimer 4d4d33404e Added gsa.reshape.concordance.table function to gsalib 2015-03-16 22:52:27 -04:00
Geraldine Van der Auwera aa4084d42f Switched VQSR tranches plot ordering rule 2015-03-12 19:57:03 -04:00
Ron Levine bee7f655b7 Log a warning if using incompatible arguments in DepthOfCoverage
Add reference gene list file
2015-03-10 18:14:21 -04:00
Phillip Dexheimer 92c7c103c1 GenotypeConcordance: monomorphic sites in truth are no longer called "Mismatching Alleles" when the comp genotype has an alternate allele
* PT 84700606
2015-02-07 15:54:38 -05:00
Ron Levine 9d4b876ccd Process X and = CIGAR operators
Add simple BaseRecalibrator integration test for CIGAR = and X operators
2015-01-29 17:00:00 -05:00
Phillip Dexheimer 72f76add71 Added -trimAlternates argument to SelectVariants
* PT 84021222
 * -trimAlternates removes all unused alternate alleles from variants.  Note that this is pretty aggressive for monomorphic sites
2015-01-21 21:33:35 -05:00
Ron Levine 804b2a36b7 Fix SplitNCigar reads exception by making the list of RNAReadTransformer non-abstract, add test for -fixNDN
Includes documentation changes for -fixNDN argument and the read transformer documentation.

Documentation changes to CombineVariants
2015-01-14 22:22:05 -05:00
Phillip Dexheimer b73e9d506a Added GATKVCFConstants and GATKVCFHeaderLines to consolidate the GATK-specific VCF annotations
* Removed unused annotations (CCC and HWP)
 * Renamed one of the two GC annotations to "IGC" (for Interval GC)
 * Revved picard & htsjdk (GATK constants are now removed from htsjdk)
 * PT 82046038
2015-01-13 21:32:09 -05:00
Ron Levine 08790e1dab Fix mmultiallelic info field annotation for VariantAnnotator
Add multi-allele test for info field annotations

Fix to process all types of INFO annotations

roll back to previous version, removes INFO and FORMAT

Correct @return for VariantAnnotatorEngine.getNonReferenceAlleles()

Enhance comments and clean up multi-allelic logic, handle header info number = R

only parse counts of A & R

Add INFO for AC

update MD5

Performance enhancement, only parse multiallelic with a count A or R

Make argument final in getNonReferenceAlleles()

Code cleanup, add exceptions for bad expression/allele size mismatch and missing header info for an expression

Change exception to warning for expression value/number of alleles check

remove adevertised exceptions
2014-12-17 22:21:00 -05:00
Phillip Dexheimer 71bdfbe465 Fix VariantsToTable output of FORMAT record lists when -SMA is specified
* PT 84242218
 * Note that FORMAT fields behave the same as INFO fields - if the annotation has a count of A (one entry per Alt Allele), it is split across the multiple output lines.  Otherwise, the entire list is output with each field
2014-12-10 21:41:15 -05:00
Phillip Dexheimer a5dee8a42e Fix NPE in SplitSamFile
* PT 82892316
  * Added integration test
  * Fixed similar error in debug output of HC
2014-12-07 10:37:30 -05:00
rpoplin 00027e1555 Merge pull request #774 from broadinstitute/ldg_makeSelectVariantsTrimAlleles
Add -trim argument to SelectVariants to trim alleles to minimal represen...
2014-11-13 13:58:13 -05:00
Ron Levine 67656bab23 Resolved conflict during rebasing
Add more logging to annotators, change loggers from info to warn

Add comments to testStrandBiasBySample()

Clarify comments in testStrandBiasBySample

remove logic for not prcossing an indel if strand bias (SB) was not computed

remove per variant warnings in annotate()

Log warnings if using the wrong annotator or missing a pedgree file

Log test failures once in annotate(), because HaplotypeCaller does not call initialize(). Avoid using exceptions

Fix so only log once in annotate(), Hardey-Weinberg does not require pedigree files, fix test MD5s so pass

Check if founderIds == null

Update MD5s from HaplotypeCaller integrations tests and clean up code

Change logic so SnpEff does not throw excpetions, change engine to utils in imports

Update test MD5s, return immediately if cannot annotate in SnpEff.initialization()

Post peer review, add more logging warnings

Update MD5 for testHaplotypeCallerMultiSampleComplex1, return null if PossibleDeNovo.annotate() is not called by VariantAnnotator
2014-11-12 02:45:49 -05:00
Laura Gauthier 783a4fd651 Change default behavior of SelectVariants to trim remaining alleles when samples are subset. -noTrim argument preserves original alleles. Add test for trimming. 2014-11-11 16:32:25 -05:00
Khalid Shakir 0092a0b9eb Faster builds, with updates to documentation generation.
Reading the multiple GATKText files as a single stream, especially with new top level target executable jar files pointing to a lib folder.
Don't dirty the build with a new GATKText.properties if input files are unmodified.
Stop warning on undocumented abstract classes.
Fixed ClassNotFoundException/NoClassDefFoundError by fixing ResourceBundleExtractorDoclet artifact.
Excluding Exceptions from documentation.
Removed custom log4j dependency from ResourceBundleExtractorDoclet.
Stop generating the dependency reduced pom during shade.
Stop regenerating gsalib when the files are already up to date.
Disabled mvn site generation from external-example.
2014-11-05 00:32:23 +08:00
rpoplin 2ff88d17ca Merge pull request #764 from broadinstitute/ldg_fixCombineVariantsError
Minor change to new CombineVariants error check so identical samples don...
2014-10-31 15:23:23 -04:00
Laura Gauthier 7bae70ec1a Minor change to new CombineVariants error check so identical samples don't need genotypeMergeOption 2014-10-28 08:17:49 -04:00
Khalid Shakir 5c9fe1a06d Split all imports of tools|engine from utils, and all tools from engine.
Second of two commits, modifying actual files.
2014-10-24 20:59:46 +08:00
Khalid Shakir bb7151192a Split all imports of tools|engine from utils, and all tools from engine.
First of two commits, renaming files only.
2014-10-24 20:59:45 +08:00
Geraldine Van der Auwera b69b256003 Update pom versions to mark the start of GATK 3.4 development 2014-10-23 22:31:44 -04:00
Geraldine Van der Auwera eee94ec81f Update pom versions for the 3.3 release 2014-10-23 22:25:17 -04:00
Geraldine Van der Auwera 3ba94b987c Minor documentation clarifications 2014-10-22 17:54:11 -04:00
rpoplin 0f89d1a362 Merge pull request #755 from broadinstitute/sc_Annotation_Docs_73647570
Improvements to documentation of variant annotations
2014-10-22 13:41:00 -04:00
Khalid Shakir 26ba4c11aa Minor fixups for previous commit once tests (only runnable at Broad) were run.
Fixed off by one error in size calculation IntervalUtils.scatterContigIntervals().
In test for fewer files than intervals, adjusted expected intervals.
In test for more files than intervals, adjusted expected exception.
2014-10-22 17:37:37 +08:00
Chris Smowton a62dc84795 Improved scatter contigs algorithm to be fairer when splitting a large number of contigs into a small number of parts.
See also: http://gatkforums.broadinstitute.org/discussion/comment/16010

Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-10-22 16:26:17 +08:00
Sheila Chandran b3c5ed4414 Improvements to documentation of variant annotations
- Added or modified explanations for majority of variant annotations
	- Generalized NBaseCount to include all tech platforms (not just SOLiD)
2014-10-21 18:20:04 -04:00