Commit Graph

13483 Commits (c191103326d2f515a4ec08033eb4d0463affafdb)

Author SHA1 Message Date
Ryan Poplin 8d5a7d412b Merge pull request #615 from broadinstitute/ami-createCigarDNFilter
create a new read filter (transformer) that refactor NDN cigar elements ...
2014-04-28 13:31:04 -04:00
Carlos Borroto b7a59e01aa Removed setting of a default queue in PbsEngineJobRunner. Discussed here: http://gatkforums.broadinstitute.org/discussion/3959/would-it-be-possible-for-pbsengine-jobrunner-not-to-set-a-default-queue
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-04-29 00:44:12 +08:00
Ami Levy-Moonshine 13dd755468 create a new read transformer that refactor NDN cigar elements to one N element.
story:
https://www.pivotaltracker.com/story/show/69648104

description:
This read transformer will refactor cigar strings that contain N-D-N elements to one N element (with total length of the three refactored elements).
This is intended primarily for users of RNA-Seq data handling programs such as TopHat2.
Currently we consider that the internal N-D-N motif is illegal and we error out when we encounter it. By refactoring the cigar string of
those specific reads, users of TopHat and other tools can circumvent this problem without affecting the rest of their dataset.

edit: address review comments - change the tool's name and change the tool to be a readTransformer instead of read filter
2014-04-28 11:29:00 -04:00
Eric Banks 385fe5fb56 Merge pull request #614 from broadinstitute/rp_fix_GenotypeGVCF_VCF_headers
GenotypeGVCF was pulling the headers from all input rods including DBsnp...
2014-04-25 15:36:35 -04:00
Ryan Poplin 221b999cb0 GenotypeGVCF was pulling the headers from all input rods including DBsnp. Now it pulls from just the input variant rods. 2014-04-25 13:16:28 -04:00
ldgauthier 147ae21253 Merge pull request #606 from broadinstitute/ldg_CalibrateLikelihoodsForCGP
Improvements to CalculateGenotypePosteriors and CalibrateGenotypeLikelih...
2014-04-24 10:58:40 -04:00
Laura Gauthier 9f3cbb2ef1 Improvements to CalculateGenotypePosteriors and CalibrateGenotypeLikelihoods
CalculateGenotypePosteriors now only computes posterior probs for SNP sites with SNP priors
(other sites have flat priors applied)

CalibrateGenotypeLikelihoods had originally applied HOM_REF/HET/HOM_VAR frequencies in callset as priors before empirical quality analysis. Now has option (-noPriors) to not apply/apply flat priors. Also takes in new external probabilities files, such as those generated by CGP, from which the genotype posterior probability qualities will be read.

Integration test was changed to account for new SNP-only behavior and default behavior to not use missing priors.

(Also, new numRefIfMissing is 0, which should only matter in cases using few samples when you probably don't want to be doing that anyway!)
2014-04-24 08:49:42 -04:00
amilev 92a3aa35d5 Merge pull request #613 from broadinstitute/ami-RNAEdttingTool
create a new tool CountMutationTypes
2014-04-23 17:17:02 -04:00
Ami Levy-Moonshine 9e5333f1d1 create a new tool CountMutationTypes
The new tool gets an VCF file as an input and create a GATK report with the percentages of each mutation type (e.g. A->G, A->T...).
It allow the user to filter sites that will be count based of JXEL or based on the varait quals
A user can aslo print 12 VCF files (one for each mutation) with the VCF line of the mutations that were counted.
2014-04-23 14:22:33 -04:00
droazen 58c8b2dd84 Merge pull request #611 from broadinstitute/mm_otf_sample_rename_support_whitespacing_sample_names
Allow for whitespace in sample names when performing on-the-fly sample-renaming.
2014-04-22 13:01:15 -04:00
Michael McCowan 8290d3c8ac Allow for non-tab whitespace in sample names when performing on-the-fly sample-renaming. 2014-04-22 11:07:13 -04:00
Valentin Ruano Rubio d38835822e Merge pull request #612 from broadinstitute/vrr_integration_test_error_quickfix
Fixed integration test problems from previous premature merge
2014-04-20 18:40:22 -04:00
Valentin Ruano-Rubio e610373169 Fixed integration test problems from previous premature merge 2014-04-20 17:11:51 -04:00
MauricioCarneiro f03e5ffeb1 Merge pull request #604 from broadinstitute/vrr_hc_omniploidy_general_api
Disentangle UG and HC Genotyper engines.
2014-04-20 07:43:23 -04:00
Valentin Ruano-Rubio 4e5850966a Reengineer engine constructors 2014-04-19 17:58:14 -04:00
Valentin Ruano-Rubio 7455ac9796 Addressed revisions 2014-04-19 16:48:48 -04:00
Ryan Poplin a9a48f2459 Merge pull request #607 from broadinstitute/mm_bugfix_raise_mathutils_n_ceiling
Support more samples in math utilities.
2014-04-17 13:32:34 -04:00
jmthibault79 b840cf6b3f Merge pull request #610 from broadinstitute/jt_block_compressed_vcfs
Enable reading of other extensions for block-compressed VCFs
2014-04-17 12:32:49 -04:00
Joel Thibault 1ab50f4ba8 CatVariants now handles BCF and Block-Compressed VCF
[Delivers #67461500]
2014-04-17 12:31:38 -04:00
Kristian Cibulskis 6b9e38c8bb incorporated comments from review, made variables final, made AF paramater hidden, and added bounds checking to AF value 2014-04-16 19:29:25 -04:00
Kristian Cibulskis 7115cadbd8 extended SimulateReadsForVariants to optionally use the AF field to indicate allele fraction of the simulated event, useful in cancer and other variable ploidy use cases 2014-04-16 16:20:02 -04:00
Joel Thibault 4c74319578 Update for Picard refactoring which improves block-compressed VCF reading
[Delivers #69215404]
2014-04-16 14:39:23 -04:00
Joel Thibault fd09cb7143 Rev Picard 1.111.1920 2014-04-16 14:39:19 -04:00
Joel Thibault f98df5c071 Integration test for the file extensions CatVariants should handle 2014-04-16 13:25:47 -04:00
Joel Thibault bdd7024d00 Integration test for block-compressed VCF reading 2014-04-16 13:09:40 -04:00
Joel Thibault ce770b032a Move execAndCheck() to ProcessController 2014-04-16 13:09:40 -04:00
Joel Thibault b197618d13 This comment is no longer true 2014-04-15 15:42:39 -04:00
MauricioCarneiro 34ece31f4a Merge pull request #605 from broadinstitute/ks_escape_dir_names
Quoting -out parameter during resource bundle creation
2014-04-15 05:56:35 -04:00
Khalid Shakir 218fe3875a Quoting -out parameter during resource bundle (StingText.properties) creation.
Fixes case where directory has parenthesis in it, like "Dropbox (Broad Dropbox1)".
2014-04-15 17:06:49 +08:00
Mike f0732d386c Support more samples in math utilities.
- Amend `MathUtils`' constants such that they support callings in excess of 70,000 samples (instead, 100,000).
2014-04-14 12:05:38 -04:00
Valentin Ruano-Rubio 08203b516e Disentangle UG and HC Genotyper engines.
Description:

  Transforms a delegation dependency from HC to UG genotyping engine into a reusage by inhertance where HC and UG engines inherit from a common superclass GenotyperEngine
  that implements the common parts. A side-effect some of the code is now more clear and redundant code has been removed.

  Changes have a few consequence for the end user. HC has now a few more user arguments, those that control the functionality that HC was borrowing directly from UGE.

     Added -ploidy argument although it is contraint to be 2 for now.
     Added -out_mode EMIT_ALL_SITES|EMIT_VARIANTS_ONLY ...
     Added -allSitePLs flag.

Stories:

   https://www.pivotaltracker.com/story/show/68017394

Changes:

   - Moved (HC's) GenotyperEngine to HaplotypeCallerGenotyperEngine (HCGE). Then created a engine superclass class GenotypingEngine (GE) that contains common parts between HCGE and the UG counterpart 'UnifiedGenotypingEngine' (UGE). Simplified the code and applied the template pattern to accomodate for small diferences in behaviour between both caller
   engines. (There is still room for improvement though).

   - Moved inner classes and enums to top-level components for various reasons including making them shorter and simpler names to refer to them.

   - Create a HomoSpiens class for Human specific constants; even if they are good default for most users we need to clearly identify the human assumption across the code if we want to make
   GATK work with any species in general; i.e. any reference to HomoSapiens, except as a default value for a user argument, should smell.

   - Fixed a bug deep in the genotyping calculation we were taking on fixed values for snp and indel heterozygisity to be the default for Human ignoring user arguments.

   - GenotypingLikehooldCalculationCModel.Model to Gen.*Like.*Calc.*Model.Name; not a definitive solution though as names are used often in conditionals that perhaps should be member methods of the
     GenLikeCalc classes.

   - Renamed LikelihoodCalculationEngine to ReadLikelihoodCalculationEngine to distinguish them clearly from Genotype likelihood calculation engines.

   - Changed copy by explicity argument listing to a clone/reflexion solution for casting between genotypers argument collection classes.

   - Created GenotypeGivenAllelesUtils to collect methods needed nearly exclusively by the GGA mode.

Tests :

    - StandardCallerArgumentCollectionUnitTest (check copy by cloning/reflexion).
    - All existing integration and unit tests for modified classes.
2014-04-13 03:09:55 -04:00
Ryan Poplin 4b140c9e48 Merge pull request #600 from broadinstitute/rp_random_forest_no_QUAL
Improvements to the Random Forest pipeline based on Marathon results.
2014-04-11 13:41:05 -04:00
Ryan Poplin 04ddbac585 Improvements to the Random Forest pipeline based on Marathon results.
-- We no longer use QUAL because it scales insidiously with AC.
-- By default we exclude sites in which NA12878 is polymorphic to prevent overfitting to the knowledgebase.
-- Tweaks to training parameters were required because of the QUAL change.
-- We now test for model convergence instead of specifying the number of iterations at the command line.
2014-04-11 12:16:05 -04:00
kshakir 6d58e61f23 Merge pull request #603 from broadinstitute/ks_specify_columns_analyzerunreports
Mapping fields to explicit column names in analyzeRunReports.py
2014-04-11 04:30:31 +08:00
Khalid Shakir c84235c17c Mapping fields to explicit column names in analyzeRunReports.py.
Removed SQLSetupTable support.
2014-04-11 04:28:33 +08:00
Eric Banks e38a295ebd Merge pull request #601 from broadinstitute/ami-updateScalaScript
update scala scrits to include more of the pipeline stpes
2014-04-10 16:01:02 -04:00
droazen 1590f06322 Merge pull request #602 from broadinstitute/use_version_controlled_scripts_for_s3_dl
Use version-controlled copies of scripts in GATKReports downloader
2014-04-10 15:40:37 -04:00
David Roazen 147bd88d26 Use version-controlled copies of scripts in GATKReports downloader 2014-04-10 15:39:06 -04:00
Ami Levy-Moonshine 40360ddb56 update scala scrits to include more of the pipeline stpes
Add a new script for evaluating the RNA-seq downsample results
2014-04-10 15:29:17 -04:00
jmthibault79 c275d76a3e Merge pull request #599 from broadinstitute/jt_logging_test
Integration test for logging to stderr
2014-04-09 15:31:51 -04:00
Joel Thibault c84126205b Test that stdout redirects and log files do not affect output 2014-04-09 13:52:42 -04:00
Joel Thibault 1103fd231a Better exception message 2014-04-09 10:51:45 -04:00
Ryan Poplin 1001a75d0e Merge pull request #598 from broadinstitute/rp_random_forest_fix_tranches
Bug fix for correctly parsing the tranche tag in the RandomForestWalker.
2014-04-09 09:28:23 -04:00
kshakir 5b32b7b191 Merge pull request #595 from broadinstitute/ks_picard_matecigar_update
After comments from @nh13, updated latest picard and setMateInfo call.
2014-04-09 10:30:22 +08:00
Ryan Poplin edd15add7c Bug fix for correctly parsing the tranche tag in the RandomForestWalker. 2014-04-08 15:39:17 -04:00
Khalid Shakir a6b0754990 After comments from @nh13, updated latest picard and setMateInfo call. 2014-04-08 15:22:45 -04:00
kshakir cc580ac75f Merge pull request #593 from broadinstitute/ks_bqsrgatherer_missing_readgroups_68720468
BQSRGatherer handles missing read groups from some input files.
2014-04-09 03:17:53 +08:00
Khalid Shakir 3047d6ff32 BQSRGatherer handles missing read groups from some input files. [#68720468] 2014-04-08 23:58:54 +08:00
Eric Banks b07c0a6b4c Merge pull request #594 from broadinstitute/dr_vcf_sample_renaming
Extend on-the-fly sample renaming feature to vcfs
2014-04-08 11:47:45 -04:00
David Roazen af6a897479 Extend on-the-fly sample renaming feature to vcfs
-Only works with single-sample vcfs

-As with bams, the user must provide a file mapping the absolute path to
 each vcf whose samples are to be renamed to the new sample name for that
 vcf. The argument is the same as for bams: --sample_rename_mapping_file,
 and the mapping file may contain a mix of bam and vcf files should the
 user wish.

-It's an error to attempt to remap the sample names of a multi-sample
 or sites-only vcf

-Implemented at the codec level at the instant the vcf header is first
 read in to minimize the chances of downstream code examining vcf
 headers/records before renaming occurs.

-Integration tests are in sting, unit tests are in picard

-Rev picard et. al. to 1.111.1902
2014-04-08 11:07:00 -04:00