Commit Graph

13332 Commits (e690ed1a6793c4192336bf6e774980566c43e37b)

Author SHA1 Message Date
Eric Banks e690ed1a67 The contig is named MT not M in b36. Delivers PT68890442. 2014-04-08 10:03:47 -04:00
Eric Banks 85f68f610e Merge pull request #596 from broadinstitute/eb_fix_na12878_roc_curve_maker
Don't error out with ArithmeticException in ROC maker when using small sets
2014-04-08 09:56:07 -04:00
Eric Banks ad336375dc Merge pull request #590 from broadinstitute/vrr_validate_variants_unused_alleles_fix
Addresses issue with strict validation on GVCF files.
2014-04-07 22:10:49 -04:00
Ryan Poplin 416ccef0c5 Merge pull request #592 from broadinstitute/rp_random_forest_improvements
Balancing training classes between SNP/Indel and TP/FP.
2014-04-07 21:22:45 -04:00
Valentin Ruano-Rubio 5afcc8e05f Change in the command line interface of ValidateVariants.
Following reviewers comments the command line interface has been simplified.
All extra strict validations are performed by default (as before) and the
user has to indicate which one he/she does not want to use with --validationTypeToExclude.

Before he/she was able to indicate the only ones to apply with --validationType but that has been scrapped out.

Stories:

    - https://www.pivotaltracker.com/story/show/68725164

Changes:

    - Removed validateType argument.
    - Improved documentation.
    - Added some warnning log message on suspicious argument combinations.

Tests:

    - ValidateVariantsIntegrationTest#*
2014-04-07 16:27:11 -04:00
Ryan Poplin 7d11b4d5f1 Balancing training classes between SNP/Indel and TP/FP.
-- This results in much more consistent distribution of LOD scores for SNPs and Indels.
-- Removing genotype summary stats since they are now produced by default.
-- Added functionality to specify certain subsets of the training data to be used in Tranche file generation, -good:tranche=true set.vcf
2014-04-07 15:23:53 -04:00
Eric Banks de2a2442d9 Merge pull request #591 from broadinstitute/rp_add_genotype_summary_annotations
Adding GenotypeSummaries as INFO field annotations.
2014-04-07 09:21:07 -04:00
Ryan Poplin f058224b3e Adding GenotypeSummaries as INFO field annotations.
-- This is needed so the ref model pipeline can cut down to sites-only files without losing these useful statistics.
-- Added new unit test to test this info field annotation.
-- GenotypeGVCF integration tests change because new annotations are present in the output
2014-04-06 11:50:10 -04:00
Eric Banks d5edb53906 Don't error out with ArithmeticException in ROC maker when using small sets 2014-04-05 23:34:40 -04:00
MauricioCarneiro 84861fa10a Merge pull request #587 from broadinstitute/eb_actually_fail_on_reduced_bams
Make sure to fail in all cases where the BAM being used was created by ReduceReads.
2014-04-04 17:27:57 -04:00
Eric Banks 267603f9a9 Merge pull request #589 from broadinstitute/ldg_SelVarXsampleFile
Added check to make sure file passed in with sample IDs is valid (used i...
2014-04-04 15:56:16 -04:00
Laura Gauthier ff25b656e1 Added check to make sure file passed in with sample IDs is valid (used in SelectVariants) -- throws UserException. Corresponding test checks for UserException. 2014-04-04 15:38:50 -04:00
Valentin Ruano-Rubio 18deeec6b0 Addresses issue with strict validation on GVCF files.
More concretelly Picard's strict VCF validation does not like that there is alternative alleles that are not participating in any genotype call across samples.

This is an issue with GVCF in the single-sample pipeline where this is certainly expected with <NON_REF> and other relative unlikely alleles.

To solve this issue we allow the user to exclude some of the strict validations using a new argument --validationTypeToExclude. In order to avoid the validation
issue with GVCF the user needs to add the following to the command line: '--validationTypeToExclude ALLELES'

Story:

    https://www.pivotaltracker.com/story/show/68725164

Changes:

    - Added validateTypeToExclude argument to ValidateVariants walker.
    - Implemented the selective exclusion of validation types.
    - Added new info and improved existing documentation of the ValidateVariants walker.

Tests:

    - ValidateVariantsIntegrationTest#testUnusedAlleleError
    - ValidateVariantsIntegrationTest#testUnusedAlleleFix
2014-04-04 14:37:10 -04:00
Laura Gauthier 06d78ba068 Expanded documentation to include description of which callsets are being compared in what order and more definitions 2014-04-04 10:35:53 -04:00
Eric Banks 9be07e0838 Merge pull request #588 from broadinstitute/eb_fix_ir_exception
IndelRealigner throws a user error when it encounters reads with I opera...
2014-04-04 10:11:51 -04:00
Eric Banks 7174f8cfeb IndelRealigner throws a user error when it encounters reads with I operators greater than the number of read bases.
Added test to ensure it works.
2014-04-03 18:16:24 -04:00
Eric Banks a3d55b3341 Make sure to fail in all cases where the BAM being used was created by ReduceReads.
In some cases, the program records were being removed from the BAM headers by the GATK engine
before we applied the check for reduced reads (so we did not fail appropriately).  Pushed up the
check to happen before the PG tags are modified and added a unit test to ensure it stays that way.
It turns out that some UG tests still used reduced bams so I switched to use different ones.

Based on reviewer feedback, made it more generic so that it's easy to add new unsupported tools.
2014-04-03 16:52:41 -04:00
Geraldine Van der Auwera 890f4e8873 Merge pull request #586 from broadinstitute/eb_allow_users_to_specify_iupac_sample
Slightly modifying the way to use the IUPAC ambiguity codes in the Fasta...
2014-04-03 09:29:56 -04:00
Eric Banks 0b73573abc Slightly modifying the way to use the IUPAC ambiguity codes in the FastaAlternateReferenceMaker.
Previously it required you to create a single sample VCF and then to pass that in to the tool, but
Geraldine convinced me that this was a pain for users (because they usually have multi-sample VCFs).
Instead now you can pass in a multi-sample VCF and specify which sample's genotypes should be used
for the IUPAC encoding.  Therefore the argument changed from '--useIUPAC' to '--use_IUPAC_sample NA12878'.
2014-04-02 21:34:25 -04:00
Eric Banks 6bba8d7147 Merge pull request #585 from broadinstitute/ks_variantqc_patch
Resuscitated from git and copy/pasted in old gsalib methods need for the private script variantCallQC.R to run, for now.
2014-04-02 16:48:42 -04:00
Khalid Shakir 0647824e75 Resuscitated from git and copy/pasted in old gsalib methods need for the private script variantCallQC.R to run, for now. 2014-04-03 04:22:11 +08:00
Valentin Ruano Rubio 45c192bb6d Merge pull request #580 from broadinstitute/vrr_graphbase_infinite_likelihoods_reprise
Fixed bug using GraphBased due to infinite likelihoods resulting from th...
2014-04-02 00:45:17 -04:00
Valentin Ruano-Rubio 84711b8e90 Fixed bug using GraphBased due to infinite likelihoods resulting from the calculation of alignment cost of very long insertion or deletions (done in linear scale)
Stories:

  https://www.pivotaltracker.com/story/show/66263868

Bug:

  The problem was due to the way we were calculating the fix penalty of a large deletion or insertion. In this case we calculate the alignment likelihood of the portion
  or read or haplotype deletion as the penalty of that deletion/insertion without going through the full pair-hmm process. For large events this resulted in a 0 in
  in linear scale computations that ins transformed into an infinity in log scale.

Changes:

  - Change to use log10 scale for calculate those penalties.
  - Minor addition of .gitignore to hide ./public/external-example/target which is generated by the building process.
2014-04-01 16:14:52 -04:00
droazen c0286853b7 Merge pull request #584 from broadinstitute/dr_update_queue_test_script_for_naming_change
Update queue test runner script for upcoming naming changes
2014-04-01 11:52:22 -04:00
David Roazen ef8f91a5be Update queue test runner script for upcoming naming changes
Use both the old and new names for now, until the transition
is complete.
2014-04-01 11:49:55 -04:00
jmthibault79 8703bd7ad4 Merge pull request #583 from broadinstitute/jt_tabix
Create Tabix indices for block-compressed VCFs
2014-03-31 16:17:25 -04:00
Joel Thibault 70fe7f72f1 Return a TabixIndexCreator for appropriate file types
[Fixes #68291082]
2014-03-31 16:15:34 -04:00
Joel Thibault ab5634cbac Test that a Tabix index is created for block-compressed output formats
- Replace .idx and .tbi with appropriate constants
2014-03-31 14:36:48 -04:00
Joel Thibault a2d40c84ba Keep the list of zipped suffixes in sync with Variant 2014-03-31 14:36:41 -04:00
Joel Thibault a2cd9703fa Rev Picard 1.110.1773 2014-03-31 14:15:06 -04:00
Eric Banks 821fbe7260 Merge pull request #582 from broadinstitute/vrr_hc_bugfixes_dangling_heads
Fix loss of key alternative haplotypes due to a change on threading star...
2014-03-31 10:42:08 -04:00
jmthibault79 313d20d849 Merge pull request #581 from broadinstitute/jt_picard
Rev Picard because it's needed by another branch
2014-03-30 11:05:50 -04:00
Joel Thibault 2049eb1658 Rev Picard 1.110.1763
- SamPairUtils migrated in Picard r1737
- Revert IndelRealigner changes made in commit 4f4b85
-- Those changes were based on Picard revision 1722 to net/sf/picard/sam/SamPairUtil.java
-- Picard revision 1723 reverts these changes, so we also revert to match
2014-03-30 09:33:57 -04:00
Valentin Ruano-Rubio 258b2bce28 Fix loss of key alternative haplotypes due to a change on threading start policy required when recovering dangling heads.
Story:

  - https://www.pivotaltracker.com/story/show/67601310

Change:

  - Unless recover-danging-heads is active, the threading starting location policy is the original one. i.e. just at already existing unique kmer vertices.

Tests:

  - HaplotypeCallerIntegrationTest#testMissingKeyAlternativeHaplotypesBugFix
2014-03-29 22:40:26 -04:00
MauricioCarneiro 5abb7ea2db Merge pull request #579 from broadinstitute/rp_fix_DP_annotation
Fix for dropping of reference sample depth in the DP annotation.
2014-03-24 15:49:20 -04:00
Ryan Poplin 6566dd6ca9 Fix for dropping of reference sample depth in the DP annotation.
-- In the case of hierarchical merge we can't assume that we have only one genotype.
-- Removed use of deprecated VC annotation access functions.
2014-03-24 14:01:50 -04:00
Ryan Poplin c61a791914 Merge pull request #578 from broadinstitute/eb_trivial_fix_to_IR
Fix for reads that are all insertions (e.g. 50I) and causing the IndelRe...
2014-03-21 15:24:58 -04:00
Ryan Poplin b8581d7d3a Merge pull request #576 from broadinstitute/rp_fix_AssessNA12878_dropping_contigs
Bug fix in AssessNA12878 when working with more than one contig.
2014-03-21 15:23:59 -04:00
Eric Banks 32a96e3ab3 Fix for reads that are all insertions (e.g. 50I) and causing the IndelRealigner to error out. 2014-03-21 15:01:34 -04:00
Ryan Poplin dd1b0a48db Bug fix in AssessNA12878 when working with more than one contig.
-- SmartSiteIterator needs to know to span across Chunks when iterating by polling the Chunk list
-- Added KB test to test for this case
-- Removed the maxSites argument in ExtractConsensusSites because it is counterintuitive and not useful.
2014-03-21 14:52:10 -04:00
Ryan Poplin 69eaf7c82d Merge pull request #577 from broadinstitute/eb_minor_fixes_for_fragment_utils
Fixed docs for method and fixed the edge case optimization to properly u...
2014-03-21 14:01:44 -04:00
Ryan Poplin ce39fcd8a3 Merge pull request #575 from broadinstitute/eb_various_fixes_for_gvcfs
Eb various fixes for gvcfs
2014-03-21 09:47:08 -04:00
Eric Banks 0d82a70633 Fixed docs for method and fixed the edge case optimization to properly use equals() on Integers.
Shouldn't affect actual results at all.
2014-03-20 15:55:09 -04:00
Eric Banks 7c8ce3cd6a Several improvements to GenotypeGVCFs: --includeNonVariantSites now actually works and we propagate AD to hom ref samples 2014-03-20 00:35:54 -04:00
Eric Banks 824983af1d Enable CombineGVCFs to process gVCFs that were created with basepair resolution. 2014-03-19 19:23:05 -04:00
Eric Banks 3b1c337401 Have CombineVariants throw a UserError when trying to combine GVCFs from the HaplotypeCaller.
Was previously throwing an IllegalArgumentException (in the wrong place in the code).
Error message tells users to use CombineGVCFs.
2014-03-19 19:11:40 -04:00
Ryan Poplin 7117bebb5e Merge pull request #572 from broadinstitute/rp_fix_manual_reviews_fix
Forgot to change the padding ref base as well.
2014-03-19 13:58:58 -04:00
Ryan Poplin a3aa68e626 Forgot to change the padding ref base as well. 2014-03-19 13:58:21 -04:00
Ryan Poplin c737b8ed1e Merge pull request #571 from broadinstitute/rp_fix_manual_reviews
Fixing the reference base in one of the manual review files.
2014-03-19 13:49:51 -04:00
Ryan Poplin 523fd40a07 Fixing the reference base in one of the manual review files. 2014-03-19 11:08:06 -04:00