Commit Graph

12565 Commits (3db02e5ef18a130365daa2f786979d8e2a466206)

Author SHA1 Message Date
Mark DePristo 3db02e5ef1 Merge pull request #315 from broadinstitute/md_ref_conf_hc
Reference confidence model for the haplotype caller
2013-07-02 13:04:33 -07:00
droazen 2e87d09c26 Merge pull request #319 from broadinstitute/dr_packaging_system_fail_gracefully_when_bcel_not_installed
Fail gracefully in the packaging system when bcel is not installed
2013-07-02 13:01:45 -07:00
David Roazen 75d1f64416 Fail gracefully in the packaging system when bcel is not installed
Packaging the GATK requires bcel to be installed. Detect when it's not,
and output instructions on how to install it.
2013-07-02 15:50:51 -04:00
Mark DePristo 35cdc16822 Merge pull request #318 from broadinstitute/dr_improve_dcov_documentation
Improve -dcov documentation to address recent user confusion
2013-07-02 12:47:29 -07:00
Mark DePristo 5f34054cc1 Remove filtering of MAPQ 0 reads from CalledHaplotypeBAMWriter 2013-07-02 15:46:49 -04:00
Mark DePristo 7be01777f6 Bugfix for incPos in GenomeLoc
-- Shouldn't have taken a GenomeLoc as an argument, as it's a instance method, not a public static
2013-07-02 15:46:49 -04:00
Mark DePristo ed0b1c5aba Fix bug in ReadThreadingAssembler in cycle failures causing NPE 2013-07-02 15:46:48 -04:00
Mark DePristo e3e8631ff5 Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences
-- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction.  Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure
--
-- Output format looks like:
20      10026072        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,120
20      10026073        .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,119
20      10026074        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,121
20      10026075        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,119
20      10026076        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,120
20      10026077        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,120
20      10026078        .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:5,0:5:15:0,15,217
20      10026079        .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:6,0:6:18:0,18,240
20      10026080        .       G       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:6,0:6:18:0,18,268
20      10026081        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:7,0:7:21:0,21,267

We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values.  Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty.
-- Can we enabled for single samples with --emitRefConfidence (-ERC).
-- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval.  The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads
-- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures.  Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class.
-- Includes GVCF writer
-- Add 1 mb of WEx data to private/testdata
-- Integration tests for reference model output for WGS and WEx data
-- Emit GQ block information into VCF header for GVCF mode
-- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC
-- Control max indel size for the reference confidence model from the command line.  Increase default to 10
-- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest
-- Unittests for ReferenceConfidenceModel
-- Unittests for new MathUtils functions
2013-07-02 15:46:38 -04:00
Mark DePristo 41aba491c0 Critical bugfix for adapter clipping in HaplotypeCaller
-- The previous code would adapter clip before reverting soft clips, so because we only clip the adapter when it's actually aligned (i.e., not in the soft clips) we were actually not removing bases in the adapter unless at least 1 bp of the adapter was aligned to the reference.  Terrible.
-- Removed the broken logic of determining whether a read adaptor is too long.
-- Doesn't require isProperPairFlag to be set for a read to be adapter clipped
-- Update integration tests for new adapter clipping code
2013-07-02 15:46:36 -04:00
David Roazen cdea744b95 Improve -dcov documentation to address recent user confusion
-Explicitly state that -dcov does not produce an unbiased random sampling from all available reads
 at each locus, and that instead it tries to maintain an even representation of reads from
 all alignment start positions (which, of course, is a form of bias)

-Recommend -dfrac for users who want a true across-the-board unbiased random sampling
2013-07-02 15:33:28 -04:00
David Roazen c3d59d890d Update licenses for new PbsEngine* classes 2013-07-01 15:50:20 -04:00
droazen 2964ebaa4e Merge pull request #314 from broadinstitute/ks_francesco_pbs_patch
Ks francesco pbs patch
2013-07-01 12:39:38 -07:00
Khalid Shakir f0c36e2890 Fixing failed test for HSP by changing dcov from 60 to 200. 2013-07-01 15:13:04 -04:00
Khalid Shakir ec206eccfc Switch "all" test pipeline job runners to mean the job runners that run at The Broad. 2013-07-01 15:12:55 -04:00
Francesco acf90ca027 corrected number of arguments passed to PbsEngineJobRunner when requesting multiple cores
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2013-07-01 15:08:15 -04:00
Francesco 948b2fca20 added PbsEngine plugin into engine folders, to be called in Queue with -jobRunner PbsEngine; the plugin is written modifying the existing GridEngine plugin, used as a template
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2013-07-01 15:08:14 -04:00
Mark DePristo 4ec50caea2 Merge pull request #313 from broadinstitute/mc_generalize_dt_scala_script
Added all the parameters to the scala script for DiagnoseTargets
2013-06-29 11:00:00 -07:00
Mauricio Carneiro a6b569b395 Added all the parameters to the scala script for DiagnoseTargets 2013-06-29 11:28:25 -04:00
David Roazen 31827022db Fix pipeline tests that were not respecting the pipeline test dry run setting
There are a few pipeline test classes that do not run Queue, but are
classified as pipeline tests because they submit farm jobs. Make these
unconventional pipeline tests respect the pipeline test dry run setting.
2013-06-28 15:27:17 -04:00
Ryan Poplin 1ec56c9e64 Merge pull request #311 from broadinstitute/eb_require_min_mq_for_FN_in_kb
We need to enforce a minimum base and mapping quality threshold to penal...
2013-06-27 16:49:45 -07:00
Mark DePristo 5717f3dc1c Merge pull request #312 from broadinstitute/eb_fix_assessna12878_and_add_integration_tests
I allowed a bad push yesterday with the KB because there weren't any int...
2013-06-27 13:51:37 -07:00
Eric Banks 7aa1f56dff We need to enforce a minimum base and mapping quality threshold to penalize a callset for FNs.
This is used in conjunction with the -BAM argument in AssessNA12878 and is necessary for the
Jenkins assessment to work properly (Ryan's commit wasn't enough).
2013-06-27 15:58:39 -04:00
Eric Banks 1f1da56d28 I allowed a bad push yesterday with the KB because there weren't any integration tests for the assessment.
Added one and fixed up the code so that the headers are more accurate for the -badSites output.
2013-06-27 15:42:53 -04:00
Ryan Poplin 825b603acb Merge pull request #298 from broadinstitute/md_likelihood_rank_sum
Md likelihood rank sum
2013-06-27 11:14:25 -07:00
Mark DePristo de7fe2e086 Merge pull request #308 from broadinstitute/rp_assessment_low_coverage
Don't count no coverage sites as false negatives in the assessment again...
2013-06-27 06:46:23 -07:00
Eric Banks 9f08718636 Merge pull request #309 from jsilter/master
Add "isComplexEvent" as attribute
2013-06-26 16:00:51 -07:00
Jacob Silterra beb834e849 Add "isComplexEvent" as attribute to VariantContextBuilder for MongoVariantContext 2013-06-26 17:12:32 -04:00
Ryan Poplin fe5348ea5d Don't count no coverage sites as false negatives in the assessment against the knowledge base 2013-06-26 16:02:44 -04:00
Mark DePristo a514dd0643 Merge pull request #307 from broadinstitute/eb_rr_off_by_one_error
Proper fix for previous RR -cancer_mode fix.
2013-06-26 13:02:23 -07:00
Eric Banks 876e40466a Proper fix for previous RR -cancer_mode fix.
I "fixed" this once before but instead of testing with unit tests I used integration tests.
Bad decision.

The proper fix is in now, with a bonafide unit test included.
2013-06-26 14:48:09 -04:00
Eric Banks 95eab80f9b Merge pull request #306 from broadinstitute/eb_make_assessreducedquals_hidden
Make this walker @Hidden
2013-06-26 08:47:28 -07:00
Eric Banks f242be12c0 Make this walker @Hidden 2013-06-26 11:45:21 -04:00
Mark DePristo 28d4c3debc Merge pull request #305 from broadinstitute/dr_move_DownsampleReadsQC_to_private
Move DownsampleReadsQC walker to private
2013-06-25 16:33:20 -07:00
David Roazen 94294ed6c4 Move DownsampleReadsQC walker to private 2013-06-25 15:48:44 -04:00
Mark DePristo d13ed06e9d Merge pull request #303 from broadinstitute/eb_update_kb_to_use_exome_intervals
Various updates to have the KB use the expanded exome intervals (from D ...
2013-06-24 13:06:52 -07:00
Eric Banks 6dc816beee Various updates to have the KB use the expanded exome intervals (from D MacArthur) in addition to chr20.
1. MergeIntervalLists should take the global interval padding into account when merging.
2. Update the name of the imported callsets in the setup script because of renaming for expanded intervals.
3. If there are too many intervals to process, MongoDB falls apart.  Refactored the site selection code so
   that in such cases we pull out all records from the DB and the GATK itself does the interval filtering.
4. Add isComplex to callset summary for the consensus summarizer.
5. Remove the check for out of order records in the SiteIterator since records now do come out of order
  (since contigs are sorted lexicographically in MongoDB).

Results:
Iteration over the gencode intervals (90 MB) in AssessNA12878 now takes 90 seconds.  I can't tell you how
much time it took before because it kept crashing Mongo (but it was a long, long time).
2013-06-24 14:57:35 -04:00
Mark DePristo ff76d0c877 Merge pull request #304 from broadinstitute/eb_rr_header_negative_fix_again
Fixing the 'header is negative' problem in Reduce Reads... again.
2013-06-24 11:55:52 -07:00
Eric Banks 165b936fcd Fixing the 'header is negative' problem in Reduce Reads... again.
Previous fixes and tests only covered trailing soft-clips.  Now that up front
hard-clipping is working properly though, we were failing on those in the tool.

Added a patch for this as well as a separate test independent of the soft-clips
to make sure that it's working properly.
2013-06-24 14:06:21 -04:00
Valentin Ruano-Rubio b97f9a487d Merged bug fix from Stable into Unstable 2013-06-24 14:00:01 -04:00
Mark DePristo 521d9c1df5 Merge pull request #302 from broadinstitute/mc_processing_pipeline2
quick updates to the techdev processing pipeline scala script
2013-06-24 09:52:55 -07:00
Mauricio Carneiro c38b8065d8 quick fixes to the scala script
* Increase the memory limit for HTSLIB - Bam shuffling just eats up a ton of memory.
   * Concurrent HTSLIB processes need unique temp files the bam shuffling step was messing up with the temporary files and failing without returning zero. Fixed it by giving a unique name to each process.
2013-06-24 12:44:47 -04:00
Mark DePristo 191e4ca251 Merge pull request #300 from broadinstitute/mc_move_qualify_intervals_to_protected
Few bug fixes to this tool now that it is in protected
2013-06-24 09:35:45 -07:00
Yossi Farjoun d8ca4d3e6d Merge pull request #299 from broadinstitute/eb_mate_fixer_confused_by_nonprimary_alignment
Another fix for the Indel Realigner that arises because of secondary alignments.
2013-06-24 06:58:27 -07:00
Valentin Ruano-Rubio 3e5ff6095f Added the pertinent DocumentedGATKFeature annotation ot AnalyzeCovariates 2013-06-21 17:02:26 -04:00
Eric Banks d976aae2b1 Another fix for the Indel Realigner that arises because of secondary alignments.
This time we don't accidentally drop reads (phew), but this bug does cause us not to
update the alignment start of the mate.  Fixed and added unit test to cover it.
2013-06-21 16:59:22 -04:00
Mark DePristo 8caf39cb65 Experimental LikelihoodRankSum annotation
-- Added experimental LikelihoodRankSum, which required slightly more detailed access to the information managed by the base class, so added an overloaded getElementForRead also provides access to the MostLikelyAllele class
-- Added base class default implementation of getElementForPileupElement() which returns null, indicating that the pileup version isn't supported.
-- Added @Override to many of the RankSum classes for safety's sake

-- Updates to GeneralCallingPipeline: annotate sites with dbSNP IDs,
-- R script to assess the value of annotations for VQSR
2013-06-21 13:57:11 -04:00
Mark DePristo f726d8130a VariantRecalibrator bugfix for bad log10sumlog10 values
-- The VR, when the model is bad, may evaluate log10sumlog10 where some of the values in the vector are NaN. This case is now trapped in VR and handled as previously -- indicating that the model has failed and evaluation continues.
2013-06-21 12:28:53 -04:00
Mark DePristo dee51c4189 Error out when NCT and BAMOUT are used with the HaplotypeCaller
-- Currently we don't support writing a BAM file from the haplotype caller when nct is enabled.  Check in initialize if this is the case, and throw a UserException
2013-06-21 09:25:57 -04:00
David Roazen e03a5e9486 Update source release script in attempt to work around intermittent github issues
Github was intermittently rejecting large pushes that were in fact
fast-forward updates as being non-fast-forward. Try to prevent this
by ensuring that all refs are up-to-date and properly checked out
after branch filtering and before doing a source release.
2013-06-20 16:58:01 -04:00
David Roazen 0018af0c0a Update README file for the 2.6 release 2013-06-20 13:08:29 -04:00