Commit Graph

12588 Commits (aba55dbb23375fc9b6c40c2d99a594efec3f6fa3)

Author SHA1 Message Date
Guillermo del Angel aba55dbb23 Moved some HC parameters related to active region extensions to command line arguments so that they're more easily modified. Some of these parameters need tinkering in order to call some large indels. See GSA-891 and subtasks for particular examples thereof. 2013-07-10 14:31:10 -04:00
droazen 2d81234ed8 Merge pull request #326 from broadinstitute/dr_dependency_analyzer_add_manual_dependencies
Dependency analyzer improvements
2013-07-09 10:12:50 -07:00
David Roazen de06eda6ac Dependency analyzer improvements
-Add ability to manually specify dependencies on the command line. This allows one
 to specify, for example, that all walkers depend on the GeneralCallingPipeline
 QScript, even though they don't have any compile-time dependencies on that QScript.

-Check that the provided walker class is valid in DependencyAnalyzer.xml

-Check ant exit status in the front-end script

-Fix bug where analyzer would give incorrect results if the list of changed
 Java classes was empty
2013-07-09 13:12:12 -04:00
Eric Banks 5dbb582be7 Merge pull request #310 from broadinstitute/mc_interval_list_to_fastq
Walker to create a fastq file from an interval list
2013-07-08 14:30:43 -07:00
Valentin Ruano Rubio ac77a4c699 Merge pull request #316 from broadinstitute/md_filter_counting
Bugfix for counting of applied filters
2013-07-08 10:58:47 -07:00
Eric Banks 380a01ddc0 Merge pull request #322 from broadinstitute/eb_add_review_indexes_to_repo
It was annoying me that these index files kept showing up in 'git status'
2013-07-08 10:15:46 -07:00
Eric Banks 4fe26ea2cf Merge pull request #323 from broadinstitute/eb_fix_sorting_in_reduce_reads
Reduce Reads output should never be expected to be sorted (hence the nee...
2013-07-08 10:15:28 -07:00
Eric Banks c5a2a8f39f Merge pull request #324 from broadinstitute/eb_AnalyzeCovariates_is_not_deprecated
AnalyzeCovariates is no longer a deprecated tool.
2013-07-08 10:15:10 -07:00
David Roazen bc28a1f236 pipeline test runner: create temp dir if it doesn't exist 2013-07-08 12:34:20 -04:00
Ryan Poplin 1c28bd2ffd Merge pull request #325 from broadinstitute/dr_general_calling_pipeline_walker_list
List of walkers used by the GeneralCallingPipeline
2013-07-08 09:28:13 -07:00
David Roazen 46b453a69d List of walkers used by the GeneralCallingPipeline
For use by the dependency analyzer
2013-07-08 11:53:58 -04:00
Eric Banks 73fc7f6ab1 Reduce Reads output should never be expected to be sorted (hence the need to sort on disk) but for some reason it was with -nwayout mode. 2013-07-08 10:33:36 -04:00
Eric Banks 921f551426 AnalyzeCovariates is no longer a deprecated tool. 2013-07-08 09:48:12 -04:00
Eric Banks 3e357ac8cf It was annoying me that these index files kept showing up in 'git status' 2013-07-08 09:46:08 -04:00
Mark DePristo 0aa9d02570 Merge pull request #321 from broadinstitute/eb_fix_annotating_multiple_comps
Fix bug introduced recently in the VariantAnnotator where only the last ...
2013-07-05 05:05:09 -07:00
Eric Banks 5f5c90e65c Fix bug introduced recently in the VariantAnnotator where only the last -comp was being annotated at a site.
Trivial fix, added integration test to cover it.
2013-07-05 00:04:52 -04:00
David Roazen 6d69c7dc71 Disable RetryMemoryLimit pipeline test
-This test is failing intermittently for unexplained reasons (see GSA-943)

-In the interest of keeping the rest of the pipeline test suite running, it's
 best to disable this one test until GSA-943 is resolved
2013-07-03 13:38:28 -04:00
Tadeusz Jordan 8d00e558fb Merge pull request #317 from broadinstitute/dr_dependency_analyzer
Ant-based walker dependency analyzer
2013-07-03 08:46:37 -07:00
Mark DePristo 7cdb7ac572 Merge pull request #320 from broadinstitute/gg_cleanup_licenses_PT51255639
Deleted old license files
2013-07-03 04:55:15 -07:00
Geraldine Van der Auwera d55dddfdba deleted old license files 2013-07-02 16:36:47 -04:00
Mark DePristo 3db02e5ef1 Merge pull request #315 from broadinstitute/md_ref_conf_hc
Reference confidence model for the haplotype caller
2013-07-02 13:04:33 -07:00
droazen 2e87d09c26 Merge pull request #319 from broadinstitute/dr_packaging_system_fail_gracefully_when_bcel_not_installed
Fail gracefully in the packaging system when bcel is not installed
2013-07-02 13:01:45 -07:00
David Roazen 75d1f64416 Fail gracefully in the packaging system when bcel is not installed
Packaging the GATK requires bcel to be installed. Detect when it's not,
and output instructions on how to install it.
2013-07-02 15:50:51 -04:00
Mark DePristo 35cdc16822 Merge pull request #318 from broadinstitute/dr_improve_dcov_documentation
Improve -dcov documentation to address recent user confusion
2013-07-02 12:47:29 -07:00
Mark DePristo 5f34054cc1 Remove filtering of MAPQ 0 reads from CalledHaplotypeBAMWriter 2013-07-02 15:46:49 -04:00
Mark DePristo 7be01777f6 Bugfix for incPos in GenomeLoc
-- Shouldn't have taken a GenomeLoc as an argument, as it's a instance method, not a public static
2013-07-02 15:46:49 -04:00
Mark DePristo ed0b1c5aba Fix bug in ReadThreadingAssembler in cycle failures causing NPE 2013-07-02 15:46:48 -04:00
Mark DePristo e3e8631ff5 Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences
-- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction.  Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure
--
-- Output format looks like:
20      10026072        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,120
20      10026073        .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,119
20      10026074        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,121
20      10026075        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,119
20      10026076        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,120
20      10026077        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,120
20      10026078        .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:5,0:5:15:0,15,217
20      10026079        .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:6,0:6:18:0,18,240
20      10026080        .       G       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:6,0:6:18:0,18,268
20      10026081        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:7,0:7:21:0,21,267

We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values.  Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty.
-- Can we enabled for single samples with --emitRefConfidence (-ERC).
-- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval.  The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads
-- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures.  Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class.
-- Includes GVCF writer
-- Add 1 mb of WEx data to private/testdata
-- Integration tests for reference model output for WGS and WEx data
-- Emit GQ block information into VCF header for GVCF mode
-- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC
-- Control max indel size for the reference confidence model from the command line.  Increase default to 10
-- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest
-- Unittests for ReferenceConfidenceModel
-- Unittests for new MathUtils functions
2013-07-02 15:46:38 -04:00
Mark DePristo 41aba491c0 Critical bugfix for adapter clipping in HaplotypeCaller
-- The previous code would adapter clip before reverting soft clips, so because we only clip the adapter when it's actually aligned (i.e., not in the soft clips) we were actually not removing bases in the adapter unless at least 1 bp of the adapter was aligned to the reference.  Terrible.
-- Removed the broken logic of determining whether a read adaptor is too long.
-- Doesn't require isProperPairFlag to be set for a read to be adapter clipped
-- Update integration tests for new adapter clipping code
2013-07-02 15:46:36 -04:00
David Roazen cdea744b95 Improve -dcov documentation to address recent user confusion
-Explicitly state that -dcov does not produce an unbiased random sampling from all available reads
 at each locus, and that instead it tries to maintain an even representation of reads from
 all alignment start positions (which, of course, is a form of bias)

-Recommend -dfrac for users who want a true across-the-board unbiased random sampling
2013-07-02 15:33:28 -04:00
David Roazen 8eab59419d Ant-based walker dependency analyzer
-Given a list of walkers and a pair of git commits, determines whether each of the
 walkers has compile-time dependencies on the Java classes changed between the two
 commits.

-Output is in the form of a Java properties file, and can be easily loaded via
 the Properties class. Example output:

 org.broadinstitute.sting.gatk.walkers.bed.MergeIntervalLists=true
 org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper=false
 org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller=false
 org.broadinstitute.sting.gatk.walkers.na12878kb.NA12878DBWalker=true
 org.broadinstitute.sting.gatk.walkers.readutils.PrintReads=false

 "true" indicates that the walker does have compile-time dependencies on one or
 more of the changed Java classes, "false" indicates no dependencies

-Considers classes within changed jar files as well, provided the jars are stored
 in our git repository (as they are with tribble, picard, etc.)

-Ant-based solution with a shell script frontend. The previous Java-based solution
 had several issues and introduced problematic dependencies into the GATK.
2013-07-02 13:49:04 -04:00
Mark DePristo 9df58314ab Bugfix for counting of applied filters
-- Because LocusWalkers have multiple filtering streams, each counting filtering independent, and the close() function set calling setFilter on the global result, not on the private counter, which is incorporated into the global (thereby incrementing the counts of each filter).
-- [delivers #52667213]
2013-07-01 21:09:48 -04:00
David Roazen c3d59d890d Update licenses for new PbsEngine* classes 2013-07-01 15:50:20 -04:00
droazen 2964ebaa4e Merge pull request #314 from broadinstitute/ks_francesco_pbs_patch
Ks francesco pbs patch
2013-07-01 12:39:38 -07:00
Khalid Shakir f0c36e2890 Fixing failed test for HSP by changing dcov from 60 to 200. 2013-07-01 15:13:04 -04:00
Khalid Shakir ec206eccfc Switch "all" test pipeline job runners to mean the job runners that run at The Broad. 2013-07-01 15:12:55 -04:00
Francesco acf90ca027 corrected number of arguments passed to PbsEngineJobRunner when requesting multiple cores
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2013-07-01 15:08:15 -04:00
Francesco 948b2fca20 added PbsEngine plugin into engine folders, to be called in Queue with -jobRunner PbsEngine; the plugin is written modifying the existing GridEngine plugin, used as a template
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2013-07-01 15:08:14 -04:00
Mark DePristo 4ec50caea2 Merge pull request #313 from broadinstitute/mc_generalize_dt_scala_script
Added all the parameters to the scala script for DiagnoseTargets
2013-06-29 11:00:00 -07:00
Mauricio Carneiro a6b569b395 Added all the parameters to the scala script for DiagnoseTargets 2013-06-29 11:28:25 -04:00
Mauricio Carneiro 815f119f7c Walker to create a fastq file from an interval list
useful to convert bait and target interval lists into actual sequences that we can align with bwa and test for mappability.
2013-06-29 11:24:16 -04:00
David Roazen 31827022db Fix pipeline tests that were not respecting the pipeline test dry run setting
There are a few pipeline test classes that do not run Queue, but are
classified as pipeline tests because they submit farm jobs. Make these
unconventional pipeline tests respect the pipeline test dry run setting.
2013-06-28 15:27:17 -04:00
Ryan Poplin 1ec56c9e64 Merge pull request #311 from broadinstitute/eb_require_min_mq_for_FN_in_kb
We need to enforce a minimum base and mapping quality threshold to penal...
2013-06-27 16:49:45 -07:00
Mark DePristo 5717f3dc1c Merge pull request #312 from broadinstitute/eb_fix_assessna12878_and_add_integration_tests
I allowed a bad push yesterday with the KB because there weren't any int...
2013-06-27 13:51:37 -07:00
Eric Banks 7aa1f56dff We need to enforce a minimum base and mapping quality threshold to penalize a callset for FNs.
This is used in conjunction with the -BAM argument in AssessNA12878 and is necessary for the
Jenkins assessment to work properly (Ryan's commit wasn't enough).
2013-06-27 15:58:39 -04:00
Eric Banks 1f1da56d28 I allowed a bad push yesterday with the KB because there weren't any integration tests for the assessment.
Added one and fixed up the code so that the headers are more accurate for the -badSites output.
2013-06-27 15:42:53 -04:00
Ryan Poplin 825b603acb Merge pull request #298 from broadinstitute/md_likelihood_rank_sum
Md likelihood rank sum
2013-06-27 11:14:25 -07:00
Mark DePristo de7fe2e086 Merge pull request #308 from broadinstitute/rp_assessment_low_coverage
Don't count no coverage sites as false negatives in the assessment again...
2013-06-27 06:46:23 -07:00
Eric Banks 9f08718636 Merge pull request #309 from jsilter/master
Add "isComplexEvent" as attribute
2013-06-26 16:00:51 -07:00
Jacob Silterra beb834e849 Add "isComplexEvent" as attribute to VariantContextBuilder for MongoVariantContext 2013-06-26 17:12:32 -04:00