gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	1575fdaab4	Merge pull request #330 from broadinstitute/yf_adding_per_sample_allele_biased_downsampling_to_HC AlleleBiasedDownsampling for HaplotypeCaller	2013-07-13 18:50:32 -07:00
Yossi Farjoun	afcf7b96db	- Added per-sample AlleleBiasedDownsampling capability to HaplotypeCaller - Added integration test to show that providing a contamination value and providing same value via a file results in the same VCF - overrode default contamination value in test	2013-07-12 16:22:02 -04:00
delangel	7ddf85c040	Merge pull request #329 from broadinstitute/eb_more_sensitivity_improvements_to_the_HC A whole slew of improvements to the Haplotype Caller and related code.	2013-07-12 10:37:43 -07:00
Eric Banks	b16c7ce050	A whole slew of improvements to the Haplotype Caller and related code. 1. Some minor refactorings and claenup (e.g. removing unused imports) throughout. 2. Updates to the KB assessment functionality: a. Exclude duplicate reads when checking to see whether there's enough coverage to make a call. b. Lower the threshold on FS for FPs that would easily be filtered since it's only single sample calling. 3. Make the HC consistent in how it treats the pruning factor. As part of this I removed and archived the DeBruijn assembler. 4. Improvements to the likelihoods for the HC a. We now include a "tristate" correction in the PairHMM (just like we do with UG). Basically, we need to divide e by 3 because the observed base could have come from any of the non-observed alleles. b. We now correct overlapping read pairs. Note that the fragments are not merged (which we know is dangerous). Rather, the overlapping bases are just down-weighted so that their quals are not more than Q20 (or more specifically, half of the phred-scaled PCR error rate); mismatching bases are turned into Q0s for now. c. We no longer run contamination removal by default in the UG or HC. The exome tends to have real sites with off kilter allele balances and we occasionally lose them to contamination removal. 5. Improved the dangling tail merging implementation.	2013-07-12 10:09:10 -04:00
Eric Banks	3e1a96844b	Merge pull request #328 from broadinstitute/dr_dependency_analyzer_output_loader DependencyAnalyzerOutputLoader: allows output from the dependency analyzer to be loaded and queried	2013-07-11 20:31:17 -07:00
David Roazen	8ef4e6c9f7	DependencyAnalyzerOutputLoader: allows output from the dependency analyzer to be loaded and queried	2013-07-11 16:15:02 -04:00
Eric Banks	ccc0ee5b4d	Merge pull request #327 from broadinstitute/gda_large_indel_improvement Moved some HC parameters related to active region extensions to command ...	2013-07-11 06:58:13 -07:00
Guillermo del Angel	aba55dbb23	Moved some HC parameters related to active region extensions to command line arguments so that they're more easily modified. Some of these parameters need tinkering in order to call some large indels. See GSA-891 and subtasks for particular examples thereof.	2013-07-10 14:31:10 -04:00
droazen	2d81234ed8	Merge pull request #326 from broadinstitute/dr_dependency_analyzer_add_manual_dependencies Dependency analyzer improvements	2013-07-09 10:12:50 -07:00
David Roazen	de06eda6ac	Dependency analyzer improvements -Add ability to manually specify dependencies on the command line. This allows one to specify, for example, that all walkers depend on the GeneralCallingPipeline QScript, even though they don't have any compile-time dependencies on that QScript. -Check that the provided walker class is valid in DependencyAnalyzer.xml -Check ant exit status in the front-end script -Fix bug where analyzer would give incorrect results if the list of changed Java classes was empty	2013-07-09 13:12:12 -04:00
Eric Banks	5dbb582be7	Merge pull request #310 from broadinstitute/mc_interval_list_to_fastq Walker to create a fastq file from an interval list	2013-07-08 14:30:43 -07:00
Valentin Ruano Rubio	ac77a4c699	Merge pull request #316 from broadinstitute/md_filter_counting Bugfix for counting of applied filters	2013-07-08 10:58:47 -07:00
Eric Banks	380a01ddc0	Merge pull request #322 from broadinstitute/eb_add_review_indexes_to_repo It was annoying me that these index files kept showing up in 'git status'	2013-07-08 10:15:46 -07:00
Eric Banks	4fe26ea2cf	Merge pull request #323 from broadinstitute/eb_fix_sorting_in_reduce_reads Reduce Reads output should never be expected to be sorted (hence the nee...	2013-07-08 10:15:28 -07:00
Eric Banks	c5a2a8f39f	Merge pull request #324 from broadinstitute/eb_AnalyzeCovariates_is_not_deprecated AnalyzeCovariates is no longer a deprecated tool.	2013-07-08 10:15:10 -07:00
David Roazen	bc28a1f236	pipeline test runner: create temp dir if it doesn't exist	2013-07-08 12:34:20 -04:00
Ryan Poplin	1c28bd2ffd	Merge pull request #325 from broadinstitute/dr_general_calling_pipeline_walker_list List of walkers used by the GeneralCallingPipeline	2013-07-08 09:28:13 -07:00
David Roazen	46b453a69d	List of walkers used by the GeneralCallingPipeline For use by the dependency analyzer	2013-07-08 11:53:58 -04:00
Eric Banks	73fc7f6ab1	Reduce Reads output should never be expected to be sorted (hence the need to sort on disk) but for some reason it was with -nwayout mode.	2013-07-08 10:33:36 -04:00
Eric Banks	921f551426	AnalyzeCovariates is no longer a deprecated tool.	2013-07-08 09:48:12 -04:00
Eric Banks	3e357ac8cf	It was annoying me that these index files kept showing up in 'git status'	2013-07-08 09:46:08 -04:00
Mark DePristo	0aa9d02570	Merge pull request #321 from broadinstitute/eb_fix_annotating_multiple_comps Fix bug introduced recently in the VariantAnnotator where only the last ...	2013-07-05 05:05:09 -07:00
Eric Banks	5f5c90e65c	Fix bug introduced recently in the VariantAnnotator where only the last -comp was being annotated at a site. Trivial fix, added integration test to cover it.	2013-07-05 00:04:52 -04:00
David Roazen	6d69c7dc71	Disable RetryMemoryLimit pipeline test -This test is failing intermittently for unexplained reasons (see GSA-943) -In the interest of keeping the rest of the pipeline test suite running, it's best to disable this one test until GSA-943 is resolved	2013-07-03 13:38:28 -04:00
Tadeusz Jordan	8d00e558fb	Merge pull request #317 from broadinstitute/dr_dependency_analyzer Ant-based walker dependency analyzer	2013-07-03 08:46:37 -07:00
Mark DePristo	7cdb7ac572	Merge pull request #320 from broadinstitute/gg_cleanup_licenses_PT51255639 Deleted old license files	2013-07-03 04:55:15 -07:00
Geraldine Van der Auwera	d55dddfdba	deleted old license files	2013-07-02 16:36:47 -04:00
Mark DePristo	3db02e5ef1	Merge pull request #315 from broadinstitute/md_ref_conf_hc Reference confidence model for the haplotype caller	2013-07-02 13:04:33 -07:00
droazen	2e87d09c26	Merge pull request #319 from broadinstitute/dr_packaging_system_fail_gracefully_when_bcel_not_installed Fail gracefully in the packaging system when bcel is not installed	2013-07-02 13:01:45 -07:00
David Roazen	75d1f64416	Fail gracefully in the packaging system when bcel is not installed Packaging the GATK requires bcel to be installed. Detect when it's not, and output instructions on how to install it.	2013-07-02 15:50:51 -04:00
Mark DePristo	35cdc16822	Merge pull request #318 from broadinstitute/dr_improve_dcov_documentation Improve -dcov documentation to address recent user confusion	2013-07-02 12:47:29 -07:00
Mark DePristo	5f34054cc1	Remove filtering of MAPQ 0 reads from CalledHaplotypeBAMWriter	2013-07-02 15:46:49 -04:00
Mark DePristo	7be01777f6	Bugfix for incPos in GenomeLoc -- Shouldn't have taken a GenomeLoc as an argument, as it's a instance method, not a public static	2013-07-02 15:46:49 -04:00
Mark DePristo	ed0b1c5aba	Fix bug in ReadThreadingAssembler in cycle failures causing NPE	2013-07-02 15:46:48 -04:00
Mark DePristo	e3e8631ff5	Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences -- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction. Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure -- -- Output format looks like: 20 10026072 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120 20 10026073 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119 20 10026074 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,121 20 10026075 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119 20 10026076 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120 20 10026077 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120 20 10026078 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:5,0:5:15:0,15,217 20 10026079 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,240 20 10026080 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,268 20 10026081 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:7,0:7:21:0,21,267 We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values. Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty. -- Can we enabled for single samples with --emitRefConfidence (-ERC). -- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval. The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads -- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures. Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class. -- Includes GVCF writer -- Add 1 mb of WEx data to private/testdata -- Integration tests for reference model output for WGS and WEx data -- Emit GQ block information into VCF header for GVCF mode -- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC -- Control max indel size for the reference confidence model from the command line. Increase default to 10 -- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest -- Unittests for ReferenceConfidenceModel -- Unittests for new MathUtils functions	2013-07-02 15:46:38 -04:00
Mark DePristo	41aba491c0	Critical bugfix for adapter clipping in HaplotypeCaller -- The previous code would adapter clip before reverting soft clips, so because we only clip the adapter when it's actually aligned (i.e., not in the soft clips) we were actually not removing bases in the adapter unless at least 1 bp of the adapter was aligned to the reference. Terrible. -- Removed the broken logic of determining whether a read adaptor is too long. -- Doesn't require isProperPairFlag to be set for a read to be adapter clipped -- Update integration tests for new adapter clipping code	2013-07-02 15:46:36 -04:00
David Roazen	cdea744b95	Improve -dcov documentation to address recent user confusion -Explicitly state that -dcov does not produce an unbiased random sampling from all available reads at each locus, and that instead it tries to maintain an even representation of reads from all alignment start positions (which, of course, is a form of bias) -Recommend -dfrac for users who want a true across-the-board unbiased random sampling	2013-07-02 15:33:28 -04:00
David Roazen	8eab59419d	Ant-based walker dependency analyzer -Given a list of walkers and a pair of git commits, determines whether each of the walkers has compile-time dependencies on the Java classes changed between the two commits. -Output is in the form of a Java properties file, and can be easily loaded via the Properties class. Example output: org.broadinstitute.sting.gatk.walkers.bed.MergeIntervalLists=true org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper=false org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller=false org.broadinstitute.sting.gatk.walkers.na12878kb.NA12878DBWalker=true org.broadinstitute.sting.gatk.walkers.readutils.PrintReads=false "true" indicates that the walker does have compile-time dependencies on one or more of the changed Java classes, "false" indicates no dependencies -Considers classes within changed jar files as well, provided the jars are stored in our git repository (as they are with tribble, picard, etc.) -Ant-based solution with a shell script frontend. The previous Java-based solution had several issues and introduced problematic dependencies into the GATK.	2013-07-02 13:49:04 -04:00
Mark DePristo	9df58314ab	Bugfix for counting of applied filters -- Because LocusWalkers have multiple filtering streams, each counting filtering independent, and the close() function set calling setFilter on the global result, not on the private counter, which is incorporated into the global (thereby incrementing the counts of each filter). -- [delivers #52667213]	2013-07-01 21:09:48 -04:00
David Roazen	c3d59d890d	Update licenses for new PbsEngine* classes	2013-07-01 15:50:20 -04:00
droazen	2964ebaa4e	Merge pull request #314 from broadinstitute/ks_francesco_pbs_patch Ks francesco pbs patch	2013-07-01 12:39:38 -07:00
Khalid Shakir	f0c36e2890	Fixing failed test for HSP by changing dcov from 60 to 200.	2013-07-01 15:13:04 -04:00
Khalid Shakir	ec206eccfc	Switch "all" test pipeline job runners to mean the job runners that run at The Broad.	2013-07-01 15:12:55 -04:00
Francesco	acf90ca027	corrected number of arguments passed to PbsEngineJobRunner when requesting multiple cores Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2013-07-01 15:08:15 -04:00
Francesco	948b2fca20	added PbsEngine plugin into engine folders, to be called in Queue with -jobRunner PbsEngine; the plugin is written modifying the existing GridEngine plugin, used as a template Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2013-07-01 15:08:14 -04:00
Mark DePristo	4ec50caea2	Merge pull request #313 from broadinstitute/mc_generalize_dt_scala_script Added all the parameters to the scala script for DiagnoseTargets	2013-06-29 11:00:00 -07:00
Mauricio Carneiro	a6b569b395	Added all the parameters to the scala script for DiagnoseTargets	2013-06-29 11:28:25 -04:00
Mauricio Carneiro	815f119f7c	Walker to create a fastq file from an interval list useful to convert bait and target interval lists into actual sequences that we can align with bwa and test for mappability.	2013-06-29 11:24:16 -04:00
David Roazen	31827022db	Fix pipeline tests that were not respecting the pipeline test dry run setting There are a few pipeline test classes that do not run Queue, but are classified as pipeline tests because they submit farm jobs. Make these unconventional pipeline tests respect the pipeline test dry run setting.	2013-06-28 15:27:17 -04:00
Ryan Poplin	1ec56c9e64	Merge pull request #311 from broadinstitute/eb_require_min_mq_for_FN_in_kb We need to enforce a minimum base and mapping quality threshold to penal...	2013-06-27 16:49:45 -07:00

1 2 3 4 5 ...

12595 Commits (1575fdaab45df4bb493b774fa8e2d84a4e71ca74) All Branches Search

12595 Commits (1575fdaab45df4bb493b774fa8e2d84a4e71ca74)

All Branches