gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Guillermo del Angel	9dd109b79a	Last feature request from Reich/Paavo labs: the allSitePLs feature in UG worked but not quite filled requirements. What's needed is the ability to have all 10 PLs for EVERY site, regardless of whether they are variant or not. Previous version only emitted the 10 PLs in reference sites. Problem is that, if all PLs are emitted in all sites and every single site is quad-allelic (only way to have the PLs printed out in a valid way) then the ability to filter variants and to use the INFO fields may be compromised. So, compromise solution is to go back to having biallelic PLs but emit a new FORMAT field, called APL, which has the 10 values, but all other statistics and regular PLs are computed as before. Note that integration test had to be disabled, as the BCF2 codec apparently doesn't support writing into genotype fields other than PL,DP,AD,GQ,FT and GT.	2013-07-18 12:54:52 -04:00
Eric Banks	234f564009	Merge pull request #335 from broadinstitute/eb_improvements_to_KB_assessment Several improvements to the NA12878 knowledge base.	2013-07-18 08:47:03 -07:00
Eric Banks	5d1454c6b0	Several improvements to the NA12878 knowledge base. 1. All NA12878DBWalkers that export/emit sites need to do so in order; also one should be able to use -L with them and not have it iterate over all possible sites. Updated ExportReviews and ExtractConsensusSites to adhere to these constraints. 2. Added the option to AssessNA12878 to have it ignore FNs that overlap with a provided VCF. This is useful if you have a list of sites from reviews that are okay to be missed in particular techs only (because for some reason there is coverage but no evidence of the alternate allele in them) - intended to be used with Jenkins. 3. Hooked up the logic of complex events all the way through the KB. Now the consensus incorporates whether a call is complex and the assessor does not penalize for them. 4. Fixed long-standing bug that I managed to find accidentally: AssessNA12878 was closing its DB connection before its final call to includeMissingCalls(). 5. Hooked up the per-call confidences through the KB. We no longer have a 2-tiered priority system in the KB (reviews and everything else) but instead use a quasi-Bayesian estimator (will update to proper Bayesian treatment if needed). Now ImportCallset and ImportReviews assigns confidences as appropriate. Also needed to fix up the consensus logic for calls with UNKNOWN status.	2013-07-18 11:05:36 -04:00
David Roazen	6440f926d3	Parallel tests: use /broad/hptmp as working dir instead of /broad/classA-test -Class A test filesystem was getting slow, and wasn't suitable for long-term use anyway	2013-07-15 16:08:48 -04:00
Eric Banks	f2fca40b2b	Merge pull request #333 from broadinstitute/dr_fix_SAMReaderID_hashing SAMReaderID: fix bug with hash code and equals() method	2013-07-15 12:20:50 -07:00
chartl	28b8815688	Merge pull request #301 from broadinstitute/mc_tsca_project Barcodes per Amplicon count tool (private tool)	2013-07-15 11:21:41 -07:00
David Roazen	c15751e41e	SAMReaderID: fix bug with hash code and equals() method -Two SAMReaderIDs that pointed at the same underlying bam file through a relative vs. an absolute path were not being treated as equal, and had different hash codes. This was causing problems in the engine, since SAMReaderIDs are often used as the keys of HashMaps. -Fix: explicitly use the absolute path to the encapsulated bam file in hashCode() and equals() -Added tests to ensure this doesn't break again	2013-07-15 13:57:00 -04:00
Eric Banks	51b95589e5	Merge pull request #331 from broadinstitute/gda_pool_caller_paper Committing pool caller script changes for posterity: mostly updating the...	2013-07-15 09:06:14 -07:00
Guillermo del Angel	464cf33dd3	Committing pool caller script changes for posterity: mostly updating the reference sample calls to latest gold standard, adding filtering tweaks and redo of R scripts.	2013-07-15 11:52:06 -04:00
Scott Thibault	5d198d3400	Added write to likelihoods.txt for batch hmm	2013-07-15 10:16:39 -05:00
Mauricio Carneiro	3459eab413	Barcodes per Amplicon count tool Tool to count the number of barcodes (TSCA degenerate bases) per amplicon (not interval/target). Useful for TSCA quality control.	2013-07-15 11:04:13 -04:00
sathibault	0a8f75b953	Merge branch 'master' into st_fpga_hmm Conflicts: protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java	2013-07-15 08:17:32 -05:00
Eric Banks	1575fdaab4	Merge pull request #330 from broadinstitute/yf_adding_per_sample_allele_biased_downsampling_to_HC AlleleBiasedDownsampling for HaplotypeCaller	2013-07-13 18:50:32 -07:00
Mauricio Carneiro	8c07614321	QualifyMissingIntervals: support different formats Problem ------- Qualify Missing Intervals only accepted GATK formatted interval files for it's coding sequence and bait parameters. Solution ------- There is no reason for such limitation, I erased all the code that did the parsing and used IntervalUtils to parse it (therefore, now it handles any type of interval file that the GATK can handle). ps: Also added an average depth column to the output	2013-07-12 17:32:53 -04:00
Yossi Farjoun	afcf7b96db	- Added per-sample AlleleBiasedDownsampling capability to HaplotypeCaller - Added integration test to show that providing a contamination value and providing same value via a file results in the same VCF - overrode default contamination value in test	2013-07-12 16:22:02 -04:00
delangel	7ddf85c040	Merge pull request #329 from broadinstitute/eb_more_sensitivity_improvements_to_the_HC A whole slew of improvements to the Haplotype Caller and related code.	2013-07-12 10:37:43 -07:00
Eric Banks	b16c7ce050	A whole slew of improvements to the Haplotype Caller and related code. 1. Some minor refactorings and claenup (e.g. removing unused imports) throughout. 2. Updates to the KB assessment functionality: a. Exclude duplicate reads when checking to see whether there's enough coverage to make a call. b. Lower the threshold on FS for FPs that would easily be filtered since it's only single sample calling. 3. Make the HC consistent in how it treats the pruning factor. As part of this I removed and archived the DeBruijn assembler. 4. Improvements to the likelihoods for the HC a. We now include a "tristate" correction in the PairHMM (just like we do with UG). Basically, we need to divide e by 3 because the observed base could have come from any of the non-observed alleles. b. We now correct overlapping read pairs. Note that the fragments are not merged (which we know is dangerous). Rather, the overlapping bases are just down-weighted so that their quals are not more than Q20 (or more specifically, half of the phred-scaled PCR error rate); mismatching bases are turned into Q0s for now. c. We no longer run contamination removal by default in the UG or HC. The exome tends to have real sites with off kilter allele balances and we occasionally lose them to contamination removal. 5. Improved the dangling tail merging implementation.	2013-07-12 10:09:10 -04:00
Eric Banks	3e1a96844b	Merge pull request #328 from broadinstitute/dr_dependency_analyzer_output_loader DependencyAnalyzerOutputLoader: allows output from the dependency analyzer to be loaded and queried	2013-07-11 20:31:17 -07:00
David Roazen	8ef4e6c9f7	DependencyAnalyzerOutputLoader: allows output from the dependency analyzer to be loaded and queried	2013-07-11 16:15:02 -04:00
sathibault	23fe3e449a	Revert "Fixed batching bug." This reverts commit 3e56c83d0eec7c374e5f187d1ef124d42ecc071e.	2013-07-11 11:30:37 -05:00
sathibault	7458b59bb3	Fixed batching bug.	2013-07-11 11:08:46 -05:00
Eric Banks	ccc0ee5b4d	Merge pull request #327 from broadinstitute/gda_large_indel_improvement Moved some HC parameters related to active region extensions to command ...	2013-07-11 06:58:13 -07:00
Guillermo del Angel	aba55dbb23	Moved some HC parameters related to active region extensions to command line arguments so that they're more easily modified. Some of these parameters need tinkering in order to call some large indels. See GSA-891 and subtasks for particular examples thereof.	2013-07-10 14:31:10 -04:00
droazen	2d81234ed8	Merge pull request #326 from broadinstitute/dr_dependency_analyzer_add_manual_dependencies Dependency analyzer improvements	2013-07-09 10:12:50 -07:00
David Roazen	de06eda6ac	Dependency analyzer improvements -Add ability to manually specify dependencies on the command line. This allows one to specify, for example, that all walkers depend on the GeneralCallingPipeline QScript, even though they don't have any compile-time dependencies on that QScript. -Check that the provided walker class is valid in DependencyAnalyzer.xml -Check ant exit status in the front-end script -Fix bug where analyzer would give incorrect results if the list of changed Java classes was empty	2013-07-09 13:12:12 -04:00
Eric Banks	5dbb582be7	Merge pull request #310 from broadinstitute/mc_interval_list_to_fastq Walker to create a fastq file from an interval list	2013-07-08 14:30:43 -07:00
Valentin Ruano Rubio	ac77a4c699	Merge pull request #316 from broadinstitute/md_filter_counting Bugfix for counting of applied filters	2013-07-08 10:58:47 -07:00
Eric Banks	380a01ddc0	Merge pull request #322 from broadinstitute/eb_add_review_indexes_to_repo It was annoying me that these index files kept showing up in 'git status'	2013-07-08 10:15:46 -07:00
Eric Banks	4fe26ea2cf	Merge pull request #323 from broadinstitute/eb_fix_sorting_in_reduce_reads Reduce Reads output should never be expected to be sorted (hence the nee...	2013-07-08 10:15:28 -07:00
Eric Banks	c5a2a8f39f	Merge pull request #324 from broadinstitute/eb_AnalyzeCovariates_is_not_deprecated AnalyzeCovariates is no longer a deprecated tool.	2013-07-08 10:15:10 -07:00
David Roazen	bc28a1f236	pipeline test runner: create temp dir if it doesn't exist	2013-07-08 12:34:20 -04:00
Ryan Poplin	1c28bd2ffd	Merge pull request #325 from broadinstitute/dr_general_calling_pipeline_walker_list List of walkers used by the GeneralCallingPipeline	2013-07-08 09:28:13 -07:00
David Roazen	46b453a69d	List of walkers used by the GeneralCallingPipeline For use by the dependency analyzer	2013-07-08 11:53:58 -04:00
Eric Banks	73fc7f6ab1	Reduce Reads output should never be expected to be sorted (hence the need to sort on disk) but for some reason it was with -nwayout mode.	2013-07-08 10:33:36 -04:00
Eric Banks	921f551426	AnalyzeCovariates is no longer a deprecated tool.	2013-07-08 09:48:12 -04:00
Eric Banks	3e357ac8cf	It was annoying me that these index files kept showing up in 'git status'	2013-07-08 09:46:08 -04:00
Mark DePristo	0aa9d02570	Merge pull request #321 from broadinstitute/eb_fix_annotating_multiple_comps Fix bug introduced recently in the VariantAnnotator where only the last ...	2013-07-05 05:05:09 -07:00
Eric Banks	5f5c90e65c	Fix bug introduced recently in the VariantAnnotator where only the last -comp was being annotated at a site. Trivial fix, added integration test to cover it.	2013-07-05 00:04:52 -04:00
David Roazen	6d69c7dc71	Disable RetryMemoryLimit pipeline test -This test is failing intermittently for unexplained reasons (see GSA-943) -In the interest of keeping the rest of the pipeline test suite running, it's best to disable this one test until GSA-943 is resolved	2013-07-03 13:38:28 -04:00
Tadeusz Jordan	8d00e558fb	Merge pull request #317 from broadinstitute/dr_dependency_analyzer Ant-based walker dependency analyzer	2013-07-03 08:46:37 -07:00
Mark DePristo	7cdb7ac572	Merge pull request #320 from broadinstitute/gg_cleanup_licenses_PT51255639 Deleted old license files	2013-07-03 04:55:15 -07:00
Geraldine Van der Auwera	d55dddfdba	deleted old license files	2013-07-02 16:36:47 -04:00
Mark DePristo	3db02e5ef1	Merge pull request #315 from broadinstitute/md_ref_conf_hc Reference confidence model for the haplotype caller	2013-07-02 13:04:33 -07:00
droazen	2e87d09c26	Merge pull request #319 from broadinstitute/dr_packaging_system_fail_gracefully_when_bcel_not_installed Fail gracefully in the packaging system when bcel is not installed	2013-07-02 13:01:45 -07:00
David Roazen	75d1f64416	Fail gracefully in the packaging system when bcel is not installed Packaging the GATK requires bcel to be installed. Detect when it's not, and output instructions on how to install it.	2013-07-02 15:50:51 -04:00
Mark DePristo	35cdc16822	Merge pull request #318 from broadinstitute/dr_improve_dcov_documentation Improve -dcov documentation to address recent user confusion	2013-07-02 12:47:29 -07:00
Mark DePristo	5f34054cc1	Remove filtering of MAPQ 0 reads from CalledHaplotypeBAMWriter	2013-07-02 15:46:49 -04:00
Mark DePristo	7be01777f6	Bugfix for incPos in GenomeLoc -- Shouldn't have taken a GenomeLoc as an argument, as it's a instance method, not a public static	2013-07-02 15:46:49 -04:00
Mark DePristo	ed0b1c5aba	Fix bug in ReadThreadingAssembler in cycle failures causing NPE	2013-07-02 15:46:48 -04:00
Mark DePristo	e3e8631ff5	Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences -- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction. Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure -- -- Output format looks like: 20 10026072 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120 20 10026073 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119 20 10026074 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,121 20 10026075 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119 20 10026076 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120 20 10026077 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120 20 10026078 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:5,0:5:15:0,15,217 20 10026079 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,240 20 10026080 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,268 20 10026081 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:7,0:7:21:0,21,267 We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values. Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty. -- Can we enabled for single samples with --emitRefConfidence (-ERC). -- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval. The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads -- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures. Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class. -- Includes GVCF writer -- Add 1 mb of WEx data to private/testdata -- Integration tests for reference model output for WGS and WEx data -- Emit GQ block information into VCF header for GVCF mode -- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC -- Control max indel size for the reference confidence model from the command line. Increase default to 10 -- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest -- Unittests for ReferenceConfidenceModel -- Unittests for new MathUtils functions	2013-07-02 15:46:38 -04:00

... 3 4 5 6 7 ...

12817 Commits (f22ab033f6de11053a33bb7bbfa2e2e856d5ee57) All Branches Search

12817 Commits (f22ab033f6de11053a33bb7bbfa2e2e856d5ee57)

All Branches