gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Guillermo del Angel	b0e7ffb931	Committing staging/calling 1000g script to posterity	2013-07-19 12:16:53 -04:00
Eric Banks	b9e3f56c5d	Merge pull request #338 from broadinstitute/eb_allow_ceu_trio_in_assessments_again Allow the CEU trio best practices in the assessments again (it is assign...	2013-07-18 21:42:28 -07:00
droazen	b992dcd9c2	Merge pull request #337 from broadinstitute/dr_runtime_sample_renaming_GSA-974 GATK engine: add ability to do on-the-fly BAM file sample renaming at runtime	2013-07-18 12:51:02 -07:00
David Roazen	605a5ac2e3	GATK engine: add ability to do on-the-fly BAM file sample renaming at runtime -User must provide a mapping file via new --sample_rename_mapping_file argument. Mapping file must contain a mapping from absolute bam file path to new sample name (format is described in the docs for the argument). -Requires that each bam file listed in the mapping file contain only one sample in their headers (they may contain multiple read groups for that sample, however). The engine enforces this, and throws a UserException if on-the-fly renaming is requested for a multi-sample bam. -Not all bam files for a traversal need to be listed in the mapping file. -On-the-fly renaming is done as the VERY first step after creating the SAMFileReaders in SAMDataSource (before the headers are even merged), to prevent possible consistency issues. -Renaming is done ONCE at traversal start for each SAMReaders resource creation in the SAMResourcePool; this effectively means once per -nt thread -Comprehensive unit/integration tests Known issues: -if you specify the absolute path to a bam in the mapping file, and then provide a path to that same bam to -I using SYMLINKS, the renaming won't work. The absolute paths will look different to the engine due to the symlink being present in one path and not in the other path. GSA-974 #resolve	2013-07-18 15:48:42 -04:00
Eric Banks	ca09000584	Allow the CEU trio best practices in the assessments again (it is assigned a low confidence)	2013-07-18 15:18:00 -04:00
Eric Banks	9121c70510	Fixing merge conflicts. Merged bug fix from Stable into Unstable Conflicts: protected/java/test/org/broadinstitute/sting/gatk/walkers/compression/reducereads/SlidingWindowUnitTest.java	2013-07-18 14:43:03 -04:00
Eric Banks	ba531bd5e6	Fixing the 'header is negative' problem in Reduce Reads... again. Previous fixes and tests only covered trailing soft-clips. Now that up front hard-clipping is working properly though, we were failing on those in the tool. Added a patch for this as well as a separate test independent of the soft-clips to make sure that it's working properly.	2013-07-18 13:56:46 -04:00
Eric Banks	bf5ce41321	Merge pull request #336 from broadinstitute/gda_ancient_dna_pls Last feature request from Reich/Paavo labs: the allSitePLs feature in UG...	2013-07-18 10:47:52 -07:00
Guillermo del Angel	9dd109b79a	Last feature request from Reich/Paavo labs: the allSitePLs feature in UG worked but not quite filled requirements. What's needed is the ability to have all 10 PLs for EVERY site, regardless of whether they are variant or not. Previous version only emitted the 10 PLs in reference sites. Problem is that, if all PLs are emitted in all sites and every single site is quad-allelic (only way to have the PLs printed out in a valid way) then the ability to filter variants and to use the INFO fields may be compromised. So, compromise solution is to go back to having biallelic PLs but emit a new FORMAT field, called APL, which has the 10 values, but all other statistics and regular PLs are computed as before. Note that integration test had to be disabled, as the BCF2 codec apparently doesn't support writing into genotype fields other than PL,DP,AD,GQ,FT and GT.	2013-07-18 12:54:52 -04:00
Eric Banks	234f564009	Merge pull request #335 from broadinstitute/eb_improvements_to_KB_assessment Several improvements to the NA12878 knowledge base.	2013-07-18 08:47:03 -07:00
Eric Banks	5d1454c6b0	Several improvements to the NA12878 knowledge base. 1. All NA12878DBWalkers that export/emit sites need to do so in order; also one should be able to use -L with them and not have it iterate over all possible sites. Updated ExportReviews and ExtractConsensusSites to adhere to these constraints. 2. Added the option to AssessNA12878 to have it ignore FNs that overlap with a provided VCF. This is useful if you have a list of sites from reviews that are okay to be missed in particular techs only (because for some reason there is coverage but no evidence of the alternate allele in them) - intended to be used with Jenkins. 3. Hooked up the logic of complex events all the way through the KB. Now the consensus incorporates whether a call is complex and the assessor does not penalize for them. 4. Fixed long-standing bug that I managed to find accidentally: AssessNA12878 was closing its DB connection before its final call to includeMissingCalls(). 5. Hooked up the per-call confidences through the KB. We no longer have a 2-tiered priority system in the KB (reviews and everything else) but instead use a quasi-Bayesian estimator (will update to proper Bayesian treatment if needed). Now ImportCallset and ImportReviews assigns confidences as appropriate. Also needed to fix up the consensus logic for calls with UNKNOWN status.	2013-07-18 11:05:36 -04:00
David Roazen	6440f926d3	Parallel tests: use /broad/hptmp as working dir instead of /broad/classA-test -Class A test filesystem was getting slow, and wasn't suitable for long-term use anyway	2013-07-15 16:08:48 -04:00
Eric Banks	f2fca40b2b	Merge pull request #333 from broadinstitute/dr_fix_SAMReaderID_hashing SAMReaderID: fix bug with hash code and equals() method	2013-07-15 12:20:50 -07:00
chartl	28b8815688	Merge pull request #301 from broadinstitute/mc_tsca_project Barcodes per Amplicon count tool (private tool)	2013-07-15 11:21:41 -07:00
David Roazen	c15751e41e	SAMReaderID: fix bug with hash code and equals() method -Two SAMReaderIDs that pointed at the same underlying bam file through a relative vs. an absolute path were not being treated as equal, and had different hash codes. This was causing problems in the engine, since SAMReaderIDs are often used as the keys of HashMaps. -Fix: explicitly use the absolute path to the encapsulated bam file in hashCode() and equals() -Added tests to ensure this doesn't break again	2013-07-15 13:57:00 -04:00
Eric Banks	51b95589e5	Merge pull request #331 from broadinstitute/gda_pool_caller_paper Committing pool caller script changes for posterity: mostly updating the...	2013-07-15 09:06:14 -07:00
Guillermo del Angel	464cf33dd3	Committing pool caller script changes for posterity: mostly updating the reference sample calls to latest gold standard, adding filtering tweaks and redo of R scripts.	2013-07-15 11:52:06 -04:00
Scott Thibault	5d198d3400	Added write to likelihoods.txt for batch hmm	2013-07-15 10:16:39 -05:00
Mauricio Carneiro	3459eab413	Barcodes per Amplicon count tool Tool to count the number of barcodes (TSCA degenerate bases) per amplicon (not interval/target). Useful for TSCA quality control.	2013-07-15 11:04:13 -04:00
sathibault	0a8f75b953	Merge branch 'master' into st_fpga_hmm Conflicts: protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java	2013-07-15 08:17:32 -05:00
Eric Banks	1575fdaab4	Merge pull request #330 from broadinstitute/yf_adding_per_sample_allele_biased_downsampling_to_HC AlleleBiasedDownsampling for HaplotypeCaller	2013-07-13 18:50:32 -07:00
Mauricio Carneiro	8c07614321	QualifyMissingIntervals: support different formats Problem ------- Qualify Missing Intervals only accepted GATK formatted interval files for it's coding sequence and bait parameters. Solution ------- There is no reason for such limitation, I erased all the code that did the parsing and used IntervalUtils to parse it (therefore, now it handles any type of interval file that the GATK can handle). ps: Also added an average depth column to the output	2013-07-12 17:32:53 -04:00
Yossi Farjoun	afcf7b96db	- Added per-sample AlleleBiasedDownsampling capability to HaplotypeCaller - Added integration test to show that providing a contamination value and providing same value via a file results in the same VCF - overrode default contamination value in test	2013-07-12 16:22:02 -04:00
delangel	7ddf85c040	Merge pull request #329 from broadinstitute/eb_more_sensitivity_improvements_to_the_HC A whole slew of improvements to the Haplotype Caller and related code.	2013-07-12 10:37:43 -07:00
Eric Banks	b16c7ce050	A whole slew of improvements to the Haplotype Caller and related code. 1. Some minor refactorings and claenup (e.g. removing unused imports) throughout. 2. Updates to the KB assessment functionality: a. Exclude duplicate reads when checking to see whether there's enough coverage to make a call. b. Lower the threshold on FS for FPs that would easily be filtered since it's only single sample calling. 3. Make the HC consistent in how it treats the pruning factor. As part of this I removed and archived the DeBruijn assembler. 4. Improvements to the likelihoods for the HC a. We now include a "tristate" correction in the PairHMM (just like we do with UG). Basically, we need to divide e by 3 because the observed base could have come from any of the non-observed alleles. b. We now correct overlapping read pairs. Note that the fragments are not merged (which we know is dangerous). Rather, the overlapping bases are just down-weighted so that their quals are not more than Q20 (or more specifically, half of the phred-scaled PCR error rate); mismatching bases are turned into Q0s for now. c. We no longer run contamination removal by default in the UG or HC. The exome tends to have real sites with off kilter allele balances and we occasionally lose them to contamination removal. 5. Improved the dangling tail merging implementation.	2013-07-12 10:09:10 -04:00
Eric Banks	3e1a96844b	Merge pull request #328 from broadinstitute/dr_dependency_analyzer_output_loader DependencyAnalyzerOutputLoader: allows output from the dependency analyzer to be loaded and queried	2013-07-11 20:31:17 -07:00
David Roazen	8ef4e6c9f7	DependencyAnalyzerOutputLoader: allows output from the dependency analyzer to be loaded and queried	2013-07-11 16:15:02 -04:00
sathibault	23fe3e449a	Revert "Fixed batching bug." This reverts commit 3e56c83d0eec7c374e5f187d1ef124d42ecc071e.	2013-07-11 11:30:37 -05:00
sathibault	7458b59bb3	Fixed batching bug.	2013-07-11 11:08:46 -05:00
Eric Banks	ccc0ee5b4d	Merge pull request #327 from broadinstitute/gda_large_indel_improvement Moved some HC parameters related to active region extensions to command ...	2013-07-11 06:58:13 -07:00
Guillermo del Angel	aba55dbb23	Moved some HC parameters related to active region extensions to command line arguments so that they're more easily modified. Some of these parameters need tinkering in order to call some large indels. See GSA-891 and subtasks for particular examples thereof.	2013-07-10 14:31:10 -04:00
droazen	2d81234ed8	Merge pull request #326 from broadinstitute/dr_dependency_analyzer_add_manual_dependencies Dependency analyzer improvements	2013-07-09 10:12:50 -07:00
David Roazen	de06eda6ac	Dependency analyzer improvements -Add ability to manually specify dependencies on the command line. This allows one to specify, for example, that all walkers depend on the GeneralCallingPipeline QScript, even though they don't have any compile-time dependencies on that QScript. -Check that the provided walker class is valid in DependencyAnalyzer.xml -Check ant exit status in the front-end script -Fix bug where analyzer would give incorrect results if the list of changed Java classes was empty	2013-07-09 13:12:12 -04:00
Eric Banks	5dbb582be7	Merge pull request #310 from broadinstitute/mc_interval_list_to_fastq Walker to create a fastq file from an interval list	2013-07-08 14:30:43 -07:00
Valentin Ruano Rubio	ac77a4c699	Merge pull request #316 from broadinstitute/md_filter_counting Bugfix for counting of applied filters	2013-07-08 10:58:47 -07:00
Eric Banks	380a01ddc0	Merge pull request #322 from broadinstitute/eb_add_review_indexes_to_repo It was annoying me that these index files kept showing up in 'git status'	2013-07-08 10:15:46 -07:00
Eric Banks	4fe26ea2cf	Merge pull request #323 from broadinstitute/eb_fix_sorting_in_reduce_reads Reduce Reads output should never be expected to be sorted (hence the nee...	2013-07-08 10:15:28 -07:00
Eric Banks	c5a2a8f39f	Merge pull request #324 from broadinstitute/eb_AnalyzeCovariates_is_not_deprecated AnalyzeCovariates is no longer a deprecated tool.	2013-07-08 10:15:10 -07:00
David Roazen	bc28a1f236	pipeline test runner: create temp dir if it doesn't exist	2013-07-08 12:34:20 -04:00
Ryan Poplin	1c28bd2ffd	Merge pull request #325 from broadinstitute/dr_general_calling_pipeline_walker_list List of walkers used by the GeneralCallingPipeline	2013-07-08 09:28:13 -07:00
David Roazen	46b453a69d	List of walkers used by the GeneralCallingPipeline For use by the dependency analyzer	2013-07-08 11:53:58 -04:00
Eric Banks	73fc7f6ab1	Reduce Reads output should never be expected to be sorted (hence the need to sort on disk) but for some reason it was with -nwayout mode.	2013-07-08 10:33:36 -04:00
Eric Banks	921f551426	AnalyzeCovariates is no longer a deprecated tool.	2013-07-08 09:48:12 -04:00
Eric Banks	3e357ac8cf	It was annoying me that these index files kept showing up in 'git status'	2013-07-08 09:46:08 -04:00
Mark DePristo	0aa9d02570	Merge pull request #321 from broadinstitute/eb_fix_annotating_multiple_comps Fix bug introduced recently in the VariantAnnotator where only the last ...	2013-07-05 05:05:09 -07:00
Eric Banks	5f5c90e65c	Fix bug introduced recently in the VariantAnnotator where only the last -comp was being annotated at a site. Trivial fix, added integration test to cover it.	2013-07-05 00:04:52 -04:00
David Roazen	6d69c7dc71	Disable RetryMemoryLimit pipeline test -This test is failing intermittently for unexplained reasons (see GSA-943) -In the interest of keeping the rest of the pipeline test suite running, it's best to disable this one test until GSA-943 is resolved	2013-07-03 13:38:28 -04:00
Tadeusz Jordan	8d00e558fb	Merge pull request #317 from broadinstitute/dr_dependency_analyzer Ant-based walker dependency analyzer	2013-07-03 08:46:37 -07:00
Mark DePristo	7cdb7ac572	Merge pull request #320 from broadinstitute/gg_cleanup_licenses_PT51255639 Deleted old license files	2013-07-03 04:55:15 -07:00
Geraldine Van der Auwera	d55dddfdba	deleted old license files	2013-07-02 16:36:47 -04:00

1 2 3 4 5 ...

12725 Commits (cdfd07f9eb4e2ca18b2b6b10d00797cd3a156ebd) All Branches Search

12725 Commits (cdfd07f9eb4e2ca18b2b6b10d00797cd3a156ebd)

All Branches