Commit Graph

12725 Commits (cdfd07f9eb4e2ca18b2b6b10d00797cd3a156ebd)

Author SHA1 Message Date
Guillermo del Angel b0e7ffb931 Committing staging/calling 1000g script to posterity 2013-07-19 12:16:53 -04:00
Eric Banks b9e3f56c5d Merge pull request #338 from broadinstitute/eb_allow_ceu_trio_in_assessments_again
Allow the CEU trio best practices in the assessments again (it is assign...
2013-07-18 21:42:28 -07:00
droazen b992dcd9c2 Merge pull request #337 from broadinstitute/dr_runtime_sample_renaming_GSA-974
GATK engine: add ability to do on-the-fly BAM file sample renaming at runtime
2013-07-18 12:51:02 -07:00
David Roazen 605a5ac2e3 GATK engine: add ability to do on-the-fly BAM file sample renaming at runtime
-User must provide a mapping file via new --sample_rename_mapping_file argument.
 Mapping file must contain a mapping from absolute bam file path to new sample name
 (format is described in the docs for the argument).

-Requires that each bam file listed in the mapping file contain only one sample
 in their headers (they may contain multiple read groups for that sample, however).
 The engine enforces this, and throws a UserException if on-the-fly renaming is
 requested for a multi-sample bam.

-Not all bam files for a traversal need to be listed in the mapping file.

-On-the-fly renaming is done as the VERY first step after creating the SAMFileReaders
 in SAMDataSource (before the headers are even merged), to prevent possible consistency
 issues.

-Renaming is done ONCE at traversal start for each SAMReaders resource creation in the
 SAMResourcePool; this effectively means once per -nt thread

-Comprehensive unit/integration tests

Known issues: -if you specify the absolute path to a bam in the mapping file, and then
               provide a path to that same bam to -I using SYMLINKS, the renaming won't
               work. The absolute paths will look different to the engine due to the
               symlink being present in one path and not in the other path.

GSA-974 #resolve
2013-07-18 15:48:42 -04:00
Eric Banks ca09000584 Allow the CEU trio best practices in the assessments again (it is assigned a low confidence) 2013-07-18 15:18:00 -04:00
Eric Banks 9121c70510 Fixing merge conflicts.
Merged bug fix from Stable into Unstable

Conflicts:
	protected/java/test/org/broadinstitute/sting/gatk/walkers/compression/reducereads/SlidingWindowUnitTest.java
2013-07-18 14:43:03 -04:00
Eric Banks ba531bd5e6 Fixing the 'header is negative' problem in Reduce Reads... again.
Previous fixes and tests only covered trailing soft-clips.  Now that up front
hard-clipping is working properly though, we were failing on those in the tool.

Added a patch for this as well as a separate test independent of the soft-clips
to make sure that it's working properly.
2013-07-18 13:56:46 -04:00
Eric Banks bf5ce41321 Merge pull request #336 from broadinstitute/gda_ancient_dna_pls
Last feature request from Reich/Paavo labs: the allSitePLs feature in UG...
2013-07-18 10:47:52 -07:00
Guillermo del Angel 9dd109b79a Last feature request from Reich/Paavo labs: the allSitePLs feature in UG worked but not quite filled requirements. What's needed is the ability to have all 10 PLs for EVERY site, regardless of whether they are variant or not. Previous version only emitted the 10 PLs in reference sites. Problem is that, if all PLs are emitted in all sites and every single site is quad-allelic (only way to have the PLs printed out in a valid way) then the ability to filter variants and to use the INFO fields may be compromised.
So, compromise solution is to go back to having biallelic PLs but emit a new FORMAT field, called APL, which has the 10 values, but all other statistics and regular PLs are computed as before.
Note that integration test had to be disabled, as the BCF2 codec apparently doesn't support writing into genotype fields other than PL,DP,AD,GQ,FT and GT.
2013-07-18 12:54:52 -04:00
Eric Banks 234f564009 Merge pull request #335 from broadinstitute/eb_improvements_to_KB_assessment
Several improvements to the NA12878 knowledge base.
2013-07-18 08:47:03 -07:00
Eric Banks 5d1454c6b0 Several improvements to the NA12878 knowledge base.
1. All NA12878DBWalkers that export/emit sites need to do so in order; also one should be able
to use -L with them and not have it iterate over all possible sites.
Updated ExportReviews and ExtractConsensusSites to adhere to these constraints.

2. Added the option to AssessNA12878 to have it ignore FNs that overlap with a provided VCF.
This is useful if you have a list of sites from reviews that are okay to be missed in
particular techs only (because for some reason there is coverage but no evidence of the
alternate allele in them) - intended to be used with Jenkins.

3. Hooked up the logic of complex events all the way through the KB.
Now the consensus incorporates whether a call is complex and the assessor does not penalize for them.

4. Fixed long-standing bug that I managed to find accidentally:
AssessNA12878 was closing its DB connection before its final call to includeMissingCalls().

5. Hooked up the per-call confidences through the KB.
We no longer have a 2-tiered priority system in the KB (reviews and everything else) but instead
use a quasi-Bayesian estimator (will update to proper Bayesian treatment if needed).
Now ImportCallset and ImportReviews assigns confidences as appropriate.
Also needed to fix up the consensus logic for calls with UNKNOWN status.
2013-07-18 11:05:36 -04:00
David Roazen 6440f926d3 Parallel tests: use /broad/hptmp as working dir instead of /broad/classA-test
-Class A test filesystem was getting slow, and wasn't suitable for long-term
 use anyway
2013-07-15 16:08:48 -04:00
Eric Banks f2fca40b2b Merge pull request #333 from broadinstitute/dr_fix_SAMReaderID_hashing
SAMReaderID: fix bug with hash code and equals() method
2013-07-15 12:20:50 -07:00
chartl 28b8815688 Merge pull request #301 from broadinstitute/mc_tsca_project
Barcodes per Amplicon count tool (private tool)
2013-07-15 11:21:41 -07:00
David Roazen c15751e41e SAMReaderID: fix bug with hash code and equals() method
-Two SAMReaderIDs that pointed at the same underlying bam file through
 a relative vs. an absolute path were not being treated as equal, and
 had different hash codes. This was causing problems in the engine, since
 SAMReaderIDs are often used as the keys of HashMaps.

-Fix: explicitly use the absolute path to the encapsulated bam file in
 hashCode() and equals()

-Added tests to ensure this doesn't break again
2013-07-15 13:57:00 -04:00
Eric Banks 51b95589e5 Merge pull request #331 from broadinstitute/gda_pool_caller_paper
Committing pool caller script changes for posterity: mostly updating the...
2013-07-15 09:06:14 -07:00
Guillermo del Angel 464cf33dd3 Committing pool caller script changes for posterity: mostly updating the reference sample calls to latest gold standard, adding filtering tweaks and redo of R scripts. 2013-07-15 11:52:06 -04:00
Scott Thibault 5d198d3400 Added write to likelihoods.txt for batch hmm 2013-07-15 10:16:39 -05:00
Mauricio Carneiro 3459eab413 Barcodes per Amplicon count tool
Tool to count the number of barcodes (TSCA degenerate bases) per amplicon (not interval/target). Useful for TSCA quality control.
2013-07-15 11:04:13 -04:00
sathibault 0a8f75b953 Merge branch 'master' into st_fpga_hmm
Conflicts:
	protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java
2013-07-15 08:17:32 -05:00
Eric Banks 1575fdaab4 Merge pull request #330 from broadinstitute/yf_adding_per_sample_allele_biased_downsampling_to_HC
AlleleBiasedDownsampling for HaplotypeCaller
2013-07-13 18:50:32 -07:00
Mauricio Carneiro 8c07614321 QualifyMissingIntervals: support different formats
Problem
-------
Qualify Missing Intervals only accepted GATK formatted interval files for it's coding sequence and bait parameters.

Solution
-------
There is no reason for such limitation, I erased all the code that did the parsing and used IntervalUtils to parse it (therefore, now it handles any type of interval file that the GATK can handle).

ps: Also added an average depth column to the output
2013-07-12 17:32:53 -04:00
Yossi Farjoun afcf7b96db - Added per-sample AlleleBiasedDownsampling capability to HaplotypeCaller
- Added integration test to show that providing a contamination value and providing same value via a file results in the same VCF

- overrode default contamination value in test
2013-07-12 16:22:02 -04:00
delangel 7ddf85c040 Merge pull request #329 from broadinstitute/eb_more_sensitivity_improvements_to_the_HC
A whole slew of improvements to the Haplotype Caller and related code.
2013-07-12 10:37:43 -07:00
Eric Banks b16c7ce050 A whole slew of improvements to the Haplotype Caller and related code.
1. Some minor refactorings and claenup (e.g. removing unused imports) throughout.

2. Updates to the KB assessment functionality:
   a. Exclude duplicate reads when checking to see whether there's enough coverage to make a call.
   b. Lower the threshold on FS for FPs that would easily be filtered since it's only single sample calling.

3. Make the HC consistent in how it treats the pruning factor.  As part of this I removed and archived
   the DeBruijn assembler.

4. Improvements to the likelihoods for the HC
   a. We now include a "tristate" correction in the PairHMM (just like we do with UG).  Basically, we need
      to divide e by 3 because the observed base could have come from any of the non-observed alleles.
   b. We now correct overlapping read pairs.  Note that the fragments are not merged (which we know is
      dangerous).  Rather, the overlapping bases are just down-weighted so that their quals are not more
      than Q20 (or more specifically, half of the phred-scaled PCR error rate); mismatching bases are
      turned into Q0s for now.
   c. We no longer run contamination removal by default in the UG or HC.  The exome tends to have real
      sites with off kilter allele balances and we occasionally lose them to contamination removal.

5. Improved the dangling tail merging implementation.
2013-07-12 10:09:10 -04:00
Eric Banks 3e1a96844b Merge pull request #328 from broadinstitute/dr_dependency_analyzer_output_loader
DependencyAnalyzerOutputLoader: allows output from the dependency analyzer to be loaded and queried
2013-07-11 20:31:17 -07:00
David Roazen 8ef4e6c9f7 DependencyAnalyzerOutputLoader: allows output from the dependency analyzer to be loaded and queried 2013-07-11 16:15:02 -04:00
sathibault 23fe3e449a Revert "Fixed batching bug."
This reverts commit 3e56c83d0eec7c374e5f187d1ef124d42ecc071e.
2013-07-11 11:30:37 -05:00
sathibault 7458b59bb3 Fixed batching bug. 2013-07-11 11:08:46 -05:00
Eric Banks ccc0ee5b4d Merge pull request #327 from broadinstitute/gda_large_indel_improvement
Moved some HC parameters related to active region extensions to command ...
2013-07-11 06:58:13 -07:00
Guillermo del Angel aba55dbb23 Moved some HC parameters related to active region extensions to command line arguments so that they're more easily modified. Some of these parameters need tinkering in order to call some large indels. See GSA-891 and subtasks for particular examples thereof. 2013-07-10 14:31:10 -04:00
droazen 2d81234ed8 Merge pull request #326 from broadinstitute/dr_dependency_analyzer_add_manual_dependencies
Dependency analyzer improvements
2013-07-09 10:12:50 -07:00
David Roazen de06eda6ac Dependency analyzer improvements
-Add ability to manually specify dependencies on the command line. This allows one
 to specify, for example, that all walkers depend on the GeneralCallingPipeline
 QScript, even though they don't have any compile-time dependencies on that QScript.

-Check that the provided walker class is valid in DependencyAnalyzer.xml

-Check ant exit status in the front-end script

-Fix bug where analyzer would give incorrect results if the list of changed
 Java classes was empty
2013-07-09 13:12:12 -04:00
Eric Banks 5dbb582be7 Merge pull request #310 from broadinstitute/mc_interval_list_to_fastq
Walker to create a fastq file from an interval list
2013-07-08 14:30:43 -07:00
Valentin Ruano Rubio ac77a4c699 Merge pull request #316 from broadinstitute/md_filter_counting
Bugfix for counting of applied filters
2013-07-08 10:58:47 -07:00
Eric Banks 380a01ddc0 Merge pull request #322 from broadinstitute/eb_add_review_indexes_to_repo
It was annoying me that these index files kept showing up in 'git status'
2013-07-08 10:15:46 -07:00
Eric Banks 4fe26ea2cf Merge pull request #323 from broadinstitute/eb_fix_sorting_in_reduce_reads
Reduce Reads output should never be expected to be sorted (hence the nee...
2013-07-08 10:15:28 -07:00
Eric Banks c5a2a8f39f Merge pull request #324 from broadinstitute/eb_AnalyzeCovariates_is_not_deprecated
AnalyzeCovariates is no longer a deprecated tool.
2013-07-08 10:15:10 -07:00
David Roazen bc28a1f236 pipeline test runner: create temp dir if it doesn't exist 2013-07-08 12:34:20 -04:00
Ryan Poplin 1c28bd2ffd Merge pull request #325 from broadinstitute/dr_general_calling_pipeline_walker_list
List of walkers used by the GeneralCallingPipeline
2013-07-08 09:28:13 -07:00
David Roazen 46b453a69d List of walkers used by the GeneralCallingPipeline
For use by the dependency analyzer
2013-07-08 11:53:58 -04:00
Eric Banks 73fc7f6ab1 Reduce Reads output should never be expected to be sorted (hence the need to sort on disk) but for some reason it was with -nwayout mode. 2013-07-08 10:33:36 -04:00
Eric Banks 921f551426 AnalyzeCovariates is no longer a deprecated tool. 2013-07-08 09:48:12 -04:00
Eric Banks 3e357ac8cf It was annoying me that these index files kept showing up in 'git status' 2013-07-08 09:46:08 -04:00
Mark DePristo 0aa9d02570 Merge pull request #321 from broadinstitute/eb_fix_annotating_multiple_comps
Fix bug introduced recently in the VariantAnnotator where only the last ...
2013-07-05 05:05:09 -07:00
Eric Banks 5f5c90e65c Fix bug introduced recently in the VariantAnnotator where only the last -comp was being annotated at a site.
Trivial fix, added integration test to cover it.
2013-07-05 00:04:52 -04:00
David Roazen 6d69c7dc71 Disable RetryMemoryLimit pipeline test
-This test is failing intermittently for unexplained reasons (see GSA-943)

-In the interest of keeping the rest of the pipeline test suite running, it's
 best to disable this one test until GSA-943 is resolved
2013-07-03 13:38:28 -04:00
Tadeusz Jordan 8d00e558fb Merge pull request #317 from broadinstitute/dr_dependency_analyzer
Ant-based walker dependency analyzer
2013-07-03 08:46:37 -07:00
Mark DePristo 7cdb7ac572 Merge pull request #320 from broadinstitute/gg_cleanup_licenses_PT51255639
Deleted old license files
2013-07-03 04:55:15 -07:00
Geraldine Van der Auwera d55dddfdba deleted old license files 2013-07-02 16:36:47 -04:00