Commit Graph

12658 Commits (00cedd0bd3a185b238ae057ab783baf3b3c8cfb7)

Author SHA1 Message Date
Yossi Farjoun 00cedd0bd3 Merge pull request #352 from broadinstitute/yf_SNPEFF_Stratifier
moved SnpEffUtilUnitTest to public tree
2013-07-30 14:52:33 -07:00
Yossi Farjoun 284176cd7b moved SnpEffUtilUnitTest to public tree 2013-07-30 17:51:40 -04:00
droazen b8709b1942 Merge pull request #332 from broadinstitute/st_fpga_hmm
FPGA support for PairHMM
2013-07-30 14:21:21 -07:00
Eric Banks ac06829194 Merge pull request #349 from broadinstitute/yf_SNPEFF_Stratifier
Adding a representation of the hierarchy of flags output by snpEff (Yoss...
2013-07-30 12:42:25 -07:00
Joseph Rose d2860a5486 Adding a representation of the hierarchy of flags output by snpEff (Yossi) and a stratifier whose output states are coding regions, genes, stop_gain, stop_lost and splice sites, all determined by the snpEff hierarchy (J. Rose) 2013-07-30 15:38:32 -04:00
Mauricio Carneiro 7b731dd596 Removed native method call
and fixed indentation.
2013-07-30 13:59:58 -04:00
chartl cf46256356 Merge pull request #350 from broadinstitute/chartl_genotypeconcordance_doc_cleanup
Add <pre> tags to the Genotype Concordance docs. Tables were not being d...
2013-07-29 16:17:26 -07:00
Chris Hartl 464a5b229d Add <pre> tags to the Genotype Concordance docs. Tables were not being displayed properly. 2013-07-29 15:48:17 -07:00
Eric Banks 678d038c76 Merge pull request #348 from broadinstitute/gg_gatkdoc_fixes
Gg gatkdoc fixes
2013-07-26 13:17:51 -07:00
Geraldine Van der Auwera 3063d82797 Fixed example in CallableLoci gatkdoc 2013-07-26 15:51:31 -04:00
Geraldine Van der Auwera fc4a8b1dd0 Fixed example in DoC gatkdoc 2013-07-26 15:51:30 -04:00
Geraldine Van der Auwera 660b075900 Added deprecation notice for SomaticIndelDetector 2013-07-26 15:51:30 -04:00
Geraldine Van der Auwera 5ad99c362d Added caveat to gatkdocs for MAPQ read transformers & cleaned up AB annotation gatkdocs 2013-07-26 15:51:30 -04:00
Geraldine Van der Auwera 0ea3f8ca58 Added function to gatkdocs to specify what VCF field an annotation goes in (INFO or FORMAT) 2013-07-26 15:51:30 -04:00
Geraldine Van der Auwera edbd17b8e0 Added note of caution to VQSR gatkdocs for option BOTH of recalibration mode 2013-07-26 15:51:29 -04:00
Ryan Poplin f52196496d Merge pull request #347 from broadinstitute/eb_more_dnagling_tail_improvements
More specific fix for the dangling tail edge case with a single leading deletion.
2013-07-26 07:25:47 -07:00
Ryan Poplin 66db412ad0 Merge pull request #345 from broadinstitute/rp_vqsr_sort_annotations
Automatically order the annotation dimensions in the VQSR by their stand...
2013-07-26 07:23:42 -07:00
Ryan Poplin 8c205dda1b Automatically order the annotation dimensions in the VQSR by their standard deviation instead of the order they were specified on the command line. 2013-07-26 10:22:43 -04:00
Eric Banks 924d9b7ef4 Merge pull request #344 from lbergelson/lb_library_read_filter
Adding LibraryReadFilter.
2013-07-26 06:44:53 -07:00
Louis Bergelson 7c43b5f26a Adding LibraryReadFilter.
--Moving LibraryReadFilter which has been part of Mutect into gatk public.
--Added an additional check for null values.
2013-07-26 09:32:14 -04:00
Eric Banks 9372c5ef41 Merge pull request #334 from broadinstitute/mc_generic_input_for_qualify_missing_intervals
QualifyMissingIntervals: support different formats
2013-07-25 12:39:26 -07:00
sathibault 71eb944e62 Adding CnyPairHMMUnitTest 2013-07-25 14:19:50 -05:00
Eric Banks 1b25cf471c Merge pull request #341 from broadinstitute/eb_make_all_rr_stranded
Eb make all rr stranded
2013-07-25 11:50:43 -07:00
Eric Banks 5dfa863caa Fully stranded implementation of RR (plus bug fix for insertions and het compression).
Now only filtered reads are unstranded.  All consensus reads have strand, so that we
emit 2 consensus reads in general now: one for each strand.

This involved some refactoring of the sliding window which cleaned it up a lot.

Also included is a bug fix:
insertions downstream of a variant region weren't triggering a stop to the compression.
2013-07-25 14:48:53 -04:00
Eric Banks 0a2b5ddadf More specific fix for the dangling tail edge case with a single leading deletion.
The previous fix was too general (and therefore incorrect) and caused the HC to exception out.
Added "unit" test for this exact case.
2013-07-25 12:24:46 -04:00
Mauricio Carneiro 31ab0824b1 quick indentation fixes to FPGA code 2013-07-24 14:09:49 -04:00
Ryan Poplin e5aab22680 Merge pull request #342 from broadinstitute/eb_fix_mq_in_rbp
Fixing ReadBackedPileup to represent mapping qualities as ints, not (signed) bytes.
2013-07-24 09:42:13 -07:00
Eric Banks 6df43f730a Fixing ReadBackedPileup to represent mapping qualities as ints, not (signed) bytes.
Having them as bytes caused problems for downstream programmers who had data with high MQs.
2013-07-23 23:47:15 -04:00
Eric Banks 71222bff45 Merge pull request #340 from broadinstitute/eb_fix_okaytomiss_arg
Various updates for KB, mostly so that reviews through IGV work properly...
2013-07-21 19:01:01 -07:00
Eric Banks 672fbe437e Various updates for KB, mostly so that reviews through IGV work properly.
1. Fix for the -okayToMiss argument for indels.
In cases where we make calls with different alleles, it wasn't allowing us to skip the site for FNs.

2. Need to add confidence and isComplexEvent attributes to the equality and duplicate checks in MVC.

3. Treat unknown confidences as reviewed for now.
We need this until IGV gets updated to use confidences for reviews.
2013-07-21 21:20:05 -04:00
David Roazen 224fec7379 parallel test runner: automount triggers
Try to reduce the number of tests failing with file not found
errors due to random automount failures by cd'ing into a
preset list of directories at the start of each job in an
effort to trigger automount.
2013-07-19 15:13:11 -04:00
David Roazen c72880f1d0 build.xml: make ant -p output only important build targets
ant -p outputs only targets that have description attributes.
Modify build.xml so only important targets that users might actually
want to use are output by ant -p.
2013-07-19 14:35:00 -04:00
Eric Banks c7c65502b4 Merge pull request #339 from broadinstitute/gda_1000g_calling
Committing staging/calling 1000g script to posterity
2013-07-19 09:47:39 -07:00
Guillermo del Angel b0e7ffb931 Committing staging/calling 1000g script to posterity 2013-07-19 12:16:53 -04:00
Eric Banks b9e3f56c5d Merge pull request #338 from broadinstitute/eb_allow_ceu_trio_in_assessments_again
Allow the CEU trio best practices in the assessments again (it is assign...
2013-07-18 21:42:28 -07:00
droazen b992dcd9c2 Merge pull request #337 from broadinstitute/dr_runtime_sample_renaming_GSA-974
GATK engine: add ability to do on-the-fly BAM file sample renaming at runtime
2013-07-18 12:51:02 -07:00
David Roazen 605a5ac2e3 GATK engine: add ability to do on-the-fly BAM file sample renaming at runtime
-User must provide a mapping file via new --sample_rename_mapping_file argument.
 Mapping file must contain a mapping from absolute bam file path to new sample name
 (format is described in the docs for the argument).

-Requires that each bam file listed in the mapping file contain only one sample
 in their headers (they may contain multiple read groups for that sample, however).
 The engine enforces this, and throws a UserException if on-the-fly renaming is
 requested for a multi-sample bam.

-Not all bam files for a traversal need to be listed in the mapping file.

-On-the-fly renaming is done as the VERY first step after creating the SAMFileReaders
 in SAMDataSource (before the headers are even merged), to prevent possible consistency
 issues.

-Renaming is done ONCE at traversal start for each SAMReaders resource creation in the
 SAMResourcePool; this effectively means once per -nt thread

-Comprehensive unit/integration tests

Known issues: -if you specify the absolute path to a bam in the mapping file, and then
               provide a path to that same bam to -I using SYMLINKS, the renaming won't
               work. The absolute paths will look different to the engine due to the
               symlink being present in one path and not in the other path.

GSA-974 #resolve
2013-07-18 15:48:42 -04:00
Eric Banks ca09000584 Allow the CEU trio best practices in the assessments again (it is assigned a low confidence) 2013-07-18 15:18:00 -04:00
Eric Banks 9121c70510 Fixing merge conflicts.
Merged bug fix from Stable into Unstable

Conflicts:
	protected/java/test/org/broadinstitute/sting/gatk/walkers/compression/reducereads/SlidingWindowUnitTest.java
2013-07-18 14:43:03 -04:00
Eric Banks ba531bd5e6 Fixing the 'header is negative' problem in Reduce Reads... again.
Previous fixes and tests only covered trailing soft-clips.  Now that up front
hard-clipping is working properly though, we were failing on those in the tool.

Added a patch for this as well as a separate test independent of the soft-clips
to make sure that it's working properly.
2013-07-18 13:56:46 -04:00
Eric Banks bf5ce41321 Merge pull request #336 from broadinstitute/gda_ancient_dna_pls
Last feature request from Reich/Paavo labs: the allSitePLs feature in UG...
2013-07-18 10:47:52 -07:00
Guillermo del Angel 9dd109b79a Last feature request from Reich/Paavo labs: the allSitePLs feature in UG worked but not quite filled requirements. What's needed is the ability to have all 10 PLs for EVERY site, regardless of whether they are variant or not. Previous version only emitted the 10 PLs in reference sites. Problem is that, if all PLs are emitted in all sites and every single site is quad-allelic (only way to have the PLs printed out in a valid way) then the ability to filter variants and to use the INFO fields may be compromised.
So, compromise solution is to go back to having biallelic PLs but emit a new FORMAT field, called APL, which has the 10 values, but all other statistics and regular PLs are computed as before.
Note that integration test had to be disabled, as the BCF2 codec apparently doesn't support writing into genotype fields other than PL,DP,AD,GQ,FT and GT.
2013-07-18 12:54:52 -04:00
Eric Banks 234f564009 Merge pull request #335 from broadinstitute/eb_improvements_to_KB_assessment
Several improvements to the NA12878 knowledge base.
2013-07-18 08:47:03 -07:00
Eric Banks 5d1454c6b0 Several improvements to the NA12878 knowledge base.
1. All NA12878DBWalkers that export/emit sites need to do so in order; also one should be able
to use -L with them and not have it iterate over all possible sites.
Updated ExportReviews and ExtractConsensusSites to adhere to these constraints.

2. Added the option to AssessNA12878 to have it ignore FNs that overlap with a provided VCF.
This is useful if you have a list of sites from reviews that are okay to be missed in
particular techs only (because for some reason there is coverage but no evidence of the
alternate allele in them) - intended to be used with Jenkins.

3. Hooked up the logic of complex events all the way through the KB.
Now the consensus incorporates whether a call is complex and the assessor does not penalize for them.

4. Fixed long-standing bug that I managed to find accidentally:
AssessNA12878 was closing its DB connection before its final call to includeMissingCalls().

5. Hooked up the per-call confidences through the KB.
We no longer have a 2-tiered priority system in the KB (reviews and everything else) but instead
use a quasi-Bayesian estimator (will update to proper Bayesian treatment if needed).
Now ImportCallset and ImportReviews assigns confidences as appropriate.
Also needed to fix up the consensus logic for calls with UNKNOWN status.
2013-07-18 11:05:36 -04:00
David Roazen 6440f926d3 Parallel tests: use /broad/hptmp as working dir instead of /broad/classA-test
-Class A test filesystem was getting slow, and wasn't suitable for long-term
 use anyway
2013-07-15 16:08:48 -04:00
Eric Banks f2fca40b2b Merge pull request #333 from broadinstitute/dr_fix_SAMReaderID_hashing
SAMReaderID: fix bug with hash code and equals() method
2013-07-15 12:20:50 -07:00
chartl 28b8815688 Merge pull request #301 from broadinstitute/mc_tsca_project
Barcodes per Amplicon count tool (private tool)
2013-07-15 11:21:41 -07:00
David Roazen c15751e41e SAMReaderID: fix bug with hash code and equals() method
-Two SAMReaderIDs that pointed at the same underlying bam file through
 a relative vs. an absolute path were not being treated as equal, and
 had different hash codes. This was causing problems in the engine, since
 SAMReaderIDs are often used as the keys of HashMaps.

-Fix: explicitly use the absolute path to the encapsulated bam file in
 hashCode() and equals()

-Added tests to ensure this doesn't break again
2013-07-15 13:57:00 -04:00
Eric Banks 51b95589e5 Merge pull request #331 from broadinstitute/gda_pool_caller_paper
Committing pool caller script changes for posterity: mostly updating the...
2013-07-15 09:06:14 -07:00
Guillermo del Angel 464cf33dd3 Committing pool caller script changes for posterity: mostly updating the reference sample calls to latest gold standard, adding filtering tweaks and redo of R scripts. 2013-07-15 11:52:06 -04:00