Commit Graph

12333 Commits (d242f1bba3fa0578ec9cfa7aa35a5bce3e99616e)

Author SHA1 Message Date
Eric Banks d242f1bba3 Secondary alignments were not handled correctly in IndelRealigner
* This is emerging now because BWA-MEM produces lots of reads that are not primary alignments
 * The ConstrainedMateFixingManager class used by IndelRealigner was mis-adjusting SAM flags because it
     was getting confused by these secondary alignments
 * Added unit test to cover this case
2013-05-06 19:09:10 -04:00
Eric Banks b53336c2d0 Added hidden mode for BQSR to force all read groups to be the same one.
* Very useful for debugging sample-specific issues
 * This argument got lost in the transition from BQSR v1 to v2
 * Added unit test to cover this case
2013-05-06 19:09:10 -04:00
chartl 98021db264 Merge pull request #208 from broadinstitute/yf_fix_molten_GenotypeConcordance
- Fixed a small bug in the printout of molten data in GenotypeConcordanc...
2013-05-06 08:42:06 -07:00
Mark DePristo 084a464779 Merge pull request #213 from broadinstitute/gda_md5_fix
Re-fix md5's that changed due to conflicting pushes
2013-05-03 16:21:46 -07:00
Guillermo del Angel 874dc8f9c1 Re-fix md5's that changed due to conflicting pushes 2013-05-03 14:59:16 -04:00
Mark DePristo 8bac61f9fb Merge pull request #212 from broadinstitute/md_misc_improvements
Several important bugfixes / improvements to general GATK infrastructure
2013-05-03 08:20:55 -07:00
Mark DePristo f42bb86bdd e# This is a combination of 2 commits.
Only try to clip adaptors when both reads of the pair are on opposite strands

-- Read pairs that have unusual alignments, such as two reads both oriented like:

  <-----
     <-----

where previously having their adaptors clipped as though the standard calculation of the insert size was meaningful, which it is not for such oddly oriented pairs.  This caused us to clip extra good bases from reads.
-- Update MD5s due change in adaptor clipping, which add some coverage in some places
2013-05-03 11:19:14 -04:00
Mark DePristo 0587a145bf Utils.dupString should allow 0 number of duplicates to produce empty string 2013-05-03 09:32:05 -04:00
Mark DePristo f5a301fb63 Bugfix for AlignmentUtils.trimCigarByBases
-- Previous version would trim down 2M2D2M into 2M if you asked for the first 2 bases, but this can result in incorrect alignment of the bases to the reference as the bases no longer span the full reference interval expected.  Fixed and added unit tests
2013-05-03 09:32:05 -04:00
Mark DePristo 2bcbdd469f leftAlignCigarSequentially now supports haplotypes with insertions and deletions where the deletion allele was previously removed by the leftAlignSingleIndel during it's cleanup phase. 2013-05-03 09:32:05 -04:00
Eric Banks aefddaa219 Merge pull request #210 from broadinstitute/gda_QUAL_fix_GSA-910
Rev'd up Picard to get PL fix: PLs were saturated to 32767 (Short.MAX_VA...
2013-05-03 06:28:10 -07:00
Mark DePristo 4f627e96c0 Merge pull request #211 from broadinstitute/md_kb_script
Starting NA12878 KB requires use Java-1.7
2013-05-03 04:30:23 -07:00
Mark DePristo b89d97cb9c Starting NA12878 KB requires use Java-1.7 2013-05-03 07:29:29 -04:00
Guillermo del Angel 0c30a5ebc6 Rev'd up Picard to get PL fix: PLs were saturated to 32767 (Short.MAX_VALUE) when converting from GL to integers. Increase capping to Integer.MAX_VALUE (2^31-1) which should be enough for reasonable sites now. Integration tests change because some tests have some hyper-deep pileups where this case was hit 2013-05-02 16:31:43 -04:00
Eric Banks c6df20cff5 Merge pull request #209 from broadinstitute/eb_more_fixes_to_bundle_script
Now that we don't generate dict and fai files, the resource script needs...
2013-05-02 12:23:41 -07:00
Eric Banks d981fd01b8 Now that we don't generate dict and fai files, the resource script needs to copy them to the bundle. 2013-05-02 15:18:13 -04:00
David Roazen 13bfa963da Revert changes to exampleFASTA.fasta.fai for now to get tests passing again 2013-05-02 12:59:20 -04:00
Mark DePristo dfdc0df4f1 Merge pull request #207 from broadinstitute/eb_fixing_bundle_script
Fixing the bundle script
2013-05-02 08:18:19 -07:00
Eric Banks f88a964e2c Adding .fai file to example fasta since we don't generate it anymore 2013-05-02 10:54:32 -04:00
Eric Banks 6d0e383a60 Fixing the bundle script
1. someone out there busted it when adding high confidence 1000G calls
2. new path to NA12878 bam
3. updated clashing version argument
2013-05-02 09:40:36 -04:00
Yossi Farjoun 4b8b411b92 - Fixed a small bug in the printout of molten data in GenotypeConcordance
Output didn't "mix-up" the genotypes, it outputed the same HET vs HET (e.g.) 3 times rather than the combinations of HET vs {HET, HOM, HOM_REF}, etc.
This was only a problem in the text, _not_ the actual numbers, which were outputted correctly.

- Updated MD5's after looking at diffs to verify that the change is what I expected.
2013-05-02 09:16:07 -04:00
Mark DePristo 803f666fd5 Merge pull request #204 from broadinstitute/rp_cgl_force_sample_name
Adding argument to CGL to not stratify by sample name. This is useful wh...
2013-05-01 13:59:20 -07:00
Ryan Poplin ad84f15572 Adding argument to CGL to not stratify by sample name. This is useful when running with 1000G so that you don't get 1000s of lines on the plots. 2013-05-01 16:25:11 -04:00
droazen 19b59f0fb9 Merge pull request #206 from broadinstitute/dr_update_tests_for_java7
Update expected test output for Java 7
2013-05-01 13:23:25 -07:00
David Roazen f3c94a3c87 Update expected test output for Java 7
-Changes in Java 7 related to comparators / sorting produce a large number
 of innocuous differences in our test output. Updating expectations now
 that we've moved to using Java 7 internally.

-Also incorporate Eric's fix to the GATKSAMRecordUnitTest to prevent
 intermittent failures.
2013-05-01 16:18:01 -04:00
David Roazen d0980e236a Merged bug fix from Stable into Unstable 2013-05-01 01:08:29 -04:00
David Roazen f57256b6c2 Delete unused FastaSequenceIndexBuilder class and accompanying test
This class, being unused, was no longer getting packaged into the
GATK release jar by bcel, and so attempting to run its unit test
on the release jar was producing an error.
2013-05-01 01:02:01 -04:00
David Roazen 2edb286d1c Merged bug fix from Stable into Unstable 2013-04-30 22:33:36 -04:00
David Roazen 3390fc7d67 Include cofoja jar in classpath when testing release jars
-Even though we're no longer compiling/using contracts in tests,
 we still need the cofoja jar in the classpath when testing the
 release jars due to some bad behavior on the part of TestNG in
 not being able to handle missing annotation classes.

-We don't need to package the cofoja classes in the actual GATK
 jar, however (and we never have).
2013-04-30 22:16:58 -04:00
Eric Banks e29b52b9a5 Merge remote-tracking branch 'unstable/master' 2013-04-30 15:31:33 -04:00
Mark DePristo a0a1a366e3 Merge pull request #201 from broadinstitute/eb_fix_reduced_count_tagging
Setting the reduce reads count tag was all wrong in a previous commit; fixing
2013-04-30 12:14:40 -07:00
Eric Banks 58424e56be Setting the reduce reads count tag was all wrong in a previous commit; fixing.
RR counts are represented as offsets from the first count, but that wasn't being done
correctly when counts are adjusted on the fly.  Also, we were triggering the expensive
conversion and writing to binary tags even when we weren't going to write the read
to disk.

The code has been updated so that unconverted counts are passed to the GATKSAMRecord
and it knows how to encode the tag correctly.  Also, there are now methods to write
to the reduced counts array without forcing the conversion (and methods that do force
the conversion).

Also:
1. counts are now maintained as ints whenever possible.  Only the GATKSAMRecord knows
about the internal encoding.
2. as discussed in meetings today, we updated the encoding so that it can now handle
a range of values that extends to 255 instead of 127 (and is backwards compatible).
3. tests have been moved from SyntheticReadUnitTest to GATKSAMRecordUnitTest accordingly.
2013-04-30 13:45:42 -04:00
delangel 15266da51c Merge pull request #203 from broadinstitute/gda_poolcaller_paper
Updates to pool caller scala script due to new paths and cleanup, hopefu...
2013-04-30 07:12:02 -07:00
Guillermo del Angel 95637e03a0 Updates to pool caller scala script due to new paths and cleanup, hopefully with final changes for paper.
Added also R script used to process everything into a couple of ggplot-friendly data frames.
Functionality is basically the same. Enhancements:
-- Add annotation to log axiom and Exome Chip AC along with LOF results for concordance comparisons.
-- General Cleanup.
-- Used base path for files as a variable in case directory structure in gsa-hpprojects changes again.
-- Output also per-pool data by subsetting genotypes per pool and comparing with corresponding genotypes from Axiom, exome chip and omni.
-- Commit R scripts that load all tables and crunch them to analyze them.
2013-04-30 10:05:35 -04:00
Mark DePristo 6ea2bceb55 Merge pull request #202 from broadinstitute/yf_remove_getlength_from_every_GATKBAMIndex_read
GATKBAMIndex calls buffer.length() on every read.
2013-04-30 06:24:34 -07:00
Mark DePristo 73fcacbf1b Change Long to long 2013-04-30 09:21:10 -04:00
Eric Banks a3a2ec5a1c Merge pull request #200 from broadinstitute/gda_ug_rr_bug_48742591
Fix for indel calling with UG in presence of reduced reads: When a read ...
2013-04-29 17:54:43 -07:00
Guillermo del Angel 20d3137928 Fix for indel calling with UG in presence of reduced reads: When a read is long enough so that there's no reference context available, the reads gets clipped so that it falls again within the reference context range. However, the clipping is incorrect, as it makes the read end precisely at the end of the reference context coordinates. This might lead to a case where a read might span beyond the haplotype if one of the candidate haplotypes is shorter than the reference context (As in the case e.g. with deletions). In this case, the HMM will not work properly and the likelihood will be bad, since "insertions" at end of reads when haplotype is done will be penalized and likelihood will be much lower than it should.
-- Added check to see if read spans beyond reference window MINUS padding and event length. This guarantees that read will always be contained in haplotype.
-- Changed md5's that happen when long reads from old 454 data have their likelihoods changed because of the extra base clipping.
2013-04-29 19:33:02 -04:00
Yossi Farjoun 0e7e6d35d8 GATKBAMIndex calls buffer.length() on every read. This is causing much pain.
Optimized by getting the read of the file upon opening the index-file and using that instead.
2013-04-29 12:49:02 -04:00
Eric Banks c5701a9ade Merge pull request #199 from broadinstitute/md_clipped_reduced_reads
Bugfix for ReadClipper with ReducedReads
2013-04-29 09:14:43 -07:00
Mark DePristo 0387ea8df9 Bugfix for ReadClipper with ReducedReads
-- The previous version of the read clipping operations wouldn't modify the reduced reads counts, so hardClipToRegion would result in a read with, say, 50 bp of sequence and base qualities but 250 bp of reduced read counts.  Updated the hardClip operation to handle reduce reads, and added a unit test to make sure this works properly.  Also had to update GATKSAMRecord.emptyRead() to set the reduced count to new byte[0] if the template read is a reduced read
-- Update md5s, where the new code recovers a TP variant with count 2 that was missed previously
2013-04-29 11:12:09 -04:00
Mark DePristo 5dd73ba2d1 Merge pull request #198 from broadinstitute/mc_reduce_reads_ds_doc
Updates GATKDocs for ReduceReads downsampling
2013-04-27 05:49:47 -07:00
delangel 651e1f23b1 Merge pull request #194 from broadinstitute/gda_ancient_dna_newPipeline
Add feature to specify Allele frequency priors by command line when call...
2013-04-27 04:59:09 -07:00
Mauricio Carneiro 76e997895e Updates GATKDocs for ReduceReads downsampling
[fixes #48258295]
2013-04-26 23:33:44 -04:00
Guillermo del Angel 4168aaf280 Add feature to specify Allele frequency priors by command line when calling variants.
Use case:
The default AF priors used (infinite sites model, neutral variation) is appropriate in the case where the reference allele is ancestral, and the called allele is a derived allele.
Most of the times this is true but in several population studies and in ancient DNA analyses this might introduce reference biases, and in some other cases it's hard to ascertain what the ancestral allele is (normally requiring to look up homologous chimp sequence).
Specifying no prior is one solution, but this may introduce a lot of artifactual het calls in shallower coverage regions.
With this option, users can specify what the prior for each AC should be according to their needs, subject to the restrictions documented in the code and in GATK docs.
-- Updated ancient DNA single sample calling script with filtering options and other cleanups.
-- Added integration test. Removed old -noPrior syntax.
2013-04-26 19:06:39 -04:00
Mark DePristo 759c531d1b Merge pull request #197 from broadinstitute/dr_disable_snpeff_version_check
Add support for snpEff "GATK compatibility mode" (-o gatk)
2013-04-26 13:55:14 -07:00
David Roazen 7d90bbab08 Add support for snpEff "GATK compatibility mode" (-o gatk)
-Do not throw an exception when parsing snpEff output files
 generated by not-officially-supported versions of snpEff,
 PROVIDED that snpEff was run with -o gatk

-Requested by the snpEff author

-Relevant integration tests updated/expanded
2013-04-26 15:47:15 -04:00
Mark DePristo ec8fb9860a Merge pull request #196 from broadinstitute/rp_cgl_allele_matching
In CGL ensure that the alleles match exactly between the comp track and ...
2013-04-26 12:38:59 -07:00
Ryan Poplin 93fc48739a In CGL ensure that the alleles match exactly between the comp track and the external likelihoods track.
-- Mostly important for indels.
-- Added integration tests to cover this and the new skipFiltered argument.
2013-04-26 15:01:04 -04:00
Mark DePristo 071fd67d55 Merge pull request #193 from broadinstitute/eb_contamination_fixing_for_reduced_reads
Eb contamination fixing for reduced reads
2013-04-26 09:48:45 -07:00