Commit Graph

1300 Commits (919c3eaa2e0c4038f3a09ef8440dadd814de4e31)

Author SHA1 Message Date
Geraldine Van der Auwera 919c3eaa2e Numerous doc fixes; mostly formatting and clarifications 2015-05-03 00:28:46 +02:00
Laura Gauthier 97caf94807 Fix implementation of allowNonUniqueKmersInRef so that it applies to all kmer sizes 2015-04-23 13:01:47 -04:00
Ron Levine d5f98e99f0 Bypass reads with a bad CIGAR length 2015-04-21 11:55:56 -04:00
Kristian Cibulskis 45610a142c initial refactoring of arguments into individual argument collections
fix blasted license blurbs

updates based on PR comments (abstractify HaplotypeCallerArgumentCollection into AssemblyBasedCallerArgumentCollection)

comments on comments from PR review
2015-04-07 16:55:32 -04:00
Geraldine Van der Auwera 2053afe52a Merge pull request #914 from broadinstitute/ldg_fixDitheringRandomness
Initialize annotations so that --disableDithering actually works
2015-04-06 15:40:30 -04:00
Yossi Farjoun d30a6258bc added the missing file to the error message 2015-04-06 08:21:55 -04:00
Laura Gauthier 9c842df3a3 Initialize annotations so that --disableDithering actually works 2015-04-02 17:34:46 -04:00
Geraldine Van der Auwera d7f7022dce Merge pull request #904 from broadinstitute/pd_orig_dp
Added keepOriginalDP argument to SelectVariants
2015-03-30 09:01:33 -04:00
Laura Gauthier 5a10758e2e Annotation changes for M2:
Build a ReferenceContext in ActiveRegionWalkers to pass in to annotation engine so we can call the TandemRepeatAnnotator from M2
Make TandemRepeatAnnotator default annotation for M2.
Setup (but don't use yet) HC-style contamination downsampling.
New HC integration test with TandemRepeatAnnotator
2015-03-27 18:25:23 -04:00
Ron Levine aef0a83c52 Automatically choose indexing strategy by file extension 2015-03-27 11:10:35 -04:00
Phillip Dexheimer c97c253ec8 Added keepOriginalDP argument to SelectVariants
Fixes #830
2015-03-25 22:45:31 -04:00
Phillip Dexheimer 9e63696315 Remove indel-length normalization of QD for GGVCFs
* Fixes #848
* length normalization is now only applied if the annotation is calculated in UG
2015-03-24 08:22:19 -04:00
Geraldine Van der Auwera 0a45b2d79d Merge pull request #883 from broadinstitute/rhl_hc_mq0
Exclude MappingQualityZero from default annotations
2015-03-23 12:59:08 -04:00
Ami Levy-Moonshine c5fc5c4f8c create 2 new tools:
- ASEReadCounter (public tool) replce Tuuli's script to produce the input to Manny's tool.
   It count the number of reads that support the ref allele and the alt allele, filtereing low qual reads and bases and keep only properPaired reads
- ASECaller (private tool) take both RNA and DNA, and produce ontingencyTables ** still under development **

minor changes in other tools:
- update RNA HC variant calling scala script
- expose FS method pValueForContingencyTable to be able to call it from ASEcaller

In ASEReadCounter:
- allow different option to deal with overlaping read from the same fragment
- add option to ignore or include indels in the pileups
- add option to disabled DuplicateRead

add ASEReadCounterIntegrationTest.java and files for the test
2015-03-21 16:56:00 -04:00
Ron Levine 46668d469a Exclude MappingQualityZero from default annotations 2015-03-17 21:46:18 -04:00
Kristian Cibulskis ab1053e83c It compiles, and produces results!
fixed NPE when normal contains no reads

first integration test (micro) and unit tests, also rename of MuTectHC -> M2

adding in standard GATK license terms

incorporated HOSTILE mode to PCR Error Correction

removed tumor and normal name parameters and cleaned up internal name handling

changes to allow for calling without a matched normal (technically, not true 'tumor-only' calling).  Used for panel-of-normals creation

additional regression tests, based on DREAM data.  Removed accidental addition of TandemRepeatAnnotator to default annotations

updated MD5 based on run from GSA4 to fix bamboo issue

reverted unneeded visibility changes
2015-03-13 18:28:01 -04:00
Geraldine Van der Auwera 39a972f348 Merge pull request #872 from broadinstitute/eb_create_rgq_format_field
Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs. Fixes #870
2015-03-13 13:59:53 -04:00
Eric Banks 1ff9463285 Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs.
Now, instead of stripping out the GQs for mono sites, we transfer them to the RGQ.
This is extremely useful for people who want to know how confident the hom ref genotype calls are.
Perhaps this is just what CRSP needs for pertinent negatives.

Note that I also changed the tool to no longer use the GenotypeSummaries annotation by default since
it was adding some seemingly unnecessary annotations (like mean GQ now that we keep the GQ around and
number of no-calls).  Let me know if this was a mistake (although Laura gave me a thumbs up).
2015-03-13 10:27:20 -04:00
Phillip Dexheimer 6ffa295963 Regression: The new 'includeUnmapped' PartitionBy annotation was incorrectly set for HC
Fixes #828
2015-03-13 00:24:57 -04:00
Eric Banks ea8a1edeb6 Adding option to CombineGVCFs to have it break blocks at every N sites.
Using --breakBandsAtMultiplesOf N will ensure that no reference blocks span across
genomic positions that are multiples of N.  This is especially important in the
case of scatter-gather where you don't want your scatter intervals to start in the
middle of blocks (because of a limitation in the way -L works in the GATK for VCF
records with the END tag).

For example, running with --breakBandsAtMultiplesOf 5 on this record:
1       69491   .       G       <NON_REF>       .       .       END=69523       GT:DP:GQ:MIN_DP:MIN_GQ:PL       ./.:94:99:82:99:0,120,1800

Will produce the following records:
1       69491   .       G       <NON_REF>       .       .       END=69494       GT:DP:GQ:MIN_DP:MIN_GQ:PL       ./.:94:99:82:99:0,120,1800
1       69495   .       C       <NON_REF>       .       .       END=69499       GT:DP:GQ:MIN_DP:MIN_GQ:PL       ./.:94:99:82:99:0,120,1800
1       69500   .       T       <NON_REF>       .       .       END=69504       GT:DP:GQ:MIN_DP:MIN_GQ:PL       ./.:94:99:82:99:0,120,1800
etc.

Added docs and a new test.
2015-03-12 14:42:10 -04:00
Valentin Ruano Rubio f8f2680142 Merge pull request #812 from broadinstitute/ldg_combineData_submit
New walker to combine WGS and WES data
2015-03-02 15:12:31 -05:00
Laura Gauthier aaf952469e Change UG @PartitionBy to fix Queue tests 2015-03-01 14:42:43 -05:00
Laura Gauthier 6ebcba5234 New walker to combine data for different formats of same sample that were called and VQSRed together; has functionality to combine only specified samples, omitting others (e.g. combine the uniquified NA12878s with -usn NA12878.variant51 -usn NA12878.variant102)
GenotypeGVCFs now has the ability to unique-ify samples so I can genotype together two different datasets containing the same sample
Modify InbreedingCoeff so that it works when genotyping uniquified samples
2015-03-01 12:44:32 -05:00
ldgauthier 8efaa97d84 Merge pull request #815 from broadinstitute/ldg_updateMulitallelicVAtestData
Update test data so it better reflects the multiallelic AC/AF annotation...
2015-03-01 12:10:25 -05:00
Ron Levine 44e5965a4b Change GC Content value type from Integer to Float 2015-02-25 13:56:42 -05:00
Laura Gauthier 4a493a7900 Update test data so it better reflects the multiallelic AC/AF annotation use case 2015-02-20 19:02:42 -05:00
Ron Levine 2cbaef2fb2 Throw exception for -dcov argument given to ActiveRegionWalkers 2015-02-19 08:24:39 -05:00
Ron Levine c3ff6df252 StrandAlleleCountsBySample can only be called from HaplotypeCaller 2015-02-12 13:43:48 -05:00
Phillip Dexheimer 92c7c103c1 GenotypeConcordance: monomorphic sites in truth are no longer called "Mismatching Alleles" when the comp genotype has an alternate allele
* PT 84700606
2015-02-07 15:54:38 -05:00
rpoplin b8b23b931e Merge pull request #807 from broadinstitute/rhl_handle_cigar
Process X and = CIGAR operators
2015-02-01 11:09:52 -05:00
Phillip Dexheimer 3354c07b1c Added optional element "includeUnmapped" to the PartitionBy annotation
* The value of this element (default true) determines whether Queue will explicitly run this walker over unmapped reads
 * This patch fixes a runtime error when FindCoveredIntervals was used with Queue
 * PT 81777160
2015-01-31 15:47:57 -05:00
Ron Levine 9d4b876ccd Process X and = CIGAR operators
Add simple BaseRecalibrator integration test for CIGAR = and X operators
2015-01-29 17:00:00 -05:00
Khalid Shakir 1808c90d2a Added introductory CRAM support.
Replaced usage of GATKSamRecordFactory with calls to wrapper GATKSAMRecord extending SAMRecord.
Minor other updates for test changes.
Added exampleCRAM.cram generated by GATK, with .bai and .crai indexes generated by CRAMTools.
CRAM-to-CRAM test disabled due to https://github.com/samtools/htsjdk/issues/148
Using exampleBAM.bam input, outputs of GATK's generated CRAM match CRAMTools generated CRAM, but not samtools/PrintReads SAM output, as things like insert sizes are different.
If required for other tools, CRAM indexes must be generated via CRAMTools until we can generate them via CRAMFileWriter.

Generation of exampleCRAM.cram:
* java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o public/gatk-utils/src/test/resources/exampleCRAM.cram
* java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram
* java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram --bam-style-index

CRAM generation by existing tools:
* samtools view -C -T public/gatk-utils/src/test/resources/exampleFASTA.fasta -o testSamtools.cram public/gatk-utils/src/test/resources/exampleBAM.bam
* java -jar cramtools-2.1.jar cram --ignore-md5-mismatch --capture-all-tags -Q -n -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -O testCRAMTools.cram
* java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o testGATK.cram

CRAMTools view of the above:
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleCRAM.cram | tail -n 1
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testSamtools.cram | tail -n 1
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testCRAMTools.cram | tail -n 1
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testGATK.cram | tail -n 1
2015-01-26 14:47:39 -03:00
Phillip Dexheimer 72f76add71 Added -trimAlternates argument to SelectVariants
* PT 84021222
 * -trimAlternates removes all unused alternate alleles from variants.  Note that this is pretty aggressive for monomorphic sites
2015-01-21 21:33:35 -05:00
Ron Levine 804b2a36b7 Fix SplitNCigar reads exception by making the list of RNAReadTransformer non-abstract, add test for -fixNDN
Includes documentation changes for -fixNDN argument and the read transformer documentation.

Documentation changes to CombineVariants
2015-01-14 22:22:05 -05:00
rpoplin 0292d49842 Merge pull request #801 from broadinstitute/pd_gatkvcfconstants
Collected VCF IDs and header lines into one place
2015-01-14 09:43:48 -05:00
Phillip Dexheimer 6190d660e0 Edits to work with the latest htsjdk release:
* TextCigarCodec.decode() is now static, and the getSingleton() method is gone
 * MergingSamRecordIterator now wants a Collection<SamReader> rather than Collection<SAMFileReader> in the constructor
 * SeekableBufferedStream now correctly reads the requested number of bytes, removed workaround in GATKBAMIndex
2015-01-13 21:32:10 -05:00
Phillip Dexheimer b73e9d506a Added GATKVCFConstants and GATKVCFHeaderLines to consolidate the GATK-specific VCF annotations
* Removed unused annotations (CCC and HWP)
 * Renamed one of the two GC annotations to "IGC" (for Interval GC)
 * Revved picard & htsjdk (GATK constants are now removed from htsjdk)
 * PT 82046038
2015-01-13 21:32:09 -05:00
Laura Gauthier 6b2bd5ed09 Address user-reported bug featuring "trio" family with two children, one parent
Add test to cover case with family of one parent, two children
2015-01-13 18:35:44 -05:00
Ryan Poplin 2e5f9db758 Raising per-sample limits on the number of reads in ART and HC.
-- Active Region Traversal was using per sample limits on the number of reads that were too low, especially now that we are running one sample at a time. This caused issues with high confidence variants being dropped in high coverage data.
-- HaplotypeCallerGVCFIntegrationTest PL/annotation changes due to using more reads in those tests
-- Removed a CountReadsInActiveRegionsIntegrationTest test for excessive coverage because the read coverage no longer goes over the limits in ART
2015-01-09 11:21:42 -05:00
rpoplin 03203e249e Merge pull request #792 from broadinstitute/rhl_pairhmm_log_stderr
Rhl pairhmm log stderr
2015-01-07 12:41:10 -05:00
Valentin Ruano-Rubio aae04b6122 Fixes explicit limitation of the maximum ploidy of the reference-confidence model
Story:
=====

 - https://www.pivotaltracker.com/story/show/83803796

Changes:
=======

  - From a fix maximum ploidy indel RCM likelihood cache to a
    dynamically resizable one.
  - Used the occassion to removed an unused and deprecated method from ReferenceConfidenceModel

Testing:
=======

  - Added integration test to check on ploidies larger than the previous limit of 20.
2015-01-07 10:43:22 -05:00
Ron Levine b4fda38922 Use logging system instead of stderr 2015-01-05 14:04:10 -05:00
Laura Gauthier 88b6f3aa50 Change []-type arrays to lists so argument parsing works in VCF header commandline output 2015-01-05 10:21:06 -05:00
rpoplin 3240b3538a Merge pull request #794 from broadinstitute/rhl_read_backed_phasing
Rhl read backed phasing
2015-01-05 09:47:25 -05:00
Ron Levine c6840124fe clean up, add final 2015-01-04 23:01:24 -05:00
Ron Levine 85dc703461 Add TestMergeIntoMNP() and TestReallyMergeIntoMNP() 2015-01-01 09:51:20 -05:00
Ron Levine bb94833750 Add more tests 2014-12-30 22:45:44 -05:00
Ron Levine 714d575e3b correct reference file name 2014-12-25 14:00:39 -05:00
Ron Levine a7fba5c209 restructure and add more tests 2014-12-25 13:57:54 -05:00