Commit Graph

13849 Commits (c374d126d7c8b40cdec76c2fa7283c8c372d7ea7)

Author SHA1 Message Date
Geraldine Van der Auwera c374d126d7 Merge pull request #808 from broadinstitute/pd_gsalib_concordance
Added gsa.reshape.concordance.table function to gsalib
2015-03-17 00:05:30 -04:00
Phillip Dexheimer 4d4d33404e Added gsa.reshape.concordance.table function to gsalib 2015-03-16 22:52:27 -04:00
Geraldine Van der Auwera 517320092c Merge pull request #863 from broadinstitute/kc_m2_initial_commit
Seeking comments on visibility changes to HaplotypeCaller-related classes

Welcome to GATK-master, MuTect2!
2015-03-13 21:05:39 -04:00
Kristian Cibulskis ab1053e83c It compiles, and produces results!
fixed NPE when normal contains no reads

first integration test (micro) and unit tests, also rename of MuTectHC -> M2

adding in standard GATK license terms

incorporated HOSTILE mode to PCR Error Correction

removed tumor and normal name parameters and cleaned up internal name handling

changes to allow for calling without a matched normal (technically, not true 'tumor-only' calling).  Used for panel-of-normals creation

additional regression tests, based on DREAM data.  Removed accidental addition of TandemRepeatAnnotator to default annotations

updated MD5 based on run from GSA4 to fix bamboo issue

reverted unneeded visibility changes
2015-03-13 18:28:01 -04:00
Geraldine Van der Auwera 1d39ed9156 Merge pull request #814 from broadinstitute/biocyberman_maven_patches
Biocyberman maven patches
2015-03-13 16:26:02 -04:00
Geraldine Van der Auwera 39a972f348 Merge pull request #872 from broadinstitute/eb_create_rgq_format_field
Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs. Fixes #870
2015-03-13 13:59:53 -04:00
Geraldine Van der Auwera 7681e89454 Merge pull request #869 from broadinstitute/gg_fix_vqsr_plots_GSA-860
Switched VQSR tranches plot ordering rule
2015-03-13 10:46:55 -04:00
Geraldine Van der Auwera 3276a964f4 Merge pull request #871 from broadinstitute/pd_queue_unmapped_regression
Regression: The new 'includeUnmapped' PartitionBy annotation was incorrectly set for HC
2015-03-13 10:42:52 -04:00
Eric Banks 1ff9463285 Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs.
Now, instead of stripping out the GQs for mono sites, we transfer them to the RGQ.
This is extremely useful for people who want to know how confident the hom ref genotype calls are.
Perhaps this is just what CRSP needs for pertinent negatives.

Note that I also changed the tool to no longer use the GenotypeSummaries annotation by default since
it was adding some seemingly unnecessary annotations (like mean GQ now that we keep the GQ around and
number of no-calls).  Let me know if this was a mistake (although Laura gave me a thumbs up).
2015-03-13 10:27:20 -04:00
Phillip Dexheimer 6ffa295963 Regression: The new 'includeUnmapped' PartitionBy annotation was incorrectly set for HC
Fixes #828
2015-03-13 00:24:57 -04:00
Geraldine Van der Auwera aa4084d42f Switched VQSR tranches plot ordering rule 2015-03-12 19:57:03 -04:00
ldgauthier f5ec870964 Merge pull request #867 from broadinstitute/eb_combinegvcfs_option_to_break_blocks
Adding option to CombineGVCFs to have it break blocks at every N sites.
2015-03-12 16:16:03 -04:00
Eric Banks ea8a1edeb6 Adding option to CombineGVCFs to have it break blocks at every N sites.
Using --breakBandsAtMultiplesOf N will ensure that no reference blocks span across
genomic positions that are multiples of N.  This is especially important in the
case of scatter-gather where you don't want your scatter intervals to start in the
middle of blocks (because of a limitation in the way -L works in the GATK for VCF
records with the END tag).

For example, running with --breakBandsAtMultiplesOf 5 on this record:
1       69491   .       G       <NON_REF>       .       .       END=69523       GT:DP:GQ:MIN_DP:MIN_GQ:PL       ./.:94:99:82:99:0,120,1800

Will produce the following records:
1       69491   .       G       <NON_REF>       .       .       END=69494       GT:DP:GQ:MIN_DP:MIN_GQ:PL       ./.:94:99:82:99:0,120,1800
1       69495   .       C       <NON_REF>       .       .       END=69499       GT:DP:GQ:MIN_DP:MIN_GQ:PL       ./.:94:99:82:99:0,120,1800
1       69500   .       T       <NON_REF>       .       .       END=69504       GT:DP:GQ:MIN_DP:MIN_GQ:PL       ./.:94:99:82:99:0,120,1800
etc.

Added docs and a new test.
2015-03-12 14:42:10 -04:00
Geraldine Van der Auwera 5673ba9209 Merge pull request #866 from broadinstitute/gg_public_doc_dir_GSA-838
Updated readme in public/doc to just point to the website. Fixes #838
2015-03-12 13:28:31 -04:00
Geraldine Van der Auwera f8a081a262 Updated readme in public/doc to just point to the website 2015-03-12 11:52:48 -04:00
Geraldine Van der Auwera e1862a04a8 Merge pull request #862 from broadinstitute/rhl_doc_args_incompatibility
Log a warning if using incompatible arguments in DepthOfCoverage
2015-03-11 13:08:53 -04:00
Ron Levine bee7f655b7 Log a warning if using incompatible arguments in DepthOfCoverage
Add reference gene list file
2015-03-10 18:14:21 -04:00
ldgauthier 6645669eca Merge pull request #818 from broadinstitute/mf_CheckNameInAssessNA12878
Mf check name in assess na12878
2015-03-10 10:33:05 -04:00
Mark Fleharty 957946c73c Change to AssessNA12878 to allow for analysis of files that specify name differing from NA12878.
Added an integration test to test the sampleNameToCompare option, and a VCF file for this test to run on.
2015-03-10 10:15:02 -04:00
Geraldine Van der Auwera 4bff024107 Merge pull request #819 from broadinstitute/rhl_fix_docs_npa_filter
Fix NotPrimaryAlignmentFilter documentation
2015-03-06 07:46:33 -05:00
Ron Levine 71d68c3d93 Fix NotPrimaryAlignmentFilter documentation 2015-03-05 20:30:46 -05:00
biocyberman ff6e288241 Upgrade SLF4J to allow new convient logging syntaxes
Signed-off-by: David Roazen <droazen@broadinstitute.org>
2015-03-02 17:01:10 -05:00
Valentin Ruano Rubio f8f2680142 Merge pull request #812 from broadinstitute/ldg_combineData_submit
New walker to combine WGS and WES data
2015-03-02 15:12:31 -05:00
ldgauthier 83bd85d8de Merge pull request #817 from broadinstitute/ldg_fixQueueTests
Change UG @PartitionBy to fix Queue tests
2015-03-02 09:49:17 -05:00
Laura Gauthier aaf952469e Change UG @PartitionBy to fix Queue tests 2015-03-01 14:42:43 -05:00
Laura Gauthier 6ebcba5234 New walker to combine data for different formats of same sample that were called and VQSRed together; has functionality to combine only specified samples, omitting others (e.g. combine the uniquified NA12878s with -usn NA12878.variant51 -usn NA12878.variant102)
GenotypeGVCFs now has the ability to unique-ify samples so I can genotype together two different datasets containing the same sample
Modify InbreedingCoeff so that it works when genotyping uniquified samples
2015-03-01 12:44:32 -05:00
Laura Gauthier 2d992ad818 Modify assessment/site reporting criteria for better bookkeeping
Make sure -allSites outputs TPs that have discordance genotypes (although we won't know they're discordant)
Make AssessNA12878 output report record the name of the VCF from which the assessment was derived
2015-03-01 12:44:32 -05:00
ldgauthier 8efaa97d84 Merge pull request #815 from broadinstitute/ldg_updateMulitallelicVAtestData
Update test data so it better reflects the multiallelic AC/AF annotation...
2015-03-01 12:10:25 -05:00
Geraldine Van der Auwera 21390575dd Merge pull request #816 from broadinstitute/rhl_gc_content_value_type
Change GC Content value type from Integer to Float
2015-02-26 15:26:28 -05:00
Ron Levine 44e5965a4b Change GC Content value type from Integer to Float 2015-02-25 13:56:42 -05:00
Geraldine Van der Auwera f3a57a6b07 Merge pull request #811 from broadinstitute/seru71_fix_MateSameStrandFilter
Corrected logical expression in MateSameStrandFilter
2015-02-23 17:57:10 -05:00
Laura Gauthier 4a493a7900 Update test data so it better reflects the multiallelic AC/AF annotation use case 2015-02-20 19:02:42 -05:00
jmthibault79 9491c0333f Merge pull request #813 from broadinstitute/rhl_throw_exception_dcov
Throw exception for -dcov argument given to ActiveRegionWalkers
2015-02-19 13:43:32 -05:00
Ron Levine 2cbaef2fb2 Throw exception for -dcov argument given to ActiveRegionWalkers 2015-02-19 08:24:39 -05:00
rpoplin b5f20bbb00 Merge pull request #806 from broadinstitute/ldg_updateNISTinNA12878KB
Update NA12878KB NIST Genomes-in-a-Bottle from v2.17 to v2.18
2015-02-17 11:37:44 -05:00
Laura Gauthier 72166eee5c Update NA12878KB NIST Genomes-in-a-Bottle from v2.17 to v2.18
Use all sites, not just high confidence
2015-02-17 08:17:57 -05:00
jmthibault79 207f0a69df Merge pull request #809 from broadinstitute/rhl_annotate_strand_allele_counts
StrandAlleleCountsBySample can only be called from HaplotypeCaller
2015-02-12 16:33:35 -05:00
Ron Levine c3ff6df252 StrandAlleleCountsBySample can only be called from HaplotypeCaller 2015-02-12 13:43:48 -05:00
seru71 3ee0311fdb corrected logical expression in MateSameStrandFilter
Signed-off-by: David Roazen <droazen@broadinstitute.org>
2015-02-12 12:21:44 -05:00
rpoplin 893e8ff9c4 Merge pull request #810 from broadinstitute/pd_monoallelic_concordance
GenotypeConcordance: monomorphic sites in truth are no longer...
2015-02-10 15:42:40 -05:00
Phillip Dexheimer 92c7c103c1 GenotypeConcordance: monomorphic sites in truth are no longer called "Mismatching Alleles" when the comp genotype has an alternate allele
* PT 84700606
2015-02-07 15:54:38 -05:00
rpoplin b8b23b931e Merge pull request #807 from broadinstitute/rhl_handle_cigar
Process X and = CIGAR operators
2015-02-01 11:09:52 -05:00
pdexheimer cf58d671d2 Merge pull request #803 from broadinstitute/pd_toggle_unmapped_scatter
Added optional element "includeUnmapped" to the PartitionBy annotation
2015-01-31 16:02:15 -05:00
Phillip Dexheimer 3354c07b1c Added optional element "includeUnmapped" to the PartitionBy annotation
* The value of this element (default true) determines whether Queue will explicitly run this walker over unmapped reads
 * This patch fixes a runtime error when FindCoveredIntervals was used with Queue
 * PT 81777160
2015-01-31 15:47:57 -05:00
rpoplin d561fc5edc Merge pull request #805 from broadinstitute/ks_gatk_cram
Introductory GATK CRAM support
2015-01-30 12:55:47 -05:00
Ron Levine 9d4b876ccd Process X and = CIGAR operators
Add simple BaseRecalibrator integration test for CIGAR = and X operators
2015-01-29 17:00:00 -05:00
Khalid Shakir 1808c90d2a Added introductory CRAM support.
Replaced usage of GATKSamRecordFactory with calls to wrapper GATKSAMRecord extending SAMRecord.
Minor other updates for test changes.
Added exampleCRAM.cram generated by GATK, with .bai and .crai indexes generated by CRAMTools.
CRAM-to-CRAM test disabled due to https://github.com/samtools/htsjdk/issues/148
Using exampleBAM.bam input, outputs of GATK's generated CRAM match CRAMTools generated CRAM, but not samtools/PrintReads SAM output, as things like insert sizes are different.
If required for other tools, CRAM indexes must be generated via CRAMTools until we can generate them via CRAMFileWriter.

Generation of exampleCRAM.cram:
* java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o public/gatk-utils/src/test/resources/exampleCRAM.cram
* java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram
* java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram --bam-style-index

CRAM generation by existing tools:
* samtools view -C -T public/gatk-utils/src/test/resources/exampleFASTA.fasta -o testSamtools.cram public/gatk-utils/src/test/resources/exampleBAM.bam
* java -jar cramtools-2.1.jar cram --ignore-md5-mismatch --capture-all-tags -Q -n -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -O testCRAMTools.cram
* java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o testGATK.cram

CRAMTools view of the above:
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleCRAM.cram | tail -n 1
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testSamtools.cram | tail -n 1
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testCRAMTools.cram | tail -n 1
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testGATK.cram | tail -n 1
2015-01-26 14:47:39 -03:00
Khalid Shakir de3ca65232 Bumping HTSJDK version to pickup a bug fix for CRAM. 2015-01-26 14:47:39 -03:00
Valentin Ruano Rubio e26e55efe1 Merge pull request #802 from broadinstitute/pd_selectvariants_subset
Added -trimAlternates argument to SelectVariants
2015-01-22 05:05:42 -05:00
Phillip Dexheimer 72f76add71 Added -trimAlternates argument to SelectVariants
* PT 84021222
 * -trimAlternates removes all unused alternate alleles from variants.  Note that this is pretty aggressive for monomorphic sites
2015-01-21 21:33:35 -05:00