gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Kristian Cibulskis	ab1053e83c	It compiles, and produces results! fixed NPE when normal contains no reads first integration test (micro) and unit tests, also rename of MuTectHC -> M2 adding in standard GATK license terms incorporated HOSTILE mode to PCR Error Correction removed tumor and normal name parameters and cleaned up internal name handling changes to allow for calling without a matched normal (technically, not true 'tumor-only' calling). Used for panel-of-normals creation additional regression tests, based on DREAM data. Removed accidental addition of TandemRepeatAnnotator to default annotations updated MD5 based on run from GSA4 to fix bamboo issue reverted unneeded visibility changes	2015-03-13 18:28:01 -04:00
Geraldine Van der Auwera	39a972f348	Merge pull request #872 from broadinstitute/eb_create_rgq_format_field Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs. Fixes #870	2015-03-13 13:59:53 -04:00
Eric Banks	1ff9463285	Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs. Now, instead of stripping out the GQs for mono sites, we transfer them to the RGQ. This is extremely useful for people who want to know how confident the hom ref genotype calls are. Perhaps this is just what CRSP needs for pertinent negatives. Note that I also changed the tool to no longer use the GenotypeSummaries annotation by default since it was adding some seemingly unnecessary annotations (like mean GQ now that we keep the GQ around and number of no-calls). Let me know if this was a mistake (although Laura gave me a thumbs up).	2015-03-13 10:27:20 -04:00
Phillip Dexheimer	6ffa295963	Regression: The new 'includeUnmapped' PartitionBy annotation was incorrectly set for HC Fixes #828	2015-03-13 00:24:57 -04:00
Eric Banks	ea8a1edeb6	Adding option to CombineGVCFs to have it break blocks at every N sites. Using --breakBandsAtMultiplesOf N will ensure that no reference blocks span across genomic positions that are multiples of N. This is especially important in the case of scatter-gather where you don't want your scatter intervals to start in the middle of blocks (because of a limitation in the way -L works in the GATK for VCF records with the END tag). For example, running with --breakBandsAtMultiplesOf 5 on this record: 1 69491 . G <NON_REF> . . END=69523 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 Will produce the following records: 1 69491 . G <NON_REF> . . END=69494 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 1 69495 . C <NON_REF> . . END=69499 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 1 69500 . T <NON_REF> . . END=69504 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 etc. Added docs and a new test.	2015-03-12 14:42:10 -04:00
Valentin Ruano Rubio	f8f2680142	Merge pull request #812 from broadinstitute/ldg_combineData_submit New walker to combine WGS and WES data	2015-03-02 15:12:31 -05:00
Laura Gauthier	aaf952469e	Change UG @PartitionBy to fix Queue tests	2015-03-01 14:42:43 -05:00
Laura Gauthier	6ebcba5234	New walker to combine data for different formats of same sample that were called and VQSRed together; has functionality to combine only specified samples, omitting others (e.g. combine the uniquified NA12878s with -usn NA12878.variant51 -usn NA12878.variant102) GenotypeGVCFs now has the ability to unique-ify samples so I can genotype together two different datasets containing the same sample Modify InbreedingCoeff so that it works when genotyping uniquified samples	2015-03-01 12:44:32 -05:00
ldgauthier	8efaa97d84	Merge pull request #815 from broadinstitute/ldg_updateMulitallelicVAtestData Update test data so it better reflects the multiallelic AC/AF annotation...	2015-03-01 12:10:25 -05:00
Ron Levine	44e5965a4b	Change GC Content value type from Integer to Float	2015-02-25 13:56:42 -05:00
Laura Gauthier	4a493a7900	Update test data so it better reflects the multiallelic AC/AF annotation use case	2015-02-20 19:02:42 -05:00
Ron Levine	2cbaef2fb2	Throw exception for -dcov argument given to ActiveRegionWalkers	2015-02-19 08:24:39 -05:00
Ron Levine	c3ff6df252	StrandAlleleCountsBySample can only be called from HaplotypeCaller	2015-02-12 13:43:48 -05:00
Phillip Dexheimer	92c7c103c1	GenotypeConcordance: monomorphic sites in truth are no longer called "Mismatching Alleles" when the comp genotype has an alternate allele * PT 84700606	2015-02-07 15:54:38 -05:00
rpoplin	b8b23b931e	Merge pull request #807 from broadinstitute/rhl_handle_cigar Process X and = CIGAR operators	2015-02-01 11:09:52 -05:00
Phillip Dexheimer	3354c07b1c	Added optional element "includeUnmapped" to the PartitionBy annotation * The value of this element (default true) determines whether Queue will explicitly run this walker over unmapped reads * This patch fixes a runtime error when FindCoveredIntervals was used with Queue * PT 81777160	2015-01-31 15:47:57 -05:00
Ron Levine	9d4b876ccd	Process X and = CIGAR operators Add simple BaseRecalibrator integration test for CIGAR = and X operators	2015-01-29 17:00:00 -05:00
Khalid Shakir	1808c90d2a	Added introductory CRAM support. Replaced usage of GATKSamRecordFactory with calls to wrapper GATKSAMRecord extending SAMRecord. Minor other updates for test changes. Added exampleCRAM.cram generated by GATK, with .bai and .crai indexes generated by CRAMTools. CRAM-to-CRAM test disabled due to https://github.com/samtools/htsjdk/issues/148 Using exampleBAM.bam input, outputs of GATK's generated CRAM match CRAMTools generated CRAM, but not samtools/PrintReads SAM output, as things like insert sizes are different. If required for other tools, CRAM indexes must be generated via CRAMTools until we can generate them via CRAMFileWriter. Generation of exampleCRAM.cram: * java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o public/gatk-utils/src/test/resources/exampleCRAM.cram * java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram * java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram --bam-style-index CRAM generation by existing tools: * samtools view -C -T public/gatk-utils/src/test/resources/exampleFASTA.fasta -o testSamtools.cram public/gatk-utils/src/test/resources/exampleBAM.bam * java -jar cramtools-2.1.jar cram --ignore-md5-mismatch --capture-all-tags -Q -n -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -O testCRAMTools.cram * java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o testGATK.cram CRAMTools view of the above: * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleCRAM.cram \| tail -n 1 * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testSamtools.cram \| tail -n 1 * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testCRAMTools.cram \| tail -n 1 * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testGATK.cram \| tail -n 1	2015-01-26 14:47:39 -03:00
Phillip Dexheimer	72f76add71	Added -trimAlternates argument to SelectVariants * PT 84021222 * -trimAlternates removes all unused alternate alleles from variants. Note that this is pretty aggressive for monomorphic sites	2015-01-21 21:33:35 -05:00
Ron Levine	804b2a36b7	Fix SplitNCigar reads exception by making the list of RNAReadTransformer non-abstract, add test for -fixNDN Includes documentation changes for -fixNDN argument and the read transformer documentation. Documentation changes to CombineVariants	2015-01-14 22:22:05 -05:00
rpoplin	0292d49842	Merge pull request #801 from broadinstitute/pd_gatkvcfconstants Collected VCF IDs and header lines into one place	2015-01-14 09:43:48 -05:00
Phillip Dexheimer	6190d660e0	Edits to work with the latest htsjdk release: * TextCigarCodec.decode() is now static, and the getSingleton() method is gone * MergingSamRecordIterator now wants a Collection<SamReader> rather than Collection<SAMFileReader> in the constructor * SeekableBufferedStream now correctly reads the requested number of bytes, removed workaround in GATKBAMIndex	2015-01-13 21:32:10 -05:00
Phillip Dexheimer	b73e9d506a	Added GATKVCFConstants and GATKVCFHeaderLines to consolidate the GATK-specific VCF annotations * Removed unused annotations (CCC and HWP) * Renamed one of the two GC annotations to "IGC" (for Interval GC) * Revved picard & htsjdk (GATK constants are now removed from htsjdk) * PT 82046038	2015-01-13 21:32:09 -05:00
Laura Gauthier	6b2bd5ed09	Address user-reported bug featuring "trio" family with two children, one parent Add test to cover case with family of one parent, two children	2015-01-13 18:35:44 -05:00
Ryan Poplin	2e5f9db758	Raising per-sample limits on the number of reads in ART and HC. -- Active Region Traversal was using per sample limits on the number of reads that were too low, especially now that we are running one sample at a time. This caused issues with high confidence variants being dropped in high coverage data. -- HaplotypeCallerGVCFIntegrationTest PL/annotation changes due to using more reads in those tests -- Removed a CountReadsInActiveRegionsIntegrationTest test for excessive coverage because the read coverage no longer goes over the limits in ART	2015-01-09 11:21:42 -05:00
rpoplin	03203e249e	Merge pull request #792 from broadinstitute/rhl_pairhmm_log_stderr Rhl pairhmm log stderr	2015-01-07 12:41:10 -05:00
Valentin Ruano-Rubio	aae04b6122	Fixes explicit limitation of the maximum ploidy of the reference-confidence model Story: ===== - https://www.pivotaltracker.com/story/show/83803796 Changes: ======= - From a fix maximum ploidy indel RCM likelihood cache to a dynamically resizable one. - Used the occassion to removed an unused and deprecated method from ReferenceConfidenceModel Testing: ======= - Added integration test to check on ploidies larger than the previous limit of 20.	2015-01-07 10:43:22 -05:00
Ron Levine	b4fda38922	Use logging system instead of stderr	2015-01-05 14:04:10 -05:00
Laura Gauthier	88b6f3aa50	Change []-type arrays to lists so argument parsing works in VCF header commandline output	2015-01-05 10:21:06 -05:00
rpoplin	3240b3538a	Merge pull request #794 from broadinstitute/rhl_read_backed_phasing Rhl read backed phasing	2015-01-05 09:47:25 -05:00
Ron Levine	c6840124fe	clean up, add final	2015-01-04 23:01:24 -05:00
Ron Levine	85dc703461	Add TestMergeIntoMNP() and TestReallyMergeIntoMNP()	2015-01-01 09:51:20 -05:00
Ron Levine	bb94833750	Add more tests	2014-12-30 22:45:44 -05:00
Ron Levine	714d575e3b	correct reference file name	2014-12-25 14:00:39 -05:00
Ron Levine	a7fba5c209	restructure and add more tests	2014-12-25 13:57:54 -05:00
Ron Levine	64375f6341	Messages that were going to stdout now going to stderr Make PairHMM outputs go to stderr instead of stdout Change output from stdout to stderr in close() Updated lib with output going to stderr	2014-12-23 11:03:29 -05:00
Ron Levine	069398ad46	Added more tests and documentation	2014-12-19 12:57:43 -05:00
Laura Gauthier	a9694951d2	Add error handling for genotypes that are called but have no PLs	2014-12-18 15:03:20 -05:00
Geraldine Van der Auwera	b0e615251b	Updated VQSR tool docs	2014-12-18 12:59:37 -05:00
rpoplin	4a2ac38308	Merge pull request #790 from broadinstitute/rp_nsubtil_fix-snp-detection BQSR bug fix from @nsubtil	2014-12-18 09:19:53 -05:00
Ron Levine	08790e1dab	Fix mmultiallelic info field annotation for VariantAnnotator Add multi-allele test for info field annotations Fix to process all types of INFO annotations roll back to previous version, removes INFO and FORMAT Correct @return for VariantAnnotatorEngine.getNonReferenceAlleles() Enhance comments and clean up multi-allelic logic, handle header info number = R only parse counts of A & R Add INFO for AC update MD5 Performance enhancement, only parse multiallelic with a count A or R Make argument final in getNonReferenceAlleles() Code cleanup, add exceptions for bad expression/allele size mismatch and missing header info for an expression Change exception to warning for expression value/number of alleles check remove adevertised exceptions	2014-12-17 22:21:00 -05:00
Ron Levine	ba949389c5	matchHaplotypeAlleles() no longer calls alleleSegregationIsKnown(), added a TODO to investigate	2014-12-17 14:02:24 -05:00
Ryan Poplin	d84970ff75	BQSR bug fix from @nsubtil -- Ignore SNP matches that lie outside the clipped read window -- This fixes an issue where GATK would skip the entire read if a SNP is entirely contained within a sequencing adapter.	2014-12-17 10:04:37 -05:00
Ron Levine	56f8e4f9cf	Add comments, alleleSegregationIsKnown() check is added to matchHaplotypeAlleles()	2014-12-17 03:25:26 -05:00
Laura Gauthier	011843c569	Fixed huge bug from 9895005a (CombineGVCFs used to stop after the first contig)	2014-12-16 12:43:32 -05:00
rpoplin	bcc6b73e9b	Merge pull request #786 from broadinstitute/pd_variantstotable_sma Fix VariantsToTable output of FORMAT record lists when -SMA is specified	2014-12-16 10:37:22 -05:00
Valentin Ruano-Rubio	736a857e82	Fixing CombineGVCFs that writes out the wrong REF allele Story: ===== - https://www.pivotaltracker.com/story/show/83259038 Changes: ======= - Done minimal changes to make the fix after an arduous attempt to understand CombineGVCFs code. Test: ==== - Added a integration test to explicitly test for the bug. - Updated a md5 changes as the bug was actually affecting one of the existing integration tests.	2014-12-13 22:38:24 -05:00
Phillip Dexheimer	71bdfbe465	Fix VariantsToTable output of FORMAT record lists when -SMA is specified * PT 84242218 * Note that FORMAT fields behave the same as INFO fields - if the annotation has a count of A (one entry per Alt Allele), it is split across the multiple output lines. Otherwise, the entire list is output with each field	2014-12-10 21:41:15 -05:00
rpoplin	bf2911d62c	Merge pull request #783 from broadinstitute/pd_splitsamfile Fix NPE in SplitSamFile	2014-12-08 09:39:03 -05:00
Valentin Ruano-Rubio	385186e11b	Makes GQ of Hom-Ref Blocks in GVCF output to be consistent with PLs Story: ----- - https://www.pivotaltracker.com/story/show/83800586 Changes: ------- - In GVCFWriter GQ is now recalculated out of the fianl PL array for the block. Testing: ------- - Updated affected integration test md5s	2014-12-07 16:45:32 -05:00

1 2 3 4 5 ...

1285 Commits (c374d126d7c8b40cdec76c2fa7283c8c372d7ea7)