gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Louis Bergelson	e1c41b2c38	Updated gatk so it compiles on java 8 updated cofoja to 1.2 from 1.0 added explicit type casts in places that java 8 required them	2015-06-26 15:59:46 -04:00
Geraldine Van der Auwera	719bb15340	Merge pull request #1019 from broadinstitute/rhl_var_index_param_gz Indexing parameters not required if output file has the g.vcf.gz exte…	2015-06-17 14:30:20 -04:00
Geraldine Van der Auwera	697c4b0cf1	Added else clause to handle symbolic alleles Add test for createAlleleMapping	2015-06-17 10:52:56 -04:00
Eric Banks	fe0b5e0fbe	Handle cases where a given sample has multiple spanning deletions. When a sample has multiple spanning deletions and we are asked to assign likelihoods to the spanning deletion allele, we currently choose the first deletion. Valentin pointed out that this isn't desired behavior. I promised Valentin that I would address this issue, so here it is. I do not believe that the correct thing to do is to sum the likelihoods over all spanning deletions (I came up with problematic cases where this breaks down). So instead I'm using a simple heuristic approach: using the hom alt PLs, find the most likely spanning deletion for this position and use its likelihoods. In the 10K-sample VCF from Monkol there were only 2 cases that this problem popped up. In both cases the heuristic approach works well.	2015-06-16 12:20:43 -04:00
Ron Levine	b35085ca28	Indexing parameters not required if output file has the g.vcf.gz extensionv	2015-06-13 11:46:56 -04:00
Ron Levine	dbed660183	Add spannning deletions allele	2015-06-12 16:43:06 -04:00
Ron Levine	a6ca97ef14	Site-level selection based on genotype filter status	2015-05-21 11:27:20 -04:00
David Roazen	caafe84e74	Rev htsjdk to version 1.132 and picard to version 1.131, and switch to using the versions in maven central -We now pull htsjdk and picard from maven central. -Updated the GATK codebase as necessary to adapt to changes in the Feature interface. -Since VCFHeader now requires that all header lines have unique keys, uniquified the keys of GVCFBlock header lines by including the min/max GQ in the key. Updated MD5s accordingly. -Other MD5s changed as a result of an htsjdk fix to eliminate "-0" in VCF output.	2015-05-14 15:26:23 -04:00
Geraldine Van der Auwera	f6b3d8e862	Merge pull request #947 from broadinstitute/rhl_invert_selection Added --invert_selection flag for variant selection queries	2015-05-13 13:40:32 -04:00
Eric Banks	c752b9bca6	Fixed a small feature/bug that I introduced with the spanning deletions genotyping. In the case where there's a low quality SNP under a spanning deletion in the gvcfs: if the SNP is not genotyped by GenotypeGVCFs (because it's just noise) we were still emitting a record with just the symbolic DEL allele (because that allele is high quality). We no longer do that.	2015-05-13 11:19:40 -04:00
Ron Levine	4a75d54e65	Added invert and exclude flags for variant selection queries	2015-05-12 15:08:28 -04:00
Geraldine Van der Auwera	7a75f4ae79	Merge pull request #974 from broadinstitute/jw_Var2BinPEDSwap Correct errant array element swap in FAM file output.	2015-05-12 08:49:16 -04:00
Eric Banks	53a34cea4a	Merge pull request #938 from broadinstitute/eb_fix_spanning_deletions_in_genotyping Added a fix for genotyping positions over spanning deletions.	2015-05-11 23:11:47 -04:00
Joseph White	abb6bc6f57	Correct errant array element swap in FAM file output. dad and mom are swapped; paternal first, then maternal updated MD5 chksums for test files remove commented lines	2015-05-11 20:45:50 -04:00
Eric Banks	530e0e5ea6	Added a fix for combining/genotyping positions over spanning deletions. Previously, if a SNP occurred in sample A at a position that was in the middle of a deletion for sample B, sample B would be genotyped as homozygous reference there (but it's NOT reference - there's a deletion). Now, sample B is genotyped as having a symbolic DEL allele. Minor cleanup added. Note that I also removed Laura's previous fix for this problem. Existing integration tests change because I've added a new header line to the VCF being output. I also added several tests for the new functionality showing: 1. genotyping from separate and already combined gvcfs give the same output 2. genotyping over multiple spanning deletions works 3. combining works too Existing unit tests also cover this case.	2015-05-11 15:11:16 -04:00
Joseph White	5be8bc5dfc	Deprecate --mergeVariantsViaLD in HC New unit test for deprecated mergeVariantsViaLD Update HaplotypeCallerIntegrationTest.java Delete duplicate testHaplotypeCallerMergeVariantsViaLDException test.	2015-05-08 17:50:25 -04:00
Geraldine Van der Auwera	071d82d1bf	Un-exclude SD and TRA from HC annotators; resolves #966 Exclude MQ0BySample Move SD and TRA to new StandardUGAnnotation interface There is now annotation interface (StandardUGAnnotation) holding annots that are standard in UG but should't be used as they are now with HC. This allows us to not have to exclude these annotations explicitly in HC, but still be able to use them for development purposes.	2015-05-03 00:45:53 +02:00
Ron Levine	9ff827c83a	More allele trimming for VariantAnnotator	2015-04-29 21:11:49 -04:00
Ron Levine	d5f98e99f0	Bypass reads with a bad CIGAR length	2015-04-21 11:55:56 -04:00
Geraldine Van der Auwera	2053afe52a	Merge pull request #914 from broadinstitute/ldg_fixDitheringRandomness Initialize annotations so that --disableDithering actually works	2015-04-06 15:40:30 -04:00
Yossi Farjoun	d30a6258bc	added the missing file to the error message	2015-04-06 08:21:55 -04:00
Laura Gauthier	9c842df3a3	Initialize annotations so that --disableDithering actually works	2015-04-02 17:34:46 -04:00
Geraldine Van der Auwera	d7f7022dce	Merge pull request #904 from broadinstitute/pd_orig_dp Added keepOriginalDP argument to SelectVariants	2015-03-30 09:01:33 -04:00
Laura Gauthier	5a10758e2e	Annotation changes for M2: Build a ReferenceContext in ActiveRegionWalkers to pass in to annotation engine so we can call the TandemRepeatAnnotator from M2 Make TandemRepeatAnnotator default annotation for M2. Setup (but don't use yet) HC-style contamination downsampling. New HC integration test with TandemRepeatAnnotator	2015-03-27 18:25:23 -04:00
Ron Levine	aef0a83c52	Automatically choose indexing strategy by file extension	2015-03-27 11:10:35 -04:00
Phillip Dexheimer	c97c253ec8	Added keepOriginalDP argument to SelectVariants Fixes #830	2015-03-25 22:45:31 -04:00
Phillip Dexheimer	9e63696315	Remove indel-length normalization of QD for GGVCFs * Fixes #848 * length normalization is now only applied if the annotation is calculated in UG	2015-03-24 08:22:19 -04:00
Ron Levine	46668d469a	Exclude MappingQualityZero from default annotations	2015-03-17 21:46:18 -04:00
Eric Banks	1ff9463285	Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs. Now, instead of stripping out the GQs for mono sites, we transfer them to the RGQ. This is extremely useful for people who want to know how confident the hom ref genotype calls are. Perhaps this is just what CRSP needs for pertinent negatives. Note that I also changed the tool to no longer use the GenotypeSummaries annotation by default since it was adding some seemingly unnecessary annotations (like mean GQ now that we keep the GQ around and number of no-calls). Let me know if this was a mistake (although Laura gave me a thumbs up).	2015-03-13 10:27:20 -04:00
Eric Banks	ea8a1edeb6	Adding option to CombineGVCFs to have it break blocks at every N sites. Using --breakBandsAtMultiplesOf N will ensure that no reference blocks span across genomic positions that are multiples of N. This is especially important in the case of scatter-gather where you don't want your scatter intervals to start in the middle of blocks (because of a limitation in the way -L works in the GATK for VCF records with the END tag). For example, running with --breakBandsAtMultiplesOf 5 on this record: 1 69491 . G <NON_REF> . . END=69523 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 Will produce the following records: 1 69491 . G <NON_REF> . . END=69494 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 1 69495 . C <NON_REF> . . END=69499 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 1 69500 . T <NON_REF> . . END=69504 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 etc. Added docs and a new test.	2015-03-12 14:42:10 -04:00
Laura Gauthier	6ebcba5234	New walker to combine data for different formats of same sample that were called and VQSRed together; has functionality to combine only specified samples, omitting others (e.g. combine the uniquified NA12878s with -usn NA12878.variant51 -usn NA12878.variant102) GenotypeGVCFs now has the ability to unique-ify samples so I can genotype together two different datasets containing the same sample Modify InbreedingCoeff so that it works when genotyping uniquified samples	2015-03-01 12:44:32 -05:00
ldgauthier	8efaa97d84	Merge pull request #815 from broadinstitute/ldg_updateMulitallelicVAtestData Update test data so it better reflects the multiallelic AC/AF annotation...	2015-03-01 12:10:25 -05:00
Ron Levine	44e5965a4b	Change GC Content value type from Integer to Float	2015-02-25 13:56:42 -05:00
Laura Gauthier	4a493a7900	Update test data so it better reflects the multiallelic AC/AF annotation use case	2015-02-20 19:02:42 -05:00
Ron Levine	2cbaef2fb2	Throw exception for -dcov argument given to ActiveRegionWalkers	2015-02-19 08:24:39 -05:00
Ron Levine	c3ff6df252	StrandAlleleCountsBySample can only be called from HaplotypeCaller	2015-02-12 13:43:48 -05:00
Phillip Dexheimer	92c7c103c1	GenotypeConcordance: monomorphic sites in truth are no longer called "Mismatching Alleles" when the comp genotype has an alternate allele * PT 84700606	2015-02-07 15:54:38 -05:00
rpoplin	b8b23b931e	Merge pull request #807 from broadinstitute/rhl_handle_cigar Process X and = CIGAR operators	2015-02-01 11:09:52 -05:00
Ron Levine	9d4b876ccd	Process X and = CIGAR operators Add simple BaseRecalibrator integration test for CIGAR = and X operators	2015-01-29 17:00:00 -05:00
Khalid Shakir	1808c90d2a	Added introductory CRAM support. Replaced usage of GATKSamRecordFactory with calls to wrapper GATKSAMRecord extending SAMRecord. Minor other updates for test changes. Added exampleCRAM.cram generated by GATK, with .bai and .crai indexes generated by CRAMTools. CRAM-to-CRAM test disabled due to https://github.com/samtools/htsjdk/issues/148 Using exampleBAM.bam input, outputs of GATK's generated CRAM match CRAMTools generated CRAM, but not samtools/PrintReads SAM output, as things like insert sizes are different. If required for other tools, CRAM indexes must be generated via CRAMTools until we can generate them via CRAMFileWriter. Generation of exampleCRAM.cram: * java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o public/gatk-utils/src/test/resources/exampleCRAM.cram * java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram * java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram --bam-style-index CRAM generation by existing tools: * samtools view -C -T public/gatk-utils/src/test/resources/exampleFASTA.fasta -o testSamtools.cram public/gatk-utils/src/test/resources/exampleBAM.bam * java -jar cramtools-2.1.jar cram --ignore-md5-mismatch --capture-all-tags -Q -n -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -O testCRAMTools.cram * java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o testGATK.cram CRAMTools view of the above: * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleCRAM.cram \| tail -n 1 * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testSamtools.cram \| tail -n 1 * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testCRAMTools.cram \| tail -n 1 * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testGATK.cram \| tail -n 1	2015-01-26 14:47:39 -03:00
Phillip Dexheimer	72f76add71	Added -trimAlternates argument to SelectVariants * PT 84021222 * -trimAlternates removes all unused alternate alleles from variants. Note that this is pretty aggressive for monomorphic sites	2015-01-21 21:33:35 -05:00
Ron Levine	804b2a36b7	Fix SplitNCigar reads exception by making the list of RNAReadTransformer non-abstract, add test for -fixNDN Includes documentation changes for -fixNDN argument and the read transformer documentation. Documentation changes to CombineVariants	2015-01-14 22:22:05 -05:00
rpoplin	0292d49842	Merge pull request #801 from broadinstitute/pd_gatkvcfconstants Collected VCF IDs and header lines into one place	2015-01-14 09:43:48 -05:00
Phillip Dexheimer	6190d660e0	Edits to work with the latest htsjdk release: * TextCigarCodec.decode() is now static, and the getSingleton() method is gone * MergingSamRecordIterator now wants a Collection<SamReader> rather than Collection<SAMFileReader> in the constructor * SeekableBufferedStream now correctly reads the requested number of bytes, removed workaround in GATKBAMIndex	2015-01-13 21:32:10 -05:00
Phillip Dexheimer	b73e9d506a	Added GATKVCFConstants and GATKVCFHeaderLines to consolidate the GATK-specific VCF annotations * Removed unused annotations (CCC and HWP) * Renamed one of the two GC annotations to "IGC" (for Interval GC) * Revved picard & htsjdk (GATK constants are now removed from htsjdk) * PT 82046038	2015-01-13 21:32:09 -05:00
Laura Gauthier	6b2bd5ed09	Address user-reported bug featuring "trio" family with two children, one parent Add test to cover case with family of one parent, two children	2015-01-13 18:35:44 -05:00
Ryan Poplin	2e5f9db758	Raising per-sample limits on the number of reads in ART and HC. -- Active Region Traversal was using per sample limits on the number of reads that were too low, especially now that we are running one sample at a time. This caused issues with high confidence variants being dropped in high coverage data. -- HaplotypeCallerGVCFIntegrationTest PL/annotation changes due to using more reads in those tests -- Removed a CountReadsInActiveRegionsIntegrationTest test for excessive coverage because the read coverage no longer goes over the limits in ART	2015-01-09 11:21:42 -05:00
Valentin Ruano-Rubio	aae04b6122	Fixes explicit limitation of the maximum ploidy of the reference-confidence model Story: ===== - https://www.pivotaltracker.com/story/show/83803796 Changes: ======= - From a fix maximum ploidy indel RCM likelihood cache to a dynamically resizable one. - Used the occassion to removed an unused and deprecated method from ReferenceConfidenceModel Testing: ======= - Added integration test to check on ploidies larger than the previous limit of 20.	2015-01-07 10:43:22 -05:00
Laura Gauthier	88b6f3aa50	Change []-type arrays to lists so argument parsing works in VCF header commandline output	2015-01-05 10:21:06 -05:00
rpoplin	3240b3538a	Merge pull request #794 from broadinstitute/rhl_read_backed_phasing Rhl read backed phasing	2015-01-05 09:47:25 -05:00

1 2 3 4

165 Commits (e1c41b2c386a1d0ad971208018105d1ee60fe5fd)