gatk-3.8

Commit Graph

Author	SHA1	Message	Date
vruano	604fb7aaf8	Faster implementation of the active state profile value calculation when running HC with a single sample. Find out about a dev-bug and added TODOs (reported in #1096). Addresses issue #1095. Conflicts: protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCaller.java	2015-07-30 10:56:05 -04:00
Valentin Ruano Rubio	bb4c9fa1d3	Merge pull request #1099 from broadinstitute/vrr_magic_numbers Extracted some constant expressions involved HC variation discovery a…	2015-07-29 13:38:23 -04:00
vruano	02c7876c72	Extracted some constant expressions involved HC variation discovery and genotyping. Addreses issue #1092.	2015-07-29 11:58:13 -04:00
meganshand	4d4de27ba3	Removes unique(int maxSize) from KBestHaplotypeFinder	2015-07-28 15:54:21 -04:00
Louis Bergelson	9d9827f176	Merge pull request #1031 from broadinstitute/lb_update_for_java8 Updated gatk so it compiles with java 8	2015-07-28 11:09:19 -04:00
Valentin Ruano Rubio	3a3ff558c4	Merge pull request #1085 from broadinstitute/vrr_path_builder ReferenceConfidenceModel likelihood calculation in non…	2015-07-28 10:48:03 -04:00
Geraldine Van der Auwera	43a37fc746	Merge pull request #1075 from broadinstitute/ldg_bamoutDocs Add info about multiple input samples (as relevant for M2)	2015-07-27 16:56:36 -04:00
Geraldine Van der Auwera	5939b4c100	Merge pull request #1073 from broadinstitute/ldg_SV-MVtestNameFix Fix logging name on SelectVariantsIntegrationTest::testInvertMendelia…	2015-07-27 16:54:59 -04:00
vruano	8f6daf70db	Refactoring of ReferenceConfidenceModel likelihood calculation in non variant sites Changed a division by -10.0 to a multiplication by -.1 in QualUtils (typically multiplication is faster than division). Addresses performance issue #1081.	2015-07-26 08:33:46 -04:00
vruano	047aea9707	Address performance issue #1077	2015-07-23 13:44:10 -04:00
Laura Gauthier	4fefedfb0b	Fix logging name on SelectVariantsIntegrationTest::testInvertMendelianViolationSelection()	2015-07-23 09:48:15 -04:00
Laura Gauthier	85b340caed	Add info about multiple input samples (as relevant for M2) Also generalize references to the tool/caller since this code is now shared by HC and M2	2015-07-23 09:46:10 -04:00
Valentin Ruano Rubio	66cf22b28f	Merge pull request #1069 from broadinstitute/vrr_ad_genotype_gvcfs_bugfix Fix AD propagation when subsetting alleles in non-diploid GenotypeGVCF.	2015-07-22 18:53:43 -04:00
vruano	315e193e51	Fix AD propagation when subsetting alleles in non-diploid GenotypeGVCF. Addresses issue #913. Also remove some commented out code and toxic debugging code that uses System.out/err.println.	2015-07-22 17:08:13 -04:00
Geraldine Van der Auwera	75081bee2b	Merge pull request #1068 from broadinstitute/gvda_remove_beagle_walkers_971 Removed walkers for handling Beagle data	2015-07-22 15:47:19 -04:00
Joseph White	3bd988825f	Removed walkers for handling Beagle data Added deprecation statements to DeprecatedToolChecks.java Removed integration test for Beagle walker Added URL for Beagle documentation	2015-07-21 18:36:08 -04:00
Geraldine Van der Auwera	ca082bfb76	Updated license text and fixed a couple of typos in doc block	2015-07-21 17:55:48 -04:00
Valentin Ruano Rubio	9360e1d293	Merge pull request #1059 from broadinstitute/vrr_true_false_list_removal More efficient implementation of the indel read qualities recalculati…	2015-07-21 17:13:45 -04:00
vruano	82f1236633	More efficient implementation of the indel read qualities recalculation for the PCR error model. Addresses #1054.	2015-07-21 14:25:11 -04:00
Geraldine Van der Auwera	a4dde8f500	Merge pull request #1040 from broadinstitute/rhl_fasta_ref_maker Merge contiguous intervals properly, closes #1035	2015-07-21 14:19:09 -04:00
Geraldine Van der Auwera	da0c8c73fb	Merge pull request #1055 from broadinstitute/ldg_TRAdocs Updated TandemRepeatAnnotator docs	2015-07-21 14:16:20 -04:00
Laura Gauthier	8c18ead5e4	Clarify VCF version for supporting population alleles files Clarify DeNovoPrior definition on PbyT	2015-07-20 13:42:57 -04:00
Laura Gauthier	7b29c55eb6	Updated TandemRepeatAnnotator docs	2015-07-17 17:26:56 -04:00
vruano	7f74303f2b	Removes a very inefficient way to iterate in ReferenceConfidenceModel.isReadInformativeAboutIndelsOfSize(...) Addresses performance issue #1048.	2015-07-16 12:04:12 -04:00
Ron Levine	6e46b3696e	Merge contiguous intervals properly	2015-07-14 15:23:37 -04:00
Geraldine Van der Auwera	c109a953f8	Merge pull request #1029 from broadinstitute/rhl_vqslod_definition Make VQSLOD definition accurate	2015-07-06 19:52:15 -04:00
Ron Levine	1a7e83fa50	Merge if both GT are phased	2015-06-30 13:03:16 -04:00
Eric Banks	f994220617	Update the allele remapping code to handle the new spanning deletion allele. Now that Ron updated the GATK so that we use star to represent spanning deletions, we need to catch those cases in the code that remaps alleles. Otherwise, we try to pad the stars and that's just bad. Added test from actual failing data.	2015-06-29 17:58:22 -04:00
Louis Bergelson	e1c41b2c38	Updated gatk so it compiles on java 8 updated cofoja to 1.2 from 1.0 added explicit type casts in places that java 8 required them	2015-06-26 15:59:46 -04:00
Ron Levine	09686f4595	Make VQSLOD definition accurate	2015-06-25 16:47:50 -04:00
Geraldine Van der Auwera	719bb15340	Merge pull request #1019 from broadinstitute/rhl_var_index_param_gz Indexing parameters not required if output file has the g.vcf.gz exte…	2015-06-17 14:30:20 -04:00
Geraldine Van der Auwera	697c4b0cf1	Added else clause to handle symbolic alleles Add test for createAlleleMapping	2015-06-17 10:52:56 -04:00
Eric Banks	29ebfc32c3	Merge pull request #1020 from broadinstitute/eb_handle_multiple_spanning_dels Handle cases where a given sample has multiple spanning deletions.	2015-06-16 14:20:46 -04:00
Eric Banks	fe0b5e0fbe	Handle cases where a given sample has multiple spanning deletions. When a sample has multiple spanning deletions and we are asked to assign likelihoods to the spanning deletion allele, we currently choose the first deletion. Valentin pointed out that this isn't desired behavior. I promised Valentin that I would address this issue, so here it is. I do not believe that the correct thing to do is to sum the likelihoods over all spanning deletions (I came up with problematic cases where this breaks down). So instead I'm using a simple heuristic approach: using the hom alt PLs, find the most likely spanning deletion for this position and use its likelihoods. In the 10K-sample VCF from Monkol there were only 2 cases that this problem popped up. In both cases the heuristic approach works well.	2015-06-16 12:20:43 -04:00
Laura Gauthier	ce5ecf1383	Enable contamination correction via downsampling (as for HaplotypeCaller), added test Add oxoG read count annotation and add as default annotation Add ##SAMPLE VCF header line in accordance with TCGA VCF spec, specifying "File" line in sample header with BAM file name and "SampleName" with BAM sample name (Don't print sample file path if --no_cmdline_in_header is specified to help with test consistency) Turn on active region assembly-based physical phasing for M2 Clean up M2-related annotations so UG doesn't crash if M2 annotations are called	2015-06-15 07:59:15 -04:00
Ron Levine	b35085ca28	Indexing parameters not required if output file has the g.vcf.gz extensionv	2015-06-13 11:46:56 -04:00
Ron Levine	dbed660183	Add spannning deletions allele	2015-06-12 16:43:06 -04:00
Geraldine Van der Auwera	526f7c0d07	Merge pull request #985 from broadinstitute/sa_refactor_cleansing_hack_negative_zeros_973_depends_on_841 removed in-line conditional (hack) that changed the result from 0.0 to -0.0; see issue #841	2015-05-23 00:02:52 -04:00
Sheila Chandran	dac0b8ddfc	Added QD calculation	2015-05-22 11:59:10 -04:00
Ron Levine	a6ca97ef14	Site-level selection based on genotype filter status	2015-05-21 11:27:20 -04:00
melonistic	8d25b2ba40	removed in-line conditional (hack) that changed the result from 0.0 to -0.0; see issue #841 removed irrelevant -0 comments as specified in issue #841 but committed in #973	2015-05-16 23:12:09 -04:00
Geraldine Van der Auwera	d1a7edd796	Update pom versions to mark the start of GATK 3.5 development	2015-05-15 00:44:54 -04:00
Geraldine Van der Auwera	f19618653a	Update pom versions for the 3.4 release	2015-05-15 00:40:39 -04:00
David Roazen	caafe84e74	Rev htsjdk to version 1.132 and picard to version 1.131, and switch to using the versions in maven central -We now pull htsjdk and picard from maven central. -Updated the GATK codebase as necessary to adapt to changes in the Feature interface. -Since VCFHeader now requires that all header lines have unique keys, uniquified the keys of GVCFBlock header lines by including the min/max GQ in the key. Updated MD5s accordingly. -Other MD5s changed as a result of an htsjdk fix to eliminate "-0" in VCF output.	2015-05-14 15:26:23 -04:00
Geraldine Van der Auwera	f6b3d8e862	Merge pull request #947 from broadinstitute/rhl_invert_selection Added --invert_selection flag for variant selection queries	2015-05-13 13:40:32 -04:00
Eric Banks	c752b9bca6	Fixed a small feature/bug that I introduced with the spanning deletions genotyping. In the case where there's a low quality SNP under a spanning deletion in the gvcfs: if the SNP is not genotyped by GenotypeGVCFs (because it's just noise) we were still emitting a record with just the symbolic DEL allele (because that allele is high quality). We no longer do that.	2015-05-13 11:19:40 -04:00
Ron Levine	4a75d54e65	Added invert and exclude flags for variant selection queries	2015-05-12 15:08:28 -04:00
Geraldine Van der Auwera	7a75f4ae79	Merge pull request #974 from broadinstitute/jw_Var2BinPEDSwap Correct errant array element swap in FAM file output.	2015-05-12 08:49:16 -04:00
Eric Banks	53a34cea4a	Merge pull request #938 from broadinstitute/eb_fix_spanning_deletions_in_genotyping Added a fix for genotyping positions over spanning deletions.	2015-05-11 23:11:47 -04:00
Joseph White	abb6bc6f57	Correct errant array element swap in FAM file output. dad and mom are swapped; paternal first, then maternal updated MD5 chksums for test files remove commented lines	2015-05-11 20:45:50 -04:00
Eric Banks	530e0e5ea6	Added a fix for combining/genotyping positions over spanning deletions. Previously, if a SNP occurred in sample A at a position that was in the middle of a deletion for sample B, sample B would be genotyped as homozygous reference there (but it's NOT reference - there's a deletion). Now, sample B is genotyped as having a symbolic DEL allele. Minor cleanup added. Note that I also removed Laura's previous fix for this problem. Existing integration tests change because I've added a new header line to the VCF being output. I also added several tests for the new functionality showing: 1. genotyping from separate and already combined gvcfs give the same output 2. genotyping over multiple spanning deletions works 3. combining works too Existing unit tests also cover this case.	2015-05-11 15:11:16 -04:00
Joseph White	5be8bc5dfc	Deprecate --mergeVariantsViaLD in HC New unit test for deprecated mergeVariantsViaLD Update HaplotypeCallerIntegrationTest.java Delete duplicate testHaplotypeCallerMergeVariantsViaLDException test.	2015-05-08 17:50:25 -04:00
Geraldine Van der Auwera	5d8b9a7c20	Moved MQ0 out of HC exclusion and into StandardUGAnnotation	2015-05-03 01:04:49 +02:00
Geraldine Van der Auwera	071d82d1bf	Un-exclude SD and TRA from HC annotators; resolves #966 Exclude MQ0BySample Move SD and TRA to new StandardUGAnnotation interface There is now annotation interface (StandardUGAnnotation) holding annots that are standard in UG but should't be used as they are now with HC. This allows us to not have to exclude these annotations explicitly in HC, but still be able to use them for development purposes.	2015-05-03 00:45:53 +02:00
Geraldine Van der Auwera	e49f6dfd0f	Merge pull request #970 from broadinstitute/gg_minor_docfixes Fairly minor if plentiful fixes to various gatkdocs. Merging this without formal review since all tests pass, the gatkdocs build, and no one really wants to review corrections to grammar, typos and layout for 120+ documents. Review will be done by users in production ;-)	2015-05-03 00:36:12 +02:00
Geraldine Van der Auwera	919c3eaa2e	Numerous doc fixes; mostly formatting and clarifications	2015-05-03 00:28:46 +02:00
Ron Levine	9ff827c83a	More allele trimming for VariantAnnotator	2015-04-29 21:11:49 -04:00
Laura Gauthier	97caf94807	Fix implementation of allowNonUniqueKmersInRef so that it applies to all kmer sizes	2015-04-23 13:01:47 -04:00
Ron Levine	d5f98e99f0	Bypass reads with a bad CIGAR length	2015-04-21 11:55:56 -04:00
Kristian Cibulskis	45610a142c	initial refactoring of arguments into individual argument collections fix blasted license blurbs updates based on PR comments (abstractify HaplotypeCallerArgumentCollection into AssemblyBasedCallerArgumentCollection) comments on comments from PR review	2015-04-07 16:55:32 -04:00
Geraldine Van der Auwera	2053afe52a	Merge pull request #914 from broadinstitute/ldg_fixDitheringRandomness Initialize annotations so that --disableDithering actually works	2015-04-06 15:40:30 -04:00
Yossi Farjoun	d30a6258bc	added the missing file to the error message	2015-04-06 08:21:55 -04:00
Laura Gauthier	9c842df3a3	Initialize annotations so that --disableDithering actually works	2015-04-02 17:34:46 -04:00
Geraldine Van der Auwera	d7f7022dce	Merge pull request #904 from broadinstitute/pd_orig_dp Added keepOriginalDP argument to SelectVariants	2015-03-30 09:01:33 -04:00
Laura Gauthier	5a10758e2e	Annotation changes for M2: Build a ReferenceContext in ActiveRegionWalkers to pass in to annotation engine so we can call the TandemRepeatAnnotator from M2 Make TandemRepeatAnnotator default annotation for M2. Setup (but don't use yet) HC-style contamination downsampling. New HC integration test with TandemRepeatAnnotator	2015-03-27 18:25:23 -04:00
Ron Levine	aef0a83c52	Automatically choose indexing strategy by file extension	2015-03-27 11:10:35 -04:00
Phillip Dexheimer	c97c253ec8	Added keepOriginalDP argument to SelectVariants Fixes #830	2015-03-25 22:45:31 -04:00
Phillip Dexheimer	9e63696315	Remove indel-length normalization of QD for GGVCFs * Fixes #848 * length normalization is now only applied if the annotation is calculated in UG	2015-03-24 08:22:19 -04:00
Geraldine Van der Auwera	0a45b2d79d	Merge pull request #883 from broadinstitute/rhl_hc_mq0 Exclude MappingQualityZero from default annotations	2015-03-23 12:59:08 -04:00
Ami Levy-Moonshine	c5fc5c4f8c	create 2 new tools: - ASEReadCounter (public tool) replce Tuuli's script to produce the input to Manny's tool. It count the number of reads that support the ref allele and the alt allele, filtereing low qual reads and bases and keep only properPaired reads - ASECaller (private tool) take both RNA and DNA, and produce ontingencyTables still under development minor changes in other tools: - update RNA HC variant calling scala script - expose FS method pValueForContingencyTable to be able to call it from ASEcaller In ASEReadCounter: - allow different option to deal with overlaping read from the same fragment - add option to ignore or include indels in the pileups - add option to disabled DuplicateRead add ASEReadCounterIntegrationTest.java and files for the test	2015-03-21 16:56:00 -04:00
Ron Levine	46668d469a	Exclude MappingQualityZero from default annotations	2015-03-17 21:46:18 -04:00
Kristian Cibulskis	ab1053e83c	It compiles, and produces results! fixed NPE when normal contains no reads first integration test (micro) and unit tests, also rename of MuTectHC -> M2 adding in standard GATK license terms incorporated HOSTILE mode to PCR Error Correction removed tumor and normal name parameters and cleaned up internal name handling changes to allow for calling without a matched normal (technically, not true 'tumor-only' calling). Used for panel-of-normals creation additional regression tests, based on DREAM data. Removed accidental addition of TandemRepeatAnnotator to default annotations updated MD5 based on run from GSA4 to fix bamboo issue reverted unneeded visibility changes	2015-03-13 18:28:01 -04:00
Geraldine Van der Auwera	39a972f348	Merge pull request #872 from broadinstitute/eb_create_rgq_format_field Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs. Fixes #870	2015-03-13 13:59:53 -04:00
Eric Banks	1ff9463285	Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs. Now, instead of stripping out the GQs for mono sites, we transfer them to the RGQ. This is extremely useful for people who want to know how confident the hom ref genotype calls are. Perhaps this is just what CRSP needs for pertinent negatives. Note that I also changed the tool to no longer use the GenotypeSummaries annotation by default since it was adding some seemingly unnecessary annotations (like mean GQ now that we keep the GQ around and number of no-calls). Let me know if this was a mistake (although Laura gave me a thumbs up).	2015-03-13 10:27:20 -04:00
Phillip Dexheimer	6ffa295963	Regression: The new 'includeUnmapped' PartitionBy annotation was incorrectly set for HC Fixes #828	2015-03-13 00:24:57 -04:00
Eric Banks	ea8a1edeb6	Adding option to CombineGVCFs to have it break blocks at every N sites. Using --breakBandsAtMultiplesOf N will ensure that no reference blocks span across genomic positions that are multiples of N. This is especially important in the case of scatter-gather where you don't want your scatter intervals to start in the middle of blocks (because of a limitation in the way -L works in the GATK for VCF records with the END tag). For example, running with --breakBandsAtMultiplesOf 5 on this record: 1 69491 . G <NON_REF> . . END=69523 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 Will produce the following records: 1 69491 . G <NON_REF> . . END=69494 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 1 69495 . C <NON_REF> . . END=69499 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 1 69500 . T <NON_REF> . . END=69504 GT:DP:GQ:MIN_DP:MIN_GQ:PL ./.:94:99:82:99:0,120,1800 etc. Added docs and a new test.	2015-03-12 14:42:10 -04:00
Valentin Ruano Rubio	f8f2680142	Merge pull request #812 from broadinstitute/ldg_combineData_submit New walker to combine WGS and WES data	2015-03-02 15:12:31 -05:00
Laura Gauthier	aaf952469e	Change UG @PartitionBy to fix Queue tests	2015-03-01 14:42:43 -05:00
Laura Gauthier	6ebcba5234	New walker to combine data for different formats of same sample that were called and VQSRed together; has functionality to combine only specified samples, omitting others (e.g. combine the uniquified NA12878s with -usn NA12878.variant51 -usn NA12878.variant102) GenotypeGVCFs now has the ability to unique-ify samples so I can genotype together two different datasets containing the same sample Modify InbreedingCoeff so that it works when genotyping uniquified samples	2015-03-01 12:44:32 -05:00
ldgauthier	8efaa97d84	Merge pull request #815 from broadinstitute/ldg_updateMulitallelicVAtestData Update test data so it better reflects the multiallelic AC/AF annotation...	2015-03-01 12:10:25 -05:00
Ron Levine	44e5965a4b	Change GC Content value type from Integer to Float	2015-02-25 13:56:42 -05:00
Laura Gauthier	4a493a7900	Update test data so it better reflects the multiallelic AC/AF annotation use case	2015-02-20 19:02:42 -05:00
Ron Levine	2cbaef2fb2	Throw exception for -dcov argument given to ActiveRegionWalkers	2015-02-19 08:24:39 -05:00
Ron Levine	c3ff6df252	StrandAlleleCountsBySample can only be called from HaplotypeCaller	2015-02-12 13:43:48 -05:00
Phillip Dexheimer	92c7c103c1	GenotypeConcordance: monomorphic sites in truth are no longer called "Mismatching Alleles" when the comp genotype has an alternate allele * PT 84700606	2015-02-07 15:54:38 -05:00
rpoplin	b8b23b931e	Merge pull request #807 from broadinstitute/rhl_handle_cigar Process X and = CIGAR operators	2015-02-01 11:09:52 -05:00
Phillip Dexheimer	3354c07b1c	Added optional element "includeUnmapped" to the PartitionBy annotation * The value of this element (default true) determines whether Queue will explicitly run this walker over unmapped reads * This patch fixes a runtime error when FindCoveredIntervals was used with Queue * PT 81777160	2015-01-31 15:47:57 -05:00
Ron Levine	9d4b876ccd	Process X and = CIGAR operators Add simple BaseRecalibrator integration test for CIGAR = and X operators	2015-01-29 17:00:00 -05:00
Khalid Shakir	1808c90d2a	Added introductory CRAM support. Replaced usage of GATKSamRecordFactory with calls to wrapper GATKSAMRecord extending SAMRecord. Minor other updates for test changes. Added exampleCRAM.cram generated by GATK, with .bai and .crai indexes generated by CRAMTools. CRAM-to-CRAM test disabled due to https://github.com/samtools/htsjdk/issues/148 Using exampleBAM.bam input, outputs of GATK's generated CRAM match CRAMTools generated CRAM, but not samtools/PrintReads SAM output, as things like insert sizes are different. If required for other tools, CRAM indexes must be generated via CRAMTools until we can generate them via CRAMFileWriter. Generation of exampleCRAM.cram: * java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o public/gatk-utils/src/test/resources/exampleCRAM.cram * java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram * java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram --bam-style-index CRAM generation by existing tools: * samtools view -C -T public/gatk-utils/src/test/resources/exampleFASTA.fasta -o testSamtools.cram public/gatk-utils/src/test/resources/exampleBAM.bam * java -jar cramtools-2.1.jar cram --ignore-md5-mismatch --capture-all-tags -Q -n -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -O testCRAMTools.cram * java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o testGATK.cram CRAMTools view of the above: * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleCRAM.cram \| tail -n 1 * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testSamtools.cram \| tail -n 1 * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testCRAMTools.cram \| tail -n 1 * java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testGATK.cram \| tail -n 1	2015-01-26 14:47:39 -03:00
Phillip Dexheimer	72f76add71	Added -trimAlternates argument to SelectVariants * PT 84021222 * -trimAlternates removes all unused alternate alleles from variants. Note that this is pretty aggressive for monomorphic sites	2015-01-21 21:33:35 -05:00
Ron Levine	804b2a36b7	Fix SplitNCigar reads exception by making the list of RNAReadTransformer non-abstract, add test for -fixNDN Includes documentation changes for -fixNDN argument and the read transformer documentation. Documentation changes to CombineVariants	2015-01-14 22:22:05 -05:00
rpoplin	0292d49842	Merge pull request #801 from broadinstitute/pd_gatkvcfconstants Collected VCF IDs and header lines into one place	2015-01-14 09:43:48 -05:00
Phillip Dexheimer	6190d660e0	Edits to work with the latest htsjdk release: * TextCigarCodec.decode() is now static, and the getSingleton() method is gone * MergingSamRecordIterator now wants a Collection<SamReader> rather than Collection<SAMFileReader> in the constructor * SeekableBufferedStream now correctly reads the requested number of bytes, removed workaround in GATKBAMIndex	2015-01-13 21:32:10 -05:00
Phillip Dexheimer	b73e9d506a	Added GATKVCFConstants and GATKVCFHeaderLines to consolidate the GATK-specific VCF annotations * Removed unused annotations (CCC and HWP) * Renamed one of the two GC annotations to "IGC" (for Interval GC) * Revved picard & htsjdk (GATK constants are now removed from htsjdk) * PT 82046038	2015-01-13 21:32:09 -05:00
Laura Gauthier	6b2bd5ed09	Address user-reported bug featuring "trio" family with two children, one parent Add test to cover case with family of one parent, two children	2015-01-13 18:35:44 -05:00
Ryan Poplin	2e5f9db758	Raising per-sample limits on the number of reads in ART and HC. -- Active Region Traversal was using per sample limits on the number of reads that were too low, especially now that we are running one sample at a time. This caused issues with high confidence variants being dropped in high coverage data. -- HaplotypeCallerGVCFIntegrationTest PL/annotation changes due to using more reads in those tests -- Removed a CountReadsInActiveRegionsIntegrationTest test for excessive coverage because the read coverage no longer goes over the limits in ART	2015-01-09 11:21:42 -05:00
rpoplin	03203e249e	Merge pull request #792 from broadinstitute/rhl_pairhmm_log_stderr Rhl pairhmm log stderr	2015-01-07 12:41:10 -05:00
Valentin Ruano-Rubio	aae04b6122	Fixes explicit limitation of the maximum ploidy of the reference-confidence model Story: ===== - https://www.pivotaltracker.com/story/show/83803796 Changes: ======= - From a fix maximum ploidy indel RCM likelihood cache to a dynamically resizable one. - Used the occassion to removed an unused and deprecated method from ReferenceConfidenceModel Testing: ======= - Added integration test to check on ploidies larger than the previous limit of 20.	2015-01-07 10:43:22 -05:00
Ron Levine	b4fda38922	Use logging system instead of stderr	2015-01-05 14:04:10 -05:00
Laura Gauthier	88b6f3aa50	Change []-type arrays to lists so argument parsing works in VCF header commandline output	2015-01-05 10:21:06 -05:00
rpoplin	3240b3538a	Merge pull request #794 from broadinstitute/rhl_read_backed_phasing Rhl read backed phasing	2015-01-05 09:47:25 -05:00
Ron Levine	c6840124fe	clean up, add final	2015-01-04 23:01:24 -05:00
Ron Levine	85dc703461	Add TestMergeIntoMNP() and TestReallyMergeIntoMNP()	2015-01-01 09:51:20 -05:00
Ron Levine	bb94833750	Add more tests	2014-12-30 22:45:44 -05:00
Ron Levine	714d575e3b	correct reference file name	2014-12-25 14:00:39 -05:00
Ron Levine	a7fba5c209	restructure and add more tests	2014-12-25 13:57:54 -05:00
Ron Levine	64375f6341	Messages that were going to stdout now going to stderr Make PairHMM outputs go to stderr instead of stdout Change output from stdout to stderr in close() Updated lib with output going to stderr	2014-12-23 11:03:29 -05:00
Ron Levine	069398ad46	Added more tests and documentation	2014-12-19 12:57:43 -05:00
Laura Gauthier	a9694951d2	Add error handling for genotypes that are called but have no PLs	2014-12-18 15:03:20 -05:00
Geraldine Van der Auwera	b0e615251b	Updated VQSR tool docs	2014-12-18 12:59:37 -05:00
rpoplin	4a2ac38308	Merge pull request #790 from broadinstitute/rp_nsubtil_fix-snp-detection BQSR bug fix from @nsubtil	2014-12-18 09:19:53 -05:00
Ron Levine	08790e1dab	Fix mmultiallelic info field annotation for VariantAnnotator Add multi-allele test for info field annotations Fix to process all types of INFO annotations roll back to previous version, removes INFO and FORMAT Correct @return for VariantAnnotatorEngine.getNonReferenceAlleles() Enhance comments and clean up multi-allelic logic, handle header info number = R only parse counts of A & R Add INFO for AC update MD5 Performance enhancement, only parse multiallelic with a count A or R Make argument final in getNonReferenceAlleles() Code cleanup, add exceptions for bad expression/allele size mismatch and missing header info for an expression Change exception to warning for expression value/number of alleles check remove adevertised exceptions	2014-12-17 22:21:00 -05:00
Ron Levine	ba949389c5	matchHaplotypeAlleles() no longer calls alleleSegregationIsKnown(), added a TODO to investigate	2014-12-17 14:02:24 -05:00
Ryan Poplin	d84970ff75	BQSR bug fix from @nsubtil -- Ignore SNP matches that lie outside the clipped read window -- This fixes an issue where GATK would skip the entire read if a SNP is entirely contained within a sequencing adapter.	2014-12-17 10:04:37 -05:00
Ron Levine	56f8e4f9cf	Add comments, alleleSegregationIsKnown() check is added to matchHaplotypeAlleles()	2014-12-17 03:25:26 -05:00
Laura Gauthier	011843c569	Fixed huge bug from 9895005a (CombineGVCFs used to stop after the first contig)	2014-12-16 12:43:32 -05:00
rpoplin	bcc6b73e9b	Merge pull request #786 from broadinstitute/pd_variantstotable_sma Fix VariantsToTable output of FORMAT record lists when -SMA is specified	2014-12-16 10:37:22 -05:00
Valentin Ruano-Rubio	736a857e82	Fixing CombineGVCFs that writes out the wrong REF allele Story: ===== - https://www.pivotaltracker.com/story/show/83259038 Changes: ======= - Done minimal changes to make the fix after an arduous attempt to understand CombineGVCFs code. Test: ==== - Added a integration test to explicitly test for the bug. - Updated a md5 changes as the bug was actually affecting one of the existing integration tests.	2014-12-13 22:38:24 -05:00
Phillip Dexheimer	71bdfbe465	Fix VariantsToTable output of FORMAT record lists when -SMA is specified * PT 84242218 * Note that FORMAT fields behave the same as INFO fields - if the annotation has a count of A (one entry per Alt Allele), it is split across the multiple output lines. Otherwise, the entire list is output with each field	2014-12-10 21:41:15 -05:00
rpoplin	bf2911d62c	Merge pull request #783 from broadinstitute/pd_splitsamfile Fix NPE in SplitSamFile	2014-12-08 09:39:03 -05:00
Valentin Ruano-Rubio	385186e11b	Makes GQ of Hom-Ref Blocks in GVCF output to be consistent with PLs Story: ----- - https://www.pivotaltracker.com/story/show/83800586 Changes: ------- - In GVCFWriter GQ is now recalculated out of the fianl PL array for the block. Testing: ------- - Updated affected integration test md5s	2014-12-07 16:45:32 -05:00
Phillip Dexheimer	a5dee8a42e	Fix NPE in SplitSamFile * PT 82892316 * Added integration test * Fixed similar error in debug output of HC	2014-12-07 10:37:30 -05:00
Ron Levine	c9175eeee8	Renamed PhasingUtilitiesUnitTest to PhasingUtilsUnitTest	2014-12-02 18:20:12 -05:00
Ron Levine	b8f0f3fdd2	Add argument for loading the vector HMM library once	2014-12-02 10:13:56 -05:00
Ron Levine	386aeda022	Add HaplotypeCaller argument so integration tests can specify the hardware dependent PairHMM sub-implementation	2014-11-25 21:53:53 -05:00
Ron Levine	34241a62f6	Use a publicly accessible sequence file	2014-11-24 11:18:21 -05:00
Ron Levine	6ff698c556	Added HP and non-HP tests for matchHaplotypeAlleles(), added a nominal test for mergeIntoMNPvalidationCheck()	2014-11-24 11:08:04 -05:00
Ron Levine	61e1a3ecd1	Added the framework for testing the PhasingUtilies methods matchHaplotypeAlleles() and reallyMergeIntoMNP()	2014-11-22 22:01:39 -05:00
Menachem Fromer	9b73c8a841	Fix MNP merging bugs	2014-11-21 06:42:51 -05:00
rpoplin	00027e1555	Merge pull request #774 from broadinstitute/ldg_makeSelectVariantsTrimAlleles Add -trim argument to SelectVariants to trim alleles to minimal represen...	2014-11-13 13:58:13 -05:00
Ron Levine	67656bab23	Resolved conflict during rebasing Add more logging to annotators, change loggers from info to warn Add comments to testStrandBiasBySample() Clarify comments in testStrandBiasBySample remove logic for not prcossing an indel if strand bias (SB) was not computed remove per variant warnings in annotate() Log warnings if using the wrong annotator or missing a pedgree file Log test failures once in annotate(), because HaplotypeCaller does not call initialize(). Avoid using exceptions Fix so only log once in annotate(), Hardey-Weinberg does not require pedigree files, fix test MD5s so pass Check if founderIds == null Update MD5s from HaplotypeCaller integrations tests and clean up code Change logic so SnpEff does not throw excpetions, change engine to utils in imports Update test MD5s, return immediately if cannot annotate in SnpEff.initialization() Post peer review, add more logging warnings Update MD5 for testHaplotypeCallerMultiSampleComplex1, return null if PossibleDeNovo.annotate() is not called by VariantAnnotator	2014-11-12 02:45:49 -05:00
Laura Gauthier	783a4fd651	Change default behavior of SelectVariants to trim remaining alleles when samples are subset. -noTrim argument preserves original alleles. Add test for trimming.	2014-11-11 16:32:25 -05:00
Valentin Ruano-Rubio	c5977e5c8f	Correct wrong left-alignment of reads in HC bamout Story: ----- https://www.pivotaltracker.com/story/show/80684230 Changes: ------- - Corrected the bug: AlignmentUtils#createReadAlignedToRef was not realigning against the reference but the best haplotype for the read. Test: ---- - Added integration test in HaplotypeCallerIntegrationTest to check that the bug has been fixed. - Fixed md5s modified by this change; these are cause due to small changes in the state of the random-number generator and read vs variant site overlapping.	2014-11-10 10:09:58 -05:00
Laura Gauthier	c09667a20d	Fix bug in CombineGVCFs so now sample 2 variants occuring within sample 1 deletions get merged properly. CombineGVCFs now outputs ref conf for the duration of deletions so that SNPs occuring in other samples aligned with those deletions will be genotyped correctly	2014-11-05 09:11:47 -05:00
Khalid Shakir	0092a0b9eb	Faster builds, with updates to documentation generation. Reading the multiple GATKText files as a single stream, especially with new top level target executable jar files pointing to a lib folder. Don't dirty the build with a new GATKText.properties if input files are unmodified. Stop warning on undocumented abstract classes. Fixed ClassNotFoundException/NoClassDefFoundError by fixing ResourceBundleExtractorDoclet artifact. Excluding Exceptions from documentation. Removed custom log4j dependency from ResourceBundleExtractorDoclet. Stop generating the dependency reduced pom during shade. Stop regenerating gsalib when the files are already up to date. Disabled mvn site generation from external-example.	2014-11-05 00:32:23 +08:00
Khalid Shakir	1cb4b99548	Added faster built executable, non-packaged jars. Moved top level target symlinks to package jar files to under target/package. Executable jar files are placed under target/executable with the new target[/lib] directories. Under top level target, symlinks to either the package or the executable jars replace what was a symlink to the package jar path. Allow disabling of the shade package. ant-bridge.sh by default only builds executable jars, and doesn't package by default, as did the old ant build.xml. Added a new package_path.sh utility script for other scripts to use instead of anything in the target folder.	2014-11-05 00:30:46 +08:00
Phillip Dexheimer	10f99cbe04	Added StrandAlleleCountsBySample annotation This annotation outputs the number of reads supporting each allele, stratified by sample and read strand. Addresses PT 76958712	2014-11-03 21:35:58 -05:00
Khalid Shakir	8b81031bf8	Disabling tests for Lsf706 specific functionality.	2014-11-04 01:31:18 +08:00
Phillip Dexheimer	bcfd9ce19a	Moved platform flow information into NGSPlatform * Explicitly added a type for rarely used platforms * PT 81767718	2014-10-31 22:27:34 -04:00
rpoplin	c84805c402	Merge pull request #768 from broadinstitute/pd_bcf_failures Fix BCF writing when FORMAT annotations contain arrays	2014-10-31 15:30:56 -04:00
rpoplin	eecb56e0ae	Merge pull request #766 from broadinstitute/ldg_StrandBiasForMultiallelics Calculate StrandBiasBySample using all alternate alleles as ref vs. any ...	2014-10-31 15:26:07 -04:00
Phillip Dexheimer	fc67e50faa	Revved Picard/htsjdk Removed inefficient array->List conversion in AlleleCountBySample	2014-10-30 21:16:25 -04:00
Laura Gauthier	bc7202fff7	Calculate StrandBiasBySample using all alternate alleles as ref vs. any alt	2014-10-30 11:52:06 -04:00
Khalid Shakir	5c9fe1a06d	Split all imports of tools\|engine from utils, and all tools from engine. Second of two commits, modifying actual files.	2014-10-24 20:59:46 +08:00
Khalid Shakir	bb7151192a	Split all imports of tools\|engine from utils, and all tools from engine. First of two commits, renaming files only.	2014-10-24 20:59:45 +08:00
Geraldine Van der Auwera	b69b256003	Update pom versions to mark the start of GATK 3.4 development	2014-10-23 22:31:44 -04:00
Geraldine Van der Auwera	eee94ec81f	Update pom versions for the 3.3 release	2014-10-23 22:25:17 -04:00
Geraldine Van der Auwera	3ba94b987c	Minor documentation clarifications	2014-10-22 17:54:11 -04:00
rpoplin	0f89d1a362	Merge pull request #755 from broadinstitute/sc_Annotation_Docs_73647570 Improvements to documentation of variant annotations	2014-10-22 13:41:00 -04:00
Sheila Chandran	b3c5ed4414	Improvements to documentation of variant annotations - Added or modified explanations for majority of variant annotations - Generalized NBaseCount to include all tech platforms (not just SOLiD)	2014-10-21 18:20:04 -04:00
Geraldine Van der Auwera	895b8c5931	Minor fix for missing INFO key definition in VCF header	2014-10-21 16:50:37 -04:00
rpoplin	c4fcd70a88	Merge pull request #754 from broadinstitute/rhl_variant_array_exception Do not process a variant if it is too large (> readLength), and log an e...	2014-10-21 12:01:52 -04:00
rpoplin	bcf6be0b08	Merge pull request #753 from broadinstitute/ldg_HCzeroDepth Fix GenotypeGVCF bugs in -allSites mode	2014-10-21 12:00:04 -04:00
Laura Gauthier	2b848ad859	Variants that become hom-ref after regenotyping in GenotypeGVCFs are now getting output in -allSites mode.	2014-10-21 08:21:53 -04:00
Laura Gauthier	5465e4484e	For GenotypeGVCFs -allSites mode, make genotypes no-call if depth is zero.	2014-10-21 08:21:43 -04:00
Ron Levine	239151ac7b	Do not process a variant if it is too large (> readLength), and log an error remove final keyword before refMap and altMap, constructHaplotype() changes their values return ArtificialHaplotype from constructHaplotype instaed of passing as an argument Add logic so arraycopy does not throw an IndexOutOfBoundsException, add test for a long insert	2014-10-20 15:51:32 -04:00
Phillip Dexheimer	b348ce8f25	Added -disableOptimizations argument to HaplotypeCaller. * This argument is intended to be used in conjunction with -bamout, and disable early-exit optimizations to allow reference regions to be contained in the output bam * Also forcibly includes the reference haplotype in the set of haplotypes given to the BAMWriter * Made -dontTrimActiveRegions visible, as it is likely also desirable in this use case * Addresses PT 77731660	2014-10-16 21:11:20 -04:00
Laura Gauthier	0f08065ebc	Throw UserException if input VCFs have duplicate samples but no genotypemergeoption is specified	2014-10-15 16:03:10 -04:00
Laura Gauthier	81482138ca	Decrease interval on CGP integration test to reduce test execution time	2014-10-15 11:28:27 -04:00
Geraldine Van der Auwera	e7e8052f84	Updated license information - Updated license files (private/protected) for version, address and a couple of legal clauses - Updated license snippet throught the codebase	2014-10-14 17:10:12 -04:00
Ron Levine	36c27155af	Made the threshold for the probability of a state being active a command line argument remove TODO comment after activeProbThreshold recover static ACTIVE_PROB_THRESHOLD for unit tests Add min/max values for active_probability_threshold parameter Move activeProbThreshold parameter to GATKArguemtnCollection define ACTIVE_PROB_THRESHOLD in unit tests add construction of argCollection in in ctor Move arguments from GATKArgumentCollection to ActiveRegionWalker Throw exception if threshold < 0 or > 1 in ActivityProfile ctor max propogation distance parameter to ActiveRegionWalker for AcrtivityProfile Use polymorphic getMaxProbPropagationDistance() so BandPassActivityProfile computes the crrect region size cutoff Get the maxProbPropagationDistance from the super class's method, instead of directly, this is safer Removed extraneous command line imports and make maxProbPropagationDistance a hidden argument remove limit check for activeProbThreshold, not necessary because the check is made when imput as a command line arg Remove extra 'region' in the doxygen param description for maxProbPropagationDistance	2014-10-10 10:36:02 -04:00
Ron Levine	645d418015	Changed hardcoded downsampling max/min coverage values to parameters Rename parameters using camel case and add to integration test Correct documentation for maxReadsInRegionPerSample and minReadsPerAlignmentStart Change the argument--minReadsPerAlignmentStart in the integration test from 50 to 5 'each genomic location' only pertains to minReadsPerAlignmentStart, not maxReadsInRegionPerSample	2014-10-09 17:09:26 -04:00
Valentin Ruano-Rubio	a3ad6f63bd	Reduce execution time of various integration tests Story: https://www.pivotaltracker.com/story/show/79461912	2014-09-30 13:28:55 -04:00
rpoplin	329bd081b7	Merge pull request #736 from broadinstitute/rhl_remove_line removed an unneed import that broke maven	2014-09-29 15:03:55 -04:00
Ron Levine	1c9d60c9a0	removed an unneed import that broke maven	2014-09-29 12:57:33 -04:00
Valentin Ruano-Rubio	311b6815b3	Fixed the QUAL calculation of the EXACT_INDEPENDENT. The QUAL value calculated by this Exact AF Calculator is very underestimated when there are more than one alternative allele (non-biallelic sites). The reason is that the QUAL was roughly calculated by adding the QUALs resulting of each alternative alleles vs all other alleles, reference and alts, collapsed. This is ok for MLEAC calculations but not for QUAL. Now, for calculating the QUAL we collapse all the alternatives as only one. This change improves sensitivy with a cost of additional false positives, but this is naturally expected. The resulting QUAL column is much closer to the one returned by the reference implementation. Story: https://www.pivotaltracker.com/story/show/75926368. Changes: Changed the QUAL calculation as described above. Updated MD5s. Fixed MD5s	2014-09-29 11:04:52 -04:00
Valentin Ruano-Rubio	0e52b8ba5a	Fixed MLEAC and QUAL inaccuracy in GeneralPloidyExactAFCalculator. The problem whas that the MLE table calculation aborted "unlikely" genotype combinations to aggresively. This also uncovered another bug where GeneralPloidyExactAFCalculation makes a slightly different use of StateTracker as compared to DiploidExactAFCalculation. We have changed StateTracker generalizing it to be able to work with both using code behaviors. Story: ----- * https://www.pivotaltracker.com/story/show/78920568 Changes: ------- * Fixes in GeneralPloidyExactAFCalculator. * Needed changes in StateTracker API and its consequences in DiploidExactAFCalculation. * Updated affected integrated tests' MD5s after fixing the GeneralPloidyExactAF.	2014-09-23 15:40:54 -04:00
Valentin Ruano-Rubio	f6cb83d476	Renamed AFCalc to AFCalculator for a better class naming	2014-09-12 14:59:58 -04:00
Valentin Ruano-Rubio	95b45443ae	Updated test according to changes in the AF calculator framework. Changes: ------- * Updated current unit and integration test to use the new API components. * Added unit tests for new classes AFPriorProvider and AFCalculatorProviders. * Added integration test for mixed ploidy GenotypeGVCFs and CombineGVCFs	2014-09-12 14:59:47 -04:00
Valentin Ruano-Rubio	3cdeab6e9e	GenotypingEngines and walkers now use AFCalc(ulator) providers rathern than instanciate their own (fixed) calculators directly. Changes: ------- * GenotypingEngine uses now a AFCalc provider instead of its own thread-local with one-time initialized and fixed AF calculator. * All walkers that use a GenotypingEngine now are passing the appropiate AF calculator provider. For now most just use a fix calculator (FixedAFCalculatorProvider) except GenotypeGVCFs as this one now can cope with mixture of ploidies failing-over to a general-ploidy calculator when the preferred implementation is not capable to handle a site's analysis.	2014-09-12 14:25:09 -04:00
Valentin Ruano-Rubio	935bd1394b	AFCalculatorProvider components to allow for dynamic instantiation of different AFCalc(ulators) to cope with dynamic ploidy and max-alt-allele counts (the latter not used for now).	2014-09-12 14:23:45 -04:00
Valentin Ruano-Rubio	ce8e93fa51	Made the AF prior probability distribution dynamic respect to the total-ploidy (added ploidy accross samples). Changes: -------- * Instead of calculate a fixed log10 prior array with a fix total likelihood we use a new component, the AFPriorProvider to generate the priors for different total plodies on demand; these are cached however so there is no unecessary recompute involved.	2014-09-12 14:23:37 -04:00
Valentin Ruano-Rubio	31e58ae4ec	Refactored AFCalc to remove unecessary capability limits allowing to deal with mixed ploidies and max-alt-allele number changes dynamically. Changes: -------- * Moved the AFCalcFactory.Calculation enum in a top level class AFCalculatorImplementation. * Given more reponsabilities to the enum like resolving the constructor method once per implementation and the best-model selection algorithm. * Removed test-code only fields and methods from AFCalc; just used to perform unit-testing and not any actual functionality of this component. * Removed the fixed ploidy constraint of GeneralPloidyExactAFCalc implementation... now can deal with mixed ploidies that may change per site and sample. * Removed the fixed maxAltAllele restriction by allowing resizing of the stateTracker structures. * Due to previous two points now call the the AFCalc object are passed the default-ploidy to assume in case some genotype in the input VC does not have it and the max-alt-allele. * Also due to those changes, removed the now totally useless 3 int parameters from all AFCalc constructors. * Cleaned the code a bit from no further used components and methods.	2014-09-12 14:17:36 -04:00
Ryan Poplin	48252897b4	Added ignore all filters options to VQSR walkers	2014-09-11 15:11:41 -04:00
Eric Banks	31cea25c36	Merge pull request #730 from broadinstitute/eb_inbreeding_coeff_unit_test Cleaned up and fleshed out unit tests for the Inbreeding Coefficient annotation class	2014-09-10 09:32:49 -04:00
Eric Banks	5e490362ca	Cleaned up and fleshed out unit tests for the Inbreeding Coefficient annotation class.	2014-09-08 11:40:39 -04:00
Eric Banks	cc175bad40	Improve the accuracy of dangling head merging in the HC assembler. Dangling head merging (like with tails) in now enabled by default. The --recoverDanglingHeads argument is now deprecated so that users know not to use it anymore. We now also allow the user to set the minimum branch length for merging. This will be different for exomes and RNA (see below). The other changes in the code itself: 1. We no longer allow an arbitrarily large number of mismatches in the dangling head for merging 2. The max number of mismatches allowed in a dangling head is proportional to the kmer size There will be a difference in the RNA calling pipeline. Instead of invoking '--recoverDanglingHeads' the user will instead want to use '--minDanglingBranchLength 0'. Below are the knowledgebase results of the master branch vs. this one. For NA12878 DNA Exome: master SNPS TRUE_POSITIVE 36722 master SNPS CALLED_NOT_IN_DB_AT_ALL 2699 master SNPS REASONABLE_FILTERS_WOULD_FILTER_FP_SITE 292 master SNPS FALSE_POSITIVE_SITE_IS_FP 70 branch SNPS TRUE_POSITIVE 36867 branch SNPS CALLED_NOT_IN_DB_AT_ALL 2952 branch SNPS REASONABLE_FILTERS_WOULD_FILTER_FP_SITE 387 branch SNPS FALSE_POSITIVE_SITE_IS_FP 94 As I discussed with Ryan in person, there are a good number of FPs that are called in the new code, but they nearly all have bad strand bias and should be easily filtered by VQSR. Note that there is no change for indels. For NA12878 RNA from Ami: master SNPS TRUE_POSITIVE 11055 master SNPS CALLED_NOT_IN_DB_AT_ALL 831 master SNPS REASONABLE_FILTERS_WOULD_FILTER_FP_SITE 44 master SNPS FALSE_POSITIVE_SITE_IS_FP 96 branch SNPS TRUE_POSITIVE 11113 branch SNPS CALLED_NOT_IN_DB_AT_ALL 874 branch SNPS REASONABLE_FILTERS_WOULD_FILTER_FP_SITE 47 branch SNPS FALSE_POSITIVE_SITE_IS_FP 92 Again, there's basically no change for indels.	2014-09-07 08:55:59 -04:00
Phillip Dexheimer	a35f5b8685	Moved arguments controlling options in output files into the engine * Arguments involved are --no_cmdline_in_header, --sites_only, and --bcf for VCF files and --bam_compression, --simplifyBAM, --disable_bam_indexing, and --generate_md5 for BAM files * PT 52740563 * Removed ReadUtils.createSAMFileWriterWithCompression(), replaced with ReadUtils.createSAMFileWriter(), which applies all appropriate engine-level arguments * Replaced hard-coded field names in ArgumentDefinitionField (Queue extension generator) with a Reflections-based lookup that will fail noisily during extension generation if there's an error	2014-09-05 21:18:11 -04:00
droazen	5c4a3eb89c	Merge pull request #727 from broadinstitute/ks_gatk_queue_package_test_updates Various fixes for package tests.	2014-09-05 10:17:32 -04:00
Ryan Poplin	a45acdfb89	StrandOddsRatio is now a standard annotation.	2014-09-05 08:33:37 -04:00
Khalid Shakir	376592f423	Various fixes for package tests. Explicitly including gatk/queue test-jar artifacts in package test classpaths. SelectVariantsIntegrationTest#testInvalidJexl now resets the JexlEngine silent flag that VariantFiltration.initialize() toggles. External example no longer tries to unpack nonexistent gatk artifact jars during package tests.	2014-09-04 15:30:31 -04:00
Ryan Poplin	1b809268d5	fixing a few small typos in the HaplotypeCaller and related classes	2014-09-04 14:48:27 -04:00
droazen	5c087a6e1f	Merge pull request #724 from broadinstitute/ks_remove_test_qscript_symbolic_links Removed symlink creation for tests and qscripts	2014-09-04 09:10:54 -04:00
Eric Banks	538537dbf1	Merge pull request #718 from broadinstitute/mf_rbp_fix Fix MNP merging code to work with explicit HP phase representation	2014-09-02 20:39:22 -04:00
Eric Banks	01e725cd1a	Merge pull request #723 from broadinstitute/eb_fix_rna_splitting_PT77878554 Make sure that the OverhangFixingManager (used for splitting RNA reads) ...	2014-09-02 20:39:01 -04:00
Menachem Fromer	10f9001738	Fix MNP merging code to work with explicit HP phase representation	2014-09-02 17:25:08 -04:00
Eric Banks	ff91ab8ba2	Make sure that the OverhangFixingManager (used for splitting RNA reads) handles unmapped reads.	2014-09-02 16:56:17 -04:00
Valentin Ruano Rubio	c7925f6e5c	Merge pull request #719 from broadinstitute/vrr_generalize_ploidy_in_genotype_gvcfs Adds support for omniploidy to GenotypeGVCFs and CombineGVCFs.	2014-09-02 16:51:02 -04:00
Valentin Ruano-Rubio	d363725b4b	Adds support for omniploidy to GenotypeGVCFs and CombineGVCFs. Same changes fixed the problem for GenotypeGVCFs and CombineGVCFs. Stories: - https://www.pivotaltracker.com/story/show/77626044 - https://www.pivotaltracker.com/story/show/77626854 Changes: - Generalized the code for the merging in GATKVariantContextUtils to cope with ploidy != 2. - GenotypeGVCFs now check that the input's ploidy conform to the '-ploidy' argument. - Moved out Refernce Confidence VC merging code from GATKVariantContextUtils so that we can keep new code in protected. Caveats: - GenotypeGVCFs only can deal with input files that have the same ploidy in all positions; the one that the user MUST indicate in the -ploidy argument (if different to the default 2). - CombineGVCFs won't necessarely complain if its passed mixed ploidy inputs but you won't be able to genotype it with GenotypeGVCFs. Test: - Removed deprecated unit tests for GATKVariantContextUtils. - Moved unit-tests regarding GVCF merging from GATKVariantContextUtilsUnitTest to ReferenceConfidenceVariantContextUtilsUnitTest. - Added unit test for new code for mapping genotype indices between allele index encoding in GenotypeLikelihoodCalculator. - GenotypeGVCFs and CombineGVCFs original integration test are unaffected by the change. - Added tetraploid run integration tests to check on non-diploid execution of GenotypeGVCFs and CombineGVCFs.	2014-09-02 15:06:47 -04:00
Khalid Shakir	fcb0eca203	Now passing in the path to the GATK directory to tests. Changed tests and scripts to use gatkdir full path instead of relative testdata/qscripts symbolic links. Although symlinks not created, left the symlink deletion script execution with a comment about future removal. Re-enabled example UG pipeline queue test. Replaced all hardcoded strings of {public,private}/testdata with BaseTest variables. Refactored temp list creation method from ListFileUtilsUnitTest to BaseTest.createTempListFile. Removed list files with hardcoded paths, now using createTempListFile instead with private test dir variable.	2014-09-02 01:40:59 +08:00
Khalid Shakir	2d28972c88	The 'after' files are @Input files and commited in git, so don't delete them after tests.	2014-08-30 03:04:54 +08:00
Eric Banks	5b087c9897	Changed the functionality of the physical phasing in the HC: now hom vars are output as 0\|1. We do this for technical reasons, mostly because we don't genotype in the HC anymore; it's all done downstream by GenotypeGVCFs so we can't be sure that the genotype will be hom var. Also, there are steps in the downstream pipeline where genotypes can change, so assuming anything in the HC is a bad idea, and if we have phasing info in the het state, we want to propagate that forward. Now, PGT tag fixing happens downstream in GenotypeGVCFs. While I was in there I also cleaned up the code a bit and fixed a bug where annotation was happening before genotype creation when using the --includeNonVariantSites argument. Added tests accordingly.	2014-08-25 21:40:14 -04:00
Valentin Ruano-Rubio	6dc5cf0be0	Fixes some missmerged md5 updates from a previous merge into master	2014-08-24 20:47:07 -04:00
Eric Banks	9009c1e996	Merge pull request #715 from broadinstitute/vrr_disable_physical_phasing_for_nondiploid_hc Disable physical phasing for non-diploid HC calling.	2014-08-23 20:58:51 -04:00
Valentin Ruano-Rubio	6695aeafd9	Disable physical phasing for non-diploid HC calling. Story: https://www.pivotaltracker.com/story/show/77452256 Changes: If ploidy != 2, disable physical phasing and log an info message to let the user know. Tests: Change md5s affected by this change.	2014-08-23 10:52:07 -04:00
Phillip Dexheimer	931890915f	Add the --sample_name argument to HaplotypeCaller * This is a shortcut for people who have multi-sample BAMs but would like to use GVCF mode. Rather than creating single-sample BAMs with PrintReads, one could use the --sample_name argument to HaplotypeCaller to specify the single sample to make calls on * Completes PT 73075482	2014-08-22 23:22:03 -04:00
Valentin Ruano-Rubio	fc5ce4b662	Created the stand-alone AC and AF annotation AlleleCountBySample Story: https://www.pivotaltracker.com/story/show/77250524 Changes: - Remove the annotating code in GeneralPloidyExactAFCalc (GPEAFC) class. - Added the asAlleleList to GenotypeAlleleCounts class and get (GPEAFC) to use that instead of implementing its own (nicer and more reusable code). - Removed the explicit addition of AlleleCountBySample fields to the VCF header by the walker initialize - Added utility methods in Utils to wrap and int[] array into a List<Integer>, and double[] array into a List<Double> efficiently. Test: - Added unit-testing for asAlleleList in GenotypeAlleleCountsUnitTest (within testFirst and testNext). - Added unit-testing for new methods in Utils : asList(int[]) and asList(double[]) - Changed UG General Ploidy test to add explicitly those annotations. - Non-trivial changes in integration tests involving non-diploid runs (namelly haploid and tetraploid) as they are not showing those annotations anylonger, so the MD5s have been changed accordingly.	2014-08-22 20:33:25 -04:00
Eric Banks	36bdfa3918	Merge pull request #712 from broadinstitute/eb_physical_phasing_bug_PT77248992 Fixing bug in the physical phasing code, found by Valentin.	2014-08-21 15:25:51 -04:00
Eric Banks	b1cb6196be	Fixing bug in the physical phasing code, found by Valentin. It turns out that there can be some really complex situations even with a single sample where there are lots of unphasable hets around a hom. Previously we were trying to phase each of the hets against the hom, but that wasn't correct. Instead we now detect that situation and don't attempt to phase anything. Added a unit test to cover this situation.	2014-08-21 15:24:09 -04:00
Laura Gauthier	9a5da41dd4	Add bells and whistles for Genotype Refinement Pipeline New annotation for low= and high-confidence de novos (only annotates biallelics) FamilyLikelihoodsUtils now add joint likelihood and joint posterior annotations Restrict population priors based on discovered allele count to be valid for 10 or more samples.	2014-08-21 11:20:40 -04:00
Valentin Ruano-Rubio	d31c5536aa	Fixed the bug first by indicating the actual possible number of alternatives alleles considering the extra <NON_REF> and second by resizing the StateTracker capacity when invoked by GeneralPloidyExactAFCalc deep within its implementation of computeLog10PNonRef which is ultimatelly what get rids of the exception. Story: https://www.pivotaltracker.com/story/show/74471252	2014-08-20 14:42:42 -04:00
Laura Gauthier	b512c7eac9	Refactor StrandBiasTest (using template method) and add warnings for when annotations may not be calculated successfully. VariantAnnotator/FS behavior changes slightly: VA used to output zeros for FS if there was no strand bias info, now skips FS output (but will still show FS in header)	2014-08-20 08:18:53 -04:00
Valentin Ruano-Rubio	8d9a55ae60	Moving new omniploidy likelihood calculation classes to their final package (as far as this pull-request is concerned) in org.broadinstitute.gatk.tools.walkers.genotyper	2014-08-19 11:54:29 -04:00
Valentin Ruano-Rubio	611b7f25ea	Adds unit-test and integration test for new omniploidy likelihood calculation components Added md5 to HaplotypeCallerIntegrationTest.testHaplotypeCallerSingleSampleWithDbsnp	2014-08-19 11:53:19 -04:00
Valentin Ruano-Rubio	9ee9da36bb	Generalize the calculation of the genotype likelihoods in HC to cope with haploid and multiploidy Changes in several walker to use new sample, allele closed lists and new GenotypingEngine constructors signatures Rebase adoption of new calculation system in walkers	2014-08-19 11:53:06 -04:00
Valentin Ruano-Rubio	f08dcbc160	Added the genotype likelihoods model interface and implementation for the random speciment sample from an infinite population with homogeneous ploidy accross samples.	2014-08-19 11:50:13 -04:00
Valentin Ruano-Rubio	4f993e8dbe	Added read-likelihoods array base structure to substitute existing Map-of-Map-of-Maps.	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	242cd0e58f	Added genotype allele counts and likelihood calculator utilities for arbitrary ploidy and number of alleles	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	b0a4cb9f0c	Added close sample and allele list data-structures and utility classes	2014-08-19 11:50:12 -04:00
Eric Banks	d3f06024f8	Updated the physical phasing in the Haplotype Caller to address requests from ATGU. 1. It is now turned on by default 2. It now phases homozygous variants 3. Most importantly, it also phases variants that are always on opposite haplotypes Changed the INFO keys to be PID and PGT, as described in the header.	2014-08-18 14:38:29 -04:00
Eric Banks	7e0c326e1c	Merge pull request #706 from broadinstitute/vrr_reduce_hc_integration_test_time Reduce intervals of integration tests in HaplotypeCallerIntegrationTest ...	2014-08-15 17:37:57 -04:00
Valentin Ruano-Rubio	2f79042dee	Reduce intervals of integration tests in HaplotypeCallerIntegrationTest class Story: https://www.pivotaltracker.com/story/show/74858854 Changes: Intervals have been shrunk so that the test run in 15s or less.	2014-08-15 14:20:10 -04:00
Eric Banks	eb84091702	Update the --keepOriginalAC functionality in SelectVariants to work for sites that lose alleles in the selection.	2014-08-14 15:34:09 -04:00
Ryan Poplin	3a9a78c785	Removing an assumption that ADs were in the same order if the number of alleles matched. This happens for example when one sample is C->T and another sample is C->G.	2014-08-13 13:26:40 -04:00
Eric Banks	27193c5048	Merge pull request #700 from broadinstitute/eb_phase_HC_variants_PT74816060 Initial implementation of functionality to add physical phasing informat...	2014-08-13 12:30:32 -04:00
Eric Banks	4512940e87	Initial implementation of functionality to add physical phasing information to the output of the HaplotypeCaller. If any pair of variants occurs on all used haplotypes together, then we propagate that information into the gVCF. Can be enabled with the --tryPhysicalPhasing argument.	2014-08-13 12:25:31 -04:00
Valentin Ruano-Rubio	b39508cd15	ReadLikelihoods class introduction final changes before merging Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM. Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set. Updated some integration test md5s.	2014-08-11 17:47:25 -04:00
Valentin Ruano-Rubio	9a9a68409e	ReadLikelihoods class introduction final changes before merging Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM. Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set. Updated some integration test md5s. Fixing GraphBased bugs with new master code Fixed ReadLikelihoods.changeReads difficult to spot bug. Changed PairHMM interface to fix a bug Fixed missing changes for various PairHMM implementations to get them to use the new structure. Fixed various bugs only detectable when running with full sample(s). Believe to have fixed the lack of annotations in UG runs Fixed integrationt test MD5s Updating some md5s Fixed yet another md5 probably left out by mistake	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	0b472f6bff	Added new test to verify the functionality of ReadLikelihoods.java and its use in HC. Updated existing integration test md5s. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	2914ecb585	Change the Map-of-maps-of-maps for an array based implementation ReadLikelihoods to hold read likelihoods. The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably. Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: - ReadLikelihoods added to substitute Map<String,PerSampleReadLikelihoods> - Operation that involve changes in full sets of ReadLikelihoods have been moved into that class. - Simplified a bit the code that handles the downsampling of reads based on contamination Caveats: - Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.	2014-08-11 17:46:28 -04:00
Ryan Poplin	c56e493f98	Merge pull request #622 from broadinstitute/ldg_SORanalysis Add StrandOddsRatio to default annotations produced by GenotypeGVCFs	2014-08-11 09:45:27 -04:00
Tim Fennell	5695f22da8	Changed the default GVCF Q Bands from 5,20,60 to be 1..60 by 1s, 60...90 by 10s and 99 in order to give finer resolution for homref PLs and ADs at lower confidences and somewhat higher resolution at higher confidences.	2014-08-08 14:31:35 -04:00
Laura Gauthier	35de598e4b	Modify StrandOddsRatio calculation to take on lower values in cases where reference +/- reads are skewed but alt reads are not. Add SOR to default annotations produced by GenotypeGVCFs. Add jitter to minimum SOR values	2014-08-07 12:09:19 -04:00
Laura Gauthier	f532f1f843	Fix nullPointerException	2014-08-07 10:13:17 -04:00
Laura Gauthier	74affcc077	Update inbreeding coefficient calculation to give a better estimate for multialleleic sites Add unit test for compound het and for multiallelic hets	2014-08-07 08:12:47 -04:00
Eric Banks	b9486f5b4d	Merge pull request #693 from broadinstitute/ldg_SORfromHC Allow SOR to be calculated from HC	2014-08-06 21:48:09 -04:00
Phillip Dexheimer	593663d9b6	Improved detection of missing argument values In particular, it was possible to specify arguments for Files or Compound types without values Added a special "none" value for annotations, since a bare "-A" is no longer allowed Delivers PT 71792842 and 59360374	2014-08-05 20:31:31 -04:00
Laura Gauthier	5533199402	Allow SOR to be calculated from HC Refactor StrandBiasTest classes	2014-08-01 20:47:58 -04:00
Ryan Poplin	63b3f7dfd3	Fixing typos in AnalyzeCovariates	2014-07-31 10:36:18 -04:00
Valentin Ruano-Rubio	750eb4b5a6	Add diploid only support message to HaplotypeCaller Story: https://www.pivotaltracker.com/story/show/73440292 Changes: - Just add the conditional in HaplotypeCaller#initialize Testing: - Nothing added, checked locally, trivial change that would eventually be removed anyway.	2014-07-29 17:05:36 -04:00
David Roazen	0798a4b768	Update pom versions to mark the start of GATK 3.3 development	2014-07-17 12:09:33 -04:00
David Roazen	323f22f852	Update pom versions for the 3.2 release	2014-07-17 12:06:22 -04:00
Eric Banks	98d88eb07e	Fixed IndexOutOfBounds error associated with tail merging. Don't expand out source nodes for tail merging, since that's a head merging action only. This shows up as a bug only because we now allow merging tails against non-reference paths.	2014-07-17 12:04:22 -04:00
Geraldine Van der Auwera	a6f632874b	Various documentation improvements - Edited intervals merging docs for correctness & clarity - Edited VQSR arg docs and made mode required (+added -mode SNP to VQSR tests) - Moved PaperGenotyper to Toy Walkers to declutter the actually useful docs - Moved GenotypeGVCFs to Variant Discovery category and clarified a few points - Clarified that the -resource argument depends on using the -V:tag format - Clarified how the pcr indel model works - Added caveat for -U ALLOW_N_CIGAR_READS - Added MathJax support for displaying equations in GATKDocs - Updated HC example commands and caveats	2014-07-14 12:03:03 -04:00
droazen	db53d096c9	Merge pull request #684 from broadinstitute/ks_add_cofoja_to_gatk_packages Added cofoja to the gatk packages for tests to pass.	2014-07-14 11:15:49 -04:00
Eric Banks	ecefcb383d	Disable the complex variant merging for now, as requested by ATGU	2014-07-11 17:27:40 -04:00
Khalid Shakir	c7e357eb59	Added cofoja to the gatk packages for tests to pass.	2014-07-11 23:19:42 +08:00
droazen	b8751ad598	Merge pull request #680 from broadinstitute/ldg_VQSRscript Update VQSR Rnd BQSR script generation code for compatibility with late...	2014-07-11 10:16:37 -04:00
Eric Banks	1d97b4a191	Improved tail merging: now tails can be merged to branches that are not entirely reference. This is useful for e.g. cases where there are SNPs on insertions. Before tails were forced to be merged (incorrectly) only to a reference node, but now they can be merged to any path in the graph from which they directly branch. Also, I've transferred over Ryan's code to refuse to process kmer sizes such that there are non-unique kmers in the reference sequence with them.	2014-07-10 08:57:01 -04:00
Ryan Poplin	5eee065133	Merge pull request #674 from broadinstitute/rp_improve_genotyping Improvements to genotyping accuracy.	2014-07-09 16:03:09 -04:00
Laura Gauthier	99026eb51b	Update VQSR Rnd BQSR script generation code for compatibility with latest ggplot version. Update queueJobReport.R and public/gsalib/src/R/R/gsa.variantqc.utils.R also	2014-07-09 15:36:58 -04:00
Ryan Poplin	74a7674d70	Improvements to genotyping accuracy. -- Global mismapping penalty was only applied to the reference haplotype. This led to problems with overlapping events, mostly STR haplotypes. Now the penalty is applied to every haplotype. -- We subset the reads down to only those which overlap the event (after assembly based realignment) for likelihood calculations.	2014-07-09 13:11:07 -04:00
David Roazen	719e685759	Remove junit imports in the test suite	2014-07-09 12:09:27 -04:00
Eric Banks	bad7865078	When converting a haplotype to a set of variants we now check for cases that are overly complex. In these cases, where the alignment contains multiple indels, we output a single complex variant instead of the multiple partial indels. We also re-enable dangling tail recovery by default.	2014-07-01 14:18:59 -04:00
Ryan Poplin	e14bff212d	SB tables should be created even if the ref or alt columns have no counts. This is so that FS/SOR will still be calculated when the variant is extremely high or low frequency. -- Removed long running HC integration test... sorry	2014-06-30 15:19:15 -04:00
Ryan Poplin	0127799cba	Reads are now realigned to the most likely haplotype before being used by the annotations. -- AD,DP will now correspond directly to the reads that were used to construct the PLs -- RankSumTests, etc. will use the bases from the realigned reads instead of the original alignments -- There is now no additional runtime cost to realign the reads when using bamout or GVCF mode -- bamout mode no longer sets the mapping quality to zero for uninformative reads, instead the read will not be given an HC tag	2014-06-30 10:35:50 -04:00
Phillip Dexheimer	06d619e9aa	Removed redundant SelectVariantsIntegrationTest, merged it's only test into protected version	2014-06-24 18:59:59 -04:00
Eric Banks	2df2a153e6	Merge pull request #658 from broadinstitute/ldg_PbyTwithPriors Updated CalculateGenotypePosteriors to compute genotype posteriors using...	2014-06-18 15:04:39 -04:00
Laura Gauthier	2356d5d63f	Updated CalculateGenotypePosteriors to compute genotype posteriors using likelihoods from all members of the trio. (Right now it only works if all members of the trio are called.) Takes posteriors as input, defaulting to PLs Added annotations for possible de novos for us in full genotype refinement pipeline Added family priors to CGP integration test. Changed CGP to use PP tag instead of GP tag because posteriors are Phred-scaled. Updated CGP integration test md5s to reflect change.	2014-06-18 11:17:15 -04:00
Phillip Dexheimer	2e78815055	Added missing arguments to GenotypeGVCFs - New arguments are nda, hets, indelHeterozygosity, stand_call_conf, stand_emit_conf, ploidy, and maxAltAlleles - Addresses PT 70110918 - To do this, moved those arguments out of the StandardCallerArgumentCollection into a new GenotypeCalculationArgumentCollection, which is now included as a member of SCAC	2014-06-16 08:10:54 -04:00

... 3 4 5 6 7 ...

1556 Commits (4fe4ace232c6f01a9d1066ed4e0219d83b446bf3)