gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Guillermo del Angel	20d3137928	Fix for indel calling with UG in presence of reduced reads: When a read is long enough so that there's no reference context available, the reads gets clipped so that it falls again within the reference context range. However, the clipping is incorrect, as it makes the read end precisely at the end of the reference context coordinates. This might lead to a case where a read might span beyond the haplotype if one of the candidate haplotypes is shorter than the reference context (As in the case e.g. with deletions). In this case, the HMM will not work properly and the likelihood will be bad, since "insertions" at end of reads when haplotype is done will be penalized and likelihood will be much lower than it should. -- Added check to see if read spans beyond reference window MINUS padding and event length. This guarantees that read will always be contained in haplotype. -- Changed md5's that happen when long reads from old 454 data have their likelihoods changed because of the extra base clipping.	2013-04-29 19:33:02 -04:00
Eric Banks	c5701a9ade	Merge pull request #199 from broadinstitute/md_clipped_reduced_reads Bugfix for ReadClipper with ReducedReads	2013-04-29 09:14:43 -07:00
Mark DePristo	0387ea8df9	Bugfix for ReadClipper with ReducedReads -- The previous version of the read clipping operations wouldn't modify the reduced reads counts, so hardClipToRegion would result in a read with, say, 50 bp of sequence and base qualities but 250 bp of reduced read counts. Updated the hardClip operation to handle reduce reads, and added a unit test to make sure this works properly. Also had to update GATKSAMRecord.emptyRead() to set the reduced count to new byte[0] if the template read is a reduced read -- Update md5s, where the new code recovers a TP variant with count 2 that was missed previously	2013-04-29 11:12:09 -04:00
Mark DePristo	5dd73ba2d1	Merge pull request #198 from broadinstitute/mc_reduce_reads_ds_doc Updates GATKDocs for ReduceReads downsampling	2013-04-27 05:49:47 -07:00
delangel	651e1f23b1	Merge pull request #194 from broadinstitute/gda_ancient_dna_newPipeline Add feature to specify Allele frequency priors by command line when call...	2013-04-27 04:59:09 -07:00
Mauricio Carneiro	76e997895e	Updates GATKDocs for ReduceReads downsampling [fixes #48258295]	2013-04-26 23:33:44 -04:00
Guillermo del Angel	4168aaf280	Add feature to specify Allele frequency priors by command line when calling variants. Use case: The default AF priors used (infinite sites model, neutral variation) is appropriate in the case where the reference allele is ancestral, and the called allele is a derived allele. Most of the times this is true but in several population studies and in ancient DNA analyses this might introduce reference biases, and in some other cases it's hard to ascertain what the ancestral allele is (normally requiring to look up homologous chimp sequence). Specifying no prior is one solution, but this may introduce a lot of artifactual het calls in shallower coverage regions. With this option, users can specify what the prior for each AC should be according to their needs, subject to the restrictions documented in the code and in GATK docs. -- Updated ancient DNA single sample calling script with filtering options and other cleanups. -- Added integration test. Removed old -noPrior syntax.	2013-04-26 19:06:39 -04:00
Mark DePristo	759c531d1b	Merge pull request #197 from broadinstitute/dr_disable_snpeff_version_check Add support for snpEff "GATK compatibility mode" (-o gatk)	2013-04-26 13:55:14 -07:00
David Roazen	7d90bbab08	Add support for snpEff "GATK compatibility mode" (-o gatk) -Do not throw an exception when parsing snpEff output files generated by not-officially-supported versions of snpEff, PROVIDED that snpEff was run with -o gatk -Requested by the snpEff author -Relevant integration tests updated/expanded	2013-04-26 15:47:15 -04:00
Mark DePristo	ec8fb9860a	Merge pull request #196 from broadinstitute/rp_cgl_allele_matching In CGL ensure that the alleles match exactly between the comp track and ...	2013-04-26 12:38:59 -07:00
Ryan Poplin	93fc48739a	In CGL ensure that the alleles match exactly between the comp track and the external likelihoods track. -- Mostly important for indels. -- Added integration tests to cover this and the new skipFiltered argument.	2013-04-26 15:01:04 -04:00
Mark DePristo	071fd67d55	Merge pull request #193 from broadinstitute/eb_contamination_fixing_for_reduced_reads Eb contamination fixing for reduced reads	2013-04-26 09:48:45 -07:00
Mark DePristo	92a6c7b561	Merge pull request #195 from broadinstitute/eb_exclude_sample_file_bug_in_select_variants Fixed bug reported on the forum where using the --exclude_sample_file ar...	2013-04-26 09:47:38 -07:00
Eric Banks	360e2ba87e	Fixed bug reported on the forum where using the --exclude_sample_file argument in SV was giving bad results. Added integration test. https://www.pivotaltracker.com/s/projects/793457/stories/47399245	2013-04-26 12:23:11 -04:00
Eric Banks	021adf4220	WTF - I thought we had disabled the randomized dithering of rank sum tests for integration tests?! Well, it wasn't done so I went ahead and did so. Lots of MD5 changes accordingly.	2013-04-26 11:24:05 -04:00
Eric Banks	ba2c3b57ed	Extended the allele-biased down-sampling functionality to handle reduced reads. Note that this works only in the case of pileups (i.e. coming from UG); allele-biased down-sampling for RR just cannot work for haplotypes. Added lots of unit tests for new functionality.	2013-04-26 11:23:17 -04:00
droazen	b749f06ba6	Merge pull request #192 from broadinstitute/dr_rev_picard_for_2.5_release Rev picard, sam-jdk, tribble, and variant jars to version 1.90.1442	2013-04-25 17:59:27 -07:00
David Roazen	7cb1247164	Rev picard, sam-jdk, tribble, and variant jars to version 1.90.1442 -This is mainly to get the new "0-length cigar element" check in the sam-jdk	2013-04-25 14:05:24 -04:00
Mark DePristo	528c3d083a	Merge pull request #191 from broadinstitute/dr_fix_rod_system_locking Detect stuck lock-acquisition calls, and disable file locking for tests	2013-04-25 09:32:54 -07:00
Ryan Poplin	3c7db87527	Merge pull request #189 from broadinstitute/md_hc_clip_before_merging Bugfix for FragmentUtils.mergeOverlappingPairedFragments	2013-04-25 08:42:03 -07:00
Mark DePristo	d20be41fee	Bugfix for FragmentUtils.mergeOverlappingPairedFragments -- The previous version was unclipping soft clipped bases, and these were sometimes adaptor sequences. If the two reads successfully merged, we'd lose all of the information necessary to remove the adaptor, producing a very high quality read that matched reference. Updated the code to first clip the adapter sequences from the incoming fragments -- Update MD5s	2013-04-25 11:11:15 -04:00
David Roazen	4d56142163	Detect stuck lock-acquisition calls, and disable file locking for tests -Acquire file locks in a background thread with a timeout of 30 seconds, and throw a UserException if a lock acquisition call times out * should solve the locking issue for most people provided they RETRY failed farm jobs * since we use NON-BLOCKING lock acquisition calls, any call that takes longer than a second or two indicates a problem with the underlying OS file lock support * use daemon threads so that stuck lock acquisition tasks don't prevent the JVM from exiting -Disable both auto-index creation and file locking for integration tests via a hidden GATK argument --disable_auto_index_creation_and_locking_when_reading_rods * argument not safe for general use, since it allows reading from an index file without first acquiring a lock * this is fine for the test suite, since all index files already exist for test files (or if they don't, they should!) -Added missing indices for files in private/testdata -Had to delete most of RMDTrackBuilderUnitTest, since it mostly tested auto-index creation, which we can't test with locking disabled, but I replaced the deleted tests with some tests of my own. -Unit test for FSLockWithShared to test the timeout feature	2013-04-24 22:49:02 -04:00
Mark DePristo	43f1746eb9	Merge pull request #190 from broadinstitute/mc_processing_pipeline TechDev version of the Data Processing Pipeline	2013-04-24 17:04:46 -07:00
Mauricio Carneiro	95ac9b6a33	TechDev version of the Data Processing Pipeline For those times when you need to re-process using the latest and greatest aligners out there...	2013-04-24 19:51:34 -04:00
Mark DePristo	55ead98d3d	Merge pull request #188 from broadinstitute/gda_cgl_fix Small fixes for CalibrateGenotypeLikelihoods.	2013-04-24 16:17:29 -07:00
Guillermo del Angel	3d49f524ee	Small fixes for CalibrateGenotypeLikelihoods. -- If we are using an external vcf, do not consider filtered out records when argument -ignoreFiltered is set. -- Fix for R script: it uses ddply but some default R installations don't include plyr library by default.	2013-04-24 19:10:59 -04:00
MauricioCarneiro	27bb699e8b	Merge pull request #181 from broadinstitute/eb_yet_more_rr_improvements_GSA-930 Various bug fixes for recent Reduce Reads additions plus solution implemented for low MQ reads.	2013-04-24 15:40:03 -07:00
Eric Banks	379a9841ce	Various bug fixes for recent Reduce Reads additions plus solution implemented for low MQ reads. 1. Using cumulative binomial probability was not working at high coverage sites (because p-values quickly got out of hand) so instead we use a hybrid system for determining significance: at low coverage sites use binomial prob and at high coverage sites revert to using the old base proportions. Then we get the best of both worlds. As a note, coverage refers to just the individual base counts and not the entire pileup. 2. Reads were getting lost because of the comparator being used in the SlidingWindow. When read pairs had the same alignment end position the 2nd one encountered would get dropped (but added to the header!). We now use a PriorityQueue instead of a TreeSet to allow for such cases. 3. Each consensus keeps track of its own number of softclipped bases. There was no reason that that number should be shared between them. 4. We output consensus filtered (i.e. low MQ) reads whenever they are present for now. Don't lose that information. Maybe we'll decide to change this in the future, but for now we are conservative. 5. Also implemented various small performance optimizations based on profiling. Added unit tests to cover these changes; systematic assessment now tests against low MQ reads too.	2013-04-24 18:18:50 -04:00
MauricioCarneiro	45fec382e7	Merge pull request #180 from broadinstitute/mc_diagnosetargets_missing_targets DiagnoseTargets Global Refactor	2013-04-24 14:54:55 -07:00
Eric Banks	3f52f55c55	Merge pull request #186 from broadinstitute/md_libs_canonical_cigar Performance optimizations and caliper benchmarks code for consolidateCigar	2013-04-24 12:58:32 -07:00
Mauricio Carneiro	367f0c0ac1	Split class names into stratification and metrics Calling everything statistics was very confusing. Diagnose Targets stratifies the data three ways: Interval, Sample and Locus. Each stratification then has it's own set of metrics (plugin system) to calculate -- LocusMetric, SampleMetric, IntervalMetric. Metrics are generalized by the Metric interface. (for generic access) Stratifications are generalized by the AbstractStratification abstract class. (to aggressively limit code duplication)	2013-04-24 14:15:49 -04:00
Mark DePristo	91d5674cc5	Merge pull request #187 from broadinstitute/rp_new_bundle_for_release Adding the 1000G_phase1.snps.high_confidence callset to the GATK resourc...	2013-04-24 08:44:38 -07:00
Ryan Poplin	80131ac996	Adding the 1000G_phase1.snps.high_confidence callset to the GATK resource bundle for use in the April 2013 updated best practices.	2013-04-24 11:41:32 -04:00
Mark DePristo	df90597bfc	Performance optimizations and caliper benchmarking code for consolidateCigar -- Now that this function is used in the core of LIBS it needed some basic optimizations, which are now complete, pass all unit tests. -- Added caliper benchmark for AlignmentUtils to assess performance (showing new version is 3x-10x faster) -- Remove unused import in ReadStateManager	2013-04-24 11:36:43 -04:00
Mark DePristo	df6ba74395	Merge pull request #185 from broadinstitute/gda_poolcaller_fix_47921867 Corner case fix to General Ploidy SNP likelihood model.	2013-04-24 04:14:05 -07:00
Guillermo del Angel	2ab270cf3f	Corner case fix to General Ploidy SNP likelihood model. -- In case there are no informative bases in a pileup but pileup isn't empty (like when all bases have Q < min base quality) the GLs were still computed (but were all zeros) and fed to the exact model. Now, mimic case of diploid Gl computation where GLs are only added if # good bases > 0 -- I believe general case where only non-informative GLs are fed into AF calc model is broken and yields bogus QUAL, will investigate separately.	2013-04-23 21:13:18 -04:00
Mauricio Carneiro	8f8f339e4b	Abstract class for the statistics Addressing the code duplication issue raised by Mark.	2013-04-23 18:02:27 -04:00
Ryan Poplin	e83d9bef59	Merge pull request #182 from broadinstitute/md_hc_vqsr_best_practices Updates to GeneralCallingPipeline	2013-04-23 12:49:10 -07:00
Mark DePristo	90fc249c8d	Merge pull request #184 from broadinstitute/rp_revert_sw_params After debate reverting SW parameter changes temporarily while we explore...	2013-04-23 12:32:13 -07:00
Mark DePristo	cb2a8f83de	Merge pull request #183 from jsilter/master Add additional necessary class files to na12878kb.jar build target	2013-04-23 11:59:52 -07:00
Jacob Silterra	75184614c6	Add additional necessary class files to na12878kb.jar target	2013-04-23 14:03:48 -04:00
Mauricio Carneiro	38662f1d47	Limiting access to the DT classes * Make most classes final, others package local * Move to diagnostics.diagnosetargets package * Aggregate statistics and walker classes on the same package for simplified visibility. * Make status list a LinkedList instead of a HashSet	2013-04-23 14:01:43 -04:00
Mark DePristo	d5d87c50e6	Updates to GeneralCallingPipeline -- GCP: use 1% bad variants and 1000 min bad variants -- Don't use project consensus for SNP recal -- Update GCP to assess hapmap and omni sensitivity -- Update the Eval command to use the right hapmap and omni comparisons (per sample) -- Update GCP to use current best filtering parameters -- SNPs: QD, FS, DP, ReadPosRankSum, MQRankSum -- indels: FS, DP, ReadPosRankSum, MQRankSum	2013-04-23 13:55:59 -04:00
Ryan Poplin	cb4ec3437a	After debate reverting SW parameter changes temporarily while we explore global SW plans.	2013-04-23 13:32:06 -04:00
Mauricio Carneiro	fdd16dc6f9	DiagnoseTargets refactor A plugin enabled implementation of DiagnoseTargets Summarized Changes: ------------------- * move argument collection into Thresholder object * make thresholder object private member of all statistics classes * rework the logic of the mate pairing thresholds * update unit and integration tests to reflect the new behavior * Implements Locus Statistic plugins * Extend Locus Statistic plugins to determine sample status * Export all common plugin functionality into utility class * Update tests accordingly [fixes #48465557]	2013-04-22 23:53:10 -04:00
Mauricio Carneiro	eb6308a0e4	General DiagnoseTargets documentation cleanup * remove interval statistic low_median_coverage -- it is already captured by low coverage and coverage gaps. * add gatkdocs to all the parameters * clean up the logic on callable status a bit (still need to be re-worked into a plugin system) * update integration tests	2013-04-22 23:53:09 -04:00
Mauricio Carneiro	b3c0abd9e8	Remove REF_N status from DiagnoseTargets This is not really feasible with the current mandate of this walker. We would have to traverse by reference and that would make the runtime much higher, and we are not really interested in the status 99% of the time anyway. There are other walkers that can report this, and just this, status more cheaply. [fixes #48442663]	2013-04-22 23:53:09 -04:00
Mauricio Carneiro	2b923f1568	fix for DiagnoseTargets multiple filter output Problem ------- Diagnose targets is outputting both LOW_MEDIAN_COVERAGE and NO_READS when no reads are covering the interval Solution -------- Only allow low median coverage check if there are reads [fixes #48442675]	2013-04-22 23:53:09 -04:00
Mauricio Carneiro	cf7afc1ad4	Fixed "skipped intervals" bug on DiagnoseTargets Problem ------- Diagnose targets was skipping intervals when they were not covered by any reads. Solution -------- Rework the interval iteration logic to output all intervals as they're skipped over by the traversal, as well as adding a loop on traversal done to finish outputting intervals past the coverage of teh BAM file. Summarized Changes ------------------ * Outputs all intervals it iterates over, even if uncovered * Outputs leftover intervals in the end of the traversal * Updated integration tests [fixes #47813825]	2013-04-22 23:53:09 -04:00
Ryan Poplin	ff430c821e	Merge pull request #178 from broadinstitute/md_common_suffix_bugfix Bugfix for CommonSuffixSplitter	2013-04-22 06:53:01 -07:00

1 2 3 4 5 ...

12295 Commits (20d3137928abb2ee2031e41f69d8e672b8df7c64) All Branches Search

12295 Commits (20d3137928abb2ee2031e41f69d8e672b8df7c64)

All Branches