gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mauricio Carneiro	1db2d1ba82	Do not add the first and last 4 cycles to the recalibration tables.	2012-04-27 15:18:07 -04:00
Mauricio Carneiro	08dbd756f3	Quick QC walkers to look at the error profile of indels in the read	2012-04-27 15:18:07 -04:00
Guillermo del Angel	730208133b	Several fixes and improvements to Pool caller with ancillary test functions (not done yet): a) Utility class called Probability Vector that holds a log-probability vector and has the ability to clip ends that deviate largely from max value. b) Used this class to hold site error model, since likelihoods of error model away from peak are so far down that it's not worth computing with them and just wastes time. c) Expand unit tests and add an exhaustive test for ErrorModel class. d) Corrected major math bug in ErrorModel uncovered by exhaustive test: log(e^x) is NOT x if log's base = 10. e) Refactored utility functions that created artificial pileups for testing into separate class ArtificialPileupTestProvider. Right now functionality is limited (one artificial contig of 10 bp), can only specify pileups in one position with a given number of matches and mismatches to ref) but functionality will be expanded in future to cover more test cases. f) Use this utility class for IndelGenotypeLikelihoods unit test and for PoolGenotypeLikelihoods unit test (the latter testing functionality still not done). g) Linearized implementation of biallelic exact model (very simple approach, similar to diploid exact model, just abort if we're past the max value of AC distribution and below a threshold). Still need to add unit tests for this and to expand to multiallelic model. h) Update integration test md5's due to minor differences stemming from linearized exact model and better error model math	2012-04-27 14:41:17 -04:00
Eric Banks	0439047269	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-27 10:49:45 -04:00
Eric Banks	05b44dd017	The genotypeCounts array wasn't always being initialized before it was accessed, leading to a NPE (which got caught and thrown as a JEXL expression when used in selection). Added unit test to cover all genotype count methods.	2012-04-27 10:49:36 -04:00
Khalid Shakir	9801dd114f	Bug fix for: https://getsatisfaction.com/gsa/topics/problem_with_indelrealigner_and_l_unmapped The GATK -L unmapped is for GenomeLocs with SAMRecord.NO_ALIGNMENT_REFERENCE_NAME, not SAMRecord.getReadUnmappedFlag() Previously unmapped flag reads in the last bin were being printed while also seeking for the reads without a reference contig.	2012-04-27 09:58:38 -04:00
Guillermo del Angel	2f86ccb086	Correct md5's for previous code change	2012-04-26 16:20:41 -04:00
Guillermo del Angel	972d6531b6	Corner case fix for indel GL computation: sometimes (depending on surrounding context) reads which are not informative of two candidate haplotypes end up having marginally higher likelihoods with one haplotype as opposed to another, depending on uncertainty on alignments in surrounding regions. So, a sample whose GL is -0.0001,-0.0005,-0.001 may have its genotype set to 1/1 due to this statistical noise. We already have a tolerance comparing max(gl)-min(gl) to avoid genotyping, so this tolerance is now increased from 0.001 to 0.1 (equivalent to 1 PL unit) to avoid genotyping a sample if all PLs are within this threshold. Changed 2 integration test md5s that hit this case.	2012-04-26 10:15:26 -04:00
Laurent Francioli	ab2a952ad1	PED support for Inbreeding Coefficient annotation Signed-off-by: Eric Banks <ebanks@broadinstitute.org>	2012-04-25 12:56:47 -04:00
Laurent Francioli	219b0a128b	PED support for ChromosomeCounts annotation Signed-off-by: Eric Banks <ebanks@broadinstitute.org>	2012-04-25 12:50:04 -04:00
Laurent Francioli	19d5213d5a	Added function to get founders IDs in SampleDB Signed-off-by: Eric Banks <ebanks@broadinstitute.org>	2012-04-25 12:49:36 -04:00
Mauricio Carneiro	902277856e	fix for RBP getPileupsForSamples() do not differentiate per sample pileups from generic pileups. Do the same for both -- it's O(n) either way.	2012-04-24 17:20:30 -04:00
Mauricio Carneiro	82b4798913	CountBasesWalker -- a quick QC walker.	2012-04-24 17:20:30 -04:00
Mauricio Carneiro	e440d0ce69	BQSR triage #4 * fixed queue script plot file names * updated the ReadGroupCovariate to use the platform unit instead of sample + lane. * fixed plotting of marginalized reported qualities	2012-04-24 17:19:54 -04:00
Eric Banks	d6277b70d8	Forgot to consider the optimized case in hasAllele	2012-04-24 11:32:28 -04:00
Eric Banks	91bad244d5	Using a VCF whose ALT is the reference in GGA mode is a User Error	2012-04-24 11:08:37 -04:00
Eric Banks	74ad008163	Adding VariantContext.hasAlternateAllele functionality	2012-04-24 11:07:46 -04:00
Eric Banks	66f3315548	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-24 09:39:55 -04:00
Eric Banks	bcb93dda5f	Fixing docs (rank sum test values are not phred-scaled)	2012-04-24 09:39:42 -04:00
Mauricio Carneiro	e39a59594a	BQSR triage and test routines * updated BQSR queue script for faster turnaround * implemented plot generation for scatter/gatherered runs * adjusted output file names to be cooperative with the queue script * added the recalibration report file to the argument table in the report * added ReadCovariates unit test -- guarantees that all the covariates are being generated for every base in the read * added RecalibrationReport unit test -- guarantees the integrity of the delta tables	2012-04-23 11:23:00 -04:00
Eric Banks	a733723439	Merged bug fix from Stable into Unstable	2012-04-23 10:30:30 -04:00
Eric Banks	2761da975e	Handle null VCs (which can arise when indels are present in the file)	2012-04-23 10:30:00 -04:00
Eric Banks	cd63bcb1b8	Fixing unit tests to register the user exception being thrown (instead of the NumberFormatException)	2012-04-23 10:06:51 -04:00
Eric Banks	63aa79df82	Slightly better error message	2012-04-23 09:37:28 -04:00
Eric Banks	7b5fbf9567	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-23 09:34:08 -04:00
Eric Banks	4edb005411	Catch poorly formatted PL/GL fields	2012-04-23 09:33:50 -04:00
Ryan Poplin	35bb55f562	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-22 13:23:36 -04:00
Ryan Poplin	18e4532d10	Turning down the amount of assembly graph pruning slightly in the case of low coverage.	2012-04-22 13:23:24 -04:00
Eric Banks	1f23d99dfa	If we are subsetting alleles in the UG (either because there were too many or because some were not polymorphic), then we may need to trim the alleles (because the original VariantContext may have had to pad at the end). Thanks to Ryan for reporting this. Only one of the integration tests had even partially covered this case, so I added one that did.	2012-04-20 17:00:05 -04:00
Eric Banks	4b81c75642	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-20 14:30:19 -04:00
Eric Banks	f1c5510ec0	When running SelectVariants with the excludeNonVariants option, remove alleles from the ALT field that are no longer polymorphic.	2012-04-20 14:30:04 -04:00
Ryan Poplin	a1596791af	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-20 14:03:04 -04:00
Ryan Poplin	a57295eb75	Fixing a bug when breaking up active regions where the resulting regions would overlap by one base. Adding quality score manipulation from the UG into the haplotype caller (qual capped by mapping quality, min qual threshold).	2012-04-20 14:02:55 -04:00
Guillermo del Angel	de68363c23	Removed experimental feature (aka hack) that was meant for 1000G consensus but remained in VQSR data manager - QD was being scaled by indel length. There's no evidence any more that QD is length-dependent, neither in CEU trio data nor in latest 1000G P2 calls	2012-04-20 10:58:34 -04:00
Guillermo del Angel	d2488dfb81	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-19 19:40:03 -04:00
Guillermo del Angel	c44c7b9a97	Restored optimization in Pair HMM only to compute HMM matrices starting in index where haplotypes start to diverge - saves about 15-20% of runtime which is what we lost by disabling banding in latest version, so runtime should be now about the same as what it was before refactoring. Output is bit-true to previous commit	2012-04-19 19:39:43 -04:00
Mauricio Carneiro	0f8c77391d	BQSR bug triage #3 * fixed context covariate famous "off by one" error * reduced maximum quality score to Q50 (following Eric/Ryan's suggestion) * remove context downsampling in BQSR R script	2012-04-19 17:31:04 -04:00
Khalid Shakir	df5dd841af	AC strat now checks if evals will be merged before throwing an error on multiple eval files. Minor tweaks to WGP script based on new recal VCF format.	2012-04-19 16:08:55 -04:00
Guillermo del Angel	1ae2ab5b63	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-19 12:50:29 -04:00
Guillermo del Angel	0e6e0cb907	Merging bug fixes	2012-04-19 12:49:30 -04:00
Eric Banks	79272c5e15	Thanks to Menachem for pointing out that the docs for genotyping_mode and output_mode were the same (and unclear). Fixed.	2012-04-19 12:48:09 -04:00
Guillermo del Angel	02ff930f6a	My changes	2012-04-19 12:45:18 -04:00
Eric Banks	2485cef5b8	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-19 11:46:06 -04:00
Eric Banks	76a6e37f4f	Don't output callability metrics by default anymore; one can still have them output to the 'metrics' file (which is now @Hidden because they are really for GSA use). Added a TODO to move UG from @By reference to reads and rods once LIBS is cleaned up.	2012-04-19 11:45:56 -04:00
Ryan Poplin	1ea4e48a27	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-19 11:32:32 -04:00
Ryan Poplin	11001ab9a2	Adding option to HaplotypeCaller to genotype the events on the chosen haplotypes as independent events. The filtered reads are now kept around so they can be passed to the variant annotations. Unfortunately the filtered reads aren't assigned a likelihood yet so they are all thrown in the Allele.NO_CALL bin.	2012-04-19 11:32:10 -04:00
Mauricio Carneiro	eb22cd7222	Unit test to guarantee BQSR sequential calculation accuracy This test brings together the old and the new BQSR, building a recalibration table using the two separate frameworks and performing the recalibration calculation using the two different frameworks for 10,000+ bases and asserting that the calculations match in every case.	2012-04-19 09:33:40 -04:00
Mauricio Carneiro	68d0211fa1	Improved BQSR plotting and some new parameters * Refactored CycleCovariate to be a fragment covariate instead of a per read covariate * Refactored the CycleCovariateUnitTest to test the pairing information * Updated BQSR Integration tests accordingly * Made quantization levels parameter not hidden anymore * Added hidden option to keep intermediate plotting files for debug purposes (they're automatically deleted) * Added hidden option not to generate the plots automatically (important for scatter/gathering)	2012-04-19 09:31:41 -04:00
Guillermo del Angel	143e92b797	Rebasing	2012-04-18 20:05:43 -04:00
Guillermo del Angel	960e7e6aaf	Changes to integration tests	2012-04-18 19:53:42 -04:00

1 2 3 4 5 ...

1983 Commits (1db2d1ba82dc84ec8e5c435ddeb89a6ca7af795f)