gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	58c470a6c5	Rev'ing Tribble from 53 to 94 -- Other tribble contributors did major refactoring / simplification of tribble, which required some changes to GATK code -- Integrationtests pass without modification, though some very old index files (callable loci beds) were apparently corrupt and no longer tolerated by the newer tribble codebase	2012-05-03 07:31:47 -04:00
Eric Banks	e448cfcc59	Forgot to update these md5s	2012-05-02 21:09:50 -04:00
Khalid Shakir	b8b7f28aa9	Revving Picard to pick up new SamFileHeaderMerger. Updated ReadFilter abstract class to implement (via UnsupportedOperationException) the new SamRecordFilter.filterOut(). In IndelRealignerIntegrationTest updates for Picard fixes to SAMRecord.getInferredInsertSize() in svn r1115 & r1124. - Ran FixMates to create new input BAM since running IR with variable maxReadsInMemory means all reads weren't realigned leading to different outputs. - Updated md5s to match new expectations after looking at TLEN diff engine output.	2012-05-02 16:47:28 -04:00
Mauricio Carneiro	f51a1d0d61	Better error message to the BAMScheduler In the case where the BAM file was aligned using a reference but analysis is being attempted with a different reference.	2012-05-02 16:10:00 -04:00
Mauricio Carneiro	940029fa5d	Fixing on-the-fly recalibration (caught by Ryan) low quality bases in the tails were being turned to N's in the final read.	2012-05-02 16:06:04 -04:00
Eric Banks	623b36fbc4	Add header lines for AC,AF, and AN tags	2012-05-02 15:33:34 -04:00
Guillermo del Angel	429800a192	Fix corner case rounding issue in MathUtils unit test: 10^logFactorial(4)) was 23.999999... which if cast directly yielded 23 - so, do pre-rounding to ensure correct integer result if caller will cast value.	2012-05-02 09:57:06 -04:00
Guillermo del Angel	76a95fdedf	Full implementation of multiallelic exact model for pools. Still super-linear so not useable at scale but it should be a gold standard to compare to. Unit tests are not exhaustive yet, will be expanded to provide better test coverage. Small inconsequential optimization in MathUtils: we're already caching log10(factorial(n)) for large n, so might as well use the cached values to compute binomial and multinomial coefficients instead of the log-gamma approximation which is more expensive (doesn't seem to save much time either in PoolCaller nor in UG though).	2012-05-02 09:24:28 -04:00
Joel Thibault	4d732fa586	Move all MongoDB files into private/java/src/org/broadinstitute/sting/mongodb	2012-05-01 18:23:51 -04:00
Eric Banks	619a69a5f1	As promised in the release notes for 1.6, I am removing the old deprecated genotyping framework revolving around the misordering of alleles and have moved the fixed version in its place in preparation for release 1.7 (or 2.0?).	2012-05-01 16:18:24 -04:00
Joel Thibault	c255dd5917	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-05-01 16:10:38 -04:00
Ryan Poplin	51af61b5d7	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-05-01 16:07:23 -04:00
Ryan Poplin	fc55dcec3c	Unfortunately the reverse trimming of alleles still doesn't work with mixed records in some corner cases. Turning it off for now.	2012-05-01 16:02:36 -04:00
Ryan Poplin	20a0078f23	Merging active regions across shard boundries if they are contiguous, have the same active status and don't grow too big.	2012-05-01 15:51:36 -04:00
Eric Banks	0f3af9555b	Adding an option to SelectVariants which allows the user to re-genotype through the exact model (if PLs are present) the samples in order to recalculate the QUAL and genotypes. This is really the correct way to select a subset of samples, especially when originally called from low coverage data. Also added integration test to cover this case.	2012-05-01 14:58:06 -04:00
Joel Thibault	aa4d41cce0	Minor cleanup before push	2012-05-01 14:16:44 -04:00
Joel Thibault	b101b9c30b	Add Mongo switch	2012-05-01 14:00:48 -04:00
Joel Thibault	1b609e9075	Move Mongo to server couchdb	2012-05-01 13:59:47 -04:00
Joel Thibault	fd57d27f45	Move MongoDB connection handling to a separate class	2012-05-01 13:59:37 -04:00
Joel Thibault	db3cd1abd5	Use 2 MongoDB collections (tables): one for INFO/attributes, one for samples/genotypes.	2012-05-01 13:57:23 -04:00
Joel Thibault	04e1be9106	Better handling of Mongo errors + exceptions	2012-05-01 13:57:23 -04:00
Joel Thibault	ca737479cf	Query for stop locations because we don't have that information in the reference	2012-05-01 13:57:23 -04:00
Joel Thibault	1cda87a4ad	Set ROD priority list to input	2012-05-01 13:57:23 -04:00
Joel Thibault	a7fe847faf	Set the priority list and don't bother combining if not needed	2012-05-01 13:57:23 -04:00
Joel Thibault	f739305f43	Combine the variants found at a location	2012-05-01 13:57:23 -04:00
Joel Thibault	020f884d5a	Use new key of source ROD plus alleles	2012-05-01 13:57:23 -04:00
Joel Thibault	221ce9c3d6	Add alleles to the primary key	2012-05-01 13:57:23 -04:00
Joel Thibault	3198ce5471	Can have multiple variants at a location	2012-05-01 13:57:22 -04:00
Joel Thibault	11ed8e61c9	Add referenceBaseForIndel to the Mongo VariantContext objects	2012-05-01 13:53:44 -04:00
Joel Thibault	7ed0ee7ed0	Skip locations with no genotypes instead of throwing a NPE	2012-05-01 13:53:44 -04:00
Joel Thibault	4bdfeacdaa	Handle multiple samples/genotypes per location TODO: sample selection	2012-05-01 13:53:43 -04:00
Joel Thibault	1f7c628796	Insert the ROD filename into MongoDB as part of the primary key	2012-05-01 13:53:43 -04:00
Joel Thibault	bb8a6e9b0a	Initial test of write and read from MongoDB	2012-05-01 13:53:43 -04:00
David Roazen	c0084c741b	Pilot BCF2 Implementation: Checkpointing the code * Not working yet, still very much a work-in-progress with lots of placeholders * Needed to check this in to enable possible collaboration, since it's going slower than anticipated and the conference deadline looms.	2012-05-01 12:23:10 -04:00
Eric Banks	0c8e801021	Removing public to private dependency	2012-05-01 11:04:11 -04:00
Eric Banks	e964d17518	Removing public to private dependency	2012-05-01 11:02:28 -04:00
Mauricio Carneiro	462450c3e3	disabling all BQSR unit tests with the changes to the cycle covariate, some tests need updates, others need to be completely re-written.	2012-04-30 14:39:55 -04:00
Guillermo del Angel	e185632013	Exhaustive unit tests for Pool SNP genotype likelihoods: a) Add ability for ErrorModel to be specified by external log-probability vector for testing. b) For a given depth and ploidy(=2*samples/pool), create artificial high quality pileup testing from AC=0 to AC=ploidy, and test that pool GL's have expected content.Misc. refactorings and cleanups c) Misc. cleanups and beautification.	2012-04-30 14:29:46 -04:00
Christopher Hartl	7d029b9a28	Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-30 12:16:30 -04:00
Christopher Hartl	944a7d815e	Bringing VQSRV3 up to date. Lots of new features (un-classifying the worst-performing training sites, treating the x% best/worst sites as postive/negative points, ability to pass in a monomorphic track to see ROC curves output). Minor changes to AlleleBalance: weighted average was incorrectly specified (using logscale actually biased the average towards the AB of low-quality genotypes), and breaking out AB by het, hom, and diploid to bring it in line with some (private) changes to the indel likelihood model that (correctly) computes these values for indels.	2012-04-28 11:31:03 -04:00
Ryan Poplin	54a9bc2da2	Bug fix in reverse trim alleles for the case of mixed records that become non-mixed after subsetting the alleles.	2012-04-28 09:12:26 -04:00
Ryan Poplin	e332aeaf70	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-27 16:21:21 -04:00
Ryan Poplin	2b5dd28550	Bug fix in reverse trim alleles for the case of mixed records.	2012-04-27 16:21:02 -04:00
Mauricio Carneiro	1db2d1ba82	Do not add the first and last 4 cycles to the recalibration tables.	2012-04-27 15:18:07 -04:00
Mauricio Carneiro	08dbd756f3	Quick QC walkers to look at the error profile of indels in the read	2012-04-27 15:18:07 -04:00
Guillermo del Angel	730208133b	Several fixes and improvements to Pool caller with ancillary test functions (not done yet): a) Utility class called Probability Vector that holds a log-probability vector and has the ability to clip ends that deviate largely from max value. b) Used this class to hold site error model, since likelihoods of error model away from peak are so far down that it's not worth computing with them and just wastes time. c) Expand unit tests and add an exhaustive test for ErrorModel class. d) Corrected major math bug in ErrorModel uncovered by exhaustive test: log(e^x) is NOT x if log's base = 10. e) Refactored utility functions that created artificial pileups for testing into separate class ArtificialPileupTestProvider. Right now functionality is limited (one artificial contig of 10 bp), can only specify pileups in one position with a given number of matches and mismatches to ref) but functionality will be expanded in future to cover more test cases. f) Use this utility class for IndelGenotypeLikelihoods unit test and for PoolGenotypeLikelihoods unit test (the latter testing functionality still not done). g) Linearized implementation of biallelic exact model (very simple approach, similar to diploid exact model, just abort if we're past the max value of AC distribution and below a threshold). Still need to add unit tests for this and to expand to multiallelic model. h) Update integration test md5's due to minor differences stemming from linearized exact model and better error model math	2012-04-27 14:41:17 -04:00
Eric Banks	0439047269	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-27 10:49:45 -04:00
Eric Banks	05b44dd017	The genotypeCounts array wasn't always being initialized before it was accessed, leading to a NPE (which got caught and thrown as a JEXL expression when used in selection). Added unit test to cover all genotype count methods.	2012-04-27 10:49:36 -04:00
Khalid Shakir	9801dd114f	Bug fix for: https://getsatisfaction.com/gsa/topics/problem_with_indelrealigner_and_l_unmapped The GATK -L unmapped is for GenomeLocs with SAMRecord.NO_ALIGNMENT_REFERENCE_NAME, not SAMRecord.getReadUnmappedFlag() Previously unmapped flag reads in the last bin were being printed while also seeking for the reads without a reference contig.	2012-04-27 09:58:38 -04:00
Guillermo del Angel	2f86ccb086	Correct md5's for previous code change	2012-04-26 16:20:41 -04:00
Guillermo del Angel	972d6531b6	Corner case fix for indel GL computation: sometimes (depending on surrounding context) reads which are not informative of two candidate haplotypes end up having marginally higher likelihoods with one haplotype as opposed to another, depending on uncertainty on alignments in surrounding regions. So, a sample whose GL is -0.0001,-0.0005,-0.001 may have its genotype set to 1/1 due to this statistical noise. We already have a tolerance comparing max(gl)-min(gl) to avoid genotyping, so this tolerance is now increased from 0.001 to 0.1 (equivalent to 1 PL unit) to avoid genotyping a sample if all PLs are within this threshold. Changed 2 integration test md5s that hit this case.	2012-04-26 10:15:26 -04:00
Laurent Francioli	ab2a952ad1	PED support for Inbreeding Coefficient annotation Signed-off-by: Eric Banks <ebanks@broadinstitute.org>	2012-04-25 12:56:47 -04:00
Laurent Francioli	219b0a128b	PED support for ChromosomeCounts annotation Signed-off-by: Eric Banks <ebanks@broadinstitute.org>	2012-04-25 12:50:04 -04:00
Laurent Francioli	19d5213d5a	Added function to get founders IDs in SampleDB Signed-off-by: Eric Banks <ebanks@broadinstitute.org>	2012-04-25 12:49:36 -04:00
Mauricio Carneiro	902277856e	fix for RBP getPileupsForSamples() do not differentiate per sample pileups from generic pileups. Do the same for both -- it's O(n) either way.	2012-04-24 17:20:30 -04:00
Mauricio Carneiro	82b4798913	CountBasesWalker -- a quick QC walker.	2012-04-24 17:20:30 -04:00
Mauricio Carneiro	e440d0ce69	BQSR triage #4 * fixed queue script plot file names * updated the ReadGroupCovariate to use the platform unit instead of sample + lane. * fixed plotting of marginalized reported qualities	2012-04-24 17:19:54 -04:00
Eric Banks	d6277b70d8	Forgot to consider the optimized case in hasAllele	2012-04-24 11:32:28 -04:00
Eric Banks	91bad244d5	Using a VCF whose ALT is the reference in GGA mode is a User Error	2012-04-24 11:08:37 -04:00
Eric Banks	74ad008163	Adding VariantContext.hasAlternateAllele functionality	2012-04-24 11:07:46 -04:00
Eric Banks	66f3315548	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-24 09:39:55 -04:00
Eric Banks	bcb93dda5f	Fixing docs (rank sum test values are not phred-scaled)	2012-04-24 09:39:42 -04:00
Mauricio Carneiro	e39a59594a	BQSR triage and test routines * updated BQSR queue script for faster turnaround * implemented plot generation for scatter/gatherered runs * adjusted output file names to be cooperative with the queue script * added the recalibration report file to the argument table in the report * added ReadCovariates unit test -- guarantees that all the covariates are being generated for every base in the read * added RecalibrationReport unit test -- guarantees the integrity of the delta tables	2012-04-23 11:23:00 -04:00
Eric Banks	a733723439	Merged bug fix from Stable into Unstable	2012-04-23 10:30:30 -04:00
Eric Banks	2761da975e	Handle null VCs (which can arise when indels are present in the file)	2012-04-23 10:30:00 -04:00
Eric Banks	cd63bcb1b8	Fixing unit tests to register the user exception being thrown (instead of the NumberFormatException)	2012-04-23 10:06:51 -04:00
Eric Banks	63aa79df82	Slightly better error message	2012-04-23 09:37:28 -04:00
Eric Banks	7b5fbf9567	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-23 09:34:08 -04:00
Eric Banks	4edb005411	Catch poorly formatted PL/GL fields	2012-04-23 09:33:50 -04:00
Ryan Poplin	35bb55f562	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-22 13:23:36 -04:00
Ryan Poplin	18e4532d10	Turning down the amount of assembly graph pruning slightly in the case of low coverage.	2012-04-22 13:23:24 -04:00
Eric Banks	1f23d99dfa	If we are subsetting alleles in the UG (either because there were too many or because some were not polymorphic), then we may need to trim the alleles (because the original VariantContext may have had to pad at the end). Thanks to Ryan for reporting this. Only one of the integration tests had even partially covered this case, so I added one that did.	2012-04-20 17:00:05 -04:00
Eric Banks	4b81c75642	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-20 14:30:19 -04:00
Eric Banks	f1c5510ec0	When running SelectVariants with the excludeNonVariants option, remove alleles from the ALT field that are no longer polymorphic.	2012-04-20 14:30:04 -04:00
Ryan Poplin	a1596791af	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-20 14:03:04 -04:00
Ryan Poplin	a57295eb75	Fixing a bug when breaking up active regions where the resulting regions would overlap by one base. Adding quality score manipulation from the UG into the haplotype caller (qual capped by mapping quality, min qual threshold).	2012-04-20 14:02:55 -04:00
Guillermo del Angel	de68363c23	Removed experimental feature (aka hack) that was meant for 1000G consensus but remained in VQSR data manager - QD was being scaled by indel length. There's no evidence any more that QD is length-dependent, neither in CEU trio data nor in latest 1000G P2 calls	2012-04-20 10:58:34 -04:00
Guillermo del Angel	d2488dfb81	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-19 19:40:03 -04:00
Guillermo del Angel	c44c7b9a97	Restored optimization in Pair HMM only to compute HMM matrices starting in index where haplotypes start to diverge - saves about 15-20% of runtime which is what we lost by disabling banding in latest version, so runtime should be now about the same as what it was before refactoring. Output is bit-true to previous commit	2012-04-19 19:39:43 -04:00
Mauricio Carneiro	0f8c77391d	BQSR bug triage #3 * fixed context covariate famous "off by one" error * reduced maximum quality score to Q50 (following Eric/Ryan's suggestion) * remove context downsampling in BQSR R script	2012-04-19 17:31:04 -04:00
Khalid Shakir	df5dd841af	AC strat now checks if evals will be merged before throwing an error on multiple eval files. Minor tweaks to WGP script based on new recal VCF format.	2012-04-19 16:08:55 -04:00
Guillermo del Angel	1ae2ab5b63	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-19 12:50:29 -04:00
Guillermo del Angel	0e6e0cb907	Merging bug fixes	2012-04-19 12:49:30 -04:00
Eric Banks	79272c5e15	Thanks to Menachem for pointing out that the docs for genotyping_mode and output_mode were the same (and unclear). Fixed.	2012-04-19 12:48:09 -04:00
Guillermo del Angel	02ff930f6a	My changes	2012-04-19 12:45:18 -04:00
Eric Banks	2485cef5b8	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-19 11:46:06 -04:00
Eric Banks	76a6e37f4f	Don't output callability metrics by default anymore; one can still have them output to the 'metrics' file (which is now @Hidden because they are really for GSA use). Added a TODO to move UG from @By reference to reads and rods once LIBS is cleaned up.	2012-04-19 11:45:56 -04:00
Ryan Poplin	1ea4e48a27	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-19 11:32:32 -04:00
Ryan Poplin	11001ab9a2	Adding option to HaplotypeCaller to genotype the events on the chosen haplotypes as independent events. The filtered reads are now kept around so they can be passed to the variant annotations. Unfortunately the filtered reads aren't assigned a likelihood yet so they are all thrown in the Allele.NO_CALL bin.	2012-04-19 11:32:10 -04:00
Mauricio Carneiro	eb22cd7222	Unit test to guarantee BQSR sequential calculation accuracy This test brings together the old and the new BQSR, building a recalibration table using the two separate frameworks and performing the recalibration calculation using the two different frameworks for 10,000+ bases and asserting that the calculations match in every case.	2012-04-19 09:33:40 -04:00
Mauricio Carneiro	68d0211fa1	Improved BQSR plotting and some new parameters * Refactored CycleCovariate to be a fragment covariate instead of a per read covariate * Refactored the CycleCovariateUnitTest to test the pairing information * Updated BQSR Integration tests accordingly * Made quantization levels parameter not hidden anymore * Added hidden option to keep intermediate plotting files for debug purposes (they're automatically deleted) * Added hidden option not to generate the plots automatically (important for scatter/gathering)	2012-04-19 09:31:41 -04:00
Guillermo del Angel	143e92b797	Rebasing	2012-04-18 20:05:43 -04:00
Guillermo del Angel	960e7e6aaf	Changes to integration tests	2012-04-18 19:53:42 -04:00
Guillermo del Angel	82efd4457e	Revert some bad merge changes	2012-04-18 16:35:09 -04:00
Guillermo del Angel	31c394d588	Resolve merge conflicts	2012-04-18 16:25:03 -04:00
Ryan Poplin	4999ae87ad	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-18 15:02:42 -04:00
Ryan Poplin	dcc4871468	minor misc optimizations to PairHMM	2012-04-18 15:02:26 -04:00
Eric Banks	d3c84e7b1f	This should be a User Error since it's provided from the DoC command-line arguments	2012-04-18 13:09:23 -04:00
Eric Banks	392f1903f7	Handling some of the NumberFormatExceptions seen via Tableau that are really user errors.	2012-04-18 12:57:37 -04:00
Ryan Poplin	8a84456626	Following Eric's awesome update to change the VQSR recal file into a VCF file, the ApplyRecalibration step is now scatter/gather-able and tree reducible.	2012-04-18 11:24:04 -04:00
Eric Banks	4448a3ea76	Final tweaks. Added an integration test to cover the case of SNPs and indels that start at the same position.	2012-04-17 23:54:10 -04:00
Eric Banks	c1f52b773a	Minor tweaks and updated integration tests MD5s	2012-04-17 23:17:28 -04:00
Eric Banks	6d03bce0d3	Important refactoring of the VQSR recal file format: we now use a VCF instead of a CSV file. The most important reason for this change is that we no longer need to read the entire recal file into memory up front in ApplyRecalibration. For 1000G calling this was prohibitive in terms of memory requirements. Now we go through the rod system and pull in just the records we need at a given position. As an added bonus, once BCF2 is live we can drastically cut down the sizes of these recal files (which can grow large for whole genome calling).	2012-04-17 22:38:18 -04:00
Eric Banks	ea793d8e27	Khalid pressured me into adding an integration test that makes sure we don't fail on reads with adjacent I and D events.	2012-04-17 21:21:29 -04:00
Mauricio Carneiro	46a212d8e9	Added "simplify reads" option to PrintReads.	2012-04-17 19:32:34 -04:00
Mauricio Carneiro	f0c81b59b0	Implementation of the new BQSR plotting infrastructure * removed low quality bases from the recalibration report. * refactored the Datum (Recal and Accuracy) class structure * created a new plotting csv table for optimized performance with the R script * added a datum object that carries the accuracy information (AccuracyDatum) for plotting * added mean reported quality score to all covariates * added QualityScore as a covariate for plotting purposes * added unit test to the key manager to operate with one required covariate and multiple optional covariates * integrated the plotting into BQSR (automatically generates the pdf with the recalibration tearsheet)	2012-04-17 19:23:55 -04:00
Ryan Poplin	952280bef1	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-17 17:00:14 -04:00
Ryan Poplin	cf705f6c62	Adding read position rank sum test to the list of annotations that get produced with the HaplotypeCaller	2012-04-17 17:00:00 -04:00
Eric Banks	13c800417e	Handle NPE in UG indel code: deletions immediately preceding insertions were not handled well in the code.	2012-04-17 15:51:23 -04:00
Guillermo del Angel	c78b0eee3a	Refactoring/fixing up UG HMM code: a) Make code use PairHMM class instead of having duplicated code. That way UG and HaplotypeCaller now use same core code. Changes to be able to do this: 1. Compute context-dependent GOP as a function of read, not of haplotype, b) Extracted code to initialize HMM arrays into separate method, c) Move PairHMM class and unit test to public, d) Reenable banded code in PairHMM, inverted sense of flag (true=enable feature) but leave off in HaplotypeCaller.	2012-04-17 14:22:48 -04:00
Khalid Shakir	91cb654791	AggregateMetrics: - By porting from jython to java now accessible to Queue via automatic extension generation. - Better handling for problematic sample names by using PicardAggregationUtils. GATKReportTable looks up keys using arrays instead of dot-separated strings, which is useful when a sample has a period in the name. CombineVariants has option to suppress the header with the command line, which is now invoked during VCF gathering. Added SelectHeaders walker for filtering headers for dbGAP submission. Generated command line for read filters now correctly prefixes the argument name as --read_filter instead of -read_filter. Latest WholeGenomePipeline. Other minor cleanup to utility methods.	2012-04-17 11:45:32 -04:00
Ryan Poplin	1a2e92f8db	Merged bug fix from Stable into Unstable	2012-04-17 10:23:05 -04:00
Ryan Poplin	adad76b36f	Fixing NPE in VQSR for the case of very small callsets.	2012-04-17 10:20:43 -04:00
Mark DePristo	3f6b2423d8	Update VE IT to reflect new fields and bugfixes	2012-04-13 17:00:37 -04:00
Mark DePristo	f9190b6fcd	VariantEvalUnitTest is better named VariantEvalWalkerUnitTest	2012-04-13 17:00:37 -04:00
Mark DePristo	23ccf772d4	IndelSummary now emits all of the underlying counts for ratios, percentages, etc it computes	2012-04-13 17:00:36 -04:00
Mark DePristo	84d1e8713a	Infrastructure for combining VariantEvaluations -- Not hooked up yet, so the output of VariantEval should be the same as before -- Implemented a VariantEvalUnitTest that tests the low level strat / eval combinatorics and counting routines -- Better docs throughout	2012-04-13 17:00:36 -04:00
Mark DePristo	38986e4240	Documentation for StratificationManager	2012-04-13 17:00:36 -04:00
Mark DePristo	ab06d53867	Useful test constructor or Unit tests in RefMetaDataTracker	2012-04-13 17:00:36 -04:00
Mark DePristo	285e61a227	Bugfix for IndelSummary -- multi allelic count should be % not ratio	2012-04-13 17:00:35 -04:00
Mark DePristo	e6d5cb46d2	Improvements and bugfixes to IndelSummary -- Now properly includes both bi and multi-allelic variants. These are actually counted as well, and emitted as counts and % of sites with multiple alleles -- Bug fix for gold standard rate	2012-04-13 17:00:35 -04:00
Mark DePristo	bfa966a4e9	Bugfix for OneBPIndel -- Previously was only including 1 bp insertions in stratification	2012-04-13 17:00:35 -04:00
Mark DePristo	2aa2d9aec0	Merged bug fix from Stable into Unstable	2012-04-13 09:25:43 -04:00
Mark DePristo	27e7e17dc7	New way to handle exceptions in multi-threaded GATK -- HMS no longer tries to grab and throw all exceptions. Exceptions are just thrown directly now. -- Proper error handling is handled by functions in HMS, which are used by ShardTraverser and TreeReducer -- Better printing of stack traces in WalkerTest	2012-04-13 09:23:33 -04:00
Mark DePristo	e85e9a8cf5	More extensive testing of type of error thrown in multi-threaded walker test -- Unfortunately the result of the multi-threaded test is non-deterministic so run the test 10x times to see if the right expection is always thrown -- Now prints the stack trace and exception message of the caught exception of the wrong type, if this occurs	2012-04-13 09:23:33 -04:00
Eric Banks	297afc7911	Added unit test to ensure that we genotype correctly cases with really large GLs	2012-04-12 15:43:14 -04:00
Eric Banks	818e8c2fb9	Resolving merge conflicts	2012-04-12 15:19:44 -04:00
Eric Banks	0dd571928d	Let's not have the indel model emit more than the max possible number of genotypable alt alleles (since we may not be able to subset down to the best ones).	2012-04-12 15:16:29 -04:00
Eric Banks	f77a6d18b8	Bad conflict merge before	2012-04-12 09:56:49 -04:00
Eric Banks	33a8bdd75f	Resolving merge conflicts	2012-04-12 09:51:55 -04:00
Eric Banks	b659b16b31	Generate User Error for bad POS value	2012-04-12 09:49:35 -04:00
Eric Banks	cc71baf691	Don't allow users to try to genotype more than the max possible value (catch and throw a User Error at startup). Better docs explaining that users shouldn't play with this value unless they know what they are doing.	2012-04-12 09:18:44 -04:00
Eric Banks	5bf9dd2def	A framework to get annotations working in the HaplotypeCaller (and ART walkers in general). Adding support for active-region-based annotation for most standard annotations. I need to discuss with Ryan what to do about tests that require offsets into the reads (since I don't have access to the offsets) like e.g. the ReadPosRankSumTest. IMPORTANT NOTE: this is still very much a dev effort and can only be accessed through private walkers (i.e. the HaplotypeCaller). The interface is in flux and so we are making no attempt at all to make it clean or to merge this with the Locus-Traversal-based annotation system. When we are satisfied that it's working properly and have settled on the proper interface, we will clean it up then.	2012-04-11 16:22:12 -04:00
Guillermo del Angel	f9f8589692	Refactoring/fixing up UG HMM code: a) Make code use PairHMM class instead of having duplicated code. That way UG and HaplotypeCaller now use same core code. Changes to be able to do this: 1. Compute context-dependent GOP as a function of read, not of haplotype, b) Extracted code to initialize HMM arrays into separate method, c) Move PairHMM class and unit test to public, d) Reenable banded code in PairHMM, inverted sense of flag (true=enable feature) but leave off in HaplotypeCaller.	2012-04-11 13:56:51 -04:00
Eric Banks	5b7da3831f	Not sure why this didn't make it into the last push, but here's a working MD5 for the NDA annotation in UG	2012-04-11 13:49:50 -04:00
Eric Banks	7aa654d13f	New interface for some dev work that Ryan and I are doing; only accessible from private walkers right now	2012-04-11 13:49:09 -04:00
Eric Banks	dc90508104	Adding a new annotation to UG calls: NDA = number of discovered (but not necessarily genotyped) alleles for the site. This could help downstream analysis esp. of indels for wonky sites (since we only use the top 2-3 alleles). Not enabled by default but we can change that if this turns out to be useful.	2012-04-11 13:47:10 -04:00
Eric Banks	d2142c3aa7	Adding integration test for Flag Stat	2012-04-10 22:40:38 -04:00
Eric Banks	f560611fe8	Merged bug fix from Stable into Unstable	2012-04-10 22:26:53 -04:00
Eric Banks	f46f7d0590	Fix the stats coming out of FlagStat. I will add an integration test in unstable	2012-04-10 22:26:10 -04:00
Mauricio Carneiro	cd842b650e	Optimizing DiagnoseTargets * Fixed output format to get a valid vcf * Optimzed the per sample pileup routine O(n^2) => O(n) pileup for samples * Added support to overlapping intervals * Removed expand target functionality (for now) * Removed total depth (pointless metric)	2012-04-10 17:43:59 -04:00
Ryan Poplin	1df0adf862	Fixing ActivityProfile unit test.	2012-04-10 15:28:27 -04:00
Ryan Poplin	e3cc7cc59c	Resolving merge conflict.	2012-04-10 14:50:27 -04:00
Ryan Poplin	a4634624b7	There are now three triggering options in the HaplotypeCaller. The default (mismatches, insertions, deletions, high quality soft clips), an external alleles file (from the UG for example), or extended triggers which include low quality soft clips, bad mates and unmapped mates. Added better algorithm for band pass filtering an ActivityProfile and breaking them apart when they get too big. Greatly increased the specificity of the caller by battening down the hatches on things like base quality and mapping quality thresholds for both the assembler and the likelihood function.	2012-04-10 14:48:23 -04:00
Eric Banks	10e74a71eb	We now allow arbitrary annotations other than dbSNP (e.g. HM3) to come out of the Unified Genotyper. This was already set up in the Variant Annotator Engine and was just a matter of hooking UG up to it. Added integration test to ensure correct behavior.	2012-04-10 12:30:35 -04:00
Mark DePristo	b43d21056b	Merged bug fix from Stable into Unstable	2012-04-10 09:42:09 -04:00
Mark DePristo	6885e2d065	UserException fixes for GATK_logs recent errors -- SamFileReader.java:525 -- BlockCompressedInputStream:376 These were both instances were we weren't catching and rethrowing picard exceptions as UserExceptions.	2012-04-10 07:37:42 -04:00
Mark DePristo	8507cd7440	Throw UserException for bad dict / chain files	2012-04-10 07:22:43 -04:00
Ryan Poplin	cd9bf1bfc3	Changing IndelSummary eval module so that PostCallingQC.scala can run with MIXED-record VCFs.	2012-04-10 00:22:40 -04:00
Roger Zurawicki	9ece93ae9c	DiagnoseTargets now outputs a VCF file - refactored the statistics classes - concurrent callable statuses by sample are now available. Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2012-04-09 16:40:20 -04:00

1 2 3 4 5 ...

2126 Commits (c4f7df4dce3b5acdb8b2487d3e87d81ca6cd1e23)