gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Laurent Francioli	025bdfe2cc	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-12 12:19:44 +01:00
Mauricio Carneiro	ed91461c49	Data Processing Pipeline Test * Added standard pipeline test for the DPP * Added a full BWA pipeline test for the DPP * Included the extra files for the reference needed by BWA (to be used by DPP and PPP tests)	2011-12-12 00:24:51 -05:00
Mauricio Carneiro	cca8a18608	PPP pipeline test * added a pipeline test to the Pacbio Processing Pipeline. * updated exampleBAM with more complete RG information so we can use it in a wider variety of pipeline tests * added exampleDBSNP.vcf file with only chromosome 1 in the range of the exampleFASTA.fasta reference for pipeline tests	2011-12-11 17:32:21 -05:00
Eric Banks	7b6338c742	Merge branch 'master' into trialleles	2011-12-11 00:28:46 -05:00
Eric Banks	7c4b9338ad	The old bi-allelic implementation of the Exact model has been completely deprecated - you can only use the multi-allelic implementation now.	2011-12-11 00:23:33 -05:00
Eric Banks	044f211a30	Don't collapse likelihoods over all alt alleles - that's just not right. For now, the QUAL is calculated for just the most likely of the alt alleles; I need to think about the right way to handle this properly.	2011-12-10 23:57:14 -05:00
Mauricio Carneiro	21ac3b59d7	Merged bug fix from Stable into Unstable	2011-12-09 16:51:46 -05:00
Mauricio Carneiro	13905c00b3	Updating PacbioProcessingPipeline to new Queue standards	2011-12-09 16:51:02 -05:00
Eric Banks	364f1a030b	Plumbing added so that the UG engine can handle multiple alleles and they can successfully be genotyped. Alleles that aren't likely are not allowed to be used when assigning genotypes, but otherwise the greedy PL-based approach is what is used. Moved assign genotypes code to UG engine since it has nothing to do with the Exact model. Still have some TODOs in here before I can push this out to everyone.	2011-12-09 14:25:28 -05:00
Mauricio Carneiro	8475328b2c	Turning off test that breaks read clipper until we define what is the desired behavior for clipping this particular case.	2011-12-09 11:53:12 -05:00
Roger Zurawicki	4cbd1f0dec	Reorganized the testing code and created ClipReadsTestUtils Tests are more rigorous and includes many more test cases. We can tests custom cigars and the generated cigars. *Still needs debugging because code is not working. Created test classes to be used across several tests. Some cases are still commented out. Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-09 11:52:34 -05:00
Roger Zurawicki	0e9c2cefa2	testHardClipSoftClippedBases works with Matches and Deletions Insertions are a problem so cigar cases with "I" are commented out. The test works with multiple deletions and matches. This is still not a complete test. A lot of cigar test cases are commented out. Added insertions to ReadClipperUnitTest ReadClipper now tests for all indels. Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-09 11:43:37 -05:00
Eric Banks	64dad13e2d	Don't carry around an extra copy of the code for the Haplotype Caller	2011-12-09 11:09:40 -05:00
Eric Banks	442ceb6ad9	The Exact model now computes both the likelihoods and posteriors (in separate arrays); likelihoods are used for assigning genotypes, not the posteriors.	2011-12-09 10:16:44 -05:00
Laurent Francioli	a79144f7db	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-09 15:57:24 +01:00
Laurent Francioli	72fbfba97d	Added UnitTests for getFamilies() and getChildrenWithParents()	2011-12-09 15:57:07 +01:00
Laurent Francioli	5a06170804	Corrected bug causing getChildrenWithParents() to not take the last family member into consideration.	2011-12-09 14:51:34 +01:00
Eric Banks	aa4a8c5303	No dynamic programming solution for assignning genotypes; just done greedily now. Fixed QualByDepth to skip no-call genotypes. No-calls are no longer given annotations (attributes).	2011-12-09 02:25:06 -05:00
Eric Banks	2fe50c64da	Updating md5s	2011-12-09 00:47:01 -05:00
Eric Banks	8777288a9f	Don't throw a UserException if too many alt alleles are trying to be genotyped. Instead, I've added an argument that allows the user to set the max number of alt alleles to genotype and the UG warns and skips any sites with more than that number.	2011-12-09 00:00:20 -05:00
Eric Banks	3e7714629f	Scrapped the whole idea of an int/long as an index into the ACset: with lots of alternate alleles we run into overflow issues. Instead, simply use the ACcounts array as the hash key since it is unique for each AC conformation. To do this, it needed to be wrapped inside an object so hashcode() would work.	2011-12-08 23:50:54 -05:00
Eric Banks	4aebe99445	Need to use longs for the set index (because we can run out of ints when there are too many alternate alleles). Integration tests now use the multiallelic implementation.	2011-12-08 15:31:02 -05:00
Eric Banks	7750bafb12	Fixed bug where last dependent set index wasn't properly being transferred for sites with many alleles. Adding debugging output.	2011-12-08 13:50:50 -05:00
Guillermo del Angel	252e0f3d0a	Merged bug fix from Stable into Unstable	2011-12-08 13:11:39 -05:00
Guillermo del Angel	1bfe28067f	Don't try to genotype an indel even bigger than the reference window size, or else we'll be out of bounds. Necessary to handle Phase 1 integrated callset with large deletions. Better error indication when validating a GenomeLoc.	2011-12-08 12:54:08 -05:00
Mark DePristo	50c4436f90	scales=free shows variance within analysis better	2011-12-07 14:09:32 -05:00
Mark DePristo	9def841275	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-07 13:36:16 -05:00
Mark DePristo	69b19047ba	Fix bad path	2011-12-07 12:08:25 -05:00
Mark DePristo	4055877708	Prints 0.0 TiTv not NaN when there are no variants -- Updated md5	2011-12-07 12:07:54 -05:00
Matt Hanna	44bc8766d7	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-07 12:03:27 -05:00
Matt Hanna	15533e08df	Fixed issue with RODWalker parallelization. Turns out that someone previously upped the declared size of a ROD shard to 100M bases, making each ROD shard larger than the size of chr20. Why didn't we see this in Stable? Because the ShardStrategy/ShardStrategyFactory mechanism was dutifully ignoring the shard size specification. When I rolled the ShardStrategy/ShardStrategyFactory mechanics back into the DataSources as part of the async I/O project, I inadvertently reenabled this specifier.	2011-12-07 11:55:42 -05:00
Ryan Poplin	831010e72f	Misc minor updates and added comments to the HaplotypeCaller. Merging branches in prep for work on active region traversal.	2011-12-07 10:08:29 -05:00
Mark DePristo	e17a1923fb	Plots runtimes by analysis name and exechosts Useful to understand the performance of analysis jobs by hosts, and to debug problematic nodes	2011-12-07 09:24:47 -05:00
Mark DePristo	5d2212bc8e	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-07 09:03:17 -05:00
Mark DePristo	6bf18899df	Fix for variant summary -- now treats all 50 bp deletions or insertions as CNVs	2011-12-07 09:02:49 -05:00
Matt Hanna	5869a87e48	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-06 18:12:12 -05:00
Matt Hanna	c9b2cd8ba5	Fix for chartl's stale null representation issue.	2011-12-06 18:05:17 -05:00
Eric Banks	79d18dc078	Fixing indexing bug on the ACsets. Added unit tests for the Exact model code.	2011-12-06 16:17:18 -05:00
Khalid Shakir	b4b7ae1bd9	Revved Picard to incorporate tfennell's AsyncSAMFileWriter. Removed DbSnpFileGenerator and related files as they were removed from PPP r2063 by ktibbett.	2011-12-06 10:37:42 -05:00
Matt Hanna	f5b977fc88	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-06 10:11:35 -05:00
Matt Hanna	4001c22a11	Better file count / buffering variation in test suite. Parameterized read shard buffering. Misc cleanup.	2011-12-06 10:10:38 -05:00
Khalid Shakir	677bea0abd	Right aligning GATKReport numeric columns and updated MD5s in tests. PreQC parses file with spaces in sample names by using tabs only. PostQC allows passing the file names for the evals so that flanks can be evaled. BaseTest's network temp dir now adds the user name to the path so files aren't created in the root. HybridSelectionPipeline: - Updated to latest versions of reference data. - Refactored Picard parsing code replacing YAML.	2011-12-05 23:22:15 -05:00
Eric Banks	7a0f6feda4	Make sure that too many alternate alleles aren't being passed to the genotyper (10 for now) and exit with a UserError if there are.	2011-12-05 16:18:52 -05:00
Eric Banks	7fac4afab3	Fixed priors (now initialized upon engine startup in a multi-dimensional array) and cell coefficients (properly handles the generalized closed form representation for multiple alleles).	2011-12-05 15:57:25 -05:00
David Roazen	1ba03a5e72	Use optional() instead of required() to construct javaMemoryLimit argument in JavaCommandLineFunction	2011-12-05 14:06:00 -05:00
Eric Banks	a7cb941417	The posteriors vector is now 2 dimensional so that it supports multiple alleles (although the UG is still hard-coded to use only array[0] for now); the exact model now collapses probabilities for all conformations over a given AC into the posteriors array (in the appropriate dimension). Fixed a bug where the priors and posteriors were being passed in swapped.	2011-12-04 13:02:53 -05:00
Eric Banks	eab2b76c9b	Added loads of comments for future reference	2011-12-03 23:54:42 -05:00
Eric Banks	29662be3d7	Fixed bug where k=2N case wasn't properly being computed. Added optimization for BB genotype case not in old model. At this point, integration tests pass except for 1 case where QUALs differ by 0.01 (this is okay because I occasionally need to compute extra cells in the matrix which affects the approximations) and 2 cases where multi-allelic indels are being genotyped (some work still needs to be done to support them).	2011-12-03 23:12:04 -05:00
Eric Banks	71f793b71b	First partially working version of the multi-allelic version of the Exact AF calculation	2011-12-02 14:13:14 -05:00
David Roazen	d014c7faf9	Queue now properly escapes all shell arguments in generated shell scripts This has implications for both Qscript authors and CommandLineFunction authors. Qscript authors: You no longer need to (and in fact must not) manually escape String values to avoid interpretation by the shell when setting up Walker parameters. Queue will safely escape all of your Strings for you so that they'll be interpreted literally. Eg., Old way: filterSNPs.filterExpression = List("\"QD<2.0\"", "\"MQ<40.0\"", "\"HaplotypeScore>13.0\"") New way: filterSNPs.filterExpression = List("QD<2.0", "MQ<40.0", "HaplotypeScore>13.0") CommandLineFunction authors: If you're writing a one-off CommandLineFunction in a Qscript and don't really care about quoting issues, just keep doing things the direct, simple way: def commandLine = "cat %s \| grep -v \"#\" > %s".format(files, out) If you're writing a CommandLineFunction that will become part of Queue and will be used by other QScripts, however, it's advisable to do things the newer, safer way, ie.: When you construct your commandLine, you should do so ONLY using the API methods required(), optional(), conditional(), and repeat(). These will manage quoting and whitespace separation for you, so you shouldn't insert quotes/extraneous whitespace in your Strings. By default you get both (quoting and whitespace separation), but you can disable either of these via parameters. Eg., override def commandLine = super.commandLine + required("eff") + conditional(verbose, "-v") + optional("-c", config) + required("-i", "vcf") + required("-o", "vcf") + required(genomeVersion) + required(inVcf) + required(">", escape=false) + // This will be shell-interpreted required(outVcf) I've ported the Picard/Samtools/SnpEff CommandLineFunction classes to the new system, so you'll get free shell escaping when you use those in Qscripts just like with walkers.	2011-12-01 18:13:44 -05:00

1 2 3 4 5 ...

8308 Commits (025bdfe2cc5d0639bbfc8ddd986dba496899661d) All Branches Search

8308 Commits (025bdfe2cc5d0639bbfc8ddd986dba496899661d)

All Branches