gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	8d2db3f249	Emit and visualize quality histogram in QualQuantizer	2012-03-06 16:56:58 -05:00
Mark DePristo	b7089a3b05	Improvements to QualQuantizer; Walker to quantize quals in BAM file -- QualQuantizer now tracks merge order and level in the QualInterval for debugging / visualization -- Write out QualIntervals tree for visualization -- visualizeQuantizedQuals.R r script for basic visualization of the quality score quantization	2012-03-06 16:56:58 -05:00
David Roazen	811f871f78	Do not fail tests that require the GATK private key if the user does not have permission to read it Several of the unit tests for the new key authorization feature require read access to the GATK master private key file. Since this file is only readable by members of the group gsagit, this makes it hard for people outside the group to run the test suite. Now, we skip tests that require the master private key if the private key exists (since not existing would be a true error) but is not readable by the user running the test suite Bamboo, of course, will always be able to run these tests.	2012-03-06 15:57:02 -05:00
David Roazen	0702ee1587	Public-key authorization scheme to restrict use of NO_ET -Running the GATK with the -et NO_ET or -et STDOUT options now requires a key issued by us. Our reasons for doing this, and the procedure for our users to request keys, are documented here: http://www.broadinstitute.org/gsa/wiki/index.php/Phone_home -A GATK user key is an email address plus a cryptographic signature signed using our private key, all wrapped in a GZIP container. User keys are validated using the public key we now distribute with the GATK. Our private key is kept in a secure location. -Keys are cryptographically secure in that valid keys definitely came from us and keys cannot be fabricated, however keys are not "copy-protected" in any way. -Includes private, standalone utilities to create a new GATK user key (GenerateGATKUserKey) and to create a new master public/private key pair (GenerateKeyPair). Usage of these tools will be documented on the internal wiki shortly. -Comprehensive unit/integration tests, including tests to ensure the continued integrity of the GATK master public/private key pair. -Generation of new user keys and the new unit/integration tests both require access to the GATK private key, which can only be read by members of the group "gsagit".	2012-03-06 00:09:43 -05:00
Lechu	027843d791	I've simply added a "library(grid)" call at the beginning of the R script generation since R 2.14.2 doesn't seem to load the "grid" package as default. I haven't tested it on previous R versions (you may edit the R version comment to be more precise if desired), but I'm almost certain that this library call shouldn't do any harm on them. Signed-off-by: Ryan Poplin <rpoplin@broadinstitute.org>	2012-03-05 21:27:03 -05:00
Ryan Poplin	f6905630bb	Adding Unit test for Haplotype class. Used in HC's genotype given alleles mode.	2012-03-05 21:08:07 -05:00
Ryan Poplin	9b53250bef	Adding Unit test for Haplotype class. Used in HC's genotype given alleles mode.	2012-03-05 21:07:36 -05:00
Ryan Poplin	b37461587d	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-05 17:54:59 -05:00
Ryan Poplin	c6ded4d23c	Bug fix for hard clipping reads when base insertion and base deletion qualities are present in the read. Updating HaplotypeCaller integration tests to reflect all the recent changes.	2012-03-05 17:54:42 -05:00
Ryan Poplin	14a77b1e71	Getting rid of redundant methods in MathUtils. Adding unit tests for approximateLog10SumLog10 and normalizeFromLog10. Increasing the precision of the Jacobian approximation used by approximateLog10SumLog which changes the UG+HC integration tests ever so slightly.	2012-03-05 12:28:32 -05:00
Mauricio Carneiro	e9ad382e74	unifying the BQSR argument collection	2012-03-05 10:48:26 -05:00
Mauricio Carneiro	a1d6b3818c	dont include deletions in the pileup	2012-03-05 10:48:26 -05:00
Mauricio Carneiro	dfbffc95a3	getting rid of the old Indel BQSR	2012-03-05 10:48:26 -05:00
Ryan Poplin	f879daa7d0	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-05 08:29:08 -05:00
Ryan Poplin	d6871967ae	Adding more unit tests and contracts to PairHMM util class. Updating HaplotypeCaller to use the new PairHMM util class. Now that the HMM result isn't dependent on the length of the haplotype there is no reason to ensure all haplotypes have the save length which simplifies the code considerably.	2012-03-05 08:28:42 -05:00
Guillermo del Angel	3b5a7c34d7	Added argument to ValidationAmplicons to only output valid sequences - useful for not having to post-filter or grep resulting files before delivering downstream	2012-03-04 10:24:29 -05:00
Mark DePristo	69611af7d3	Workaround for bug in Picard in ReadGroupProperties -- NPE caused when you call getRunDate on a read group without a date.	2012-03-02 18:53:45 -05:00
Mark DePristo	914c23da51	Generic infrastructure for quantizing quality scores -- Just infrastructure at this point (but with UnitTests!). -- Capable of taking a histogram of quality scores and a target number of levels (8 for example), and mapping the full range of input quality scores down to only 8. -- The selected quality scores are chosen to minimize the miscalibration rate of the resulting bins. I believe this adaptive approach is vastly better than the current systems being developed by EBI and NCBI -- This infrastructure is designed to work with BQSRv2. I envision a system where we feed in the projected empirical quality score distribution from the BQSRv2 table, compute the required deleveling for each of the B, I, and D qualities, and on the fly emit calibrated, compressed quality scores. -- Note the algorithm right now for determining the best intervals is both greedy (i.e., will miss the best overall choice) and potentially extremely slow. But it is enough for me to play with.	2012-03-02 16:12:42 -05:00
Mark DePristo	ba71b0aee4	ReadGroupProperties mk3 -- Includes sequencing date	2012-03-02 16:12:42 -05:00
Khalid Shakir	fc1c0a9d8f	Minor change: switched HSP default fasta from bundle/g1k to Picard since in all oneoff runs of the HSP the BAMs were aligned by Picard to Picard's reference.	2012-03-02 14:20:54 -05:00
Eric Banks	1e07e97b58	Optimization: create allele list just once, not for each genotype	2012-03-02 13:30:17 -05:00
Mark DePristo	0a7137616c	Now converts gatkreports to properly typed R data types in gsa.read.gatkreport -- use the general function type.convert from read.table to automagically convert the string data to booleans, factors, and numeric types as appropriate. Vastly better than the previous behavior which only worked for numerics, in some cases.	2012-03-02 09:11:59 -05:00
Ryan Poplin	0ad7d5fbc1	Standalone common Pair HMM utility class with associated unit tests.	2012-03-01 22:41:13 -05:00
Mark DePristo	2f334a57c2	ReadGroupProperties mk2 -- Includes paired end status (T/F) -- Includes count of reads used in calculation -- Includes simple read type (2x76 for example) -- Better handling of insert size, read length when there's no data, or the data isn't paired end by emitting NA not 0	2012-03-01 18:43:53 -05:00
Mauricio Carneiro	486712bfc2	ugly RG encoding	2012-03-01 17:56:45 -05:00
Mauricio Carneiro	4409293b5d	get rid of the sorting parameter	2012-03-01 17:56:45 -05:00
Mauricio Carneiro	29f74b658b	Unit tests for the context covariate this is simple, but it's the infra-structure to start messing around with the context.	2012-03-01 17:56:45 -05:00
Mark DePristo	aff508e091	ReadGroupProperties walker and associated infrastructure -- ReadGroupProperties: Emits a GATKReport containing read group, sample, library, platform, center, median insert size and median read length for each read group in every BAM file. -- Median tool that collects up to a given maximum number of elements and returns the median of the elements. -- Unit and integration tests for everything. -- Making name of TestProvider protected so subclasses and override name more easily	2012-03-01 15:01:11 -05:00
Eric Banks	88e060e1e9	Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-01 10:55:21 -05:00
Eric Banks	ed70d9b380	The reduced reads + calling script I am using; adding now for Khalid's benefit	2012-03-01 10:55:08 -05:00
Ryan Poplin	2af470404c	HaplotypeCaller integration test goes back to using flat base insertion and base deletion quality scores while the BQSR is in flux.	2012-03-01 09:05:55 -05:00
Mauricio Carneiro	9e95b10789	Context covariate now operates as a highly compressed bitset * All contexts with 'N' bases are now collapsed as uninformative * Context size is now represented internally as a BitSet but output as a dna string * Temporarily disabled sorted outputs because of null objects	2012-02-29 19:25:21 -05:00
Mauricio Carneiro	d379c3763a	DNA Sequence to BitSet and vice-versa conversion tools * Turns DNA sequences (for context covariates) into bit sets for maximum compression * Allows variable context size representation guaranteeing uniqueness. * Works with long precision, so it is limited to a context size of 31 bases (can be extended with BigNumber precision if necessary). * Unit Tests added	2012-02-29 19:25:20 -05:00
Mark DePristo	18c91e3cb3	Massively expanded haplotype caller likelihood testing -- Now include combinatorial testing for all input parameters: base quality, indel quality, continuation penalty, base identity, and indel length -- Disabled default the results coming back are not correct	2012-02-29 08:49:12 -05:00
Mark DePristo	a6735d1d88	UnitTest framework for HaplotypeCaller likelihood function -- Currently disabled as the likelihood function doesn't pass basic unit tests -- Also make low-level function in LikelihoodCalculationEngine protected	2012-02-28 20:41:52 -05:00
Menachem Fromer	30abe123f1	Added space before command-line parameters	2012-02-28 14:54:45 -05:00
Mauricio Carneiro	cf6472eea6	Silly build fix.	2012-02-28 10:52:27 -05:00
Mark DePristo	3ddcd6879f	Bugfix for fullRefresh mode for gsafolkLogsForTableau	2012-02-28 10:36:58 -05:00
Eric Banks	129b5e7f6b	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-28 10:09:34 -05:00
Eric Banks	a4a279ce80	Damn you, Mark	2012-02-28 10:09:09 -05:00
Khalid Shakir	0681bea5a5	Changed DoC from PartitionType.INTERVAL to PartitionType.NONE since it doesn't have a way to gather scattered outputs. Added MultiallelicSummary to HSP eval.	2012-02-28 09:27:27 -05:00
Eric Banks	bd398e30fd	Another quick optimization	2012-02-28 09:25:35 -05:00
Eric Banks	40bdadbda5	Minor optimization as per Mark	2012-02-28 09:24:07 -05:00
Eric Banks	d7928ad669	Drat, missed one: handle null alleles being passed in.	2012-02-27 21:31:54 -05:00
Mauricio Carneiro	1245a3c868	Tool for diagnosing new technology error modes	2012-02-27 19:37:19 -05:00
Mark DePristo	24356f11b7	Merged bug fix from Stable into Unstable -- Resolved conflict Conflicts: public/java/src/org/broadinstitute/sting/gatk/datasources/reads/SAMDataSource.java	2012-02-27 17:13:17 -05:00
Mark DePristo	0b29d54937	Changed most BAMSchedule ReviewedStingExceptions to UserExceptions -- As these represent the bulk of the StingExceptions coming from BAMSchedule and are caused by simple problems like the user providing bad input tmp directories, etc.	2012-02-27 17:08:41 -05:00
Mark DePristo	f9e8e82e33	Removed unused class variable from VCFHeaderLineTranslator	2012-02-27 17:07:19 -05:00
Mark DePristo	100ddef930	Fix typo in VariantContextBuilder	2012-02-27 17:06:45 -05:00
Mark DePristo	ca0931c01f	Adding test for reading samtools VCF file	2012-02-27 17:05:50 -05:00

1 2 3 4 5 ...

8947 Commits (8d2db3f24914236dc7d1eb85a82fd7f835805a7c) All Branches Search

8947 Commits (8d2db3f24914236dc7d1eb85a82fd7f835805a7c)

All Branches