gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	5f5edeca63	Reverting move of BQSR tests to public, as per DR's email	2012-07-19 10:02:05 -04:00
Eric Banks	d46ccec04e	Adding Unit Tests to cover the exception catching for Picard errors: because we are using String matching, we want to ensure that we know if/when the exception text changes underneath us.	2012-07-18 21:48:58 -04:00
Eric Banks	9c1ab1b0c0	Move BQSR integration test and its dependent files into public; previously there was a protected->private dependency.	2012-07-18 21:11:33 -04:00
Eric Banks	33306d2e20	Changing the logic of the -standard argument; the way it stands currently one can never turn off the cycle or context covariates. Now they are on by default and users must opt out of them to turn them off.	2012-07-04 00:21:21 -04:00
Mark DePristo	66337a9899	Moved most testdata from public to private	2012-06-21 15:17:19 -04:00
Mark DePristo	c4e0233ba3	Updating testdata to have proper VCF headers	2012-06-21 15:16:31 -04:00
Mark DePristo	0e81484f0f	More testdata test files	2012-06-14 16:42:35 -04:00
Mark DePristo	6a0e0951af	In upgrade to VCF4 replace 0 with PASS	2012-06-14 16:42:33 -04:00
Mark DePristo	5894d045cb	Bugfixes and code cleanup throughout so BCF2 passes VC -> BCF -> VC tests -- This version of BCF should actually work properly for most files, assuming headers are properly defined. -- Lots of bug fixes to BCF2 codec -- Genotype getPhredScaledQual is now an int, returning -1 if there's no QUAL. NOTE THIS SEMANTICS change -- Equals() method for GenotypeLikelihoods, using PLs. -- VCFCodec now longer adds empty bindings to missing input field values. NOTE THIS CHANGE -- VCs can be marked as fully decoded, so that when fullyDecode() is called it returns itself, instead of doing the decoding work. The BCF2 codec now makes VCs marked as fully decoded -- stringToBytes returns empty list for null or "" string in BCF2Encoder -- Proper handling of genotype ordering in BCF2 reader / writer -- Removed the crazy slow noDups and sameSamples tests that were slowing down unit and integration tests totally unnecessarily -- Many failing MD5s now due to double -> int change in GQ, will update later	2012-05-27 11:17:17 -04:00
Mark DePristo	0a86564669	Updated test files didn't make it into last push	2012-05-24 13:29:44 -04:00
Mark DePristo	57a1ac0888	Fix up bad paths to public/testdata files	2012-05-24 10:59:00 -04:00
Mark DePristo	6ca71fe3b4	GATK tests use public/testdata not /humgen/ as much as possible	2012-05-24 10:58:58 -04:00
Mark DePristo	cad608c07f	Adding as many test data files into public/testdata as possible	2012-05-24 10:58:30 -04:00
Mark DePristo	ac7460ef8c	New complex VCF files for testing	2012-05-24 10:57:05 -04:00
Eric Banks	a26b04ba17	Extensive refactoring of the GATKReports. This was a beast. The practical differences between version 1.0 and this one (v1.1) are: * the underlying data structure now uses arrays instead of hashes, which should drastically reduce the memory overhead required to create large tables. * no more primary keys; you can still create arbitrary IDs to index into rows, but there is no special cased primary key column in the table. * no more dangerous/ugly table operations supported except to increment a cell's value (if an int) or to concatenate 2 tables. Integration tests change because table headers are different. Old classes are still lying around. Will clean those up in a subsequent commit.	2012-05-18 01:11:26 -04:00
Mauricio Carneiro	a19c27297f	continuing the BQSR triage... * fixed the loading of the new reduced size reports * reduced BQSR scala script memory to 2Gb * removed dcov parameter from BQSR scala script * fixed estimatedQReported calculation from -log10(pe) to -10log10(pe). updated md5's with the proper PHRED scaled EstimatedQReported	2012-04-05 14:34:15 -04:00
Mauricio Carneiro	7c3b3650bb	BQSR bug triage * fixed bug where some keys were using the same recal datum objects * fixed quantization qual calculations when combining multiple reports * fixed rounding error with empirical quality reported when combining reports * fixed combine routine in the gatk reports due to the primary keys being out of order * added auto-recalibration option to BQSR scala script * reduced the size of the recalibration report by ~15% * updated md5's	2012-04-05 09:32:18 -04:00
Mark DePristo	1ccea866d8	VariantEval now includes -keepAC0 argument to include sites with alt alleles but AC 0 in analyses -- Updated EvalModules to work with new paramter -- adding test file for keepAC0 to public/testdata and integration tests	2012-04-04 15:37:12 -04:00
Mauricio Carneiro	1b75663178	BQSR Gatherer implementation and integration tests * restructured the hash tables into one class (RecalibrationReport) that has all the functionality for the different tables and key managers * optmized empirical qual calculation when merging recalibration reports * centralized the quality score quantization functionalities * unified the creating/loading of all the key manager/hash table structures. * added unit tests for the gatherer (disabled because gatk report needs to be sorted for automated testing) * added integration tests for BQSR and on-the-fly recalibration	2012-03-27 13:50:22 -05:00
Mauricio Carneiro	9f74969e3a	BQSR with GATKReport implementation * restructured BQSR to report recalibrated tables. * implemented empirical quality calculation to the BQSR stage (instead of on-the-fly recalibration) * linked quality score quantization to the BQSR stage, outputting a quantization histogram * included the arguments used in BQSR to the GATK Report * included all three tables (RG, QUAL and COVARIATES) to the GATK Report with empirical qualities On-the-fly recalibration with GATK Report * loads all tables from the GATKReport using existing infrastructure (with minor updates) * implemented initialiazation of the covariates using BQSR's argument list * reduced memory usage significantly by loading only the empirical quality and estimated quality reported for each bit set key * applied quality quantization to the base recalibration * excluded low quality bases from on-the-fly recalibration for mismatches, insertions or deletions	2012-03-23 15:42:32 -04:00
Mauricio Carneiro	e4cbeddf2d	adding on-the-fly recalibration test data	2012-03-16 13:18:16 -04:00
Mark DePristo	a63d1f58b6	analyzeRunReports cleanup for new minimal GATKRunReport structure -- No more command lines or working directories -- Added failing and successful gatkrunreports to public/testdata for testing	2012-03-12 09:46:26 -04:00
Mark DePristo	e0c189909f	Added support for breakpoint alleles -- See https://getsatisfaction.com/gsa/topics/support_vcf_4_1_structural_variation_breakend_alleles?utm_content=topic_link&utm_medium=email&utm_source=new_topic -- Added integrationtest to ensure that we can parse and write out breakpoint example	2012-02-23 12:14:48 -05:00
Mauricio Carneiro	ed91461c49	Data Processing Pipeline Test * Added standard pipeline test for the DPP * Added a full BWA pipeline test for the DPP * Included the extra files for the reference needed by BWA (to be used by DPP and PPP tests)	2011-12-12 00:24:51 -05:00
Mauricio Carneiro	cca8a18608	PPP pipeline test * added a pipeline test to the Pacbio Processing Pipeline. * updated exampleBAM with more complete RG information so we can use it in a wider variety of pipeline tests * added exampleDBSNP.vcf file with only chromosome 1 in the range of the exampleFASTA.fasta reference for pipeline tests	2011-12-11 17:32:21 -05:00
Mark DePristo	714cac21c9	Testdata for IntervalStratification	2011-11-10 11:08:34 -05:00
Mark DePristo	a3c5a31686	Oops, forgot the PED test file	2011-10-05 21:09:08 -07:00
Mark DePristo	c6ba944719	Adding bgzip vcf file for unit tests	2011-09-21 15:39:45 -04:00
Mark DePristo	bc902e8421	GZIP VCF for testing	2011-08-19 09:05:39 -04:00
Mark DePristo	449bf1b539	Testdata for diffObjects. PipelineTest updated to point to MD5DB.java	2011-07-18 10:47:03 -04:00
Mark A. DePristo	38740b0ff5	First working version of the DiffNode readers for VCF and BAM files. Unit tests confirm the readers are approximately working. Skeleton of a working DiffObjects walker that will be able to provide detailed information about how exactly two files of the same type differ, so long as the files are supported by the DiffNode structure.	2011-07-04 16:11:42 -04:00
David Roazen	3c9497788e	Reorganized the codebase beneath top-level public and private directories, removing the playground and oneoffprojects directories in the process. Updated build.xml accordingly.	2011-06-28 06:55:19 -04:00

32 Commits (7796ba7601689ea864934853e4509d2abd4dae9a)