Commit Graph

9032 Commits (e86ce8f3d66f7cffc9e349e2b207d77ccab9f9c3)

Author SHA1 Message Date
Ryan Poplin e86ce8f3d6 updating HaplotypeCaller integration tests to reflect all the recent changes. 2012-03-15 14:56:35 -04:00
Ryan Poplin 0c6b34e9df Fixing a bug identified by the ActivityProfile unit tests 2012-03-15 14:24:30 -04:00
Ryan Poplin 252b830aa8 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-15 11:56:04 -04:00
Ryan Poplin 0fa5a7af05 Adding contracts and unit tests for HaplotypeCaller GenotypingEngine 2012-03-15 11:55:48 -04:00
Ryan Poplin c1f454fbe6 cleaning up and expanding LikelihoodCalculationEngine unit tests 2012-03-15 08:53:11 -04:00
Mauricio Carneiro c865950923 fixing my typo on the md5. 2012-03-14 22:00:03 -04:00
Ryan Poplin 1212a65140 Adding contracts and unit tests for HaplotypeCaller LikelihoodCalculationEngine 2012-03-14 21:26:01 -04:00
Ryan Poplin 1429ddcf55 Adding contracts and unit tests for HaplotypeCaller LikelihoodCalculationEngine 2012-03-14 21:25:43 -04:00
Mauricio Carneiro c045542442 ReduceReads default downsampling strategy is now NORMAL
Adaptive downsampler had an undesirable behavior in strange regions of the genome. This is a temporary fix, both downsamplers will be made obsolete when engine's positional downsampler gets generalized to read walkers.
2012-03-14 17:29:47 -04:00
Mark DePristo 7c5cdb51c2 UnitTests for ActivityProfile and minor ART cleanup
-- TODO for ryan -- there are bugs in ActivityProfile code that I cannot fix right now :-(
-- UnitTesting framework for ActivityProfile -- needs to be expanded
-- Minor helper functions for ActiveRegion to help with unit tests
2012-03-14 17:26:37 -04:00
Mark DePristo e440c9be98 Clean up logic for adding reads to ART cache
-- No longer has duplicate code
2012-03-14 17:26:37 -04:00
Mark DePristo 5bcb5c7433 Preliminary refactoring of ART
-- Refactored ART into clearer, simpler procedures.  Attempted to merge shared code into utility classes.
-- Added some docs
-- Created a new, testable ActivityProfile that represents as a class the probability of a base being active or inactive
-- Separated band-pass filtering from creation of active regions.  Now you can band pass filter a profile to make another profile, and then that is explicitly converted to active regions
-- Misc. utility functions in ActiveRegionWalker such as hasPresetActiveRegions()
-- Many TODOs in ActivityProfile.
2012-03-14 17:26:37 -04:00
Mark DePristo e73406b9b5 CountReadsInActiveRegions now emits a detailed GATK report
-- This report details which intervals are coming in and how many reads they contain
-- Added integration test to verify that the intervals aren't changing, before heading into the ART refactor
2012-03-14 17:26:37 -04:00
Mark DePristo 86eed6de07 Updating 1000G summary table to use new CNVs list 2012-03-14 17:26:36 -04:00
Ryan Poplin 66411ea1e9 misc minor cleanup 2012-03-14 16:10:25 -04:00
Ryan Poplin 1da8928407 HC GenotypingEngine marginalizes over haplotypes when outputing events that were found on a subset of the called haplotypes. 2012-03-14 15:22:21 -04:00
Guillermo del Angel eca055ccad Add option in ValidationAmplicons to only output SNPs and INDELs, ignoring complex variants (or SVs, etc.) 2012-03-14 14:26:40 -04:00
Mark DePristo 8e96969744 Support for exception-class in analyzeRunReports.py 2012-03-14 12:27:11 -04:00
Mark DePristo 6a40ca6bec Merged bug fix from Stable into Unstable 2012-03-14 12:19:33 -04:00
Mark DePristo bb2c10b785 Capture the class of the exception in GATKRunReport
-- As suggested by David.
2012-03-14 12:16:22 -04:00
Ryan Poplin 78a4e7e45e Major restructuring of HaplotypeCaller's LikelihoodCalculationEngine and GenotypingEngine. We no longer create an ugly event dictionary and genotype events found on haplotypes independently by finding the haplotype with the max likelihood. Lots of code has been rewritten to be much cleaner. 2012-03-14 12:05:05 -04:00
Eric Banks 47e5a80d0f Trivial submission script that's useful to have for next time 2012-03-14 10:17:14 -04:00
Mark DePristo 06340a3c48 Code cleanup now that we have Tableau analysis
-- Stop looking at exceptions in daily digest
-- Remove old code analyzeRunReports that wasn't being maintained
2012-03-14 08:06:31 -04:00
Mark DePristo 841d200688 Always use long format 2012-03-13 17:05:29 -04:00
Eric Banks 77243d0df1 Splitting up the MultiallelicSummary module into the standard part for use by all and the dev piece used just by me 2012-03-13 16:31:51 -04:00
Eric Banks f76da1efd2 Updating md5s because MultiallelicSummary is now standard 2012-03-13 16:31:13 -04:00
Eric Banks ae65d86b81 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-13 16:26:51 -04:00
Eric Banks 568a1362f5 Splitting up the MultiallelicSummary module into the standard part for use by all and the dev piece used just by me 2012-03-13 16:19:15 -04:00
Ryan Poplin 2d5ca8bcfe Adding my AnalyzeCovariates R script for Mauricio to use 2012-03-13 13:05:10 -04:00
Eric Banks 6e18ecfc9a Adding integration test to cover errors from my previous commit (GENOTYPE_GIVEN_ALLELE bugs reported by Sara Pulit and Chris Hartl) 2012-03-13 12:43:40 -04:00
Eric Banks 5d7c761784 Merged bug fix from Stable into Unstable 2012-03-13 11:01:03 -04:00
Eric Banks 5200f7f919 When creating a synthetic VC based on the passed in alleles, set the reference base for indel. 2012-03-13 10:59:58 -04:00
Eric Banks 1675bd4dd7 When creating a synthetic VC based on the passed in alleles, set the length correctly. 2012-03-13 10:55:52 -04:00
Eric Banks a514257318 Output to sdout, not a file 2012-03-13 10:46:12 -04:00
Eric Banks ed69f4ff7c Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-13 09:28:16 -04:00
Eric Banks 9b9856ead5 quick todo for next time we make a bundle 2012-03-13 09:28:11 -04:00
Mark DePristo 6dd341ea22 Generalized runtime simplifying routines in analyzeRunReports
-- Handles 2012/3/12 as well as 2012-3-12
-- UnitTests to test it's working correctly
2012-03-13 08:06:11 -04:00
David Roazen 5d6a686474 Restoring key-related unit/integration tests
The recent GATKReport commit accidentally clobbered a few tests -- this
restores them.
2012-03-13 00:58:24 -04:00
Eric Banks 42ace8bf7b Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-12 23:20:14 -04:00
Eric Banks 6e9b8559d8 Unfortunately need to bump up memory needed for liftover to get Omni file sorted 2012-03-12 23:20:00 -04:00
Roger Zurawicki 7887a06703 GATKReport v1.0
GATKReport format changes:

 - All non-data header lines are preceeded with a single pound ( #:)
 - Every report now has a report header containing the version number and number of tables
 - Every table has two lines of table header: The first explains the size of the table and the data types of each column, the second contains the table name and description.
 - This new format will allow reports in the future to be gatherable.
 - Changed the header format to include an end-of-line string ":;"

Added features:

 - Simplified GATK Reports:

	The constructor for a simplified GATK Report. Simplified GATK report are designed for reports that do not need the advanced functionality of a full GATK Report.

	A simple GATK Report consists of:
		- A single table
		- No primary key ( it is hidden )
	    Optional:
		- Only untyped columns. As long as the data is an Object, it will be accepted.
		- Default column values being empty strings.
	Limitations:
		- A simple GATK report cannot contain multiple tables.
		- It cannot contain typed columns, which prevents arithmetic gathering.

       - Added a constructor to generate simplified GATK reports.
       - Added a method to easily add data to simple GATK reports.

 - Upgraded the input parser take advantage of the new file format (v1).
 - Added the GATKReportGatherer, more usability cmoing in next versionof GATK Report. Curently, it can only add rows from one table to another. Added private methods in GATKReport to combine Tables and Reports, It is very conservative and will only gather if the table columns, as well as everything else matches. At the column level, it uses the (redundant) row ids to add new rows. It will throw an exception if it is overwriting data.
 - Made some GATKReport methods public, and added more setters and getters.
 - Added method that compares formats of two GATKReports, and added an equals method to verify all data inside.
 - The gsalib for R now supports reading GATKReport v1 files in addition to legacy formats (v0.*)
 - Added a GATKReportDataType enum to give column a certain data type. This must be specified when making a gatherable report. This enum contains several methods including a reverse lookup map.
 - Added a data type field in GATKColumn, when a type is not specified, the unknown type is used. Unknown types should not be gathered.

Test changes:

 - Updated Unit Tests for GATK Report v1. Added a test for the gatherer. Left one test disabled while we transition from v0 to v1.
 - Updated the MD5 hashes in integration tests throughout the GATK.

Other changes:

 - Added the gatherer functions to CoverageByRG
 - Also added the scatterCount parameter in the Interval Coverage script
 - Dropped support for reading in legacy GATKReport formats ( v0.*)
 - Updated VariantEvalWalker to work with GATK Report v1, added a format String to all applicable DataPoints.
 - Rewrote the read file method for GATK report files.
 - Optimized the equals methods within GATKReport. The protected functions should only be called by the GATKReport methods.

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-03-12 23:09:19 -04:00
Eric Banks 10995d349e Fix old error message 2012-03-12 22:56:08 -04:00
Eric Banks 2314787767 Generalizing to avoid JDK 1.7 incompatibilities 2012-03-12 22:50:59 -04:00
Eric Banks 359090c4b7 Updating dbsnp to v135 2012-03-12 13:17:58 -04:00
Eric Banks 7e9a535c4d Updated the bundle to use the official filtered (final) indel calls 2012-03-12 12:12:24 -04:00
Eric Banks 05ef5863cf Don't assume files have .bai and .bas associated with them 2012-03-12 11:47:48 -04:00
Mark DePristo 6bc92d2bbf Bugfix for analyzeRunReports.py: now handles case where hostname is missing 2012-03-12 09:46:29 -04:00
Mark DePristo a63d1f58b6 analyzeRunReports cleanup for new minimal GATKRunReport structure
-- No more command lines or working directories
-- Added failing and successful gatkrunreports to public/testdata for testing
2012-03-12 09:46:26 -04:00
Ryan Poplin 03223029e3 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-12 09:42:37 -04:00
Eric Banks 04cafffaa7 Merge remote-tracking branch 'unstable/master' 2012-03-12 08:43:43 -04:00