gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Roger Zurawicki	7afb333811	GATK Report code cleanup - Updated the documentation on the code - Made the table.write() method private and updated necessary files. - Added a constructor to GATKReport that takes GATKReportTables - Optimized my code Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2012-03-19 11:53:57 -04:00
Mauricio Carneiro	0d4ea30d6d	Updating the BQSR Gatherer to the new file format This is important for quick turnaround in the analysis cycle of the new covariates. Also added a dummy unit test that doesn't really test anything (disabled), but helps in debugging.	2012-03-19 09:02:27 -04:00
Ryan Poplin	943b1d34f8	intermediate commit to aid in debugging HC / exact model changes. HC integration tests will still fail	2012-03-18 15:50:27 -04:00
Eric Banks	9223e451a3	Merged bug fix from Stable into Unstable	2012-03-18 00:54:19 -04:00
Eric Banks	5c5d8e7cd3	Minor: cleaner way of turning off index-on-the-fly checking in case we want to turn it back on.	2012-03-18 00:53:29 -04:00
Eric Banks	344a938a70	When checking to make sure that we have cached enough data in the PL array, use the converted index value since that's what will be used as an index into the array.	2012-03-18 00:36:30 -04:00
Guillermo del Angel	a27a9ccba2	Merged bug fix from Stable into Unstable	2012-03-16 21:15:30 -04:00
Guillermo del Angel	a05a7f287d	TMP: disable checking of whether on the fly index is equal to index after run completed	2012-03-16 21:14:45 -04:00
Eric Banks	539d51f324	Resolving conflicts	2012-03-16 14:36:07 -04:00
Eric Banks	be9e48ba29	Merged bug fix from Stable into Unstable	2012-03-16 14:33:53 -04:00
Eric Banks	a7578e85e8	Rewriting a few of the indel integration tests for multi-allelics. The old tests were running b37 calls against a b36 reference, so the calls were all ref. The new tests are run against the pilot1 data and then those calls are fed back into the the same bam to test genotype given alleles, with a sprinkling of bi- and tri-allelics.	2012-03-16 14:21:27 -04:00
Mauricio Carneiro	ec4a870a0f	Added @PG tag to ReduceReads Pulled out the functionality from Indel Realigner and Table Recalibrator into Utils.setupWriter to make everyone else's life's easier if they want to include the PG tag in their walkers.	2012-03-16 14:09:07 -04:00
Mauricio Carneiro	3bfca0ccfd	BitSet implementation of the on-the-fly recalibration using the CSV format file. Infrastructure: * Added static interface to all different clipping algorithms of low quality tail clipping * Added reverse direction pileup element event lookup (indels) to the PileupElement and LocusIteratorByState * Complete refactor of the KeyManager. Much cleaner implementation that handles keys with no optional covariates (necessary for on-the-fly recalibration) * EventType is now an independent enum with added capabilities. All functionality is now centralized. BQSR and RecalibrateBases: * On-the-fly recalibration is now generic and uses the same bit set structure as BQSR for a reduced memory footprint * Refactored the object creation to take advantage of the compact key structure * Replaced nested hash maps with single hash maps indexed by bitsets * Eliminated low quality tails from the context covariate (using ReadClipper's write N's algorithm). * Excluded contexts with N's from the output file. * Fixed cycle covariate for discrete platforms (need to check flow cycle platforms now!) * Redfined error for indels to look at the previous base in negative strand reads (using new PE functionality) * Added the covariate ID (for optional covariates) to the output for disambiguation purposes * Refactored CovariateKeySet -- eventType functionality is now handled by the EventType enum. * Reduced memory usage of the BQSR script to 4 Tests: * Refactored BQSRKeyManagerUnitTest to handle the new implementation of the key manager * Added tests for keys without optional covariates * Added tests for on-the-fly recalibration (but more tests are necessary)	2012-03-16 13:02:15 -04:00
Mauricio Carneiro	ca11ab39e7	BitSets keys to lower BQSR's memory footprint Infrastructure: * Generic BitSet implementation with any precision (up to long) * Two's complement implementation of the bit set handles negative numbers (cycle covariate) * Memoized implementation of the BitSet utils for better performance. * All exponents are now calculated with bit shifts, fixing numerical precision issues with the double Math.pow. * Replace log/sqrt with bitwise logic to get rid of numerical issues BQSR: * All covariates output BitSets and have the functionality to decode them back into Object values. * Covariates are responsible for determining the size of the key they will use (number of bits). * Generalized KeyManager implementation combines any arbitrary number of covariates into one bitset key with event type * No more NestedHashMaps. Single key system now fits in one hash to reduce hash table objects overhead Tests: * Unit tests added to every method of BitSetUtils * Unit tests added to the generalized key system infrastructure of BQSRv2 (KeyManager) * Unit tests added to the cycle and context covariates (will add unit tests to all covariates)	2012-03-16 13:01:48 -04:00
Eric Banks	7424041a17	Updating integration tests to deal with the new GL framework. Now multi-allelic indel calls are correct.	2012-03-16 12:50:39 -04:00
Eric Banks	dce6b91f7d	Add a conversion from the deprecated PL ordering to the new one. We need this for the DiploidSNPGenotypeLikelihoods which still use the old ordering. My intention is for this to be a temporary patch, but changing the ordering in DiploidSNPGenotypeLikelihoods is not appriopriate for committing to stable as it will break all of the external tools (e.g. MuTec) that are built on top of the class. We will have to talk to e.g. Kristian to see how disruptive this will be. Added unit tests to the GL conversions and indexing.	2012-03-16 11:14:37 -04:00
Eric Banks	41068b6985	The commit constitutes a major refactoring of the UG as far as the genotype likelihoods are concerned. I hate to do this in stable, but the VCFs currently being produced by the UG are totally busted. I am trying to make just the necessary changes in stable, doing everything else in unstable later. Now all GL calculations are unified into the GenotypeLikelihoods class - please try and use this functionality from now on instead of duplicating the code.	2012-03-15 16:08:58 -04:00
Ryan Poplin	0c6b34e9df	Fixing a bug identified by the ActivityProfile unit tests	2012-03-15 14:24:30 -04:00
Ryan Poplin	252b830aa8	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-15 11:56:04 -04:00
Ryan Poplin	0fa5a7af05	Adding contracts and unit tests for HaplotypeCaller GenotypingEngine	2012-03-15 11:55:48 -04:00
Ryan Poplin	1429ddcf55	Adding contracts and unit tests for HaplotypeCaller LikelihoodCalculationEngine	2012-03-14 21:25:43 -04:00
Mark DePristo	7c5cdb51c2	UnitTests for ActivityProfile and minor ART cleanup -- TODO for ryan -- there are bugs in ActivityProfile code that I cannot fix right now :-( -- UnitTesting framework for ActivityProfile -- needs to be expanded -- Minor helper functions for ActiveRegion to help with unit tests	2012-03-14 17:26:37 -04:00
Mark DePristo	e440c9be98	Clean up logic for adding reads to ART cache -- No longer has duplicate code	2012-03-14 17:26:37 -04:00
Mark DePristo	5bcb5c7433	Preliminary refactoring of ART -- Refactored ART into clearer, simpler procedures. Attempted to merge shared code into utility classes. -- Added some docs -- Created a new, testable ActivityProfile that represents as a class the probability of a base being active or inactive -- Separated band-pass filtering from creation of active regions. Now you can band pass filter a profile to make another profile, and then that is explicitly converted to active regions -- Misc. utility functions in ActiveRegionWalker such as hasPresetActiveRegions() -- Many TODOs in ActivityProfile.	2012-03-14 17:26:37 -04:00
Mark DePristo	e73406b9b5	CountReadsInActiveRegions now emits a detailed GATK report -- This report details which intervals are coming in and how many reads they contain -- Added integration test to verify that the intervals aren't changing, before heading into the ART refactor	2012-03-14 17:26:37 -04:00
Ryan Poplin	1da8928407	HC GenotypingEngine marginalizes over haplotypes when outputing events that were found on a subset of the called haplotypes.	2012-03-14 15:22:21 -04:00
Guillermo del Angel	eca055ccad	Add option in ValidationAmplicons to only output SNPs and INDELs, ignoring complex variants (or SVs, etc.)	2012-03-14 14:26:40 -04:00
Eric Banks	f7c2c818fe	Exact model memory optimization: instead of having a later matrix column pull in data from earlier ones (requiring us to keep them around until all dependencies are hit), the earlier columns push data into their dependents immediately and then are removed. This does trade off speed a little bit (because we need to call approximateLog10Sum each time we add to a dependent instead of once in an array at the end). Note that this commit would normally not get pushed into stable, but I'm about to make a very disruptive push into stable that would make merging this from unstable a nightmare.	2012-03-14 14:02:36 -04:00
Mark DePristo	6a40ca6bec	Merged bug fix from Stable into Unstable	2012-03-14 12:19:33 -04:00
Mark DePristo	bb2c10b785	Capture the class of the exception in GATKRunReport -- As suggested by David.	2012-03-14 12:16:22 -04:00
Ryan Poplin	78a4e7e45e	Major restructuring of HaplotypeCaller's LikelihoodCalculationEngine and GenotypingEngine. We no longer create an ugly event dictionary and genotype events found on haplotypes independently by finding the haplotype with the max likelihood. Lots of code has been rewritten to be much cleaner.	2012-03-14 12:05:05 -04:00
Eric Banks	77243d0df1	Splitting up the MultiallelicSummary module into the standard part for use by all and the dev piece used just by me	2012-03-13 16:31:51 -04:00
Eric Banks	f76da1efd2	Updating md5s because MultiallelicSummary is now standard	2012-03-13 16:31:13 -04:00
Eric Banks	568a1362f5	Splitting up the MultiallelicSummary module into the standard part for use by all and the dev piece used just by me	2012-03-13 16:19:15 -04:00
Eric Banks	6e18ecfc9a	Adding integration test to cover errors from my previous commit (GENOTYPE_GIVEN_ALLELE bugs reported by Sara Pulit and Chris Hartl)	2012-03-13 12:43:40 -04:00
Eric Banks	5d7c761784	Merged bug fix from Stable into Unstable	2012-03-13 11:01:03 -04:00
Eric Banks	5200f7f919	When creating a synthetic VC based on the passed in alleles, set the reference base for indel.	2012-03-13 10:59:58 -04:00
Eric Banks	1675bd4dd7	When creating a synthetic VC based on the passed in alleles, set the length correctly.	2012-03-13 10:55:52 -04:00
David Roazen	5d6a686474	Restoring key-related unit/integration tests The recent GATKReport commit accidentally clobbered a few tests -- this restores them.	2012-03-13 00:58:24 -04:00
Roger Zurawicki	7887a06703	GATKReport v1.0 GATKReport format changes: - All non-data header lines are preceeded with a single pound ( #:) - Every report now has a report header containing the version number and number of tables - Every table has two lines of table header: The first explains the size of the table and the data types of each column, the second contains the table name and description. - This new format will allow reports in the future to be gatherable. - Changed the header format to include an end-of-line string ":;" Added features: - Simplified GATK Reports: The constructor for a simplified GATK Report. Simplified GATK report are designed for reports that do not need the advanced functionality of a full GATK Report. A simple GATK Report consists of: - A single table - No primary key ( it is hidden ) Optional: - Only untyped columns. As long as the data is an Object, it will be accepted. - Default column values being empty strings. Limitations: - A simple GATK report cannot contain multiple tables. - It cannot contain typed columns, which prevents arithmetic gathering. - Added a constructor to generate simplified GATK reports. - Added a method to easily add data to simple GATK reports. - Upgraded the input parser take advantage of the new file format (v1). - Added the GATKReportGatherer, more usability cmoing in next versionof GATK Report. Curently, it can only add rows from one table to another. Added private methods in GATKReport to combine Tables and Reports, It is very conservative and will only gather if the table columns, as well as everything else matches. At the column level, it uses the (redundant) row ids to add new rows. It will throw an exception if it is overwriting data. - Made some GATKReport methods public, and added more setters and getters. - Added method that compares formats of two GATKReports, and added an equals method to verify all data inside. - The gsalib for R now supports reading GATKReport v1 files in addition to legacy formats (v0.) - Added a GATKReportDataType enum to give column a certain data type. This must be specified when making a gatherable report. This enum contains several methods including a reverse lookup map. - Added a data type field in GATKColumn, when a type is not specified, the unknown type is used. Unknown types should not be gathered. Test changes: - Updated Unit Tests for GATK Report v1. Added a test for the gatherer. Left one test disabled while we transition from v0 to v1. - Updated the MD5 hashes in integration tests throughout the GATK. Other changes: - Added the gatherer functions to CoverageByRG - Also added the scatterCount parameter in the Interval Coverage script - Dropped support for reading in legacy GATKReport formats ( v0.) - Updated VariantEvalWalker to work with GATK Report v1, added a format String to all applicable DataPoints. - Rewrote the read file method for GATK report files. - Optimized the equals methods within GATKReport. The protected functions should only be called by the GATKReport methods. Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2012-03-12 23:09:19 -04:00
Eric Banks	10995d349e	Fix old error message	2012-03-12 22:56:08 -04:00
Eric Banks	2314787767	Generalizing to avoid JDK 1.7 incompatibilities	2012-03-12 22:50:59 -04:00
Ryan Poplin	03223029e3	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-12 09:42:37 -04:00
Eric Banks	b4749757f8	Fixes for SLOD: 1) didn't work properly for multi-allelics (randomly chose an allele, possibly one that wasn't genotyped in the full context); 2) in cases when there were more alt alleles than the max allowed and the user is calculating SB, we would recompute the best alt alleles(s); 3) for some reason, we were recomputing the LOD for the full context when we'd already done that. Given that this passes integration tests on my end, this should be the last commit before the release.	2012-03-12 01:07:07 -04:00
Ryan Poplin	2836c161ee	Moving trimToVariableRegion out of reduced reads and into a public static ReadClipper function. HaplotypeCaller clips reads to the active region boundries before passing to the HMM. The philosophy of the HC is moving towards genotyping the entire haplotype sequence contained within the active region as a single allele.	2012-03-11 14:45:59 -04:00
Ryan Poplin	8db11eb781	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-10 21:00:55 -05:00
Mark DePristo	1ee46e5c06	Collect only the bare essentials in the GATKRunReport Now looks like: <GATK-run-report> <id>D7D31ULwTSxlAwnEOSmW6Z4PawXwMxEz</id> <start-time>2012/03/10 20.21.19</start-time> <end-time>2012/03/10 20.21.19</end-time> <run-time>0</run-time> <walker-name>CountReads</walker-name> <svn-version>1.4-483-g63ecdb2</svn-version> <total-memory>85000192</total-memory> <max-memory>129957888</max-memory> <user-name>depristo</user-name> <host-name>10.0.1.10</host-name> <java>Apple Inc.-1.6.0_26</java> <machine>Mac OS X-x86_64</machine> <iterations>105</iterations> </GATK-run-report> No longer capturing command line or directory information, to minimize people's concerns with phone home and privacy	2012-03-10 20:27:14 -05:00
Ryan Poplin	92bbb9bbdd	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-10 10:09:57 -05:00
Mark DePristo	3ba2e5667c	CalibrateGenotypesLikelihoods include pOfDGivenD now	2012-03-09 16:00:07 -05:00
Mark DePristo	1011f3862b	CalibrateGenotypeLikelihoods now emits the position of the variant for debugging -- Refactored some duplicated code (FYI, code duplication = root of all evil) into shared functions -- Added long-missing integrationtests -- CHRIS/RYAN -- it would be very good to add an integration test covering external VCF files as I believe we rely on this functionality and it's not tested at all	2012-03-09 16:00:07 -05:00
David Roazen	91d10431d3	BAMScheduler: detect contigs from the interval list that are not in the merged BAM header's sequence dictionary This is a quick-and-dirty patch for the null pointer error Mauricio reported earlier. Later on we might want to address in a more general way the fact that we validate user intervals against the reference but not against the merged BAM header produced by the engine at runtime.	2012-03-09 15:20:16 -05:00
David Roazen	bc65f6326f	Detect incomplete reads from BAM schedule file in BAMSchedule before they become buffer underflows This fix is similar, but distinct from the earlier fix to GATKBAMIndex. If we fail to read in a complete 3-integer bin header from the BAM schedule file that the engine has written, throw a ReviewedStingException (since this is our problem, not the user's) rather than allowing a cryptic buffer underflow error to occur. Note that this change does not fix the underlying problem in the engine, if there is one (there may be an as-yet-undetected bug in the code that writes the bam schedule). It will just make it easier for us to identify what's going wrong in the future.	2012-03-09 12:33:48 -05:00
David Roazen	32dee7ed9b	Avoid buffer underflow in GATKBAMIndex by detecting premature EOF in BAM indices GATKBAMIndex would allow an extremely confusing BufferUnderflowException to be thrown when a BAM index file was truncated or corrupt. Now, a UserException is thrown in this situation instructing the user to re-index the BAM. Added a unit test for this case as well.	2012-03-08 15:30:44 -05:00
Guillermo del Angel	c04853eae6	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-08 12:30:04 -05:00
Guillermo del Angel	858acf8616	Hidden mode in ValidationAmplicons to support ILMN output format (same as Sequenom, with just shuffled columns)	2012-03-08 12:29:44 -05:00
Andrey Sivachenko	56f074b520	docs updated	2012-03-07 18:47:15 -05:00
Andrey Sivachenko	117ea605ac	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-07 18:35:07 -05:00
Andrey Sivachenko	497a1b059e	transition to JEXL completed, old parameters setting individual cutoffs now deprecated	2012-03-07 18:34:11 -05:00
Andrey Sivachenko	fbd2f04a04	JEXL support added; intermediate commit, not yet functional	2012-03-07 17:29:42 -05:00
Mark DePristo	0376d73ece	Improved, public version of ErrorRateByCycle -- A cleaner table output (molten). For those interested in seeing how this can be done with GATKReports look here for a nice clean example -- Integration tests -- Minor improvements to GATKReportTable with methods to getPrimaryKeys	2012-03-07 13:10:08 -05:00
Christopher Hartl	a6a8fc0521	Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable	2012-03-07 10:05:43 -05:00
Mark DePristo	569be953b9	Bugfix for VariantEval -- We weren't properly handling the case where a site had both a SNP and indel in both eval and comp. These would naturally pair off as SNP x SNP and INDEL x INDEL in eval, but we'd still invoke update2 with (null, SNP) and (null, INDEL) resulting most conspicously as incorrect false negatives in the validation report. -- Updating misc. integrationtests, as the counting of comps (in particular for dbSNP) was inflated because of this effect.	2012-03-06 16:56:59 -05:00
David Roazen	811f871f78	Do not fail tests that require the GATK private key if the user does not have permission to read it Several of the unit tests for the new key authorization feature require read access to the GATK master private key file. Since this file is only readable by members of the group gsagit, this makes it hard for people outside the group to run the test suite. Now, we skip tests that require the master private key if the private key exists (since not existing would be a true error) but is not readable by the user running the test suite Bamboo, of course, will always be able to run these tests.	2012-03-06 15:57:02 -05:00
Christopher Hartl	67def6acc8	Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable	2012-03-06 14:23:14 -05:00
Christopher Hartl	20c1fbaf0f	Fixing a merge (turning off downsampling on DoC)	2012-03-06 14:22:45 -05:00
Ryan Poplin	46b470cc69	Minor misc updates	2012-03-06 10:14:45 -05:00
David Roazen	0702ee1587	Public-key authorization scheme to restrict use of NO_ET -Running the GATK with the -et NO_ET or -et STDOUT options now requires a key issued by us. Our reasons for doing this, and the procedure for our users to request keys, are documented here: http://www.broadinstitute.org/gsa/wiki/index.php/Phone_home -A GATK user key is an email address plus a cryptographic signature signed using our private key, all wrapped in a GZIP container. User keys are validated using the public key we now distribute with the GATK. Our private key is kept in a secure location. -Keys are cryptographically secure in that valid keys definitely came from us and keys cannot be fabricated, however keys are not "copy-protected" in any way. -Includes private, standalone utilities to create a new GATK user key (GenerateGATKUserKey) and to create a new master public/private key pair (GenerateKeyPair). Usage of these tools will be documented on the internal wiki shortly. -Comprehensive unit/integration tests, including tests to ensure the continued integrity of the GATK master public/private key pair. -Generation of new user keys and the new unit/integration tests both require access to the GATK private key, which can only be read by members of the group "gsagit".	2012-03-06 00:09:43 -05:00
Lechu	027843d791	I've simply added a "library(grid)" call at the beginning of the R script generation since R 2.14.2 doesn't seem to load the "grid" package as default. I haven't tested it on previous R versions (you may edit the R version comment to be more precise if desired), but I'm almost certain that this library call shouldn't do any harm on them. Signed-off-by: Ryan Poplin <rpoplin@broadinstitute.org>	2012-03-05 21:27:03 -05:00
Ryan Poplin	f6905630bb	Adding Unit test for Haplotype class. Used in HC's genotype given alleles mode.	2012-03-05 21:08:07 -05:00
Ryan Poplin	9b53250bef	Adding Unit test for Haplotype class. Used in HC's genotype given alleles mode.	2012-03-05 21:07:36 -05:00
Ryan Poplin	b37461587d	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-05 17:54:59 -05:00
Ryan Poplin	c6ded4d23c	Bug fix for hard clipping reads when base insertion and base deletion qualities are present in the read. Updating HaplotypeCaller integration tests to reflect all the recent changes.	2012-03-05 17:54:42 -05:00
Ryan Poplin	14a77b1e71	Getting rid of redundant methods in MathUtils. Adding unit tests for approximateLog10SumLog10 and normalizeFromLog10. Increasing the precision of the Jacobian approximation used by approximateLog10SumLog which changes the UG+HC integration tests ever so slightly.	2012-03-05 12:28:32 -05:00
Mauricio Carneiro	e9ad382e74	unifying the BQSR argument collection	2012-03-05 10:48:26 -05:00
Ryan Poplin	f879daa7d0	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-05 08:29:08 -05:00
Ryan Poplin	d6871967ae	Adding more unit tests and contracts to PairHMM util class. Updating HaplotypeCaller to use the new PairHMM util class. Now that the HMM result isn't dependent on the length of the haplotype there is no reason to ensure all haplotypes have the save length which simplifies the code considerably.	2012-03-05 08:28:42 -05:00
Guillermo del Angel	3b5a7c34d7	Added argument to ValidationAmplicons to only output valid sequences - useful for not having to post-filter or grep resulting files before delivering downstream	2012-03-04 10:24:29 -05:00
Mark DePristo	69611af7d3	Workaround for bug in Picard in ReadGroupProperties -- NPE caused when you call getRunDate on a read group without a date.	2012-03-02 18:53:45 -05:00
Mark DePristo	ba71b0aee4	ReadGroupProperties mk3 -- Includes sequencing date	2012-03-02 16:12:42 -05:00
Eric Banks	1e07e97b58	Optimization: create allele list just once, not for each genotype	2012-03-02 13:30:17 -05:00
Ryan Poplin	0ad7d5fbc1	Standalone common Pair HMM utility class with associated unit tests.	2012-03-01 22:41:13 -05:00
Mark DePristo	2f334a57c2	ReadGroupProperties mk2 -- Includes paired end status (T/F) -- Includes count of reads used in calculation -- Includes simple read type (2x76 for example) -- Better handling of insert size, read length when there's no data, or the data isn't paired end by emitting NA not 0	2012-03-01 18:43:53 -05:00
Mauricio Carneiro	486712bfc2	ugly RG encoding	2012-03-01 17:56:45 -05:00
Mauricio Carneiro	29f74b658b	Unit tests for the context covariate this is simple, but it's the infra-structure to start messing around with the context.	2012-03-01 17:56:45 -05:00
Mark DePristo	aff508e091	ReadGroupProperties walker and associated infrastructure -- ReadGroupProperties: Emits a GATKReport containing read group, sample, library, platform, center, median insert size and median read length for each read group in every BAM file. -- Median tool that collects up to a given maximum number of elements and returns the median of the elements. -- Unit and integration tests for everything. -- Making name of TestProvider protected so subclasses and override name more easily	2012-03-01 15:01:11 -05:00
Mauricio Carneiro	9e95b10789	Context covariate now operates as a highly compressed bitset * All contexts with 'N' bases are now collapsed as uninformative * Context size is now represented internally as a BitSet but output as a dna string * Temporarily disabled sorted outputs because of null objects	2012-02-29 19:25:21 -05:00
Mauricio Carneiro	d379c3763a	DNA Sequence to BitSet and vice-versa conversion tools * Turns DNA sequences (for context covariates) into bit sets for maximum compression * Allows variable context size representation guaranteeing uniqueness. * Works with long precision, so it is limited to a context size of 31 bases (can be extended with BigNumber precision if necessary). * Unit Tests added	2012-02-29 19:25:20 -05:00
Eric Banks	129b5e7f6b	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-28 10:09:34 -05:00
Eric Banks	a4a279ce80	Damn you, Mark	2012-02-28 10:09:09 -05:00
Khalid Shakir	0681bea5a5	Changed DoC from PartitionType.INTERVAL to PartitionType.NONE since it doesn't have a way to gather scattered outputs. Added MultiallelicSummary to HSP eval.	2012-02-28 09:27:27 -05:00
Eric Banks	bd398e30fd	Another quick optimization	2012-02-28 09:25:35 -05:00
Eric Banks	40bdadbda5	Minor optimization as per Mark	2012-02-28 09:24:07 -05:00
Eric Banks	d7928ad669	Drat, missed one: handle null alleles being passed in.	2012-02-27 21:31:54 -05:00
Mark DePristo	24356f11b7	Merged bug fix from Stable into Unstable -- Resolved conflict Conflicts: public/java/src/org/broadinstitute/sting/gatk/datasources/reads/SAMDataSource.java	2012-02-27 17:13:17 -05:00
Mark DePristo	0b29d54937	Changed most BAMSchedule ReviewedStingExceptions to UserExceptions -- As these represent the bulk of the StingExceptions coming from BAMSchedule and are caused by simple problems like the user providing bad input tmp directories, etc.	2012-02-27 17:08:41 -05:00
Mark DePristo	f9e8e82e33	Removed unused class variable from VCFHeaderLineTranslator	2012-02-27 17:07:19 -05:00
Mark DePristo	100ddef930	Fix typo in VariantContextBuilder	2012-02-27 17:06:45 -05:00
Mark DePristo	ca0931c01f	Adding test for reading samtools VCF file	2012-02-27 17:05:50 -05:00
Eric Banks	bd944ab04f	Another test where we no longer print out 'NaN' for the AF.	2012-02-27 15:19:08 -05:00
Mark DePristo	5f7ccdcc01	Avoid calling getBasePileup when there's no pileup in NBaseCount annotation	2012-02-27 15:12:25 -05:00
Eric Banks	52871187d7	Adding integration test for file with no GTs. Also updated md5 for one other test (since we no longer print out 'NaN' for the AF).	2012-02-27 15:09:56 -05:00
Mark DePristo	729bb954e2	Throws ReviewedStingException for a bug when parent VariantContext argument is null	2012-02-27 15:09:00 -05:00
Eric Banks	998ed8fff3	Bug fix to deal with VCF records that don't have GTs. While in there, optimized a bunch of related functions (including removing a copy of the method calculateChromosomeCounts(); why did we have 2 copies? very dangerous).	2012-02-27 14:56:10 -05:00
Mark DePristo	4d9582de77	More general catching of Exceptions in interval reading to throw MalformedFile exception in all cases -- Now throws UserException no matter what happens during the reading of the intervals file.	2012-02-27 14:02:26 -05:00
Mark DePristo	9712fed7a5	Trap SAMFormatException and rethrow as MalformatedBAM exception -- Trap errors in header and rethrow -- Wrap underlying iterator in MalformatedBAMErrorReformattingIterator	2012-02-27 13:52:50 -05:00
Eric Banks	1ea34058c2	Updating integration tests now that standard annotations support multiple alleles	2012-02-27 11:32:26 -05:00
Eric Banks	64754e7870	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-27 11:31:41 -05:00
Eric Banks	850c5d0db2	Enabling Rank Sum Tests for multi-allelics: use ref vs any alt allele.	2012-02-27 09:59:36 -05:00
Eric Banks	dfdf4f989b	Enabling Fisher Strand for multi-allelics: use the alt allele with max AC. Added minor optimization to the method in the VC.	2012-02-27 09:50:09 -05:00
Guillermo del Angel	16122bea8d	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-25 13:57:54 -05:00
Guillermo del Angel	dea35943d1	a) Bug fix in calling new functions that give indel bases and length from regular pileup in LocusIteratorByState, b) Added unit test to cover these.	2012-02-25 13:57:28 -05:00
Mark DePristo	c8a06e53c1	DoC now properly handles reference N bases + misc. additional cleanups -- DoC now by default ignores bases with reference Ns, so these are not included in the coverage calculations at any stage. -- Added option --includeRefNSites that will include them in the calculation -- Added integration tests that ensures the per base tables (and so all subsequent calculations) work with and without reference N bases included -- Reorganized command line options, tagging advanced options with @Advanced	2012-02-25 11:32:50 -05:00
Mark DePristo	50de1a3eab	Fixing bad VCFIntegration tests -- Left disabled a test that should have been enabled -- Didn't add the md5 to the test I actually added -- Now VCFIntegrationTests should be working!	2012-02-25 11:26:36 -05:00
Guillermo del Angel	c9a4c74f7a	a) Bug fixes for last commit related to PileupElements (unit tests are forthcoming). b) Changes needed to make pool caller work in GENOTYPE_GIVEN_ALLELES mode c) Bug fix (yet again) for UG when GENOTYPE_GIVEN_ALLELES and EMIT_ALL_SITES are on, when there's no coverage at site and when input vcf has genotypes: output vcf would still inherit genotypes from input vcf. Now, we just build vc from scratch instead of initializing from input vc. We just take location and alleles from vc	2012-02-24 10:27:59 -05:00
Mauricio Carneiro	ee9a56ad27	Fix subtle bug in the ReduceReads stash reported by Adam * The tailSet generated every time we flush the reads stash is still being affected by subsequent clears because it is just a pointer to the parent element in the original TreeSet. This is dangerous, and there is a weird condition where the clear will affects it. * Fix by creating a new set, given the tailSet instead of trying to do magic with just the pointer.	2012-02-23 18:35:25 -05:00
Mark DePristo	e0c189909f	Added support for breakpoint alleles -- See https://getsatisfaction.com/gsa/topics/support_vcf_4_1_structural_variation_breakend_alleles?utm_content=topic_link&utm_medium=email&utm_source=new_topic -- Added integrationtest to ensure that we can parse and write out breakpoint example	2012-02-23 12:14:48 -05:00
Guillermo del Angel	6866a41914	Added functionality in pileups to not only determine whether there's an insertion or deletion following the current position, but to also get the indel length and involved bases - definitely needed for extended event removal, and needed for pool caller indel functionality.	2012-02-23 09:45:47 -05:00
Eric Banks	d34f07dba0	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-22 20:41:03 -05:00
Ryan Poplin	2b6c0939ab	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-22 19:00:38 -05:00
Ryan Poplin	8695738400	Bug fix in HaplotypeCaller's GENOTYPE_GIVEN_ALLELES mode for insertions greater than length 1. The allele being genotyped was off by one base pair.	2012-02-22 19:00:04 -05:00
Christopher Hartl	2c1b14d35e	Mostly small changes to my own scala scripts: .vcf.gz compatibility for output files, smarter beagle generation, simple script to scatter-gather combine variants. Whole genome indel calling now uses the gold standard indel set.	2012-02-22 17:20:04 -05:00
Mauricio Carneiro	75783af6fc	int <-> BitSet conversion utils for MathUtils * added unit tests.	2012-02-21 14:10:36 -05:00
Guillermo del Angel	0f5674b95e	Redid fix for corner case when forming consensus with reads that start/end with insertions and that don't agree with each other in inserted bases: since I can't iterate over the elements of a HashMap because keys might change during iteration, and since I can't use ConcurrentHashMaps, the code now copies structure of (bases, number of times seen) into ArrayList, which can be addressed by element index in order to iterate on it.	2012-02-20 09:12:51 -05:00
Ryan Poplin	3d9eee4942	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-18 10:55:29 -05:00
Ryan Poplin	a8be96f63d	This caching in the BQSR seems to be too slow now that there are so many keys	2012-02-18 10:54:39 -05:00
Ryan Poplin	78718b8d6a	Adding Genotype Given Alleles mode to the HaplotypeCaller. It constructs the possible haplotypes via assembly and then injects the desired allele to be genotyped.	2012-02-18 10:31:26 -05:00
Guillermo del Angel	e724c63f2b	Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on	2012-02-17 17:18:43 -05:00
Guillermo del Angel	f2ef8d1d23	Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on	2012-02-17 17:15:53 -05:00
Guillermo del Angel	3e031a540f	Solve merge conflict	2012-02-17 10:56:03 -05:00
Guillermo del Angel	cd352f502d	Corner case bug fix: if a read starts with an insertion, when computing the consensus allele for calling the insertion was only added to the last element in the consensus key hash map. Now, an insertion that partially overlaps with several candidate alleles will have their respective count increased for all of them	2012-02-17 10:21:37 -05:00
Eric Banks	2f33c57060	No reason to restrict HaplotypeScore to bi-allelic SNPs when the plumbing for multi-allelic events is already present.	2012-02-16 13:58:00 -05:00
Guillermo del Angel	2f08846d82	Merged bug fix from Stable into Unstable	2012-02-14 21:26:25 -05:00
Guillermo del Angel	7dc6f73399	Bug fix for validation site selector: records with AC=0 in them were always being thrown out if input vcf was sites-only, even when -ignorePolymorphicStatus flag was set	2012-02-14 21:11:24 -05:00
Ryan Poplin	30085781cf	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-14 14:01:20 -05:00
Ryan Poplin	ae5b42c884	Put base insertion and base deletions in the SAMRecord as a string of quality scores instead of an array of bytes. Start of a proper genotype given alleles mode in HaplotypeCaller	2012-02-14 14:01:04 -05:00
David Roazen	85d31f80a2	Merged bug fix from Stable into Unstable	2012-02-13 16:37:11 -05:00
David Roazen	03e5184741	Fix serious engine bug that could cause reads to be dropped under certain circumstances When aggregating raw BAM file spans into shards, the IntervalSharder tries to combine file spans when it can. Unfortunately, the method that combines two BAM file spans was seriously flawed, and would produce a truncated union if the file spans overlapped in certain ways. This could cause entire regions of the BAM file containing reads within the requested intervals to be dropped. Modified GATKBAMFileSpan.union() to correct this problem, and added unit tests to verify that the correct union is produced regardless of how the file spans happen to overlap. Thanks to Khalid, who did at least as much work on this bug as I did.	2012-02-13 16:25:21 -05:00
Eric Banks	ad90af94ed	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-13 15:10:10 -05:00
Eric Banks	0920a1921e	Minor fixes to splitting multi-allelic records (as regards printing indel alleles correctly); minor code refactoring; adding integration tests to cover +/- splitting multi-allelics.	2012-02-13 15:09:53 -05:00
Eric Banks	14981bed10	Cleaning up VariantsToTable: added docs for supported fields; removed one-off hidden arguments for multi-allelics; default behavior is now to include multi-allelics in one record; added option to split multi-allelics into separate records.	2012-02-13 14:32:03 -05:00
Ryan Poplin	e9338e2c20	Context covariate needs to look in the reverse direction for negative stranded reads.	2012-02-13 13:40:41 -05:00
Ryan Poplin	41ffd08d53	On the fly base quality score recalibration now happens up front in a SAMIterator on input instead of in a lazy-loading fashion if the BQSR table is provided as an engine argument. On the fly recalibration is now completely hooked up and live.	2012-02-13 12:35:09 -05:00
Ryan Poplin	3caa1b83bb	Updating HC integration tests	2012-02-11 11:48:32 -05:00
Ryan Poplin	9b8fd4c2ff	Updating the half of the code that makes use of the recalibration information to work with the new refactoring of the bqsr. Reverting the covariate interface change in the original bqsr because the error model enum was moved to a different class and didn't make sense any more.	2012-02-11 10:57:20 -05:00
Eric Banks	f52f1f659f	Multiallelic implementation of the TDT should be a pairwise list of values as per Mark Daly. Integration tests change because the count in the header is now A instead of 1.	2012-02-10 14:15:59 -05:00
Mauricio Carneiro	1fb19a0f98	Moving the covariates and shared functionality to public so Ryan can work on the recalibration on the fly without breaking the build. Supposedly all the secret sauce is in the BQSR walker, which sits in private.	2012-02-10 11:44:01 -05:00
Eric Banks	5e18020a5f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-10 11:08:33 -05:00
Eric Banks	f53cd3de1b	Based on Ryan's suggestion, there's a new contract for genotyping multiple alleles. Now the requester submits alleles in any arbitrary order - rankings aren't needed. If the Exact model decides that it needs to subset the alleles because too many were requested, it does so based on PL mass (in other words, I moved this code from the SNPGenotypeLikelihoodsCalculationModel to the Exact model). Now subsetting alleles is consistent.	2012-02-10 11:07:32 -05:00
Mauricio Carneiro	5af373a3a1	BQSR with indels integrated! * added support to base before deletion in the pileup * refactored covariates to operate on mismatches, insertions and deletions at the same time * all code is in private so original BQSR is still working as usual in public * outputs a molten CSV with mismatches, insertions and deletions, time to play! * barely tested, passes my very simple tests... haven't tested edge cases.	2012-02-09 18:46:45 -05:00
Eric Banks	7a937dd1eb	Several bug fixes to new genotyping strategy. Update integration tests for multi-allelic indels accordingly.	2012-02-09 16:14:22 -05:00
Eric Banks	0f728a0604	The Exact model now subsets the VC to the first N alleles when the VC contains more than the maximum number of alleles (instead of throwing it out completely as it did previously). [Perhaps the culling should be done by the UG engine? But theoretically the Exact model can be called outside of the UG and we'd still want the context subsetted.]	2012-02-09 14:02:34 -05:00
Matt Hanna	aa097a83d5	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-09 11:26:48 -05:00
Matt Hanna	b57d4250bf	Documentation request by Eric. At each stage of the GATK where filtering occurs, added documentation suggesting the goal of the filtering along with examples of suggested inputs and outputs.	2012-02-09 11:24:52 -05:00
Mauricio Carneiro	d561914d4f	Revert "First implementation of GATKReportGatherer" premature push from my part. Roger is still working on the new format and we need to update the other tools to operate correctly with the new GATKReport. This reverts commit aea0de314220810c2666055dc75f04f9010436ad.	2012-02-08 23:28:55 -05:00
Eric Banks	2f800b078c	Changes to default behavior of UG: multi-allelic mode is always on; max number of alternate alleles to genotype is 3; alleles in the SNP model are ranked by their likelihood sum (Guillermo will do this for indels); SB is computed again.	2012-02-08 15:27:16 -05:00
Matt Hanna	51ac87b28c	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-08 08:43:55 -05:00
Matt Hanna	5b58fe741a	Retiring Picard customizations for async I/O and cleaning up parts of the code to use common Picard utilities I recently discovered. Also embedded bug fix for issues reading sparse shards and did some cleanup based on comments during BAM reading code transition meetings.	2012-02-08 08:34:37 -05:00
Mauricio Carneiro	337819e791	disabling the test while we fix it	2012-02-07 19:22:32 -05:00
Roger Zurawicki	c0c676590b	First implementation of GATKReportGatherer - Added the GATKReportGatherer - Added private methods in GATKReport to combine Tables and Reports - It is very conservative and it will only gather if the table columns, match. - At the column level it uses the (redundant) row ids to add new rows. It will throw an exception if it is overwriting data. Added the gatherer functions to CoverageByRG Also added the scatterCount parameter in the Interval Coverage script Made some more GATKReport methods public The UnitTest included shows that the merging methods work Added a getter for the PrimaryKeyName Fixed bugs that prevented the gatherer form working Working GATKReportGatherer Has only the functional to addLines The input file parser assumes that the first column is the primary key Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2012-02-07 18:14:47 -05:00
Mauricio Carneiro	e89887cd8e	laying groundwork to have insertions and deletions going through the system.	2012-02-07 18:11:53 -05:00
Mauricio Carneiro	0d3ea0401c	BQSR Parameter cleanup * get rid of 320C argument that nobody uses. * get rid of DEFAULT_READ_GROUP parameter and functionality (later to become an engine argument).	2012-02-07 14:42:11 -05:00
Eric Banks	717cd4b912	Document -L unmapped	2012-02-07 13:30:54 -05:00
Eric Banks	718da7757e	Fixes to ValidateVariants as per GS post: ref base of mixed alleles were sometimes wrong, error print out of bad ACs was throwing a RuntimeException, don't validate ACs if there are no genotypes.	2012-02-07 13:15:58 -05:00
Eric Banks	9d1a19bbaa	Multi-allelic indels were not being printed out correctly in VariantsToTable; fixed.	2012-02-06 22:49:29 -05:00
Mauricio Carneiro	5961868a7f	fixup for BQSR (HC integration tests) In the new BQSR implementation, covariates do depend on the RecalibrationArgumentCollection.	2012-02-06 22:47:27 -05:00
Mauricio Carneiro	6e6f0f10e1	BaseQualityScoreRecalibration walker (bqsr v2) first commit includes * Adding the context covariate standard in both modes (including old CountCovariates) with parameters * Updating all covariates and modules to use GATKSAMRecord throughout the code. * BQSR now processes indels in the pileup (but doesn't do anything with them yet)	2012-02-06 17:38:29 -05:00
Eric Banks	0717c79901	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-06 16:23:36 -05:00
Eric Banks	91897f5fe7	Transpose rows/cols in AF table to make it molten (so I can plot easily in R)	2012-02-06 16:23:32 -05:00
Guillermo del Angel	fb5786385c	Merged bug fix from Stable into Unstable	2012-02-06 13:22:56 -05:00
Guillermo del Angel	6ec686b877	Complement to previous commit: make sure we also don't inherit filter from input VCF when genotyping at an empty site	2012-02-06 13:19:26 -05:00
Guillermo del Angel	93ffca1e3a	Merged bug fix from Stable into Unstable	2012-02-06 11:58:58 -05:00
Guillermo del Angel	827be878b4	Bug fix when running UG in GenotypeGivenAlleles mode: if an input site to genotype had no coverage, the output VCF had AC,AF and AN inherited from input VCF, which could have nothing to do with given BAM so numbers could be non-sensical. Now new vc has clear attributes instead of attributes inherited from input VCF.	2012-02-06 11:58:13 -05:00
Eric Banks	fbbd04621d	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-06 11:53:31 -05:00
Eric Banks	edb4edc08f	Commented out unused metrics for now	2012-02-06 11:53:15 -05:00
Ryan Poplin	096c23a473	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-06 11:10:38 -05:00
Ryan Poplin	dc05b71e39	Updating Covariate interface with Mauricio to include an errorModel parameter. On the fly recalibration of base insertion and base deletion quals is live for the HaplotypeCaller	2012-02-06 11:10:24 -05:00
Guillermo del Angel	1e11408f8b	Merged bug fix from Stable into Unstable	2012-02-06 10:34:26 -05:00
Guillermo del Angel	090d87b48b	Bug fix in ValidationSiteSelector: when input vcf had genotypes and was multiallelic, the parsing of the AF/AC fields was wrong. Better logic to unify parsing of field	2012-02-06 10:33:12 -05:00
Eric Banks	9d94f310f1	Break AF histogram into max and min AFs	2012-02-06 09:01:19 -05:00
Ryan Poplin	b7ffd144e8	Cleaning up the covariate classes and removing unused code from the bqsr optimizations in 2009.	2012-02-06 08:54:42 -05:00
Eric Banks	cef550903e	Minor optimization	2012-02-06 00:48:00 -05:00
Ryan Poplin	5343f8ba67	Initial version of on-the-fly, lazy loading base quality score recalibration. It isn't completely hooked up yet but I'm committing so Mauricio and Mark can see how I envision it will fit together. Look it over and give any feedback. With the exception of the Solid specific code we are very very close to being able to remove TableRecalibrationWalker from the code base and just replace it with PrintReads -BQSR recal.csv	2012-02-05 13:09:03 -05:00
Ryan Poplin	f94d547e97	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-03 17:14:20 -05:00
Ryan Poplin	894d3340be	Active Region Traversal should use GATKSAMRecords everywhere instead of SAMRecords. misc cleanup.	2012-02-03 17:13:52 -05:00
Mauricio Carneiro	4a57add6d0	First implementation of DiagnoseTargets * calculates and interprets the coverage of a given interval track * allows to expand intervals by specified number of bases * classifies targets as CALLABLE, LOW_COVERAGE, EXCESSIVE_COVERAGE and POOR_QUALITY. * outputs text file for now (testing purposes only), soon to be VCF. * filters are overly aggressive for now.	2012-02-03 17:12:43 -05:00
Mauricio Carneiro	3dd6a1f962	Adding some generic sum and average functions to MathUtils	2012-02-03 17:12:43 -05:00
Mauricio Carneiro	e1d69e4060	make the size of a GenomeLoc int instead of long it will never be bigger than an int and it's actually useful to be an int so we can use it as parameters to array/list/hash size creation.	2012-02-03 17:12:42 -05:00
Ryan Poplin	0e44430e47	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-03 13:45:11 -05:00
Christopher Hartl	aa3638ecb3	Merge branch 'master' of ssh://chartl@ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-03 13:42:09 -05:00
Eric Banks	3abfbcbcf2	Generalized the TDT for multi-allelic events	2012-02-03 12:23:21 -05:00
Ryan Poplin	601e53d633	Fix when specifying preset active regions with -AR argument	2012-02-02 16:34:26 -05:00
Christopher Hartl	0111505ea9	Terrible. Swapping the paternal and sample ids.	2012-02-02 11:41:16 -05:00
Ryan Poplin	1f50f6970b	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-02 10:17:13 -05:00
Ryan Poplin	4ed06801a7	Updating HaplotypeCaller's HMM calc to use GOP as a function of the read instead of a function of the haplotype in preparation for IQSR	2012-02-02 10:17:04 -05:00
Matt Hanna	8adfc79123	Merged bug fix from Stable into Unstable	2012-02-01 16:07:41 -05:00
Matt Hanna	30b937d2af	Fix bug discovered in FGTP branch in which BlockInputStream returns -1 in cases where some data could be read, but not all the data requested by the caller.	2012-02-01 16:06:22 -05:00
Mauricio Carneiro	45da892ecc	Better exceptions to catch malformed reads * throw exceptions in LocusIteratorByState when hitting reads starting or ending with deletions	2012-02-01 11:56:19 -05:00
Christopher Hartl	810996cfca	Introducing: VariantsToPed, the world's most annoying walker! And also a busted QScript to run it that I need Khalid's help debugging ( frownie face ). Note that VariantsToPed and PlinkSeq generate the same binary file (up to strand flips...thanks PlinkSeq), so I know it's working properly. Hooray!	2012-02-01 10:39:03 -05:00
Christopher Hartl	25d943f706	Merge branch 'master' of ssh://chartl@ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-01 10:32:11 -05:00
Ryan Poplin	056b24ccd6	Resolving merge conflicts with LocusIteratorByState	2012-01-31 16:13:32 -05:00

... 2 3 4 5 6 ...

1882 Commits (f9f8589692fece0185a7e8e059b75ee4672d1c8d)