Commit Graph

294 Commits (69b19047ba1e40a1bd4acdbf11e0a42394de48bb)

Author SHA1 Message Date
Mark DePristo 4055877708 Prints 0.0 TiTv not NaN when there are no variants
-- Updated md5
2011-12-07 12:07:54 -05:00
Matt Hanna f5b977fc88 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-06 10:11:35 -05:00
Matt Hanna 4001c22a11 Better file count / buffering variation in test suite. Parameterized read shard buffering. Misc cleanup. 2011-12-06 10:10:38 -05:00
Khalid Shakir 677bea0abd Right aligning GATKReport numeric columns and updated MD5s in tests.
PreQC parses file with spaces in sample names by using tabs only.
PostQC allows passing the file names for the evals so that flanks can be evaled.
BaseTest's network temp dir now adds the user name to the path so files aren't created in the root.
HybridSelectionPipeline:
- Updated to latest versions of reference data.
- Refactored Picard parsing code replacing YAML.
2011-12-05 23:22:15 -05:00
Ryan Poplin 110298322c Adding Transmission Disequilibrium Test annotation to VariantAnnotator and integration test to test it. 2011-11-29 09:29:18 -05:00
Mark DePristo e60272975a Fix for changed MD5 in streaming VCF test 2011-11-23 19:01:33 -05:00
Mark DePristo 12f09d88f9 Removing references to SimpleMetricsByAC 2011-11-23 16:08:18 -05:00
Mark DePristo 4107636144 VariantEval updates
-- Performance optimizations
-- Tables now are cleanly formatted (floats are %.2f printed)
-- VariantSummary is a standard report now
-- Removed CompEvalGenotypes (it didn't do anything)
-- Deleted unused classes in GenotypeConcordance
-- Updates integration tests as appropriate
2011-11-23 13:02:07 -05:00
Mark DePristo 29ca24694a UG now encoding NO_CALLs as ./. not ./.:.:4:0,0,0
A few updated UGs integration tests
2011-11-22 08:22:32 -05:00
Mark DePristo 2b51c01df4 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-21 19:16:06 -05:00
Mark DePristo 5443d3634a Again, fixing the add call when we really mean replace
-- Updating MD5s for UG to reflect that what was previously called ./.:.:10:0,0,0 is now just ./.  Eric will fix long-standing bug in QD observed from this change
-- VFW MD5s restored to their old correct values.  There was a bug in my implementation to caused the genotypes to not be parsed from the lazy output even through the header was incorrect.
2011-11-21 19:15:56 -05:00
Mark DePristo 2c501364b8 GenotypesContext no longer have immutability in constructor
-- additional bug fixes throughout VariantContext and GenotypesContext objects
2011-11-21 14:34:31 -05:00
David Roazen 1296dd41be Removing the legacy -L "interval1;interval2" syntax
This syntax predates the ability to have multiple -L arguments, is
inconsistent with the syntax of all other GATK arguments, requires
quoting to avoid interpretation by the shell, and was causing
problems in Queue.

A UserException is now thrown if someone tries to use this syntax.
2011-11-21 13:18:53 -05:00
Mark DePristo 9cb3fe3a59 Vastly better way of doing on-demand genotyping loading
-- With our GenotypesContext class we can naturally create a LazyGenotypesContext subclass that does the on-demand loading.
-- This new class was replaced all of the old, complex functionality
-- Better still, there were many cases were the genotypes were being loaded unnecessarily, resulting in efficiency.  This was detected because some of the integration tests changed as the genotypes were no longer being parsing unnecessarily
-- Misc. bug fixes throughout the system
-- Bug fixes for PhaseByTransmission with new GenotypesContext
2011-11-20 08:23:09 -05:00
Mark DePristo 7d09c0064b Bug fixes and code cleanup throughout
-- chromosomeCounts now takes builder as well, cleaning up a lot of code throughout the codebase.
2011-11-19 18:40:15 -05:00
Mark DePristo b7b57ef39a Updating MD5 to reflect canonical ordering of calculation
-- We should no longer have md5s changing because of hashmaps changing their sort order on us
-- Added GenotypeLikelihoodsUnitTests
-- Refactored ExactAFCaclculation to put the PL -> QUAL calculation in the GenotypeLikelihoods class to avoid the code copy.
2011-11-19 15:57:33 -05:00
Mark DePristo 73119c8e3c Merge with master
-- A few bug fixes
2011-11-19 09:56:06 -05:00
Mark DePristo f685fff79b Killing the final versions of old new VariantContext interface 2011-11-18 21:32:43 -05:00
Matt Hanna 8bb4d4dca3 First pass of the asynchronous block loader.
Block loads are only triggered on queue empty at this point.  Disabled by
default (enable with nt:io=?).
2011-11-18 15:02:59 -05:00
Mark DePristo 7490dbb6eb First version of VariantContextBuilder 2011-11-18 11:06:15 -05:00
Eric Banks 16a021992b Updated header description for the INFO and FORMAT DP fields to be more accurate. 2011-11-17 13:17:53 -05:00
Mark DePristo e56d52006a Continuing bugfixes to get new VC working 2011-11-16 10:39:17 -05:00
Eric Banks c2ebe58712 Merge remote-tracking branch 'Laurent/master' 2011-11-16 09:34:47 -05:00
David Roazen 0d163e3f52 SnpEff 2.0.4 support
-Modified the SnpEff parser to work with the SnpEff 2.0.4 VCF output format
-Assigning functional classes and effect impacts now handled directly
 by SnpEff rather than the GATK
-Removed support for SnpEff 2.0.2, as we no longer trust the output of that
 version since it doesn't exclude effects associated with certain nonsensical
 transcripts. These effects are excluded as of 2.0.4.
-Updated unit and integration tests

This support is based on a *release-candidate* of SnpEff 2.0.4, and so is subject
to change between now and the next GATK release.
2011-11-15 18:36:22 -05:00
Mark DePristo df415da4ab More bug fixes on the way to passing all tests 2011-11-15 17:38:12 -05:00
Laurent Francioli fb685f88ec Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-15 16:23:53 -05:00
Mark DePristo 460a51f473 ID field now stored in the VariantContext itself, not the attributes 2011-11-15 14:56:33 -05:00
Mark DePristo 4ff8225d78 GenotypeMap -> GenotypeCollection part 3
-- Test code actually builds
2011-11-14 17:51:41 -05:00
Mark DePristo f0234ab67f GenotypeMap -> GenotypeCollection part 2
-- Code actually builds
2011-11-14 17:42:55 -05:00
Mark DePristo 2e9d5363e7 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-14 15:32:06 -05:00
Eric Banks 7b2a7cfbe7 Transfer headers from the resource VCF when possible when using expressions. While there, VA was modified so that it didn't assume that the ID field was present in the VC's info map in preparation for Mark's upcoming changes. 2011-11-14 14:31:27 -05:00
Mark DePristo 79987d685c GenotypeMap contains a Map, not extends it
-- On path to replacing it with GenotypeCollection
2011-11-14 12:55:03 -05:00
Laurent Francioli 1347beef40 Merge branch 'PhaseByTransmission' 2011-11-14 11:31:28 +01:00
Laurent Francioli 6881d4800c Added Integration tests for Phasing by Transmission 2011-11-14 10:47:51 +01:00
Mark DePristo fee9b367e4 VariantContext genotypes are now stored as GenotypeMap objects
-- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples
-- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type.
-- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous.  Now everything uses GenotypeMap with a specific ordering of samples (by name)
-- Integrationtests updated and all pass
2011-11-11 15:00:35 -05:00
Mark DePristo 153e52ffed VariantEvalIntegrationTest for IntervalStratification 2011-11-10 14:10:39 -05:00
Ryan Poplin 94dc447a70 Merged bug fix from Stable into Unstable 2011-11-07 15:26:35 -05:00
Ryan Poplin 0b181be61f Bug fix in SelectVariants when using a discordance track but no sample specifications. Added integration test to test this. 2011-11-07 15:25:16 -05:00
Eric Banks 759f4fe6b8 Moving unclaimed walker with bad integration test to archive 2011-11-07 13:16:38 -05:00
Eric Banks 3517489a22 Better --sample selection integration test for VE. The previous one would return true even if --sample was not working at all. 2011-11-06 01:07:49 -04:00
Eric Banks ad57bcd693 Adding integration test to cover using expressions with IDs (-E foo.ID) 2011-11-05 23:53:15 -04:00
Mauricio Carneiro e89ff063fc GATKSAMRecord refactor
The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...).

* No tools should create SAMRecord anymore, use GATKSAMRecord instead *
2011-11-03 15:43:26 -04:00
Eric Banks 78a00d2ddc Updating UG integration tests (needed updating only because the -mbq default is different from the old -mmq one). 2011-11-02 21:13:44 -04:00
Eric Banks e1edd6bd12 Removing the min mapping quality argument since it wasn't being used in the normal processing of the pileups in UG - only for indel pileups. Instead, we apply the min base quality to the reads in the pileup for indels and define it to be the min 'confidence' of the base. Docs are updated but I didn't rename the argument as I don't want people to complain. 2011-11-02 20:32:58 -04:00
Eric Banks 54331b44e9 New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths. 2011-11-02 12:47:30 -04:00
Eric Banks 0ca7428e76 Allow processing of empty intervals, but warn user when this case is encountered. 2011-10-28 12:12:14 -04:00
Eric Banks 649dfe98f0 Add VCF header for any expressions that are requested 2011-10-28 10:22:19 -04:00
Eric Banks 19e27d4568 Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative. 2011-10-27 23:55:11 -04:00
Eric Banks ccfd853b34 Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed. 2011-10-27 20:43:50 -04:00
Eric Banks 8c4dbce6d8 Don't serialize the GATKArgumentCollection for the GATKRunReports (which would have meant dealing with the new IntervalBindings). Also, forgot to remove a test that's no longer relevant to BED parsing. 2011-10-27 13:58:19 -04:00