Commit Graph

402 Commits (101ffc4dfd5ce83f7d6bf0b5de4a99ebfac447de)

Author SHA1 Message Date
Mark DePristo c642a080d4 Merged bug fix from Stable into Unstable 2011-10-04 14:08:41 -07:00
Mark DePristo 941317167e Updating MD5 for BAMs that I added a read group to 2011-10-04 14:08:00 -07:00
Mark DePristo e1d6c7a50a Updating MD5 that have changed due to sample ordering differences 2011-10-04 09:33:23 -07:00
Mark DePristo 343a7b6b2f Updating UG integration tests for arbitrary impact of sample order changes on downsampling 2011-10-04 08:14:00 -07:00
Mark DePristo a27641e1fc Cleaned up imports 2011-10-04 06:28:36 -07:00
Mark DePristo b20689ff55 No longer supports extraProperties
-- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem
-- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record.  If the two records are inconsistent, an error is thrown
-- addSample() in Sample.class now invokes mergeSample() when appropriate
-- Validation types are now only STRICT or SILENT
-- Validation code implemented in SampleDBBuilder
-- Extensive unit tests for SampleDBBuilder
2011-10-03 19:20:33 -07:00
Mark DePristo 867a7476c1 Systematic unit tests for the sample object 2011-10-03 19:09:02 -07:00
Mauricio Carneiro 3837aa45b4 Fixing conflicts
Conflicts:
	public/java/test/org/broadinstitute/sting/utils/clipreads/ReadClipperUnitTest.java
2011-10-03 19:07:59 -07:00
Mark DePristo 2e3dc52088 Minor function renaming 2011-10-03 14:41:13 -07:00
Mark DePristo dd71884b0c On path to SampleDB engine integration
-- PedReader tag parser
-- Separation of SampleDBBuilder from SampleDB (now immutable)
-- Removed old sample engine arguments
2011-10-03 12:08:07 -07:00
Mark DePristo 89ac50e86e SampleDataSource -> SampleDB 2011-10-03 09:33:30 -07:00
Mark DePristo 93fba06cb5 Support for whitespace only lines 2011-10-03 09:30:10 -07:00
Mark DePristo 0604ce55d1 PedReader support for ; separated lines, not only newline 2011-10-03 09:19:58 -07:00
Mark DePristo 52f670c8b8 100% version of PedReader
-- Passes all unit tests
-- Added unit tests for missing fields
2011-10-03 06:12:58 -07:00
Roger Zurawicki bf6a3a6532 Added framework to do batch CigarClip Testing
*NOTE: This commit has not been compiled!
2011-10-02 22:33:46 -04:00
Mark DePristo dd75ad9f49 95% PedReader
-- Passes significiant unit tests
-- Implicit sample creation for mom / dad when you create single samples
-- Continuing cleanup of Sample and SampleDataSource
2011-09-30 18:03:34 -04:00
Mark DePristo 84160bd83f Reorganization of Sample
-- Moved Gender and Afflication to separate public enums
-- PedReader 90% implemented
-- Improve interface cleanup to XReadLines and UserException
2011-09-30 15:50:54 -04:00
Mark DePristo 56f10b40a8 Fixing test bugs for WindowMaker that required empty sample list 2011-09-30 14:18:27 -04:00
Mark DePristo 30d23942b1 Renamed ReadBackedPileup getXSampleName() functions to getXSample
-- now that we don't have Sample objects floating around we don't have to have all of the Name extensions on our functions
2011-09-30 10:02:57 -04:00
Mark DePristo e055a78f6e LIBS now requires at least one sample be present
-- UnitTest provides a "null" sample for matching the reads without read groups
2011-09-30 09:49:35 -04:00
Mark DePristo b71b51751e Bug fix for UnitTest
-- Provide the null sample to the LIBS, as this seems to be required for correctly passing this unit test
-- Will be fixed in a future update
2011-09-29 17:30:01 -04:00
Mark DePristo 1765fbeb6b Merge branch 'master' into ped 2011-09-29 17:18:51 -04:00
Mark DePristo 98ecaf8aa0 Support for ReducedReads with reduced counts and average quals
-- ReadUtils and UnitTest updated to support new byte[] style
-- Removed unnecessary read transformer in PairHMM
2011-09-29 17:18:39 -04:00
Mark DePristo 9458f01409 Test cleanup of Sample object 2011-09-29 15:13:05 -04:00
Mark DePristo 625ffb6a07 LocusIteratorByState and ReadBackedPileups no long use Sample 2011-09-29 14:52:11 -04:00
Mark DePristo 505416b6c0 Merge branch 'master' into ped 2011-09-29 12:22:39 -04:00
Mauricio Carneiro 4086fa768f Disabling all ReadClipperUnitTests 2011-09-29 12:20:35 -04:00
Mark DePristo 5043d76c3d Removing more bad uses of SampleDataSource creation 2011-09-29 12:16:34 -04:00
Mark DePristo 5c9227cf5e Further cleanup of Sample database
-- Removing more and more unnecessary code
-- Partial removal of type safe Sample usage.  On the road to SampleDB only
2011-09-29 11:50:05 -04:00
Mark DePristo 2a0cd556d3 Further cleanup of Sample
-- Cleaned up interface functions in GAE
-- Added Walker.getSampleDB() function which is an easier option for tools to get the samples db
2011-09-29 10:34:51 -04:00
Mark DePristo e76f381628 Moved sample package from DataSources to gatk, and renamed it samples
-- All associated changes to the codebase are just header updates
2011-09-29 09:57:15 -04:00
Mauricio Carneiro fc86cd6fd8 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/carneiro/gatk/RR into rr 2011-09-29 00:12:15 -04:00
Roger Zurawicki 4fd5630f6a Added ReadClipper Unit Test
* Includes tests that include HardClip to Read and Reference Coords.
* Changed ReadUtils.HardClipByReferenceCoordinates from private to protected to allow for testing
2011-09-28 23:13:50 -04:00
Matt Hanna 9272ed03b5 Merged bug fix from Stable into Unstable 2011-09-28 21:26:43 -04:00
Matt Hanna 0acaf2df65 Fix an embarrassing issue where a specific configuration of minimal coverage
over small intervals could cause reads to be dropped from the pileup.  Nothing
to see here...
2011-09-28 21:23:01 -04:00
Mark DePristo 4f09453470 Refactored reduced read utilities
-- UnitTests for key functions on reduced reads
-- PileupElement calls static functions in ReadUtils
-- Simple routine that takes a reduced read and fills in its quals with its reduced qual
2011-09-26 12:58:31 -04:00
Guillermo del Angel 3eef800889 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-24 21:20:11 -04:00
Guillermo del Angel 4707ab4a7d Added unit tests to test genotype merges with PL's 2011-09-24 21:17:15 -04:00
Guillermo del Angel 203517fbb7 a) Cleanups/bug fixes to previous commit to CombineVariants.
b) Change md5 to reflect records that are now merged correctly.
c) Change unit merge alleles test to reflect the fact that a null non-variant vc object is not valid and not supported because there's no way to codify such object in a vcf. The code correctly converts this to a non-variant single-base event with whatever the reference is at that location.
2011-09-24 19:08:00 -04:00
Guillermo del Angel cd058dd10f a) Fixed md5 for legit change in UG output that now also no-calls genotypes w/0,0,0 in PL's in SNP case.
b) First reimplementation of new vc merger of different types. Previous version did it in two steps, first merging all vc's per type and then trying to see if resulting vc's would be merged if alleles of one type were a subset of another, but this won't work when uniquifying genotypes since sample names would be messed up and GT sample names wouldn't match VC sample names. Now, it's actually simpler: when splitting vc's by type before merging, we check for alleles of one vc being a subset of alleles of vc of another type and if so we put them together in same list.
2011-09-24 13:40:11 -04:00
Mark DePristo 8d9e136bba Merge branch 'stable' 2011-09-24 09:26:28 -04:00
Mark DePristo f792353dcd Framework for genotype unit test 2011-09-24 08:56:45 -04:00
Mark DePristo c0bb0cb465 Make DiploidGenotype enum private to walkers.genotyper 2011-09-24 08:48:33 -04:00
Khalid Shakir 1803bd6ae2 Merged bug fix from Stable into Unstable 2011-09-23 21:05:00 -04:00
Khalid Shakir 8ceb93b8ac Fixed an integration test which crashed on the out of date LSF DRMAA library when run against the obsolete LSF dotkit instead of .combined_LSF_SGE 2011-09-23 21:03:22 -04:00
David Roazen 40202c85e0 Merged bug fix from Stable into Unstable 2011-09-23 16:35:55 -04:00
David Roazen e1cb5f6459 SnpEff annotator now assigns a functional class to each effect and distinguishes between actual effects and mere modifiers.
-We now assign a functional class (nonsense, missense, silent, or none) to each SnpEff effect, and add a
 SNPEFF_FUNCTIONAL_CLASS annotation to the INFO field of the output VCF.
-Effects are now prioritized according to both biological impact and functional class, instead of impact only.
-Many of SnpEff's "low-impact" effects are now classified as "modifiers" with lower priority than every
 other effect. This includes such "effects" as DOWNSTREAM, UPSTREAM, INTRON, GENE, EXON, and others that
 really describe the location of the variant rather than its biological effect.

This code will be short-lived (likely 1.2-only), as the next version of SnpEff will include most of these
features directly.

Checking this change into Stable+Unstable instead of Unstable because the current functional class stratification
in VariantEval is basically broken and urgently needs to be fixed for production purposes.
2011-09-23 16:06:52 -04:00
Mark DePristo 106a26c42d Minor file cleanup 2011-09-23 08:25:20 -04:00
Mark DePristo a9f073fa68 Genotype merging unit tests for simpleMerge
-- Remaining TODOs are all for GdA
2011-09-23 08:24:49 -04:00
Eric Banks a8e0fb26ea Updating md5 because the file changed 2011-09-23 07:33:20 -04:00
Mark DePristo c49cc623de Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-22 17:26:21 -04:00
Mark DePristo dab7232e9a simpleMerge UnitTest for not annotating and annotating to different info key 2011-09-22 17:26:11 -04:00
Mark DePristo 30ab3af0c8 A few more simpleMerge UnitTest tests for filtered vcs 2011-09-22 17:14:59 -04:00
Mark DePristo 5cf82f9236 simpleMerge UnitTest tests filtered VC merging 2011-09-22 17:05:12 -04:00
Mark DePristo 46ca33dc04 TestDataProvider now can be named 2011-09-22 17:04:32 -04:00
Mark DePristo 68da555932 UnitTest for simpleMerge for alleles 2011-09-22 15:16:37 -04:00
Eric Banks 80d7300de4 Unit test was passing in FORMAT as one of the sample names. There used to be a hack in the VCFHeader to check for this and remove it and I couldn't figure out why, but now I know. Hack was removed and now the unit test passes in only the sample names as per the contract. 2011-09-22 13:28:42 -04:00
Eric Banks 9c1728416c Revert "Updating md5 for fixed file" because this was fixed properly in unstable (but will break SnpEff if put into Stable).
This reverts commit 6b4182c6ab3e214da4c73bc6f3687ac6d1c0b72c.
2011-09-22 13:16:42 -04:00
Eric Banks 888d8697b1 Merged bug fix from Stable into Unstable 2011-09-22 13:16:31 -04:00
Eric Banks 15a410b24b Updating md5 for fixed file 2011-09-22 13:15:41 -04:00
Mark DePristo ba5f83fee2 start of VariantContextUtils UnitTest
-- tests rsID merging
2011-09-22 12:10:39 -04:00
Mark DePristo a05c959e5a Empty unit tests for VariantContextUtils
-- will be expanded over the day
2011-09-22 11:20:07 -04:00
Mark DePristo 3fdee2b9ed Merge from stable into unstable 2011-09-22 11:19:43 -04:00
Mark DePristo c514df6d18 Merge of stable into unstable 2011-09-22 10:34:27 -04:00
Mark DePristo f81a41b889 Updating MD5s for CombineVariants
-- Old version had broken RSIDs, new version is fixed.  No longer see rs1234,. as it is now just rs1234
2011-09-22 10:30:25 -04:00
Eric Banks b8ea9ceb68 Adding integration test that uses the -V:dbsnp binding to make sure it won't fail later on if someone messes with Tribble. 2011-09-21 22:43:31 -04:00
Mark DePristo 6bcfce225f Fix for dynamic type determination for bgzip files
-- GZipInputStream handles bgzip files under linux, but not mac
-- Added BlockCompressedInputStream test as well, which works properly on bgzip files
2011-09-21 15:39:19 -04:00
Mark DePristo 74f9ccf6dd Merge 2011-09-21 11:30:11 -04:00
Mark DePristo 6592972f82 Putative fix for BAQ array out of bounds
-- Old code required qual to be <64, which isn't strictly necessary.  Now uses the Picard SAMUtils.MAX_PHRED_SCORE constant
-- Unittest to enforce this behavior
2011-09-21 11:25:08 -04:00
Mark DePristo 7d11f93b82 Final bugfix for CombineVariants
-- Now handles multiple records at a site, so that you don't see records like set=dbsnp-dbsnp-dbsnp when combining something with dbsnp
-- Proper handling of ids.  If you are merging files with multiple ids for the same record, the ids are merged into a comma separated list
2011-09-21 10:58:32 -04:00
Mark DePristo a91ac0c5db Intermediate commit of bugfixes to CombineVariants 2011-09-21 10:15:05 -04:00
David Roazen b04d8eab55 Merged bug fix from Stable into Unstable 2011-09-20 17:24:14 -04:00
David Roazen d9ea764611 SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file.
This change is urgently required for production, which is why it's going into Stable+Unstable
instead of just Unstable.

The keys for the SnpEff version and command header lines in the VCF file output by
VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally
different from the keys for those same lines in the SnpEff output file (SnpEffVersion
and SnpEffCmd), so that output files from VariantAnnotator won't be confused
with output files from SnpEff itself.
2011-09-20 16:30:55 -04:00
Mark DePristo b7511c5ff3 Fixed long-standing bug in tribble index creation
-- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index.  This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write
-- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary.  This can be used conveniently everywhere, and is what's written into the Tribble index
-- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils
-- VCFWriter now requires the master sequence dictionary
-- Updated walkers that create VCFWriters to provide the master sequence dictionary
2011-09-20 10:53:18 -04:00
Mark DePristo aa8afa3899 Merge 2011-09-19 21:16:47 -04:00
Mark DePristo 4ad330008d Final intervals cleanup
-- No functional changes (my algorithm wouldn't work)
-- Major structural cleanup (returning more basic data structures that allow us to development new algorithm)
-- Unit tests for the efficiency of interval partitioning
2011-09-19 10:19:10 -04:00
Mark DePristo 6ea57bf036 Merge branch 'master' into sgintervals 2011-09-19 09:50:19 -04:00
Guillermo del Angel 7fa1e237d9 Forgot to git stash pop new MD5's for CombineVariants integration test 2011-09-16 12:53:54 -04:00
David Roazen d78e00e5b2 Renaming VariantAnnotator SnpEff keys
This is to head off potential confusion with the output from the SnpEff tool itself,
which also uses a key named EFF.
2011-09-15 17:42:15 -04:00
Eric Banks 9dc6354130 Oops didn't mean to touch this test before 2011-09-15 16:55:24 -04:00
Eric Banks 202405b1a1 Updating the FunctionalClass stratification in VariantEval to handle the snpEff annotations; this change really needs to be in before the release so that the pipeline can output semi-meaningful plots. This commit maintains backwards compatibility with the crappy Genomic Annotator output. However, I did clean up the code a bit so that we now use an Enum instead of hard-coded values (so it's now much easier to change things if we choose to do so in the future). I do not see this as the final commit on this topic - I think we need to make some changes to the snpEff annotator to preferentially choose certain annotations within effect classes; Mark, let's chat about this for a bit when you get back next week. Also, for the record, I should be blamed for David's temporary commit the other day because I gave him the green light (since when do you care about backwards compatibility anyways?). In any case, at least now we have something that works for both the old and new annotations. 2011-09-15 13:52:31 -04:00
David Roazen 3db457ed01 Revert "Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames"
After discussing this with Mark, it seems clear that the old version of the
VariantEval FunctionalClass stratification is preferable to this version.
By reverting, we maintain backwards compatibility with legacy output files
from the old GenomicAnnotator, and can add SnpEff support later without
breaking that backwards compatibility.

This reverts commit b44acd1abd9ab6eec37111a19fa797f9e2ca3326.
2011-09-14 10:47:28 -04:00
David Roazen e0c8c0ddcb Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames
This is a temporary and hopefully short-lived solution. I've modified
the FunctionalClass stratification to stratify by effect impact as
defined by SnpEff annotations (high, moderate, and low impact) rather
than by the silent/missense/nonsense categories.

If we want to bring back the silent/missense/nonsense stratification,
we should probably take the approach of asking the SnpEff author
to add it as a feature to SnpEff rather than coding it ourselves,
since the whole point of moving to SnpEff was to outsource genomic
annotation.
2011-09-14 07:09:47 -04:00
David Roazen 1213b2f8c6 SnpEff 2.0.2 support
-Rewrote SnpEff support in VariantAnnotator to support the latest SnpEff release (version 2.0.2)
-Removed support for SnpEff 1.9.6 (and associated tribble codec)
-Will refuse to parse SnpEff output files produced by unsupported versions (or without a version tag)
-Correctly matches ref/alt alleles before annotating a record, unlike the previous version
-Correctly handles indels (again, unlike the previous version
2011-09-14 07:09:47 -04:00
Guillermo del Angel 5b1bf6e244 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-13 17:04:43 -04:00
Guillermo del Angel c6672f2397 Intermediate (but necessary) fix for Beagle walkers: if a marker is absent in the Beagle output files, but present in the input vcf, there's no reason why it should be omitted in the output vcf. Rather, the vc is written as is from the input vcf 2011-09-13 16:57:37 -04:00
Ryan Poplin 981b78ea50 Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts. 2011-09-12 12:17:43 -04:00
Guillermo del Angel 9344938360 Uncomment code to add deleted bases covering an indel to per-sample genotype reporting, update integration tests accordingly 2011-09-10 19:41:01 -04:00
Guillermo del Angel b399424a9c Fix integration test affected by non-calling all-zero PL samples, and add a more complicated multi-sample integration test from a phase 1 case, GBR with mixed technologies and complex input alleles 2011-09-09 20:44:47 -04:00
Mark DePristo 72536e5d6d Done 2011-09-09 15:44:47 -04:00
Ryan Poplin 1953edcd2d updating Validate Variants deletion integration test 2011-09-09 13:39:08 -04:00
Ryan Poplin 9ada9b3ed4 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-09 13:15:36 -04:00
Ryan Poplin 354529bff3 adding Validate Variants integration test with a deletion 2011-09-09 13:15:24 -04:00
Mark DePristo 06cb20f2a5 Intermediate commit cleaning up scatter intervals
-- Adding unit tests to ensure uniformity of intervals
2011-09-09 12:56:45 -04:00
Eric Banks 51eb95d638 Missed these tests before 2011-09-09 11:46:37 -04:00
Eric Banks 6ad8943ca0 CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly. 2011-09-09 09:45:24 -04:00
Eric Banks eaaba6eb51 Confirming that when stratifying by sample in VE the monomorphic sites for a given sample are not counted for the relevant metrics. Adding integration test to cover it. 2011-09-08 13:17:34 -04:00
Ryan Poplin 2636d216de Adding indel vqsr integration test 2011-09-08 10:38:13 -04:00
Ryan Poplin 9cba1019c8 Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap 2011-09-08 09:25:13 -04:00
Ryan Poplin e0020b2b29 Fixing PrintRODs. Now has input and only prints out one copy of each record 2011-09-08 08:58:37 -04:00
Mark DePristo 2ded027762 Removed dysfunctional tranches support from VariantEval 2011-09-07 16:09:24 -04:00
Eric Banks aa9e32f2f1 Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark. 2011-09-07 15:48:06 -04:00
Mark DePristo 9127849f5d BugFix for unit test 2011-09-07 14:54:10 -04:00
Eric Banks da9c8ab386 Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly. 2011-09-06 20:39:42 -04:00
Mark DePristo 1aa4b12ff0 Reduced the number of combinations being tested here, which was overkill 2011-09-01 10:42:43 -04:00
Mark DePristo 3af001fff2 Bugfix for file that must not exist on disk 2011-08-29 17:00:10 -04:00
Mark DePristo 1ceb020fae UnitTests for RScript 2011-08-27 10:50:05 -04:00
Mark DePristo c0503283df Spelling fix requires md5 updates 2011-08-26 07:40:44 -04:00
Guillermo del Angel e618cb1e79 a) Renamed/expanded SelectVariants arguments that choose particular kinds of variants and particular allelic types, now instead of -Indels or -SNPs we can specify for example -selectType [MIXED|INDEL|SNP|MNP|SYMBOLIC]. To select biallelic, multiallelic variants, use -restrictAllelesTo [BIALLELIC|MULTIALLELIC]. Corresponding gatkdocs changes.
b) More useful AC,AF logging in VariantsToTable with multiallelic sites: instead of logging comma-separated values, log max value by default. Hidden, experimental argument -logACSum to log sum of ACs instead. This is due to extreme slowness of R in parsing strings to tokens and computing max/sum itself (~100x slower than gatk).
c) Added integrationtest for new SelectVariants commands
2011-08-24 12:25:50 -04:00
Khalid Shakir 1ecbf05aae Avoid segfaults due to out of date and possibly abandonded LSF DRMAA implementation when use'ing LSF instead of .combined_LSF_SGE 2011-08-23 23:49:36 -04:00
Khalid Shakir c4c90c8826 Updates to JobRunners from the Queue developer community and from running the WholeGenomePipeline:
- Ability to pass a different resident memory reservation and limits. Useful for large pileups of low pass genome data that sometimes need high -Xmx6g but usually don't exceed 2-3g in actual heap size.
- Fixed jobPriority to work for all job runners. Now must be a integer between 0 and 100- even for GridEngine- and will be mapped to the correct values.
- Passing parallel environment and job resource requests to LSF and GridEngine. Useful for passing tokens like iodine_io=1 and -pe pe_slots 8
- Refactored GridEngine JobRunner to also provide basic support for other job dispatchers with DRMAA implementations such as Torque/PBS. Should work for basic running but advanced users must pass their own jobNativeArgs from the command line or in customized QScripts until someone maps properties like jobQueue, jobPriority, residentRequest, etc. into a Torque/PBS/etc. dispatcher.
2011-08-22 15:13:27 -04:00
Guillermo del Angel 782453235a Updated VariantEvalIntegrationTest since there's a new column separating nMixed and nComplex in CountVariants
Misc updates to WholeGenomeIndelCalling.scala
Bug fix in VariantEval (may be temporary, need more investigation): if -disc option is used in sites-only vcf's then a null pointer exception is produced, caused by recent introduction of -xl_sf options.
2011-08-20 12:24:22 -04:00
Guillermo del Angel 4939648fd4 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-20 08:50:43 -04:00
Mark DePristo ff018c7964 Swapped argument order but not MD5 order 2011-08-19 16:55:56 -04:00
Mark DePristo b08d63a6b8 Documentation and code cleanup for ClipReads, CallableLoci, and VariantsToTable
-- Swapped -o [summary] and -ob [bam] for more standard -o [bam] and -os [summary] arguments.
-- @Advanced arguments
2011-08-19 15:06:37 -04:00
Guillermo del Angel 269ed1206c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-19 09:32:20 -04:00
Mark DePristo a5e279d697 Dynamic typing of vcf.gz files
-- CombineVariantsIntegrationTests now use dynamic typing of vcf.gz files
-- FeatureManagerUnitTests tests for correctness.
2011-08-19 09:05:11 -04:00
Guillermo del Angel 58560a6d50 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 16:17:52 -04:00
Guillermo del Angel 3dfb60a46e Fixing up and refactoring usage of indel categories. On a variant context, isInsertion() and isDeletion() are now removed because behavior before was wrong in case of multiallelic sites. Now, methods isSimpleInsertion() and isSimpleDeletion() will return true only if sites are biallelic. For multiallelic sites, isComplex() will return true in all cases.
VariantEval module CountVariants is corrected and an additional column is added so that we log mixed events and complex indels separately (before they were being conflated).
VariantEval module IndelStatistics is considerably simplified as the sample stratification was wrong and redundant, now it should work with the VE-generic Sample stratification. Several columns are renamed or removed since they're not really useful
2011-08-18 16:17:38 -04:00
Mark DePristo c2287c93d7 Cleanup of codec locations. No more dbSNPHelper
-- refdata/features now in utils/codecs with the other codecs
-- Deleted dbsnpHelper.  rsID function now in VCFutils.  Remaining code either deleted or put into VariantContextAdaptors
-- Many associated import updates due to code move
2011-08-18 10:02:46 -04:00
Eric Banks b75a1807e3 Adding integration test to cover sample exclusion 2011-08-17 22:40:09 -04:00
David Roazen 53006da9a5 Improved descriptions for the SnpEff annotations in the VCF header
(based on Eric's feedback).
2011-08-17 16:09:10 -04:00
Mark DePristo 6e828260a0 Removed -B support. Now explodes with error if -B provided. 2011-08-16 16:13:47 -04:00
David Roazen 9d2cda3d41 Removed a public -> private dependency in our test suite. 2011-08-12 17:29:10 -04:00
Menachem Fromer 9121b8ed65 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-12 12:24:19 -04:00
Menachem Fromer 7ed120361d Fixed bug that required symbolic alleles to be padded with reference base and added integration test to test parsing and output of symbolic alleles 2011-08-12 12:23:44 -04:00
Eric Banks 27f0748b33 Renaming the HapMap codec and feature to RawHapMap so that we don't get esoteric errors when trying to bind a rod with the name 'hapmap' (since it was also a feature). 2011-08-12 11:11:56 -04:00
Eric Banks 005bd71be3 Working too quickly earlier. Fixing syntax. 2011-08-12 10:29:36 -04:00
Menachem Fromer c7ca33cbff Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-12 10:12:09 -04:00
Eric Banks 639a01f382 Updating integration test now that VE has been updated 2011-08-12 07:15:08 -04:00
Eric Banks 41f3da75d7 Implementation in VE was confusing 'variant' status vs. 'polymorphic' status. This led to issues because we now match types of eval and comp; specifically, subsetting a VC to a monomorphic sample can't change the 'variant' status of the VC (it's still a variant site or otherwise we'll never match the comps, which breaks GenotypeConcordance). CountVariants really got this wrong. Fixed. VE now passes all integration tests. 2011-08-12 02:22:44 -04:00
Eric Banks 45f973ab1f Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-12 00:40:18 -04:00
Eric Banks eba316621d Finish moving VE over to new rod system and fixing up the type inconsistency between eval and comp rods. Now the novel count is always 0 under the known stratification. :) 2011-08-12 00:40:08 -04:00
Menachem Fromer 9de06560df Update to new RodBinding system 2011-08-11 17:54:16 -04:00
Ryan Poplin f1d1252be2 Fixing syntax of BQSR and UG performance tests. 2011-08-11 17:04:09 -04:00
Ryan Poplin 902eb0c61e Adding dbsnp annotation back into the UG integration tests 2011-08-11 13:55:03 -04:00
Ryan Poplin c7b9a9ef0a Updating UnifiedGenotyper to use the new rod binding system. 2011-08-11 11:02:11 -04:00
Ryan Poplin 79c86e211f Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-11 09:59:20 -04:00
Ryan Poplin ea42ee4a95 Updating BQSR for the new rod binding system. 2011-08-11 09:58:42 -04:00
Mark DePristo 8cdc0cbd9c Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-11 08:58:49 -04:00
Mark DePristo 40e06f9afb Fixed broken RodBinding defaults.
-- Verified now to be correct at runtime
-- UnitTest covers this
-- createTypeDefault now takes a Type, not a Class, so that parameterized classes can have their parameter fetched in the defaults.
2011-08-11 08:58:30 -04:00
Eric Banks bdb1da30fd Better interface for getting RodBindings to the VariantAnnotatorEngine and its annotations: pass around an AnnotatorCompatibleWalker (interface) object. Updating VA to use the new rod system. 2011-08-10 22:43:08 -04:00
Eric Banks 07ad8c78a9 More tools moved over. Fixed the VariantContextIntegrationTest which was not useful because the md5s were all removed. In the future, instead of removing md5s (putting it in 'parameterization' mode), you should instead use @Test{enabled=false} since it's easier to track. 2011-08-10 14:24:40 -04:00
Eric Banks 8d14d32a62 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-10 13:42:37 -04:00
Eric Banks 749c8bfbcd Moving more tools over to the new rod system 2011-08-10 13:42:35 -04:00
David Roazen 0497170bc9 SnpEffCodec now implements SelfScopingFeatureCodec so that we no longer have to specify the codec name on the command line for SnpEff files. 2011-08-10 13:12:09 -04:00
Eric Banks a42f90db11 Moving more tools over to use the standard VC arg collection. Also, while I'm in there, I removed all of the empty references to @Requires given that it's no longer relevant. 2011-08-10 12:20:18 -04:00
Ryan Poplin c60cf52f73 Updating VQSR for new RodBinding syntax. Cleaning up indel specific parts of VQSR. 2011-08-10 10:20:37 -04:00
Eric Banks 1ea5ec276b Minor cleanup 2011-08-09 23:28:59 -04:00
Eric Banks bc2d4f554d Bringing Indel Realigner up to speed with the new rod binding syntax; now use -known to specify the known indels track. 2011-08-09 23:21:17 -04:00