Commit Graph

1554 Commits (edb4edc08fb0dea2aeea61afcfe4fd39faa7ada1)

Author SHA1 Message Date
Mauricio Carneiro e5df9e0684 cleaner test output
cleaned up the debug "pass" messages in the unit tests
2011-12-16 18:04:00 -05:00
Mauricio Carneiro fcc21180e8 Added hardClipLeadingInsertions UnitTest for the ReadClipper
fixed issue where a read starting with an insertion followed by a deletion would break, clipper can now safely clip the insertion and the deletion if that's the case.

note: test is turned off until contract changes to allow hanging insertions (left/right).
2011-12-16 18:02:47 -05:00
Mauricio Carneiro 075be52adc Added hardClipByReferenceCoordinates (left and right tails) UnitTest for the ReadClipper 2011-12-16 18:01:33 -05:00
Mauricio Carneiro 5bba44d693 Added hardClipByReferenceCoordinates UnitTest for the ReadClipper
* fixed edge case when requested to hard clip beginning of a read that had hanging soft clipped bases on the left tail.
* fixed edge case when requested to hard clip end of a read that had hanging soft clipped bases on the right tail.
* fixed AlignmentStart of a clipped read that results in only hard clips and soft clips

note: added tests to all these beautiful cases...
2011-12-16 18:01:33 -05:00
Mauricio Carneiro 5838ba529d Added hardClipByReadCoordinates UnitTest for the ReadClipper 2011-12-16 18:01:33 -05:00
Mauricio Carneiro c26295919e Added hardClipBothEndsByReferenceCoordinates UnitTest for the ReadClipper 2011-12-16 18:01:33 -05:00
Mark DePristo 1994c3e3bc Only print warning about allele incompatibility when running there are genotypes in the file in CombineVariants 2011-12-16 16:50:51 -05:00
Mark DePristo b6067be952 Support for selecting only variants with specific IDs from a file in SelectVariants
-- Cleaned up unused variables as well
2011-12-16 16:50:39 -05:00
Mark DePristo d6d2f49c88 Don't print log if there are no BAMs 2011-12-16 16:50:36 -05:00
Mark DePristo 78e0950a77 Minor bug fix for printing in SAMDataSource 2011-12-16 11:45:40 -05:00
Mark DePristo 7bc0d18418 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-16 11:42:42 -05:00
Ryan Poplin 5aa79dacfc Changing hidden optimization argument to advanced. 2011-12-16 10:29:20 -05:00
Matt Hanna 3642a73c07 Performance improvements for dynamically merging BAMs in read walkers.
This change and my previous change have dropped runtime when dynamically merging 2k BAM files from 72.6min/1M reads to 46.8sec/1M reads.
Note that many of these changes are stopgaps -- the real problem is the way ReadWalkers interface with Picard, and I'll have to work with
Tim&Co to produce a more maintainable patch.
2011-12-16 09:37:44 -05:00
Mark DePristo 3414ecfe2e Restored serial version of reader initialization. Serial mode is default, as the performance gains aren't so huge.
-- Serial version can be re-enabled with a static boolean, if we decide to return to the serial version

-- Comparison of serial and parallel reader with cached and uncached files:

Initialization time: serial   with 500 fully cached BAMs: 8.20 seconds
Initialization time: serial   with 500 uncached BAMs    : 197.02 seconds
Initialization time: parallel with 500 fully cached BAMs: 30.12 seconds
Initialization time: parallel with 500 uncached BAMs    : 75.47 seconds
2011-12-16 09:22:10 -05:00
Mark DePristo fb1c9d2abc Restored serial version of reader initialization. Parallel mode is default.
-- Serial version can be re-enabled with a static boolean, if we decide to return to the serial version
2011-12-16 09:05:28 -05:00
Mauricio Carneiro e61e5c7589 Refactor of ReadClipper unit tests
* expanded the systematic cigar string space test framework Roger wrote to all tests
* moved utility functions into Utils and ReadUtils
* cleaned up unused classes
2011-12-15 19:05:43 -05:00
Mauricio Carneiro 4748ae0a14 Bugfix: Softclips before Hardclips weren't being accounted for
caught a bug in the hard clipper where it does not account for hard clipping softclipped bases in the resulting cigar string, if there is already a hard clipped base immediately after it.
* updated unit test for hardClipSoftClippedBases with corresponding test-case.
2011-12-15 12:17:25 -05:00
Mauricio Carneiro 62a2e335bc Changing HardClipper contract to allow UNMAPPED reads
shifted the contract to functions that operate on reference based coordinates. The clipper should do the right thing with unmapped reads, but it needs more testing (Ryan is using it at the moment and says it works). Will write some unit tests.
2011-12-15 11:08:19 -05:00
Matt Hanna 9333b678b5 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-14 18:05:44 -05:00
Matt Hanna 6fb4be1a09 Cache header merger. 2011-12-14 18:05:31 -05:00
Mauricio Carneiro 50dee86d7f Added unit test to catch Ryan's exception
Unit test to catch the special case that broke the clipping op, fixed in the previous commit.
2011-12-14 16:58:14 -05:00
Mauricio Carneiro 128bdf9c09 Create artificial reads with "default" parameters
* added functions to create synthetic reads for unit testing with reasonable default parameters
* added more functions to create synthetic reads based on cigar string + bases and quals.
2011-12-14 16:58:14 -05:00
Mauricio Carneiro c85100ce9c Fix ClippingOp bug when performing multiple hardclip ops
bug: When performing multiple hard clip operations in a read that has indels, if the N+1 hardclip requests to clip inside an indel that has been removed by one of the (1..N) previous hardclips, the hard clipper would go out of bounds.

fix: dynamically adjust the boundaries according to the new hardclipped read length. (this maintains the current contract that hardclipping will never return a read starting or ending in indels).
2011-12-14 16:57:47 -05:00
Eric Banks de5928ac5a Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-14 16:24:56 -05:00
Eric Banks 4fddac9f22 Updating busted integration tests 2011-12-14 16:24:43 -05:00
Mark DePristo 01e547eed3 Parallel SAMDataSource initialization
-- Uses 8 threads to load BAM files and indices in parallel, decreasing costs to read thousands of BAM files by a significant amount
-- Added logger.info message noting progress and cost of reading low-level BAM data.
2011-12-14 16:14:26 -05:00
Mark DePristo 71b4bb12b7 Bug fix for incorrect logic in subsetSamples
-- Now properly handles the case where a sample isn't present (no longer adds a null to the genotypes list)
-- Fix for logic failure where if the number of requested samples equals the number of known genotypes then all of the records were returned, which isn't correct when there are missing samples.
-- Unit tests added to handle these cases
2011-12-14 16:14:26 -05:00
Eric Banks 35fc2e13c3 Using the new PL cache, fix a bug: when only a subset of the genotyped alleles are used for assigning genotypes (because the exact model determined that they weren't all real) the PLs need to be adjusted to reflect this. While fixing this I discovered that the integration tests are busted because ref calls (ALT=.) were getting annotated with PLs, which makes no sense at all. 2011-12-14 15:31:09 -05:00
Eric Banks 1e90d602a4 Optimization: cache up front the PL index to the pair of alleles it represents for all possible numbers of alternate alleles. 2011-12-14 13:38:20 -05:00
Eric Banks 988d60091f Forgot to add in the new result class 2011-12-14 13:37:15 -05:00
Eric Banks 106bf13056 Use a thread local result object to collect the results of the exact calculation instead of passing in multiple pre-allocated arrays. 2011-12-14 12:05:50 -05:00
Eric Banks 7648521718 Add check for mixed genotype so that we don't exception out for a valid record 2011-12-14 11:26:43 -05:00
Eric Banks 9497e9492c Bug fix for complex records: do not ever reverse clip out a complete allele. 2011-12-14 11:21:28 -05:00
Eric Banks 09a5a9eac0 Don't update lineNo for decodeLoc - only for decode (otherwise they get double-counted). Even still, because of the way the GATK currently utilizes Tribble we can parse the same line multiple times, which knocks the line counter out of sync. For now, I've added a TODO in the code to remind us and the error messages note that it's an approximate line number. 2011-12-14 10:43:52 -05:00
Eric Banks d3f4a5a901 Fail gracefully when encountering malformed VCFs without enough data columns 2011-12-14 10:37:38 -05:00
Eric Banks 079932ba2a The log10cache needs to be larger if we want to handle 10K samples in the UG. 2011-12-13 23:36:10 -05:00
Ryan Poplin 7fa1ab1bae Fix to allow haplotype caller to call indels after UG engine entry points were unified. Adding Haplotype Caller integration test 2011-12-13 17:19:40 -05:00
Eric Banks e47a113c9f Enabled multi-allelic SNP discovery in the UG. Needs loads of testing so do not use yet. While working in the UG engine, I removed the extraneous and unnecessary MultiallelicGenotypeLikelihoods class: now a VariantContext with PL-annotated Genotypes is passed around instead. Integration tests pass so it must all work, right? 2011-12-12 23:02:45 -05:00
Mauricio Carneiro 5cc1e72fdb Parallelized SelectVariants
* can now use -nt with SelectVariants for significant speedup in large files
* added parallelization integration tests for SelectVariants
2011-12-12 18:41:14 -05:00
Mauricio Carneiro a70a0f25fb Better debug output for SAMDataSource
output the name and number of the files being loaded by the GATK instead of "coordinate sorted".
2011-12-12 17:57:29 -05:00
Mark DePristo d03425df2f TODO optimization targets 2011-12-12 17:39:51 -05:00
Laurent Francioli 7cf27bb66e Updated md5sum for MendelianViolationEvaluator test to reflect the change in column alignment in VariantEval. 2011-12-12 12:22:43 +01:00
Laurent Francioli 025bdfe2cc Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-12 12:19:44 +01:00
Eric Banks 7b6338c742 Merge branch 'master' into trialleles 2011-12-11 00:28:46 -05:00
Eric Banks 7c4b9338ad The old bi-allelic implementation of the Exact model has been completely deprecated - you can only use the multi-allelic implementation now. 2011-12-11 00:23:33 -05:00
Eric Banks 044f211a30 Don't collapse likelihoods over all alt alleles - that's just not right. For now, the QUAL is calculated for just the most likely of the alt alleles; I need to think about the right way to handle this properly. 2011-12-10 23:57:14 -05:00
Eric Banks 364f1a030b Plumbing added so that the UG engine can handle multiple alleles and they can successfully be genotyped. Alleles that aren't likely are not allowed to be used when assigning genotypes, but otherwise the greedy PL-based approach is what is used. Moved assign genotypes code to UG engine since it has nothing to do with the Exact model. Still have some TODOs in here before I can push this out to everyone. 2011-12-09 14:25:28 -05:00
Mauricio Carneiro 8475328b2c Turning off test that breaks read clipper
until we define what is the desired behavior for clipping this particular case.
2011-12-09 11:53:12 -05:00
Roger Zurawicki 4cbd1f0dec Reorganized the testing code and created ClipReadsTestUtils
Tests are more rigorous and includes many more test cases.
We can tests custom cigars and the generated cigars.
     *Still needs debugging because code is not working.
Created test classes to be used across several tests.

Some cases are still commented out.

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2011-12-09 11:52:34 -05:00
Roger Zurawicki 0e9c2cefa2 testHardClipSoftClippedBases works with Matches and Deletions
Insertions are a problem so cigar cases with "I" are commented out.
The test works with multiple deletions and matches.

This is still not a complete test. A lot of cigar test cases are commented out.

Added insertions to ReadClipperUnitTest

ReadClipper now tests for all indels.

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2011-12-09 11:43:37 -05:00
Eric Banks 64dad13e2d Don't carry around an extra copy of the code for the Haplotype Caller 2011-12-09 11:09:40 -05:00
Eric Banks 442ceb6ad9 The Exact model now computes both the likelihoods and posteriors (in separate arrays); likelihoods are used for assigning genotypes, not the posteriors. 2011-12-09 10:16:44 -05:00
Laurent Francioli a79144f7db Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-09 15:57:24 +01:00
Laurent Francioli 72fbfba97d Added UnitTests for getFamilies() and getChildrenWithParents() 2011-12-09 15:57:07 +01:00
Laurent Francioli 5a06170804 Corrected bug causing getChildrenWithParents() to not take the last family member into consideration. 2011-12-09 14:51:34 +01:00
Eric Banks aa4a8c5303 No dynamic programming solution for assignning genotypes; just done greedily now. Fixed QualByDepth to skip no-call genotypes. No-calls are no longer given annotations (attributes). 2011-12-09 02:25:06 -05:00
Eric Banks 2fe50c64da Updating md5s 2011-12-09 00:47:01 -05:00
Eric Banks 8777288a9f Don't throw a UserException if too many alt alleles are trying to be genotyped. Instead, I've added an argument that allows the user to set the max number of alt alleles to genotype and the UG warns and skips any sites with more than that number. 2011-12-09 00:00:20 -05:00
Eric Banks 3e7714629f Scrapped the whole idea of an int/long as an index into the ACset: with lots of alternate alleles we run into overflow issues. Instead, simply use the ACcounts array as the hash key since it is unique for each AC conformation. To do this, it needed to be wrapped inside an object so hashcode() would work. 2011-12-08 23:50:54 -05:00
Eric Banks 4aebe99445 Need to use longs for the set index (because we can run out of ints when there are too many alternate alleles). Integration tests now use the multiallelic implementation. 2011-12-08 15:31:02 -05:00
Eric Banks 7750bafb12 Fixed bug where last dependent set index wasn't properly being transferred for sites with many alleles. Adding debugging output. 2011-12-08 13:50:50 -05:00
Guillermo del Angel 252e0f3d0a Merged bug fix from Stable into Unstable 2011-12-08 13:11:39 -05:00
Guillermo del Angel 1bfe28067f Don't try to genotype an indel even bigger than the reference window size, or else we'll be out of bounds. Necessary to handle Phase 1 integrated callset with large deletions. Better error indication when validating a GenomeLoc. 2011-12-08 12:54:08 -05:00
Mark DePristo 9def841275 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-07 13:36:16 -05:00
Mark DePristo 4055877708 Prints 0.0 TiTv not NaN when there are no variants
-- Updated md5
2011-12-07 12:07:54 -05:00
Matt Hanna 15533e08df Fixed issue with RODWalker parallelization.
Turns out that someone previously upped the declared size of a ROD shard to 100M bases, making
each ROD shard larger than the size of chr20.  Why didn't we see this in Stable?  Because the
ShardStrategy/ShardStrategyFactory mechanism was dutifully ignoring the shard size specification.
When I rolled the ShardStrategy/ShardStrategyFactory mechanics back into the DataSources as part
of the async I/O project, I inadvertently reenabled this specifier.
2011-12-07 11:55:42 -05:00
Mark DePristo 5d2212bc8e Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-07 09:03:17 -05:00
Mark DePristo 6bf18899df Fix for variant summary -- now treats all 50 bp deletions or insertions as CNVs 2011-12-07 09:02:49 -05:00
Matt Hanna c9b2cd8ba5 Fix for chartl's stale null representation issue. 2011-12-06 18:05:17 -05:00
Eric Banks 79d18dc078 Fixing indexing bug on the ACsets. Added unit tests for the Exact model code. 2011-12-06 16:17:18 -05:00
Matt Hanna f5b977fc88 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-06 10:11:35 -05:00
Matt Hanna 4001c22a11 Better file count / buffering variation in test suite. Parameterized read shard buffering. Misc cleanup. 2011-12-06 10:10:38 -05:00
Khalid Shakir 677bea0abd Right aligning GATKReport numeric columns and updated MD5s in tests.
PreQC parses file with spaces in sample names by using tabs only.
PostQC allows passing the file names for the evals so that flanks can be evaled.
BaseTest's network temp dir now adds the user name to the path so files aren't created in the root.
HybridSelectionPipeline:
- Updated to latest versions of reference data.
- Refactored Picard parsing code replacing YAML.
2011-12-05 23:22:15 -05:00
Eric Banks 7a0f6feda4 Make sure that too many alternate alleles aren't being passed to the genotyper (10 for now) and exit with a UserError if there are. 2011-12-05 16:18:52 -05:00
Eric Banks 7fac4afab3 Fixed priors (now initialized upon engine startup in a multi-dimensional array) and cell coefficients (properly handles the generalized closed form representation for multiple alleles). 2011-12-05 15:57:25 -05:00
Eric Banks a7cb941417 The posteriors vector is now 2 dimensional so that it supports multiple alleles (although the UG is still hard-coded to use only array[0] for now); the exact model now collapses probabilities for all conformations over a given AC into the posteriors array (in the appropriate dimension). Fixed a bug where the priors and posteriors were being passed in swapped. 2011-12-04 13:02:53 -05:00
Eric Banks eab2b76c9b Added loads of comments for future reference 2011-12-03 23:54:42 -05:00
Eric Banks 29662be3d7 Fixed bug where k=2N case wasn't properly being computed. Added optimization for BB genotype case not in old model. At this point, integration tests pass except for 1 case where QUALs differ by 0.01 (this is okay because I occasionally need to compute extra cells in the matrix which affects the approximations) and 2 cases where multi-allelic indels are being genotyped (some work still needs to be done to support them). 2011-12-03 23:12:04 -05:00
Eric Banks 71f793b71b First partially working version of the multi-allelic version of the Exact AF calculation 2011-12-02 14:13:14 -05:00
David Roazen d014c7faf9 Queue now properly escapes all shell arguments in generated shell scripts
This has implications for both Qscript authors and CommandLineFunction authors.

Qscript authors:
You no longer need to (and in fact must not) manually escape String values to
avoid interpretation by the shell when setting up Walker parameters. Queue will
safely escape all of your Strings for you so that they'll be interpreted literally. Eg.,

Old way:
filterSNPs.filterExpression = List("\"QD<2.0\"", "\"MQ<40.0\"", "\"HaplotypeScore>13.0\"")

New way:
filterSNPs.filterExpression = List("QD<2.0", "MQ<40.0", "HaplotypeScore>13.0")

CommandLineFunction authors:
If you're writing a one-off CommandLineFunction in a Qscript and don't really
care about quoting issues, just keep doing things the direct, simple way:

def commandLine = "cat %s | grep -v \"#\" > %s".format(files, out)

If you're writing a CommandLineFunction that will become part of Queue and
will be used by other QScripts, however, it's advisable to do things the
newer, safer way, ie.:

When you construct your commandLine, you should do so ONLY using the API methods
required(), optional(), conditional(), and repeat(). These will manage quoting
and whitespace separation for you, so you shouldn't insert quotes/extraneous
whitespace in your Strings. By default you get both (quoting and whitespace
separation), but you can disable either of these via parameters. Eg.,

override def commandLine = super.commandLine +
                           required("eff") +
                           conditional(verbose, "-v") +
                           optional("-c", config) +
                           required("-i", "vcf") +
                           required("-o", "vcf") +
                           required(genomeVersion) +
                           required(inVcf) +
                           required(">", escape=false) +  // This will be shell-interpreted
                           required(outVcf)

I've ported the Picard/Samtools/SnpEff CommandLineFunction classes to the new
system, so you'll get free shell escaping when you use those in Qscripts just
like with walkers.
2011-12-01 18:13:44 -05:00
Mark DePristo 3060a4a15e Support for list of known CNVs in VariantEval
-- VariantSummary now includes novelty of CNVs by reciprocal overlap detection using the standard variant eval -knownCNVs argument
-- Genericizes loading for intervals into interval tree by chromosome
-- GenomeLoc methods for reciprocal overlap detection, with unit tests
2011-11-30 17:05:16 -05:00
Matt Hanna b65db6a854 First draft of a test script for I/O performance with the new asynchronous I/O processing.
Also includes convenience parameters for specifying the IO/CPU threading balance outside of a tag.  Will be killed when
Queue gets better support for tagged arguments (hopefully soon).
2011-11-30 13:13:16 -05:00
Laurent Francioli 1d5d200790 Cleaned up unused import statements 2011-11-30 15:30:30 +01:00
Mark DePristo 28b286ad39 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-30 09:11:53 -05:00
Laurent Francioli 20bffe0430 Adapted for the new version of MendelianViolation 2011-11-30 14:46:38 +01:00
Laurent Francioli 1cb5e9e149 Removed outdated (and unused) -familyStr commandline argument 2011-11-30 14:45:04 +01:00
Laurent Francioli 9574be0394 Updated MendelianViolationEvaluator integration test 2011-11-30 14:44:15 +01:00
Laurent Francioli f49dc5c067 Added functionality to get all children that have both parents (useful when trios are needed) 2011-11-30 14:43:37 +01:00
Laurent Francioli a4606f9cfe Merge branch 'MendelianViolation'
Conflicts:
	public/java/src/org/broadinstitute/sting/utils/MendelianViolation.java
2011-11-30 11:13:15 +01:00
Laurent Francioli b279ae4ead Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-30 10:10:21 +01:00
Laurent Francioli 7d58db626e Added MendelianViolationEvaluator integration test 2011-11-30 10:09:20 +01:00
Ryan Poplin 91413cf0d9 Merged bug fix from Stable into Unstable 2011-11-29 14:01:23 -05:00
Ryan Poplin cb284eebde Further updating VQSR tutorial wiki docs to reflect the bundle 2011-11-29 14:00:57 -05:00
Ryan Poplin dcb889665d Merged bug fix from Stable into Unstable 2011-11-29 09:58:49 -05:00
Ryan Poplin 447e9bff9e Updating VQSR tutorial wiki docs to reflect the bundle 2011-11-29 09:57:45 -05:00
Ryan Poplin 110298322c Adding Transmission Disequilibrium Test annotation to VariantAnnotator and integration test to test it. 2011-11-29 09:29:18 -05:00
Laurent Francioli ab67011791 Corrected bug introduced in the last update and causing no families to be returned by getFamilies in case the samples were not specified 2011-11-29 11:18:15 +01:00
Eric Banks d7d8b8e380 Tribble v42 changes the Codec.canDecode method to take in a String instead of a File; this is something that Jim was adamant about (because Tribble can handle streams other than files). I didn't want the next person who needed to rev Tribble to deal with this change additionally, so I took care of updating the GATK now. 2011-11-28 14:18:28 -05:00
Laurent Francioli a09c01fcec Removed walker argument FamilyStructure as this is now supported by the engine (ped file) 2011-11-28 17:18:11 +01:00
Laurent Francioli 795c99d693 Adapted MendelianViolation to the new ped family representation. Adapted all classes using MendelianViolation too.
MendelianViolationEvaluator was added a number of useful metrics on allele transmission and MVs
2011-11-28 17:13:14 +01:00
Laurent Francioli e877db8f42 Changed visibility of getSampleDB from protected to public as the sampleDB needs to be accessible from Annotators and Evaluators too. 2011-11-28 17:11:30 +01:00
Laurent Francioli 5c2595701c Added a function to get families only for a given list of samples. 2011-11-28 17:10:33 +01:00
Mark DePristo 3c36428a20 Bug fix for TiTv calculation -- shouldn't be rounding 2011-11-28 10:20:34 -05:00
Eric Banks 436b4dc855 Updated docs 2011-11-28 08:59:48 -05:00
Laurent Francioli b1dd632d5d Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java
2011-11-25 16:16:44 +01:00
Mark DePristo e60272975a Fix for changed MD5 in streaming VCF test 2011-11-23 19:01:33 -05:00
Mark DePristo 12f09d88f9 Removing references to SimpleMetricsByAC 2011-11-23 16:08:18 -05:00
Mark DePristo e319079c32 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-23 13:02:11 -05:00
Mark DePristo 4107636144 VariantEval updates
-- Performance optimizations
-- Tables now are cleanly formatted (floats are %.2f printed)
-- VariantSummary is a standard report now
-- Removed CompEvalGenotypes (it didn't do anything)
-- Deleted unused classes in GenotypeConcordance
-- Updates integration tests as appropriate
2011-11-23 13:02:07 -05:00
David Roazen e5b85f0a78 A toString() method for IntervalBindings
Necessary since we're currently writing things like this to our VCF headers:
intervals=[org.broadinstitute.sting.commandline.IntervalBinding@4ce66f56]
2011-11-23 11:56:12 -05:00
Mark DePristo 5a4856b82e GATKReports now support a format field per column
-- You can tell the table to format your object with "%.2f" for example.
2011-11-23 11:31:04 -05:00
Mark DePristo c8bf7d2099 Check for null comment 2011-11-23 10:47:21 -05:00
Mark DePristo 6c2555885c Caching getSimpleName() in VariantEval is a big performance improvement
-- Removed the SimpleMetricsByAC table, as one should just use the AlleleCount Stratefication and the upcoming VariantSummary table
2011-11-23 08:34:05 -05:00
Guillermo del Angel 32adbd614f Solve merge conflict 2011-11-22 22:48:46 -05:00
Guillermo del Angel 941f3784dc Solve merge conflict 2011-11-22 22:48:03 -05:00
Guillermo del Angel 75d93e6335 Another corner condition fix: skip likelihood computation in case we cut so many bases there's no haplotype or read left 2011-11-22 22:46:12 -05:00
Mark DePristo a3aef8fa53 Final performance optimization for GenotypesContext 2011-11-22 17:19:30 -05:00
Mark DePristo 990c02e4de Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-22 17:19:11 -05:00
Guillermo del Angel 38a90da92c Fixed merge conflict to Unstable 2011-11-22 14:39:45 -05:00
Guillermo del Angel 32a77a8a56 Prevent out of bound error in case read span > reference context + indel length. Can happen in RNAseq reads with long N CIGAR operators in the middle. 2011-11-22 13:57:24 -05:00
Eric Banks 5821c11fad For BAM and Reviewed errors we now check the error message to see if it's actually a 'too many open files' problem and, if so, we generate a User Error instead. 2011-11-22 10:50:22 -05:00
Mark DePristo 7087310373 Embarassing bug fixed 2011-11-22 10:16:36 -05:00
Mark DePristo e484625594 GenotypesContext now updates cached data for add, set, replace operations when possible
-- Involved separately managing the sample -> offset and sample sorted list operations.  This should improve performance throughout the system
2011-11-22 08:40:48 -05:00
Mark DePristo 29ca24694a UG now encoding NO_CALLs as ./. not ./.:.:4:0,0,0
A few updated UGs integration tests
2011-11-22 08:22:32 -05:00
Mark DePristo 2b51c01df4 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-21 19:16:06 -05:00
Mark DePristo 5443d3634a Again, fixing the add call when we really mean replace
-- Updating MD5s for UG to reflect that what was previously called ./.:.:10:0,0,0 is now just ./.  Eric will fix long-standing bug in QD observed from this change
-- VFW MD5s restored to their old correct values.  There was a bug in my implementation to caused the genotypes to not be parsed from the lazy output even through the header was incorrect.
2011-11-21 19:15:56 -05:00
Mauricio Carneiro 5ad3dfcd62 BugFix: byte overflow in SyntheticRead compressed base counts
* fixed and added unit test
2011-11-21 17:11:50 -05:00
Mark DePristo 9ea7b70a02 Added decode method to LazyGenotypesContext
-- AbstractVCFCodec calls this if the samples are not sorted.  Previously called getGenotypes() which didn't actually trigger the decode
2011-11-21 16:21:23 -05:00
Mark DePristo ab2efe3bd3 Reverting bad exact model changes 2011-11-21 16:14:40 -05:00
Eric Banks 44554b2bfd Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-21 15:01:45 -05:00
Eric Banks 022832bd74 Very bad use of the == operator with Strings was ensuring that validating GenomeLocs was very inefficient. This fix resulted in a significant speedup for a simple RodWalker. 2011-11-21 14:49:47 -05:00
Mark DePristo 1561af22af Exact model code cleanup
-- Fixed up code when fixing a bug detected by aggressive contracts in GenotypesContext.
2011-11-21 14:35:15 -05:00
Mark DePristo 2c501364b8 GenotypesContext no longer have immutability in constructor
-- additional bug fixes throughout VariantContext and GenotypesContext objects
2011-11-21 14:34:31 -05:00
David Roazen 1296dd41be Removing the legacy -L "interval1;interval2" syntax
This syntax predates the ability to have multiple -L arguments, is
inconsistent with the syntax of all other GATK arguments, requires
quoting to avoid interpretation by the shell, and was causing
problems in Queue.

A UserException is now thrown if someone tries to use this syntax.
2011-11-21 13:18:53 -05:00
Mark DePristo e467b8e1ae More contracts on LazyGenotypesContext 2011-11-21 09:34:57 -05:00
Mark DePristo 2e9ecf639e Generalized interface to LazyGenotypesContext
-- Now you provide a LazyParsing object
-- LazyGenotypesContext now knows nothing about the VCF parser itself.  The parser holds all of the necessary data to parse the VCF genotypes when necessarily, and the LGC only has a pointer to this object
-- Using new interface added LazyGenotypesContext to unit tests with a simple lazy version
-- Deleted VCFParser interface, as it was no longer necessary
2011-11-21 09:30:40 -05:00
Mark DePristo f0ac588d32 Extensive unit test for GenotypeContextUnitTest
-- Currently only tests base class.  Adding subclass testing in a bit
2011-11-20 18:28:01 -05:00
Mark DePristo bc44f6fd9e Utility function Collection<Genotype> -> Collection<String> 2011-11-20 18:26:56 -05:00
Mark DePristo 9445326c6c Genotype is Comparable via sampleName 2011-11-20 18:26:27 -05:00
Mark DePristo f9e25081ab Completed documented LazyGenotypesContext 2011-11-20 08:35:52 -05:00
Mark DePristo 9cb3fe3a59 Vastly better way of doing on-demand genotyping loading
-- With our GenotypesContext class we can naturally create a LazyGenotypesContext subclass that does the on-demand loading.
-- This new class was replaced all of the old, complex functionality
-- Better still, there were many cases were the genotypes were being loaded unnecessarily, resulting in efficiency.  This was detected because some of the integration tests changed as the genotypes were no longer being parsing unnecessarily
-- Misc. bug fixes throughout the system
-- Bug fixes for PhaseByTransmission with new GenotypesContext
2011-11-20 08:23:09 -05:00
Mark DePristo f392d330c3 Proper use of builder. Previous conversion attempt was flawed 2011-11-19 22:09:56 -05:00
Mark DePristo 7d09c0064b Bug fixes and code cleanup throughout
-- chromosomeCounts now takes builder as well, cleaning up a lot of code throughout the codebase.
2011-11-19 18:40:15 -05:00
Mark DePristo 707bd30b3f Should have been @BeforeMethod 2011-11-19 16:10:09 -05:00
Mark DePristo 8f7eebbaaf Bugfix for pError not being checked correctly in CommonInfo
-- UnitTests to ensure correct behavior
-- UnitTests to ensure correct behavior for pass filters vs. failed filters vs. unfiltered
2011-11-19 15:58:59 -05:00
Mark DePristo b7b57ef39a Updating MD5 to reflect canonical ordering of calculation
-- We should no longer have md5s changing because of hashmaps changing their sort order on us
-- Added GenotypeLikelihoodsUnitTests
-- Refactored ExactAFCaclculation to put the PL -> QUAL calculation in the GenotypeLikelihoods class to avoid the code copy.
2011-11-19 15:57:33 -05:00
Mark DePristo 73119c8e3c Merge with master
-- A few bug fixes
2011-11-19 09:56:06 -05:00
Mark DePristo f685fff79b Killing the final versions of old new VariantContext interface 2011-11-18 21:32:43 -05:00
Mark DePristo 6cf315e17b Change interface to getNegLog10PError to getLog10PError 2011-11-18 21:07:30 -05:00
Mark DePristo c7f2d5c7c7 Final minor fix to contract 2011-11-18 19:40:05 -05:00
Mauricio Carneiro b5de182014 isEmpty now checks if mReadBases is null
Since newly created reads have mReadBases == null. This is an effort to centralize the place to check for empty GATKSAMRecords.
2011-11-18 18:34:05 -05:00
Mauricio Carneiro 8ab3ee9c65 Merge remote-tracking branch 'unstable/master' into rr 2011-11-18 16:50:25 -05:00
Mauricio Carneiro 333e5de812 returning read instead of GATKSAMRecord
Do not create new GATKSAMRecord when read has been fully clipped, because it is essentially the same as returning the currently fully clipped read.
2011-11-18 16:49:59 -05:00
Matt Hanna 8bb4d4dca3 First pass of the asynchronous block loader.
Block loads are only triggered on queue empty at this point.  Disabled by
default (enable with nt:io=?).
2011-11-18 15:02:59 -05:00
Mark DePristo a2e79fbe8a Fixes to contracts 2011-11-18 14:18:53 -05:00
Mark DePristo 660d6009a2 Documentation and contracts for GenotypesContext and VariantContextBuilder 2011-11-18 13:59:30 -05:00
Mark DePristo f54afc19b4 VariantContextBuilder
-- New approach to making VariantContexts modeled on StringBuilder
-- No more modify routines -- use VariantContextBuilder
-- Renamed isPolymorphic to isPolymorphicInSamples.   Same for mono
-- getChromosomeCount -> getCalledChrCount
-- Walkers changed to use new VariantContext.  Some deprecated new VariantContext calls remain
-- VCFCodec now uses optimized cached information to create GenotypesContext.
2011-11-18 12:39:10 -05:00
Eric Banks 6459784351 Merged bug fix from Stable into Unstable 2011-11-18 12:34:57 -05:00
Eric Banks c62082ba1b Making this class public again as per request from Cancer folks 2011-11-18 12:34:27 -05:00
Eric Banks 8710673a97 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-18 12:29:33 -05:00
Eric Banks 768b27322b I figured out why we were getting tons of hom var genotype calls with Mauricio's low quality (synthetic) reduced reads: the RR implementation in the UG was not capping the base quality by the mapping quality, so all the low quality reads were used to generate GLs. Fixed. 2011-11-18 12:29:15 -05:00
Mark DePristo 7490dbb6eb First version of VariantContextBuilder 2011-11-18 11:06:15 -05:00
Roger Zurawicki f48d4cfa79 Bug fix: fully clipping GATKSAMRecords and flushing ops
Reads that are emptied after clipping become new GATKSAMRecords.
When applying ClippingOps, the ops are cleared after the clipping
2011-11-18 00:24:39 -05:00
Mark DePristo fa454c88bb UnitTests for VariantContext for chrCount, getSampleNames, Order function
-- Major change to how chromosomeCounts is computed.  Now NO_CALL alleles are always excluded.  So ChromosomeCounts(A/.) is 1, the previous result would have been 2.
-- Naming changes for getSamplesNameInOrder()
2011-11-17 20:37:22 -05:00
Mark DePristo 02f22cc9f8 No more VC integration tests. All tests are now unit tests 2011-11-17 15:33:09 -05:00
Mark DePristo 23359d1c6c Bugfix for pruneVariantContext, which was dropping the ref base for padding 2011-11-17 15:32:52 -05:00
Mark DePristo 473b860312 Major determinism fix for UG and RankSumTest
-- Now these routines all iterate in sample name order (genotypes.iterateInSampleNameOrder) so that the results of UG and the annotator do not depend on the particular order of samples we see for the exact model and the RankSumTest
2011-11-17 15:31:45 -05:00
Khalid Shakir c50274e02e During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice.
Moved flanking interval utilies to IntervalUtils with UnitTests.
2011-11-17 13:56:42 -05:00
Eric Banks bad19779b9 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-17 13:29:43 -05:00
Eric Banks 16a021992b Updated header description for the INFO and FORMAT DP fields to be more accurate. 2011-11-17 13:17:53 -05:00
Eric Banks e7d41d8d33 Minor cleanup 2011-11-17 12:00:28 -05:00
Mark DePristo 7e66677769 Expanded UnitTests for VariantContext
Tests for
-- getGenotype and getGenotypes
-- subContextBySample
-- modify routines
2011-11-16 20:45:15 -05:00
Mauricio Carneiro 72f00e2883 Merging Roger's Unit tests for Reduce Reads from RR repository 2011-11-16 17:26:49 -05:00
Mark DePristo aa0610ea92 GenotypeCollection renamed to GenotypesContext 2011-11-16 16:24:05 -05:00
Mark DePristo 974daaca4d V13 version in archive. Can you pulled out wholesale for performance testing 2011-11-16 16:08:46 -05:00
Mark DePristo caf6080402 Better algorithm for merging genotypes in CombineVariants 2011-11-16 15:17:33 -05:00
Mark DePristo 101ffc4dfd Expanded, contrastive VariantContextBenchmark
-- Compares performance across a bunch of common operations with GATK 1.3 version of VariantContext and GATK 1.4
-- 1.3 VC and associated utilities copied wholesale into test directory under v13
2011-11-16 13:35:16 -05:00
Mark DePristo e56d52006a Continuing bugfixes to get new VC working 2011-11-16 10:39:17 -05:00
Matt Hanna eb8e031f75 Merged bug fix from Stable into Unstable 2011-11-16 09:57:37 -05:00
Matt Hanna 6a5d5e7ac9 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable 2011-11-16 09:57:13 -05:00
Matt Hanna 7ac5cf8430 Getting rid of unsupported CountReadPairs walker in stable. Removal of
remainder of pairs processing framework to follow in unstable.
2011-11-16 09:53:59 -05:00
Eric Banks c2ebe58712 Merge remote-tracking branch 'Laurent/master' 2011-11-16 09:34:47 -05:00
Laurent Francioli 0dc3d20d58 Corrected bug causing PhaseByTransmission to crash in case of new Genotype.Type 2011-11-16 09:33:13 +01:00
Laurent Francioli 7d77fc51f5 Corrected bug causing PhaseByTransmission to crash in case of new Genotype.Type 2011-11-16 03:32:43 -05:00
David Roazen 0d163e3f52 SnpEff 2.0.4 support
-Modified the SnpEff parser to work with the SnpEff 2.0.4 VCF output format
-Assigning functional classes and effect impacts now handled directly
 by SnpEff rather than the GATK
-Removed support for SnpEff 2.0.2, as we no longer trust the output of that
 version since it doesn't exclude effects associated with certain nonsensical
 transcripts. These effects are excluded as of 2.0.4.
-Updated unit and integration tests

This support is based on a *release-candidate* of SnpEff 2.0.4, and so is subject
to change between now and the next GATK release.
2011-11-15 18:36:22 -05:00
Mark DePristo df415da4ab More bug fixes on the way to passing all tests 2011-11-15 17:38:12 -05:00
Mark DePristo 0be23aae4e Bugfixes on way to a working refactored VariantContext 2011-11-15 17:20:14 -05:00
Mark DePristo 231c47c039 Bugfixes on way to a working refactored VariantContext 2011-11-15 16:42:50 -05:00
Laurent Francioli fb685f88ec Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-15 16:23:53 -05:00
Mark DePristo 2b2514dad2 Moved many unused phasing walkers and utilities to archive 2011-11-15 16:14:50 -05:00
Mark DePristo 460a51f473 ID field now stored in the VariantContext itself, not the attributes 2011-11-15 14:56:33 -05:00
Eric Banks 7fada320a9 The right fix for this test is just to delete it. 2011-11-15 14:53:27 -05:00
Eric Banks b45d10e6f1 The DP in the FORMAT field (per sample) must also use the representative count or else it's always 1 for reduced reads. 2011-11-15 10:23:59 -05:00
Mark DePristo 233e581828 Merging in Master 2011-11-15 09:28:24 -05:00
Eric Banks b66556f4a0 Update error message so that it's clear ReadPair Walkers are exceptions 2011-11-15 09:22:57 -05:00
Mark DePristo 6e1a86bc3e Bug fixes to VariantContext and GenotypeCollection 2011-11-15 09:21:30 -05:00
Roger Zurawicki 284430d61d Added more basic UnitTests for ReadClipper
hardClipByReadCoordinatesWorks
hardClipLowQualTailsWorks
2011-11-15 00:13:52 -05:00
Roger Zurawicki 8e91e19229 Merge branch 'master' of ssh://nickel/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-15 00:13:37 -05:00
Mauricio Carneiro cde829899d compress Reduce Read counts bytes by offset
compressed the representation of the reduce reads counts by offset results in 17% average compression in final BAM file size.

Example compression -->

from : 10, 10, 11, 11, 12, 12, 12, 11, 10
to:      10, 0, 1, 1,2, 2, 2, 1, 0
2011-11-14 18:30:24 -05:00
Mark DePristo 4ff8225d78 GenotypeMap -> GenotypeCollection part 3
-- Test code actually builds
2011-11-14 17:51:41 -05:00
Mark DePristo f0234ab67f GenotypeMap -> GenotypeCollection part 2
-- Code actually builds
2011-11-14 17:42:55 -05:00
David Roazen ab0ee9b847 Perform only necessary validation in VariantContext modify methods 2011-11-14 16:49:59 -05:00
Mark DePristo 2e9d5363e7 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-14 15:32:06 -05:00
Mark DePristo 1fbdcb4f43 GenotypeMap -> GenotypeCollection 2011-11-14 15:32:03 -05:00
Eric Banks 4dc9dbe890 One quick fix to previous commit 2011-11-14 14:42:12 -05:00
Eric Banks 7b2a7cfbe7 Transfer headers from the resource VCF when possible when using expressions. While there, VA was modified so that it didn't assume that the ID field was present in the VC's info map in preparation for Mark's upcoming changes. 2011-11-14 14:31:27 -05:00
Mark DePristo 9b5c79b49d Renamed InferredGeneticContext to CommonInfo
-- I have no idea why I named this InferredGeneticContext, a totally meaningless term
-- Renamed to CommonInfo.
-- Made package protected, as no one should use this outside of VariantContext and Genotype
-- UGEngine was using IGC constant, but it's now using the public one in VariantContext.
2011-11-14 14:28:52 -05:00
Mark DePristo 077397cb4b Deleted MutableVariantContext
-- All methods that used this capable now use VariantContext directly instead
2011-11-14 14:19:06 -05:00
Mark DePristo b11c535527 Deleted MutableGenotype
-- This class wasn't really used anywhere, and so removed to control code bloat.
2011-11-14 13:16:36 -05:00
Mark DePristo 79987d685c GenotypeMap contains a Map, not extends it
-- On path to replacing it with GenotypeCollection
2011-11-14 12:55:03 -05:00
Eric Banks 7aee80cd3b Fix to deal with reduced reads containing a deletion 2011-11-14 12:23:46 -05:00
Eric Banks 3d2970453b Misc minor cleanup 2011-11-14 09:41:54 -05:00
Laurent Francioli 1347beef40 Merge branch 'PhaseByTransmission' 2011-11-14 11:31:28 +01:00
Laurent Francioli 6881d4800c Added Integration tests for Phasing by Transmission 2011-11-14 10:47:51 +01:00
Laurent Francioli 34acf8b978 Added Unit tests for new methods in GenotypeLikelihoods 2011-11-14 10:47:02 +01:00
Roger Zurawicki 1202a809cb Added Basic Unit Tests for ReadClipper
Tests some but not all functions
Some tests have been disabled because they are not working
2011-11-13 22:27:49 -05:00
Eric Banks b7c33116af Minor docs update 2011-11-12 23:21:07 -05:00
Eric Banks 76d357be40 Updating docs example to use -L since that's best practice 2011-11-12 23:20:05 -05:00
Mark DePristo fee9b367e4 VariantContext genotypes are now stored as GenotypeMap objects
-- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples
-- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type.
-- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous.  Now everything uses GenotypeMap with a specific ordering of samples (by name)
-- Integrationtests updated and all pass
2011-11-11 15:00:35 -05:00
Guillermo del Angel cd3146f4cf Add hidden option to ValidationAmplicons to output slightly modified format to make file work with downstream SQNM tools more seamlessly at request of GAP: one line per record, keep probe identifier to 20 characters, no * in ref allele. 2011-11-11 14:07:07 -05:00
Ryan Poplin 40fbeafa37 VQSR will now detect if the negative model failed to converge properly because of having too few data points and automatically retry with more appropriate clustering parameters. 2011-11-11 11:52:30 -05:00
Mark DePristo 4938569b3a More general handling of parameters for VariantContextBenchmark 2011-11-11 10:22:19 -05:00
Mark DePristo ef9f8b5d46 Added subContextOfSamples to VariantContext
-- This is a more convenient accesssor than subContextOfGenotypes, represents nearly all of the use cases of the former function, and potentially can be implemented more efficiently.
2011-11-11 10:07:11 -05:00
Mark DePristo e216e85465 First working version of VariantContextBenchmark 2011-11-11 09:56:00 -05:00
Mark DePristo ee40791776 Attributes are now Map<String,Object> not Map<String,?>
-- Allows us to avoid an unnecessary copy when creating InferredGeneticContext (whose name really needs to change).
2011-11-11 09:55:42 -05:00
Mark DePristo dc9b351b5e Meaningful error message when an IntervalArg file fails to parse correctly 2011-11-10 17:10:26 -05:00
Mark DePristo bb7bf74aa8 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-10 16:05:43 -05:00
Mark DePristo 153e52ffed VariantEvalIntegrationTest for IntervalStratification 2011-11-10 14:10:39 -05:00
Mauricio Carneiro 060c7ce8ae It wouldn't harm integrationtests if we had our logic right... :-) 2011-11-10 14:03:22 -05:00
Eric Banks 39678b6a20 Check for reads with missing read groups and throw a UserException when encountered. Mauricio said this wouldn't break integration tests. 2011-11-10 13:34:45 -05:00
Mark DePristo dd1810140f -stratIntervals is optional 2011-11-10 13:27:32 -05:00
Mark DePristo 67b022c34b Cleanup for new SampleUtils function
-- getVCFHeadersFromRods(rods) is now available so that you don't have getVCFHeadersFromRods(rods, null) throughout the codebase
2011-11-10 13:27:13 -05:00
Mark DePristo 35fe9c8a06 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-10 11:11:33 -05:00
Mark DePristo dc4932f93d VariantEval module to stratify the variants by whether they overlap an interval set
The primary use of this stratification is to provide a mechanism to divide asssessment of a call set up by whether a variant overlaps an interval or not.  I use this to differentiate between variants occurring in CCDS exons vs. those in non-coding regions, in the 1000G call set, using a command line that looks like:

-T VariantEval -R human_g1k_v37.fasta -eval 1000G.vcf -stratIntervals:BED ccds.bed -ST IntervalStratification

Note that the overlap algorithm properly handles symbolic alleles with an INFO field END value.  In order to safely use this module you should provide entire contigs worth of variants, and let the interval strat decide overlap, as opposed to using -L which will not properly work with symbolic variants.

Minor improvements to create() interval in GenomeLocParser.
2011-11-10 10:58:40 -05:00
Mauricio Carneiro 0d8983feee outputting the RG information
setReadGroup now sets the read group attribute for the GATKSAMRecord
2011-11-09 23:35:00 -05:00
Eric Banks 315ac68b0b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-09 22:37:36 -05:00
Eric Banks 6313aae2c4 Adding checks for hasBasePileup() before calling getBasePileup() as per GS thread 2011-11-09 22:37:26 -05:00
Ryan Poplin 74a18d3de8 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-09 22:29:40 -05:00
Ryan Poplin 24712c0221 Merged bug fix from Stable into Unstable 2011-11-09 22:28:27 -05:00
Ryan Poplin 8942406aa2 Use MathUtils to compare doubles instead of testing for equality 2011-11-09 22:05:21 -05:00
Ryan Poplin 348f2db7fd Fix for HMM optimization. If the two penalty arrays match exactly the function should return the end of the array instead of 0. 2011-11-09 22:00:52 -05:00
Eric Banks 82bf09edf3 Mark Standard Annotations with an asterisk 2011-11-09 20:42:31 -05:00
Eric Banks 04b122be29 Fix for bug reported on GetSatisfaction 2011-11-09 20:33:36 -05:00
Mauricio Carneiro d00b2c6599 Adding a synthetic read for filtered data
* Generalized the concept of a synthetic read to cread both running consensus and a synthetic reads of filtered data.
* Synthetic reads can now have deletions (but not insertions)
* New reduced read tag for filtered data synthetic reads *(RF)*
* Sliding window header now keeps information of consensus and filtered data
* Synthetic reads are created simultaneously, new functionality is controlled internally by addToSyntheticReads
2011-11-09 20:16:22 -05:00
Eric Banks 21bf43f3bb Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-09 15:34:40 -05:00
Eric Banks 02d5e3025e Added integration test for intervals from bed file 2011-11-09 15:34:19 -05:00
Christopher Hartl 85bffe1dca Merged bug fix from Stable into Unstable 2011-11-09 15:29:14 -05:00
Christopher Hartl d828eba7f4 Allow comments in a table-formatted file to precede the header line. 2011-11-09 15:27:38 -05:00
Eric Banks 8205efbb29 Merge branch 'master' into intervals 2011-11-09 15:27:15 -05:00
Eric Banks d64f8a89a9 Instead of the SelfScopingFeatureCodec interface, pushed this functionality into Tribble itself. Now we can e.g. determine that a file can be parsed by the BedCodec on the fly. 2011-11-09 15:24:29 -05:00