Laurent Francioli
f49dc5c067
Added functionality to get all children that have both parents (useful when trios are needed)
2011-11-30 14:43:37 +01:00
Laurent Francioli
a4606f9cfe
Merge branch 'MendelianViolation'
...
Conflicts:
public/java/src/org/broadinstitute/sting/utils/MendelianViolation.java
2011-11-30 11:13:15 +01:00
Laurent Francioli
b279ae4ead
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-30 10:10:21 +01:00
Laurent Francioli
7d58db626e
Added MendelianViolationEvaluator integration test
2011-11-30 10:09:20 +01:00
Ryan Poplin
91413cf0d9
Merged bug fix from Stable into Unstable
2011-11-29 14:01:23 -05:00
Ryan Poplin
cb284eebde
Further updating VQSR tutorial wiki docs to reflect the bundle
2011-11-29 14:00:57 -05:00
Ryan Poplin
dcb889665d
Merged bug fix from Stable into Unstable
2011-11-29 09:58:49 -05:00
Ryan Poplin
447e9bff9e
Updating VQSR tutorial wiki docs to reflect the bundle
2011-11-29 09:57:45 -05:00
Ryan Poplin
110298322c
Adding Transmission Disequilibrium Test annotation to VariantAnnotator and integration test to test it.
2011-11-29 09:29:18 -05:00
Laurent Francioli
ab67011791
Corrected bug introduced in the last update and causing no families to be returned by getFamilies in case the samples were not specified
2011-11-29 11:18:15 +01:00
Eric Banks
d7d8b8e380
Tribble v42 changes the Codec.canDecode method to take in a String instead of a File; this is something that Jim was adamant about (because Tribble can handle streams other than files). I didn't want the next person who needed to rev Tribble to deal with this change additionally, so I took care of updating the GATK now.
2011-11-28 14:18:28 -05:00
Laurent Francioli
a09c01fcec
Removed walker argument FamilyStructure as this is now supported by the engine (ped file)
2011-11-28 17:18:11 +01:00
Laurent Francioli
795c99d693
Adapted MendelianViolation to the new ped family representation. Adapted all classes using MendelianViolation too.
...
MendelianViolationEvaluator was added a number of useful metrics on allele transmission and MVs
2011-11-28 17:13:14 +01:00
Laurent Francioli
e877db8f42
Changed visibility of getSampleDB from protected to public as the sampleDB needs to be accessible from Annotators and Evaluators too.
2011-11-28 17:11:30 +01:00
Laurent Francioli
5c2595701c
Added a function to get families only for a given list of samples.
2011-11-28 17:10:33 +01:00
Mark DePristo
3c36428a20
Bug fix for TiTv calculation -- shouldn't be rounding
2011-11-28 10:20:34 -05:00
Eric Banks
436b4dc855
Updated docs
2011-11-28 08:59:48 -05:00
Laurent Francioli
b1dd632d5d
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
...
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java
2011-11-25 16:16:44 +01:00
Mark DePristo
e60272975a
Fix for changed MD5 in streaming VCF test
2011-11-23 19:01:33 -05:00
Mark DePristo
12f09d88f9
Removing references to SimpleMetricsByAC
2011-11-23 16:08:18 -05:00
Mark DePristo
e319079c32
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-23 13:02:11 -05:00
Mark DePristo
4107636144
VariantEval updates
...
-- Performance optimizations
-- Tables now are cleanly formatted (floats are %.2f printed)
-- VariantSummary is a standard report now
-- Removed CompEvalGenotypes (it didn't do anything)
-- Deleted unused classes in GenotypeConcordance
-- Updates integration tests as appropriate
2011-11-23 13:02:07 -05:00
David Roazen
e5b85f0a78
A toString() method for IntervalBindings
...
Necessary since we're currently writing things like this to our VCF headers:
intervals=[org.broadinstitute.sting.commandline.IntervalBinding@4ce66f56]
2011-11-23 11:56:12 -05:00
Mark DePristo
5a4856b82e
GATKReports now support a format field per column
...
-- You can tell the table to format your object with "%.2f" for example.
2011-11-23 11:31:04 -05:00
Mark DePristo
c8bf7d2099
Check for null comment
2011-11-23 10:47:21 -05:00
Mark DePristo
6c2555885c
Caching getSimpleName() in VariantEval is a big performance improvement
...
-- Removed the SimpleMetricsByAC table, as one should just use the AlleleCount Stratefication and the upcoming VariantSummary table
2011-11-23 08:34:05 -05:00
Guillermo del Angel
32adbd614f
Solve merge conflict
2011-11-22 22:48:46 -05:00
Guillermo del Angel
941f3784dc
Solve merge conflict
2011-11-22 22:48:03 -05:00
Guillermo del Angel
75d93e6335
Another corner condition fix: skip likelihood computation in case we cut so many bases there's no haplotype or read left
2011-11-22 22:46:12 -05:00
Mark DePristo
a3aef8fa53
Final performance optimization for GenotypesContext
2011-11-22 17:19:30 -05:00
Mark DePristo
990c02e4de
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-22 17:19:11 -05:00
Guillermo del Angel
38a90da92c
Fixed merge conflict to Unstable
2011-11-22 14:39:45 -05:00
Guillermo del Angel
32a77a8a56
Prevent out of bound error in case read span > reference context + indel length. Can happen in RNAseq reads with long N CIGAR operators in the middle.
2011-11-22 13:57:24 -05:00
Eric Banks
5821c11fad
For BAM and Reviewed errors we now check the error message to see if it's actually a 'too many open files' problem and, if so, we generate a User Error instead.
2011-11-22 10:50:22 -05:00
Mark DePristo
7087310373
Embarassing bug fixed
2011-11-22 10:16:36 -05:00
Mark DePristo
e484625594
GenotypesContext now updates cached data for add, set, replace operations when possible
...
-- Involved separately managing the sample -> offset and sample sorted list operations. This should improve performance throughout the system
2011-11-22 08:40:48 -05:00
Mark DePristo
29ca24694a
UG now encoding NO_CALLs as ./. not ./.:.:4:0,0,0
...
A few updated UGs integration tests
2011-11-22 08:22:32 -05:00
Mark DePristo
2b51c01df4
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-21 19:16:06 -05:00
Mark DePristo
5443d3634a
Again, fixing the add call when we really mean replace
...
-- Updating MD5s for UG to reflect that what was previously called ./.:.:10:0,0,0 is now just ./. Eric will fix long-standing bug in QD observed from this change
-- VFW MD5s restored to their old correct values. There was a bug in my implementation to caused the genotypes to not be parsed from the lazy output even through the header was incorrect.
2011-11-21 19:15:56 -05:00
Mauricio Carneiro
5ad3dfcd62
BugFix: byte overflow in SyntheticRead compressed base counts
...
* fixed and added unit test
2011-11-21 17:11:50 -05:00
Mark DePristo
9ea7b70a02
Added decode method to LazyGenotypesContext
...
-- AbstractVCFCodec calls this if the samples are not sorted. Previously called getGenotypes() which didn't actually trigger the decode
2011-11-21 16:21:23 -05:00
Mark DePristo
ab2efe3bd3
Reverting bad exact model changes
2011-11-21 16:14:40 -05:00
Eric Banks
44554b2bfd
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-21 15:01:45 -05:00
Eric Banks
022832bd74
Very bad use of the == operator with Strings was ensuring that validating GenomeLocs was very inefficient. This fix resulted in a significant speedup for a simple RodWalker.
2011-11-21 14:49:47 -05:00
Mark DePristo
1561af22af
Exact model code cleanup
...
-- Fixed up code when fixing a bug detected by aggressive contracts in GenotypesContext.
2011-11-21 14:35:15 -05:00
Mark DePristo
2c501364b8
GenotypesContext no longer have immutability in constructor
...
-- additional bug fixes throughout VariantContext and GenotypesContext objects
2011-11-21 14:34:31 -05:00
David Roazen
1296dd41be
Removing the legacy -L "interval1;interval2" syntax
...
This syntax predates the ability to have multiple -L arguments, is
inconsistent with the syntax of all other GATK arguments, requires
quoting to avoid interpretation by the shell, and was causing
problems in Queue.
A UserException is now thrown if someone tries to use this syntax.
2011-11-21 13:18:53 -05:00
Mark DePristo
e467b8e1ae
More contracts on LazyGenotypesContext
2011-11-21 09:34:57 -05:00
Mark DePristo
2e9ecf639e
Generalized interface to LazyGenotypesContext
...
-- Now you provide a LazyParsing object
-- LazyGenotypesContext now knows nothing about the VCF parser itself. The parser holds all of the necessary data to parse the VCF genotypes when necessarily, and the LGC only has a pointer to this object
-- Using new interface added LazyGenotypesContext to unit tests with a simple lazy version
-- Deleted VCFParser interface, as it was no longer necessary
2011-11-21 09:30:40 -05:00
Mark DePristo
f0ac588d32
Extensive unit test for GenotypeContextUnitTest
...
-- Currently only tests base class. Adding subclass testing in a bit
2011-11-20 18:28:01 -05:00
Mark DePristo
bc44f6fd9e
Utility function Collection<Genotype> -> Collection<String>
2011-11-20 18:26:56 -05:00
Mark DePristo
9445326c6c
Genotype is Comparable via sampleName
2011-11-20 18:26:27 -05:00
Mark DePristo
f9e25081ab
Completed documented LazyGenotypesContext
2011-11-20 08:35:52 -05:00
Mark DePristo
9cb3fe3a59
Vastly better way of doing on-demand genotyping loading
...
-- With our GenotypesContext class we can naturally create a LazyGenotypesContext subclass that does the on-demand loading.
-- This new class was replaced all of the old, complex functionality
-- Better still, there were many cases were the genotypes were being loaded unnecessarily, resulting in efficiency. This was detected because some of the integration tests changed as the genotypes were no longer being parsing unnecessarily
-- Misc. bug fixes throughout the system
-- Bug fixes for PhaseByTransmission with new GenotypesContext
2011-11-20 08:23:09 -05:00
Mark DePristo
f392d330c3
Proper use of builder. Previous conversion attempt was flawed
2011-11-19 22:09:56 -05:00
Mark DePristo
7d09c0064b
Bug fixes and code cleanup throughout
...
-- chromosomeCounts now takes builder as well, cleaning up a lot of code throughout the codebase.
2011-11-19 18:40:15 -05:00
Mark DePristo
707bd30b3f
Should have been @BeforeMethod
2011-11-19 16:10:09 -05:00
Mark DePristo
8f7eebbaaf
Bugfix for pError not being checked correctly in CommonInfo
...
-- UnitTests to ensure correct behavior
-- UnitTests to ensure correct behavior for pass filters vs. failed filters vs. unfiltered
2011-11-19 15:58:59 -05:00
Mark DePristo
b7b57ef39a
Updating MD5 to reflect canonical ordering of calculation
...
-- We should no longer have md5s changing because of hashmaps changing their sort order on us
-- Added GenotypeLikelihoodsUnitTests
-- Refactored ExactAFCaclculation to put the PL -> QUAL calculation in the GenotypeLikelihoods class to avoid the code copy.
2011-11-19 15:57:33 -05:00
Mark DePristo
73119c8e3c
Merge with master
...
-- A few bug fixes
2011-11-19 09:56:06 -05:00
Mark DePristo
f685fff79b
Killing the final versions of old new VariantContext interface
2011-11-18 21:32:43 -05:00
Mark DePristo
6cf315e17b
Change interface to getNegLog10PError to getLog10PError
2011-11-18 21:07:30 -05:00
Mark DePristo
c7f2d5c7c7
Final minor fix to contract
2011-11-18 19:40:05 -05:00
Mauricio Carneiro
b5de182014
isEmpty now checks if mReadBases is null
...
Since newly created reads have mReadBases == null. This is an effort to centralize the place to check for empty GATKSAMRecords.
2011-11-18 18:34:05 -05:00
Mauricio Carneiro
8ab3ee9c65
Merge remote-tracking branch 'unstable/master' into rr
2011-11-18 16:50:25 -05:00
Mauricio Carneiro
333e5de812
returning read instead of GATKSAMRecord
...
Do not create new GATKSAMRecord when read has been fully clipped, because it is essentially the same as returning the currently fully clipped read.
2011-11-18 16:49:59 -05:00
Matt Hanna
8bb4d4dca3
First pass of the asynchronous block loader.
...
Block loads are only triggered on queue empty at this point. Disabled by
default (enable with nt:io=?).
2011-11-18 15:02:59 -05:00
Mark DePristo
a2e79fbe8a
Fixes to contracts
2011-11-18 14:18:53 -05:00
Mark DePristo
660d6009a2
Documentation and contracts for GenotypesContext and VariantContextBuilder
2011-11-18 13:59:30 -05:00
Mark DePristo
f54afc19b4
VariantContextBuilder
...
-- New approach to making VariantContexts modeled on StringBuilder
-- No more modify routines -- use VariantContextBuilder
-- Renamed isPolymorphic to isPolymorphicInSamples. Same for mono
-- getChromosomeCount -> getCalledChrCount
-- Walkers changed to use new VariantContext. Some deprecated new VariantContext calls remain
-- VCFCodec now uses optimized cached information to create GenotypesContext.
2011-11-18 12:39:10 -05:00
Eric Banks
6459784351
Merged bug fix from Stable into Unstable
2011-11-18 12:34:57 -05:00
Eric Banks
c62082ba1b
Making this class public again as per request from Cancer folks
2011-11-18 12:34:27 -05:00
Eric Banks
8710673a97
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-18 12:29:33 -05:00
Eric Banks
768b27322b
I figured out why we were getting tons of hom var genotype calls with Mauricio's low quality (synthetic) reduced reads: the RR implementation in the UG was not capping the base quality by the mapping quality, so all the low quality reads were used to generate GLs. Fixed.
2011-11-18 12:29:15 -05:00
Mark DePristo
7490dbb6eb
First version of VariantContextBuilder
2011-11-18 11:06:15 -05:00
Roger Zurawicki
f48d4cfa79
Bug fix: fully clipping GATKSAMRecords and flushing ops
...
Reads that are emptied after clipping become new GATKSAMRecords.
When applying ClippingOps, the ops are cleared after the clipping
2011-11-18 00:24:39 -05:00
Mark DePristo
fa454c88bb
UnitTests for VariantContext for chrCount, getSampleNames, Order function
...
-- Major change to how chromosomeCounts is computed. Now NO_CALL alleles are always excluded. So ChromosomeCounts(A/.) is 1, the previous result would have been 2.
-- Naming changes for getSamplesNameInOrder()
2011-11-17 20:37:22 -05:00
Mark DePristo
02f22cc9f8
No more VC integration tests. All tests are now unit tests
2011-11-17 15:33:09 -05:00
Mark DePristo
23359d1c6c
Bugfix for pruneVariantContext, which was dropping the ref base for padding
2011-11-17 15:32:52 -05:00
Mark DePristo
473b860312
Major determinism fix for UG and RankSumTest
...
-- Now these routines all iterate in sample name order (genotypes.iterateInSampleNameOrder) so that the results of UG and the annotator do not depend on the particular order of samples we see for the exact model and the RankSumTest
2011-11-17 15:31:45 -05:00
Khalid Shakir
c50274e02e
During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice.
...
Moved flanking interval utilies to IntervalUtils with UnitTests.
2011-11-17 13:56:42 -05:00
Eric Banks
bad19779b9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-17 13:29:43 -05:00
Eric Banks
16a021992b
Updated header description for the INFO and FORMAT DP fields to be more accurate.
2011-11-17 13:17:53 -05:00
Eric Banks
e7d41d8d33
Minor cleanup
2011-11-17 12:00:28 -05:00
Mark DePristo
7e66677769
Expanded UnitTests for VariantContext
...
Tests for
-- getGenotype and getGenotypes
-- subContextBySample
-- modify routines
2011-11-16 20:45:15 -05:00
Mauricio Carneiro
72f00e2883
Merging Roger's Unit tests for Reduce Reads from RR repository
2011-11-16 17:26:49 -05:00
Mark DePristo
aa0610ea92
GenotypeCollection renamed to GenotypesContext
2011-11-16 16:24:05 -05:00
Mark DePristo
974daaca4d
V13 version in archive. Can you pulled out wholesale for performance testing
2011-11-16 16:08:46 -05:00
Mark DePristo
caf6080402
Better algorithm for merging genotypes in CombineVariants
2011-11-16 15:17:33 -05:00
Mark DePristo
101ffc4dfd
Expanded, contrastive VariantContextBenchmark
...
-- Compares performance across a bunch of common operations with GATK 1.3 version of VariantContext and GATK 1.4
-- 1.3 VC and associated utilities copied wholesale into test directory under v13
2011-11-16 13:35:16 -05:00
Mark DePristo
e56d52006a
Continuing bugfixes to get new VC working
2011-11-16 10:39:17 -05:00
Matt Hanna
eb8e031f75
Merged bug fix from Stable into Unstable
2011-11-16 09:57:37 -05:00
Matt Hanna
6a5d5e7ac9
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable
2011-11-16 09:57:13 -05:00
Matt Hanna
7ac5cf8430
Getting rid of unsupported CountReadPairs walker in stable. Removal of
...
remainder of pairs processing framework to follow in unstable.
2011-11-16 09:53:59 -05:00
Eric Banks
c2ebe58712
Merge remote-tracking branch 'Laurent/master'
2011-11-16 09:34:47 -05:00
Laurent Francioli
0dc3d20d58
Corrected bug causing PhaseByTransmission to crash in case of new Genotype.Type
2011-11-16 09:33:13 +01:00
Laurent Francioli
7d77fc51f5
Corrected bug causing PhaseByTransmission to crash in case of new Genotype.Type
2011-11-16 03:32:43 -05:00
David Roazen
0d163e3f52
SnpEff 2.0.4 support
...
-Modified the SnpEff parser to work with the SnpEff 2.0.4 VCF output format
-Assigning functional classes and effect impacts now handled directly
by SnpEff rather than the GATK
-Removed support for SnpEff 2.0.2, as we no longer trust the output of that
version since it doesn't exclude effects associated with certain nonsensical
transcripts. These effects are excluded as of 2.0.4.
-Updated unit and integration tests
This support is based on a *release-candidate* of SnpEff 2.0.4, and so is subject
to change between now and the next GATK release.
2011-11-15 18:36:22 -05:00
Mark DePristo
df415da4ab
More bug fixes on the way to passing all tests
2011-11-15 17:38:12 -05:00
Mark DePristo
0be23aae4e
Bugfixes on way to a working refactored VariantContext
2011-11-15 17:20:14 -05:00
Mark DePristo
231c47c039
Bugfixes on way to a working refactored VariantContext
2011-11-15 16:42:50 -05:00
Laurent Francioli
fb685f88ec
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-15 16:23:53 -05:00
Mark DePristo
2b2514dad2
Moved many unused phasing walkers and utilities to archive
2011-11-15 16:14:50 -05:00
Mark DePristo
460a51f473
ID field now stored in the VariantContext itself, not the attributes
2011-11-15 14:56:33 -05:00
Eric Banks
7fada320a9
The right fix for this test is just to delete it.
2011-11-15 14:53:27 -05:00
Eric Banks
b45d10e6f1
The DP in the FORMAT field (per sample) must also use the representative count or else it's always 1 for reduced reads.
2011-11-15 10:23:59 -05:00
Mark DePristo
233e581828
Merging in Master
2011-11-15 09:28:24 -05:00
Eric Banks
b66556f4a0
Update error message so that it's clear ReadPair Walkers are exceptions
2011-11-15 09:22:57 -05:00
Mark DePristo
6e1a86bc3e
Bug fixes to VariantContext and GenotypeCollection
2011-11-15 09:21:30 -05:00
Roger Zurawicki
284430d61d
Added more basic UnitTests for ReadClipper
...
hardClipByReadCoordinatesWorks
hardClipLowQualTailsWorks
2011-11-15 00:13:52 -05:00
Roger Zurawicki
8e91e19229
Merge branch 'master' of ssh://nickel/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-15 00:13:37 -05:00
Mauricio Carneiro
cde829899d
compress Reduce Read counts bytes by offset
...
compressed the representation of the reduce reads counts by offset results in 17% average compression in final BAM file size.
Example compression -->
from : 10, 10, 11, 11, 12, 12, 12, 11, 10
to: 10, 0, 1, 1,2, 2, 2, 1, 0
2011-11-14 18:30:24 -05:00
Mark DePristo
4ff8225d78
GenotypeMap -> GenotypeCollection part 3
...
-- Test code actually builds
2011-11-14 17:51:41 -05:00
Mark DePristo
f0234ab67f
GenotypeMap -> GenotypeCollection part 2
...
-- Code actually builds
2011-11-14 17:42:55 -05:00
David Roazen
ab0ee9b847
Perform only necessary validation in VariantContext modify methods
2011-11-14 16:49:59 -05:00
Mark DePristo
2e9d5363e7
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-14 15:32:06 -05:00
Mark DePristo
1fbdcb4f43
GenotypeMap -> GenotypeCollection
2011-11-14 15:32:03 -05:00
Eric Banks
4dc9dbe890
One quick fix to previous commit
2011-11-14 14:42:12 -05:00
Eric Banks
7b2a7cfbe7
Transfer headers from the resource VCF when possible when using expressions. While there, VA was modified so that it didn't assume that the ID field was present in the VC's info map in preparation for Mark's upcoming changes.
2011-11-14 14:31:27 -05:00
Mark DePristo
9b5c79b49d
Renamed InferredGeneticContext to CommonInfo
...
-- I have no idea why I named this InferredGeneticContext, a totally meaningless term
-- Renamed to CommonInfo.
-- Made package protected, as no one should use this outside of VariantContext and Genotype
-- UGEngine was using IGC constant, but it's now using the public one in VariantContext.
2011-11-14 14:28:52 -05:00
Mark DePristo
077397cb4b
Deleted MutableVariantContext
...
-- All methods that used this capable now use VariantContext directly instead
2011-11-14 14:19:06 -05:00
Mark DePristo
b11c535527
Deleted MutableGenotype
...
-- This class wasn't really used anywhere, and so removed to control code bloat.
2011-11-14 13:16:36 -05:00
Mark DePristo
79987d685c
GenotypeMap contains a Map, not extends it
...
-- On path to replacing it with GenotypeCollection
2011-11-14 12:55:03 -05:00
Eric Banks
7aee80cd3b
Fix to deal with reduced reads containing a deletion
2011-11-14 12:23:46 -05:00
Eric Banks
3d2970453b
Misc minor cleanup
2011-11-14 09:41:54 -05:00
Laurent Francioli
1347beef40
Merge branch 'PhaseByTransmission'
2011-11-14 11:31:28 +01:00
Laurent Francioli
6881d4800c
Added Integration tests for Phasing by Transmission
2011-11-14 10:47:51 +01:00
Laurent Francioli
34acf8b978
Added Unit tests for new methods in GenotypeLikelihoods
2011-11-14 10:47:02 +01:00
Roger Zurawicki
1202a809cb
Added Basic Unit Tests for ReadClipper
...
Tests some but not all functions
Some tests have been disabled because they are not working
2011-11-13 22:27:49 -05:00
Eric Banks
b7c33116af
Minor docs update
2011-11-12 23:21:07 -05:00
Eric Banks
76d357be40
Updating docs example to use -L since that's best practice
2011-11-12 23:20:05 -05:00
Mark DePristo
fee9b367e4
VariantContext genotypes are now stored as GenotypeMap objects
...
-- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples
-- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type.
-- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous. Now everything uses GenotypeMap with a specific ordering of samples (by name)
-- Integrationtests updated and all pass
2011-11-11 15:00:35 -05:00
Guillermo del Angel
cd3146f4cf
Add hidden option to ValidationAmplicons to output slightly modified format to make file work with downstream SQNM tools more seamlessly at request of GAP: one line per record, keep probe identifier to 20 characters, no * in ref allele.
2011-11-11 14:07:07 -05:00
Ryan Poplin
40fbeafa37
VQSR will now detect if the negative model failed to converge properly because of having too few data points and automatically retry with more appropriate clustering parameters.
2011-11-11 11:52:30 -05:00
Mark DePristo
4938569b3a
More general handling of parameters for VariantContextBenchmark
2011-11-11 10:22:19 -05:00
Mark DePristo
ef9f8b5d46
Added subContextOfSamples to VariantContext
...
-- This is a more convenient accesssor than subContextOfGenotypes, represents nearly all of the use cases of the former function, and potentially can be implemented more efficiently.
2011-11-11 10:07:11 -05:00
Mark DePristo
e216e85465
First working version of VariantContextBenchmark
2011-11-11 09:56:00 -05:00
Mark DePristo
ee40791776
Attributes are now Map<String,Object> not Map<String,?>
...
-- Allows us to avoid an unnecessary copy when creating InferredGeneticContext (whose name really needs to change).
2011-11-11 09:55:42 -05:00
Mark DePristo
dc9b351b5e
Meaningful error message when an IntervalArg file fails to parse correctly
2011-11-10 17:10:26 -05:00
Mark DePristo
bb7bf74aa8
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-10 16:05:43 -05:00
Mark DePristo
153e52ffed
VariantEvalIntegrationTest for IntervalStratification
2011-11-10 14:10:39 -05:00
Mauricio Carneiro
060c7ce8ae
It wouldn't harm integrationtests if we had our logic right... :-)
2011-11-10 14:03:22 -05:00
Eric Banks
39678b6a20
Check for reads with missing read groups and throw a UserException when encountered. Mauricio said this wouldn't break integration tests.
2011-11-10 13:34:45 -05:00
Mark DePristo
dd1810140f
-stratIntervals is optional
2011-11-10 13:27:32 -05:00
Mark DePristo
67b022c34b
Cleanup for new SampleUtils function
...
-- getVCFHeadersFromRods(rods) is now available so that you don't have getVCFHeadersFromRods(rods, null) throughout the codebase
2011-11-10 13:27:13 -05:00
Mark DePristo
35fe9c8a06
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-10 11:11:33 -05:00
Mark DePristo
dc4932f93d
VariantEval module to stratify the variants by whether they overlap an interval set
...
The primary use of this stratification is to provide a mechanism to divide asssessment of a call set up by whether a variant overlaps an interval or not. I use this to differentiate between variants occurring in CCDS exons vs. those in non-coding regions, in the 1000G call set, using a command line that looks like:
-T VariantEval -R human_g1k_v37.fasta -eval 1000G.vcf -stratIntervals:BED ccds.bed -ST IntervalStratification
Note that the overlap algorithm properly handles symbolic alleles with an INFO field END value. In order to safely use this module you should provide entire contigs worth of variants, and let the interval strat decide overlap, as opposed to using -L which will not properly work with symbolic variants.
Minor improvements to create() interval in GenomeLocParser.
2011-11-10 10:58:40 -05:00
Mauricio Carneiro
0d8983feee
outputting the RG information
...
setReadGroup now sets the read group attribute for the GATKSAMRecord
2011-11-09 23:35:00 -05:00
Eric Banks
315ac68b0b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-09 22:37:36 -05:00
Eric Banks
6313aae2c4
Adding checks for hasBasePileup() before calling getBasePileup() as per GS thread
2011-11-09 22:37:26 -05:00
Ryan Poplin
74a18d3de8
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-09 22:29:40 -05:00
Ryan Poplin
24712c0221
Merged bug fix from Stable into Unstable
2011-11-09 22:28:27 -05:00
Ryan Poplin
8942406aa2
Use MathUtils to compare doubles instead of testing for equality
2011-11-09 22:05:21 -05:00
Ryan Poplin
348f2db7fd
Fix for HMM optimization. If the two penalty arrays match exactly the function should return the end of the array instead of 0.
2011-11-09 22:00:52 -05:00
Eric Banks
82bf09edf3
Mark Standard Annotations with an asterisk
2011-11-09 20:42:31 -05:00
Eric Banks
04b122be29
Fix for bug reported on GetSatisfaction
2011-11-09 20:33:36 -05:00
Mauricio Carneiro
d00b2c6599
Adding a synthetic read for filtered data
...
* Generalized the concept of a synthetic read to cread both running consensus and a synthetic reads of filtered data.
* Synthetic reads can now have deletions (but not insertions)
* New reduced read tag for filtered data synthetic reads *(RF)*
* Sliding window header now keeps information of consensus and filtered data
* Synthetic reads are created simultaneously, new functionality is controlled internally by addToSyntheticReads
2011-11-09 20:16:22 -05:00
Eric Banks
21bf43f3bb
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-09 15:34:40 -05:00
Eric Banks
02d5e3025e
Added integration test for intervals from bed file
2011-11-09 15:34:19 -05:00
Christopher Hartl
85bffe1dca
Merged bug fix from Stable into Unstable
2011-11-09 15:29:14 -05:00
Christopher Hartl
d828eba7f4
Allow comments in a table-formatted file to precede the header line.
2011-11-09 15:27:38 -05:00
Eric Banks
8205efbb29
Merge branch 'master' into intervals
2011-11-09 15:27:15 -05:00
Eric Banks
d64f8a89a9
Instead of the SelfScopingFeatureCodec interface, pushed this functionality into Tribble itself. Now we can e.g. determine that a file can be parsed by the BedCodec on the fly.
2011-11-09 15:24:29 -05:00
Mauricio Carneiro
f080f64f99
Preserve RG information on new GATKSAMRecord from SAMRecord
2011-11-09 14:39:20 -05:00
Mauricio Carneiro
f9530e0768
Clean unnecessary attributes from the read
...
this gives on average 40% file size reduction.
2011-11-09 14:39:20 -05:00
Mauricio Carneiro
9427ada498
Fixing no cigar bug
...
empty GATKSAMRecords will have a null cigar. Treat them accordingly.
2011-11-09 14:39:20 -05:00
Mark DePristo
e639f0798e
mergeEvals allows you to treat -eval 1.vcf -eval 2.vcf as a single call set
...
-- A bit of code cleanup in VCFUtils
-- VariantEval table to create 1000G Phase I variant summary table
-- First version of 1000G Phase I summary table Qscript
2011-11-09 14:35:50 -05:00
Christopher Hartl
149b79eaad
Merged bug fix from Stable into Unstable
2011-11-09 11:26:30 -05:00
Christopher Hartl
11abb4f9d1
Better error message.
2011-11-09 11:25:28 -05:00
Christopher Hartl
d3a533b82e
Revert "a"
...
This reverts commit 1175f50ddbf389f5da74d27dc725596582ae15af.
2011-11-09 11:22:26 -05:00
Christopher Hartl
5eaf800281
a
2011-11-09 11:22:20 -05:00
Christopher Hartl
5451fbc2b2
Merged bug fix from Stable into Unstable
2011-11-09 11:06:15 -05:00
Christopher Hartl
091229e4db
MVLikelihoodRatio now checks if the family string is provided before attempting to instantiate. Also check that variant contexts have both genotypes and genotype likelihoods.
...
Table codec now yells at users for not providing a HEADER with the table - parsing tables without a header line was causing the first line of the file to be eaten.
Table feature now has a toString method.
These are minor bug fixes.
2011-11-09 11:03:29 -05:00
Mauricio Carneiro
e1b4c3968f
Fixing GATKSAMRecord bug
...
when constructing a GATKSAMRecord from scratch, we should set "mRestOfBinaryData" to null so the BAMRecord doesn't try to retrieve missing information from the non-existent bam file.
2011-11-08 16:50:36 -05:00
Ryan Poplin
e973ca2010
fixing merge conflict.
2011-11-08 14:55:05 -05:00
Ryan Poplin
b0e6afec48
Bug fix for HMM optimization. Need to also check the gap continuation penalty array for the index with the first discrepancy.
2011-11-08 14:51:25 -05:00
Laurent Francioli
571c724cfd
Added reporting of the number of genotypes updated.
2011-11-08 15:15:51 +01:00
Ryan Poplin
94dc447a70
Merged bug fix from Stable into Unstable
2011-11-07 15:26:35 -05:00
Ryan Poplin
0b181be61f
Bug fix in SelectVariants when using a discordance track but no sample specifications. Added integration test to test this.
2011-11-07 15:25:16 -05:00
Ryan Poplin
0534149708
Merged bug fix from Stable into Unstable
2011-11-07 14:07:08 -05:00
Ryan Poplin
2d1e385ca4
Adding note to VQSR docs about Rscript being needed in the environment PATH.
2011-11-07 14:04:13 -05:00
Eric Banks
759f4fe6b8
Moving unclaimed walker with bad integration test to archive
2011-11-07 13:16:38 -05:00
Eric Banks
c1986b6335
Add notes to the GATKdocs as to when a particular annotation can/cannot be calculated.
2011-11-07 11:06:19 -05:00
Eric Banks
724e3f3b0d
Merged bug fix from Stable into Unstable
2011-11-06 22:23:22 -05:00
Eric Banks
cdd40d1222
Removing contracts for the SimpleTimer
2011-11-06 22:22:49 -05:00
Ryan Poplin
5c565d28b9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-06 10:26:19 -05:00
Eric Banks
3517489a22
Better --sample selection integration test for VE. The previous one would return true even if --sample was not working at all.
2011-11-06 01:07:49 -04:00
Eric Banks
1c4e429a1c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-06 00:05:56 -04:00
Eric Banks
a12bc63e5c
Get rid of support for bams without sample information in the read groups. This hidden option wasn't being used anyways because it wasn't hooked up properly in the AlignmentContext.
2011-11-05 23:54:28 -04:00
Eric Banks
ad57bcd693
Adding integration test to cover using expressions with IDs (-E foo.ID)
2011-11-05 23:53:15 -04:00
Eric Banks
90a053ea93
Don't change the mapping quality of MQ=255 reads in IR
2011-11-05 22:40:45 -04:00
Ryan Poplin
611a395783
Now properly extending candidate haplotypes with bases from the reference context instead of filling with padding bases. Functionality in the private Haplotype class is no longer necessary so removing it. No need to have four different Haplotype classes in the GATK.
2011-11-05 12:18:56 -04:00
Mark DePristo
e99871f587
Bug fix for decode loc
...
-- decodeLoc() wasn't skipping input header lines, so the system blew up when there was an = line being split.
2011-11-04 13:20:54 -04:00
Mark DePristo
a340a1aeac
Bug fix. decodeLoc() should update lineNo so you get meaningful line no when indexing
...
due to malformed VCF files.
2011-11-04 11:44:24 -04:00
Mark DePristo
9f260c0dc1
Zero byte index bug fix for RandomlySplitVariants + cleanup
...
-- vcfWriter2 was never being closed in onTraversalDone(), so the on the fly index file was being created but never actually properly written to the file.
-- This bug is ultimately due to the inability of the GATK to allow multiple VCF output writers as @Output arguments, though
-- Removed the unnecessary local variable iFraction, = 1000 * the input fraction argument. Now the system just uses a double random number and compares to the input fraction at all. Is there some subtle reason I don't appreciate for this programming construct?
2011-11-04 09:45:20 -04:00
Mauricio Carneiro
e89ff063fc
GATKSAMRecord refactor
...
The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...).
* No tools should create SAMRecord anymore, use GATKSAMRecord instead *
2011-11-03 15:43:26 -04:00
Laurent Francioli
385a6abec1
Fixed a bug that wrongly swapped the mother and father genotypes in case the child genotype missing.
2011-11-03 13:04:53 +01:00
Laurent Francioli
893787de53
Functions getAsMap and getNegLog10GQ now handle missing genotype case.
2011-11-03 13:04:11 +01:00
Eric Banks
e8bceb1eaa
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-02 21:13:54 -04:00
Eric Banks
78a00d2ddc
Updating UG integration tests (needed updating only because the -mbq default is different from the old -mmq one).
2011-11-02 21:13:44 -04:00
Eric Banks
52b16bf739
Must check whether there's a normal vs. extended pileup before asking for it.
2011-11-02 20:45:24 -04:00
Eric Banks
e1edd6bd12
Removing the min mapping quality argument since it wasn't being used in the normal processing of the pileups in UG - only for indel pileups. Instead, we apply the min base quality to the reads in the pileup for indels and define it to be the min 'confidence' of the base. Docs are updated but I didn't rename the argument as I don't want people to complain.
2011-11-02 20:32:58 -04:00
Ryan Poplin
e94fcf537b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-02 16:29:19 -04:00
Ryan Poplin
4d35272916
Bug fixes with Mauricio to functions in ReadUtils used by reduced reads and the haplotype caller.
2011-11-02 16:29:10 -04:00
Mark DePristo
8a2929c1dd
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-02 16:21:00 -04:00
Laurent Francioli
19ad5b635a
- Calculation of parent/child pairs corrected
...
- Separated the reporting of single and double mendelian violations in trios
2011-11-02 18:35:31 +01:00
Eric Banks
967ff647b8
Reduced reads shouldn't contribute to Fisher Strand calculations
2011-11-02 13:07:20 -04:00
Eric Banks
cf0e699226
QualByDepth was inefficiently iterating over the pileup 2 times for some reason. Removed non-useful annotation classes.
2011-11-02 12:58:38 -04:00
Eric Banks
4501dce58d
Fixing merge conflict
2011-11-02 12:50:32 -04:00
Eric Banks
54331b44e9
New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths.
2011-11-02 12:47:30 -04:00
Mark DePristo
392e0aeace
Moved unit tests into master IntervalUtilsUnitTest
2011-11-02 10:52:00 -04:00
Mark DePristo
c2b97030a4
IntervalUtils for completely balanced locus-based scatter/gather
...
-- scatterLocusIntervals master utility
-- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc
-- Util function for reversing a list (List<T> -> List<T>, unlike Collections version)
-- DoC is PartitionType.INTERVAL
-- Significant unit tests on new functionality (all passing)
-- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work
2011-11-02 10:49:40 -04:00
Laurent Francioli
119ca7d742
Fixed a bug in parent/child pairs reporting causing a crash in case the -mvf option was used and mother was not provided
2011-11-02 08:22:33 +01:00
Laurent Francioli
b91a9c4711
- Fixed parent/child pairs handling (was crashing before)
...
- Added parent/child pair reporting
2011-11-02 08:04:01 +01:00
Mark DePristo
5fc613f972
Better default partition types for walkers
...
-- Added PartitionType.READ, and associated ReadScatterFunction. ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better
-- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.
2011-11-01 19:47:10 -04:00
Mauricio Carneiro
36600fd8e9
added MQ of low MQ/BQ to consensus RMS
...
Bases that were excluded for MQ and BQ filters are now contributing to the MQ RMS (but not to consensus base counts and variant/not variant region triggers).
2011-11-01 17:46:12 -04:00
Mauricio Carneiro
b004489c6d
Moving ReduceRead TAG to GATKSAMRecord
...
ReduceReads are now a feature of a GATKSAMRecord, so the tag and the special methods needed to use it will now be housed by the GATKSAMRecord.
2011-11-01 17:12:09 -04:00
Mauricio Carneiro
17cc484dbd
Revert "ReduceReads ref bases are now output as '='
...
Reducing the reference bases to '=' results in an extra compression of 13% on average. The GATK is not ready to handle files with '=' bases, and the decision was to implement this a an engine support, not a part of ReduceReads.
2011-11-01 16:35:07 -04:00
Eric Banks
0839c75c8d
More minor fixes to docs
2011-10-31 21:49:27 -04:00
Eric Banks
74b018a1f3
Minor fixes to docs
2011-10-31 21:41:43 -04:00
Eric Banks
31ee5432c5
Merged bug fix from Stable into Unstable
2011-10-31 14:56:59 -04:00
David Roazen
cdde32acbd
Merged bug fix from Stable into Unstable
2011-10-31 14:21:15 -04:00
Eric Banks
f62af0291b
Check for invalid VCF records (not enough tokens) instead of assuming they are there.
2011-10-31 14:09:51 -04:00
Andrey Sivachenko
bed0acaed4
nWayOut now adds PG tag to the header as it should. Also, additional hidden option added: keepPGTags. If invoked, IndelRealigner PG tags from previous runs (if any) are kept in the header and the new PG tag is simply added, instead of overriding them
2011-10-31 12:28:28 -04:00
Mauricio Carneiro
389380a590
ReduceReads ref bases are now output as '=' to save space
...
Restructured the sliding window framework to manipulate a wrapped version of the SAMRecord that contains information about the reference.
2011-10-30 12:04:39 -04:00
Eric Banks
0ca7428e76
Allow processing of empty intervals, but warn user when this case is encountered.
2011-10-28 12:12:14 -04:00
Eric Banks
649dfe98f0
Add VCF header for any expressions that are requested
2011-10-28 10:22:19 -04:00
Eric Banks
8b1a62da27
Adding unit test to cover overlapping intervals from the same source with the intersection rule.
2011-10-28 09:59:43 -04:00
Eric Banks
057a79f598
This argument should be annotated as @Input
2011-10-28 09:44:49 -04:00
Eric Banks
4ba7c0cecd
Moving to private
2011-10-28 09:29:28 -04:00
Eric Banks
1bdd76c2f2
These tools now use the IntervalBinding system to handle intervals instead of doing it all manually
2011-10-28 09:28:12 -04:00
Eric Banks
6ba08a103d
Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory.
2011-10-28 09:23:25 -04:00
Eric Banks
3d04bb5608
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-27 23:55:18 -04:00
Eric Banks
19e27d4568
Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative.
2011-10-27 23:55:11 -04:00
Eric Banks
cafc245a43
For some reason, a class of Codecs (including TableCodec) require that a GenomeLocParser be passed in to do the position processing. Why can't they just return a Feature with chr, start, stop? Isn't that the right thing?
2011-10-27 23:54:28 -04:00
Guillermo del Angel
cbc43683ee
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-27 20:54:18 -04:00
Guillermo del Angel
8907e42007
First fully functional implementation of ValidationSiteSelectorWalker. User gives a) a set of input variants, b) a desired number of output variants, b) Optionally, a set of samples which will restrict sites to be polymorphic in those samples, c) a frequency selection mode: either uniform (no AF matching), or matching AF so that output sites mirror the input AF spectrum as closely as possible.
...
More testing is needed and docs need improving but so far all functionality seems up and running
2011-10-27 20:53:48 -04:00
Eric Banks
ccfd853b34
Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed.
2011-10-27 20:43:50 -04:00
Eric Banks
c2f343773e
Oops, working too quickly last time. This is the proper fix for the potential NPE in the equals() test.
2011-10-27 15:32:08 -04:00
Khalid Shakir
b80d407dc7
No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
...
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Eric Banks
8c4dbce6d8
Don't serialize the GATKArgumentCollection for the GATKRunReports (which would have meant dealing with the new IntervalBindings). Also, forgot to remove a test that's no longer relevant to BED parsing.
2011-10-27 13:58:19 -04:00
Eric Banks
4a7e6fee3f
Remove support for BED file interval parsing in the GATK; it should all go through Tribble now. IndelRealigner no longer supports unordered interval input (which shouldn't have been used anyways). Temporarily commenting out serialization of arguments so that tests pass; this whole piece will be deleted soon anyways.
2011-10-27 13:38:08 -04:00
Matt Hanna
f7df8bdecc
Merged bug fix from Stable into Unstable
2011-10-27 11:31:17 -04:00
Matt Hanna
41ddc7bce7
Make sure we output a full stack trace when we encounter Tribble error messages on VCF header merge.
2011-10-27 11:30:04 -04:00
Eric Banks
44f905b5e5
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-26 23:31:11 -04:00
Eric Banks
68283b1651
Fixing docs and adding GATKdocs for the new interval functionality
2011-10-26 22:14:43 -04:00
Mark DePristo
c9978316a3
Merge branch 'FragmentUtils'
2011-10-26 19:51:49 -04:00
Mauricio Carneiro
add9ad97ec
No scatter gather for VQSR or ApplyVQSR.
...
These walkers should not be scatter gatherable. Annotating them accordingly so that Queue doesn't allow a less than knowledgeable user to try and scatter/gather VQSR.
2011-10-26 16:35:44 -04:00
Ryan Poplin
74aeb22eeb
Merged bug fix from Stable into Unstable
2011-10-26 15:57:30 -04:00
Ryan Poplin
86871bd1e3
Throw a UserException in the BQSR when there is no data instead of creating an empty csv file
2011-10-26 15:56:41 -04:00
Mark DePristo
034a997d07
Generalized Reads -> Fragment calculation
...
-- Supports ReadBackedPileup -> FragmentCollection as before
-- Added support for List<SAMRecord> -> FragmentCollection for Ryan's haplotype caller
-- General cleanup, renaming, move to separate package, more extensive unit tests, etc.
-- Added toFragment() function to ReadBackedPileup interface
2011-10-26 15:54:38 -04:00
Eric Banks
2f21b6ecfb
Removed debugging output
2011-10-26 15:50:20 -04:00
Eric Banks
b39fcb1bea
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-26 15:44:25 -04:00
Eric Banks
b6ce6ed3f8
Go around the ROD system for now so that we can just call decodeLoc() for efficiency. Noted that we should go through the ROD system once it gets cleaned up. This means that currently gzipped files are not supported with -L.
2011-10-26 15:42:53 -04:00
Eric Banks
3273c20c98
Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes.
2011-10-26 15:29:18 -04:00
Eric Banks
9424e8b2ca
Initial working version of new interval system in which the argument for -L (and -XL) is allowed to be a rod file (e.g. VCF). Old samtools-style intervals still behave as before. BTI is no longer supported. The merging (union or intersection) of intervals is now consistently applied to all -L (or -XL) intervals, which is nice. More testing needed.
2011-10-26 14:11:49 -04:00
Mark DePristo
7fa943aef1
Renamed FragmentPileup to FragmentUtils
2011-10-26 14:01:45 -04:00
Laurent Francioli
1f044faedd
- Genotype assignment in case of equally likeli combination is now random
...
- Genotype combinations with 0 confidence are now left unphased
2011-10-26 19:57:09 +02:00
Laurent Francioli
81b163ff4d
Indentation
2011-10-26 14:49:12 +02:00
Laurent Francioli
62cff266d4
GQ calculation corrected for most likely genotype
2011-10-26 14:40:04 +02:00
Mark DePristo
af3613cc5f
GATKSAMRecord commit branch summary
...
First, I'm sure there's a better way to do this, but I wanted to create a single commit summarizing the changes from my branch SamRecordFactory. What's the best way to do this? Rebase?
Now, on to the changes here:
-- Picard added a SamRecordFactory that is used to create instances the subclass SamRecord or BAMRecord. This factory allows us to have low-level picard readers (SamFileReader) create objects of type GATKSamRecord. The abomination of the extends and contains GATKSamRecord is now gone. GATKSamRecords are now produced by this factory, the GATK provides this factory to our SamFileReaders, and everything works with GATKSamRecord just extending BAMRecord. This results in up to a 2x performance improvement in writing BAMs and a ~10% improvement when reading BAMs files.
-- As a consequence of this, we no longer officially support SAM records. Attempting to create SAMRecord objects with the factory will throw a user exception.
-- Created a standard NGSPlatform enum, and GATKSamRecords support efficiently obtaining this value. The real BQSR (not the copy indel version) got the efficient code to use this. Please add all future platforms to this enum.
-- GATKSamRecord no longer supports using the OQ or defaultBaseQuality. This is performed in a wrapper iterator that's only added when these command line options are used.
-- ReducedRead code has been moved from ReadUtils until efficiency caching assessors in GATKSamRecord.
-- ArtificialSamUtils creates GATKSamRecords now, just SAMRecords. Added code here to create artifical pairs and using that code to create artificial ReadBackedPileups with specific properties
-- New smarter algorithm for FragmentPileup. This new code is up to 3x faster than the previous version, and is lazy so is more efficient when no overlapping pairs are actually in the pileup. Created extensive DataProvider driven UnitTest. Added Caliper-based benchmarking system to characterize the performance differences between the old and new algorithms. TODO still remains to make a efficient version that works for non-pileups for the HaplotypeCaller
2011-10-25 20:52:56 -04:00
Mark DePristo
2822f0dc27
Merge branch 'SamRecordFactory'
2011-10-25 20:34:47 -04:00
Mark DePristo
1b722c21cf
merge master
2011-10-25 16:08:39 -04:00
Ryan Poplin
56fdf0b865
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-25 15:58:56 -04:00
Ryan Poplin
4a34c1862e
misc cleanup. We now filter out haplotypes when it is obvious that the assembly has failed to find a parsimonious event rather than use haplotypes with large numbers of SNPs and small indels on them.
2011-10-25 15:22:28 -04:00
David Roazen
2794e5c1d4
Modified the VCFJarClassLoadingUnitTest to play nice with the packaged-jar test targets.
2011-10-25 14:47:15 -04:00
Guillermo del Angel
b559936b7a
a)New variant eval stratification module for indel size. b) Next iteration on indel caller runtime optimization: when computing likelihood of each haplotype for a given read, many computations will be redundant since pieces of haplotypes will be common to both REF and ALT haplotypes. So, we keep HMM matrices from one haplotype to the next one and recompute starting at the part where either haplotype is different or GOP/GCP are different.
2011-10-25 09:56:43 -04:00
Khalid Shakir
fac9932938
Embedding gsalib source and queueJobReport R scripts in the dist and package jars.
...
Moved gsalib and queueJobReport.R to embeddable namespaced locations.
Updated packager dependencies/dir to add an @includes which filters the embedded fileset.
RScriptExecutor can now JIT compiles the gsalib.
RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG.
Refactored ProcessController and IOUtils from Queue to Sting Utils.
Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count.
Replaced uses of some IOUtils with Apache Commons IO.
ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown.
Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().
2011-10-24 15:58:34 -04:00
Khalid Shakir
89a581a66f
Added ability to specify arguments in files via -args/--arg_file
...
Pushing back downsample and read filter args so they show up in getApproximateCommandLineArgs()
2011-10-24 15:58:34 -04:00
Mark DePristo
502592671d
Cleanup FragmentPileup before main repo commit
...
-- removed intermiate functions. Now only original version and best optimized new version remain
-- Moved general artificial read backed pileup creation code into ArtificialSamUtils
2011-10-24 14:40:05 -04:00
Mark DePristo
166174a551
Google caliper example execution script
...
-- FragmentPileup with final performance testing
2011-10-24 14:04:53 -04:00
Laurent Francioli
62477a0810
Added documentation and comments
2011-10-24 13:45:21 +02:00
Laurent Francioli
38ebf3141a
- Now supports parent/child pairs
...
- Sites with missing genotypes in pairs/trios are handled as follows:
-- Missing child -> Homozygous parents are phased, no transmission probability is emitted
-- Two individuals missing -> Phase if homozygous, no transmission probability is emitted
-- One parent missing -> Phased / transmission probability emitted
- Mutation prior set as argument
2011-10-24 12:30:04 +02:00
Laurent Francioli
7312e35c71
Now makes use of standard Allele and Genotype classes. This allowed quite some code cleaning.
2011-10-24 10:25:53 +02:00
Laurent Francioli
01b16abc8d
Genotype quality calculation modified to handle all genotypes the same way. This is inconsistent with GQ output by the UG but is correct even for cases of poor quality genotypes.
2011-10-24 10:24:41 +02:00
Mark DePristo
f6ccac889b
Merged bug fix from Stable into Unstable
2011-10-23 16:37:12 -04:00
Mark DePristo
585a45b7a3
Bug fix for ClipReadsWalker when stats output isn't provided
...
-- See http://getsatisfaction.com/gsa/topics/clipreadswalker?utm_content=topic_link&utm_medium=email&utm_source=reply_notification
2011-10-23 16:36:48 -04:00
Ryan Poplin
f5d910b8a5
Haplotype caller now sends genotype likelihoods to the exact model to genotype the events found in the best haplotypes.
2011-10-23 13:29:08 -04:00
Mark DePristo
42bf9adede
Initial version of "fast" FragmentPileup code
...
-- Uses mayOverlapRoutine in ReadUtils
-- Attempts to be smart when doing overlap calculation, to avoid unnecessary allocations
-- PileupElement now comparable (sorts on offset than on start)
-- Caliper microbenchmark to assess performance
2011-10-22 21:36:37 -04:00
Mauricio Carneiro
4913f8a60f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-21 17:45:07 -04:00
Mauricio Carneiro
102dafdcbc
Validation of GATKSamRecord in read filters
...
Moved the validation of the GATKSamRecord to the MalformedReadFilter with the intent to make the read filter the ultimate validation location for sam records. This way we can opt to filter out malformed reads if we know what we are doing or blow up otherwise.
2011-10-21 17:40:43 -04:00
Guillermo del Angel
f4b409fa0d
CombineVariants bug fix: when merging records with disparate alleles we were leaving AC,AF fields intact. This had as a consequence that we could end up with a record with 3 alt alleles but only 2 values in AC,AF fields. Now, if alleles in combined vc are different from original, and if AC,AF fields can't be recomputed from genotypes, we remove attributes from vc map since they'll be invalid anyway. Integration test md5 changed since there were several badly merged records in result
2011-10-21 14:07:20 -04:00
Mark DePristo
b863390cb1
Moving reduced read functionality into GATKSAMRecord
...
-- More functions take / produce GATKSAMRecords instead of SAMRecord
2011-10-21 13:28:05 -04:00
Mark DePristo
2403e96062
Renamed GATKSamRecord -> GATKSAMRecord for consistency. Better docs.
2011-10-21 09:59:24 -04:00
Mark DePristo
110e13bc1e
Merge branch 'master' into SamRecordFactory
2011-10-21 09:43:52 -04:00
Mark DePristo
be797a8a1f
Recalibrator now uses the much more efficient NGSPlatform in the cycle covariates system
2011-10-21 09:39:21 -04:00
Mark DePristo
ed74ebcfa1
GATKSamRecords with efficiency NGSPlatform method
2011-10-21 09:38:41 -04:00
Mark DePristo
94e1898d8f
A canonical set of NGS platforms as enums with convenient manipulation methods
2011-10-21 09:37:45 -04:00
Laurent Francioli
edea90786a
Genotype quality is now recalculated for each of the phased Genotypes. Small problem is that we unnecessarily loose a little precision on the genotypes that do not change after assignment.
2011-10-20 17:04:19 +02:00
Laurent Francioli
1c61a57329
Original rewrite of PhaseByTransmission:
...
- Adapted to get the trio information from the SampleDB (i.e. from Pedigree file (ped)) => Multiple trios can be passed as argument
- Mendelian violations and trio phasing possibilities are pre-calculated and stored in Maps. => Runtime is ~3x faster
- Genotype combinations possible only given two MVs are now given a squared MV prior (e.g. 0/0+0/0=>1/1 is given 10^-16 prior if the MV prior is 10^-8)
- Corrected bug: In case the best genotype combination is Het/Het/Het, the genotypes are now set appropriately (before original genotypes were left even if they weren't Het/Het/Het)
- Basic reporting added:
-- mvf argument let the user specify a file to report remaining MVs
-- When the walker ends, some basic stats about the genotype reconfiguration and phasing are output
Known problems:
- GQ is not recalculated even if the genotype changes
Possible improvements:
- Phase partially typed trios
- Use standard Allele/Genotype Classes for the storage of the pre-calculated phase
2011-10-20 13:06:44 +02:00
Laurent Francioli
ef6a6fdfe4
Added getAsMap -> returns the likelihoods as an EnumMap with Genotypes as keys and likelihoods as values.
2011-10-20 12:49:18 +02:00
Laurent Francioli
76dd816e70
Added getParents() -> returns an arrayList containing the sample's parent(s) if available
2011-10-20 12:47:27 +02:00
Mark DePristo
999a8998ae
Constructor for GATKSamRecord with header only, for unit testing
2011-10-19 17:51:48 -04:00
Mark DePristo
3227143a1c
Systematic test code for FragmentPileup
...
-- Creates all combinatinos of overlapping and non-overlapping read pair pileups in all orientations and first/second pairings to validate fragment detection.
2011-10-19 17:50:27 -04:00
Mark DePristo
bba69701b5
Now creates GATKSamRecords now SamRecords
2011-10-19 17:49:17 -04:00
Christopher Hartl
cd8a6d62bb
You know how the wiki has a big section on commiting local changes to BRANCHES of the repository you clone it from? Yeah. It sucks if you don't do that.
...
This commit contains:
- IntronLossGenotyper is brought into its current incarnation
- A couple of simple new filters (ReadName is super useful for debugging, MateUnmapped is useful for selecting out reads that may have a relevant unaligned mate)
- RFA now matches my current local repository. It's in flux since I'm transitioning to the new traversal type.
+ the triggering read stash pilot required me to change the scope of some of the variables in the ReadClipping code, private -> protected. Those are all the changes there.
- MendelianViolation restored to its former glory (and an annotator module that uses the likelihood calculation has been added)
+ use this rather than a hard GQ threshold if you're doing MV analyses.
- Some miscellaneous QScripts
2011-10-19 17:42:37 -04:00
Mark DePristo
52345f0aec
Meaningful documentation string
2011-10-19 15:47:36 -04:00
Mark DePristo
1b38aa1a7e
Cleaning up reduced read code accessors
2011-10-19 15:46:44 -04:00
Eric Banks
d8d73fe4f2
Treat ./X genotypes as MIXED so that isHet, isHom, etc. still return the expected and correct values. Added docs to these accessors with contracts explicitly mentioned. Fixed case where NPE could be thrown.
2011-10-19 15:11:13 -04:00
Mark DePristo
7928b287fc
GATKSamRecord now produced by SAMFileReaders by default
...
-- Removed all of the unnecessary caching operations in GATKSAMRecord
-- GATKSAMRecord renamed to GATKSamRecord for consistency
2011-10-19 13:15:27 -04:00
Eric Banks
5a6468c11e
Allowing ./X genotypes and adding a unit test to ensure that this case is covered from now on (especially given that we may want to revert in the future). Reverting this change is really easy and entails uncommenting a few lines of code. But for now, despite Mark's objections, this case is allowed in the VCF spec and we are wrong not to allow it.
2011-10-19 11:52:05 -04:00
Eric Banks
48c4a8cb33
Make error messages clearer (even I was confused)
2011-10-19 11:49:16 -04:00
Eric Banks
6cadaa84c9
Just use validate() from super class since it does the same thing
2011-10-19 11:48:23 -04:00
Mark DePristo
df3e4e1abd
First working code to use SamRecordFactory to produce objects of our own design in SAMFileReader
2011-10-19 11:22:35 -04:00
Mauricio Carneiro
c27e2fb676
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-18 15:23:05 -04:00
Mark DePristo
f77f2eeb7d
Fix for new ID structure
2011-10-18 13:04:43 -04:00
Mark DePristo
1a92ee3593
No longer adds a binding of ID -> . when the ID field is dot in the VCF
...
-- Really we should make ID a primary key in VariantContext. Putting it into the attributes is just annoying now
2011-10-18 10:57:02 -04:00
Ryan Poplin
e45fcb66eb
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-17 15:56:19 -04:00
Ryan Poplin
1e6794c539
fixing typo in VariantsToTable docs
2011-10-17 15:56:02 -04:00
Mark DePristo
0de8550f17
Merged bug fix from Stable into Unstable
2011-10-17 15:29:53 -04:00
Mark DePristo
c1329c4dde
Fixing a binary to logical or
2011-10-17 15:29:45 -04:00
Mark DePristo
9e4963efc8
Merged bug fix from Stable into Unstable
2011-10-17 15:27:38 -04:00
Mark DePristo
ec911ce5bb
Even better error messages
2011-10-17 15:27:22 -04:00
Mark DePristo
d065bf1715
Merged bug fix from Stable into Unstable
2011-10-17 15:25:47 -04:00
Mark DePristo
a7cf9cdc67
Fixing error message typo
2011-10-17 15:25:35 -04:00
Ryan Poplin
589df6b7cf
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-17 14:35:14 -04:00
Ryan Poplin
6b02354d84
Adding a new getter in VariantsToTable to extract the indel event length.
2011-10-17 14:34:52 -04:00
Mark DePristo
3550798c4c
Merged bug fix from Stable into Unstable
2011-10-17 13:58:56 -04:00
Mark DePristo
4108a294f7
Better error message when a RodBinding file doesn't exist
2011-10-17 13:58:46 -04:00
Mark DePristo
cc76826f78
Merged bug fix from Stable into Unstable
2011-10-17 13:38:11 -04:00
Mark DePristo
09a09cacef
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable
2011-10-17 13:38:00 -04:00
Mark DePristo
fd4540cd32
Fixed extraordinarily subtle race condition with contracts invariant
...
-- all of the methods in the class must be synchronized or the internal state can be inconsistent with the contract invariant when entering the class in a non-synchronized method, even when that method doesn't care about the object's internal state
2011-10-17 13:37:55 -04:00
David Roazen
88d6b8bc1f
Merged bug fix from Stable into Unstable
2011-10-14 20:13:38 -04:00
David Roazen
bd8bb93811
Split RScriptExecutorUnitTest into public and private test classes.
...
We can't have a public test that depends on both public and private
code/data -- the new release system needs to do public-only tests,
and will catch this sort of thing.
2011-10-14 20:04:42 -04:00
David Roazen
4f01a742cb
Merged bug fix from Stable into Unstable
2011-10-13 21:39:52 -04:00
David Roazen
edfd6f8a06
Removing a public -> private dependency from the test suite.
...
The public integration test VariantContextIntegrationTest was dependent on the
private walker TestVariantContextWalker. Moved this walker to public/java/test
(NOT public/java/src, since this walker is only used by the test suite) to avoid
errors during public-only tests.
2011-10-13 21:32:52 -04:00
Mark DePristo
404ef741f1
Merged bug fix from Stable into Unstable
2011-10-13 18:02:06 -04:00
Mark DePristo
2ebdff074c
Update MD5s for SOLiD recalibration
...
-- MD5 db had spelling error; fixed
-- Bug in AlignmentUtils resulted in some bases not being color space corrected. The integration test caught the change, and it's clear that the new version is correct, as the prev. version was not considering the last the N qualities for reads with a ND operation.
2011-10-13 18:01:51 -04:00
Mark DePristo
5a881360df
Merged bug fix from Stable into Unstable
2011-10-13 15:54:43 -04:00
Mark DePristo
7cab6f6bb0
Bug fixes for thread unsafe simple timer and bad Ns treatment in AlignmentUtils
...
-- SimpleTimer is now threadsafe using synchronized method keywords
-- Bug fix for alignmentToByteArray() where the N case was refPos++ not the now correct refPos += elementLength
2011-10-13 15:53:12 -04:00
Mauricio Carneiro
e12ffb6547
Updating docs for GCContentByInterval
...
This walker does not take any BAMs. It only walks over the reference.
2011-10-13 13:27:00 -04:00
Eric Banks
9aecd50473
Adding ability to exclude annotations from the VA and UG lists. As described in the docs, this argument trumps all others (including -all) so that we can get around the SnpEff issue brought up by Menachem. Added integration test for it.
2011-10-12 15:44:54 -04:00
Mauricio Carneiro
e53a952aeb
Added ION Torrent support to CountCovariates.
2011-10-12 01:57:02 -04:00
Mauricio Carneiro
a2733a451f
Added NotCalled feature to GAV
...
Added "not called" and "no status" to the truth table. Very useful.
2011-10-11 19:31:45 -04:00
David Roazen
ae83420637
Merged bug fix from Stable into Unstable
2011-10-11 12:26:08 -04:00
David Roazen
794f275871
SnpEff is now marked as a RodRequiringAnnotation instead of an ExperimentalAnnotation.
...
Having SnpEff grouped with the Experimental annotations was proving problematic, since it
requires a rod. Placing it in its own group should improve the situation somewhat, making it
easier to request "all annotations except for SnpEff".
2011-10-11 12:08:56 -04:00
David Roazen
cfd0ac8410
Merged bug fix from Stable into Unstable
...
Conflicts:
public/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java
2011-10-11 12:03:51 -04:00
David Roazen
24b72334b3
UnifiedGenotyper now correctly initializes the VariantAnnotator engine.
...
This allows the annotation classes to perform any necessary initialization/validation.
For example, it allows the SnpEff annotator to (among other things) validate its rod binding.
This will prevent a NullPointerException when SnpEff annotation is requested but no rod binding
is present.
Added an integration test to cover this case so that it doesn't break again.
2011-10-11 12:02:05 -04:00
Guillermo del Angel
0429b38021
Merged bug fix from Stable into Unstable
2011-10-11 11:19:38 -04:00
Guillermo del Angel
1c485d8b5e
Forgot that no matter how trivial a change it's a good idea to compile first
2011-10-11 11:18:41 -04:00
Guillermo del Angel
6418f4d69b
Merged bug fix from Stable into Unstable
2011-10-11 11:13:18 -04:00
Guillermo del Angel
1975de1b32
Second try: hide --do_indel_quality in AnalyzeCovariates
2011-10-11 11:11:29 -04:00
Guillermo del Angel
6506ea83e8
Revert "Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users"... a hidden passenger change made it through.
...
This reverts commit 70e10ccb1be90dcff8f4485ae6ee036db2d1ac86.
2011-10-11 11:03:12 -04:00
Guillermo del Angel
4c1d8c8d44
Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users
2011-10-11 11:01:06 -04:00
Eric Banks
77c983c5b5
No one claimed this walker and it doesn't have integration tests or GATKdocs so it doesn't belong in public.
2011-10-10 15:17:54 -04:00
Mark DePristo
fb72bcf732
DiffObjects no longer prints out the file name in the status so MD5 are stable
2011-10-10 15:10:57 -04:00
Mark DePristo
e3ff4f4266
Failing MD5 because output now contains absolute path
2011-10-10 11:05:02 -04:00
Mark DePristo
3e6c16d961
CombineVariants preserves allele order
2011-10-10 11:04:38 -04:00
Mark DePristo
a4bb842958
RankSum tests have lightly different MD5 results based on allele order
...
-- UG GENOTYPE_GIVEN_ALLELES now uses the order of alleles in the VCF, so this changes the MD5
2011-10-10 11:04:07 -04:00
Mark DePristo
46e7370128
this.allele, getAlleles(), and getAltAlleles() now return List not set
...
-- Changes associated code throughout the codebase
-- Updated necessary (but minimal) UnitTests to reflect new behavior
-- Much better makealleles() function in VC.java that enforces a lot of key constraints in VC
2011-10-09 11:45:55 -07:00
Mark DePristo
822654b119
UnitTests for allele getting functions in VC in prep for move from set to list
2011-10-09 10:36:14 -07:00
Mark DePristo
c67f6c076b
simpleMerge now preserves allele order
...
-- UnitTests for dangerous PL merging cases in the multi-allelic case. The new behavior is correct
2011-10-08 17:39:53 -07:00
Mark DePristo
e94e6ba101
A UnitTest to ensure that the order of alleles is maintained
...
-> A, C, T and A, T, C are different and must be maintained. The constructors were doing this appropriately, so nothing needed to be changed
2011-10-08 08:47:58 -07:00
Mark DePristo
ec14a4a606
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-07 08:38:50 -07:00
Matt Hanna
6fbd41724a
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-07 11:20:00 -04:00
Matt Hanna
4514bc350f
More reliable way of finding the Tribble jar.
2011-10-07 11:19:29 -04:00
Eric Banks
181c76750e
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 22:38:55 -04:00
Eric Banks
ca9cd9b688
Minor fix for merging intervals which hadn't been necessary when only merging from the left to right. Added integration tests to cover the parallelization of RTC.
2011-10-06 22:38:44 -04:00
Khalid Shakir
f91b015e0e
Made the BaseTest.testDir absolute
2011-10-06 22:33:21 -04:00
Mark DePristo
c7864c7256
Filter application order is now deterministic, in the order defined by the walker
...
-- For no apparent reason we were using a HashSet to store the ReadFilters, so the order of operations was really arbitrarily applied. The order now is
(1) the order of the walker intrinsic filters
(2) read group black list (if provided)
(3) command line filters (if provided)
2011-10-06 18:51:40 -07:00
Mark DePristo
0b88af4af9
Counts of records failing filters are displayed sorted
...
-- Stops random ordering of the output, as the counts are returned sorted by string name of the class
-- Deleted now unused sh*tty assessors in Utils
2011-10-06 18:42:26 -07:00
Mark DePristo
d1e70d6ec2
Removed Nx counting of reads in metrics with -nt > 1
2011-10-06 18:29:26 -07:00
Eric Banks
c61804a450
Rename the long version of the argument name to more accurately reflect its purpose.
2011-10-06 16:14:04 -04:00
Eric Banks
61a3dfae24
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 15:58:04 -04:00
Eric Banks
6eb87bf58a
RTC now caches all intervals as GenomeLocs (which is expected to take < 1Gb whole genome based on back of the envelope calculations with Matt) so that 1) we don't have to worry about emitting outside of the leaves in the hierarchical reductions and 2) we can emit the intervals in sorted order which is a big performance plus for the realigner. Integration tests change only because intervals whose start=stop are now printed as chr:start instead of chr:start-stop.
2011-10-06 15:57:49 -04:00
Mark DePristo
6d9c210460
Updating MD5s for updated BAM with read groups
2011-10-06 12:15:48 -07:00
Mark DePristo
ab357ef900
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 10:50:02 -07:00
Eric Banks
1b0735f0a3
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 13:41:45 -04:00
Eric Banks
c4dfc1fb8b
Temporary commit of parallelization support for RealignerTargetCreator. Tim begged us for this and I got assurances from Khalid/Matt that this would also be extremely helpful for the whole genome calling pipeline, so I spent a while working on this. Needs to be fixed up though because apparently only the leaves in the hierarchical reduce get their output aggregated. Worked out a better solution with Matt.
2011-10-06 13:41:36 -04:00
Matt Hanna
3961733590
Merged bug fix from Stable into Unstable
2011-10-06 12:54:52 -04:00
Matt Hanna
4fa5045e84
Abandoning classfileset/rootfileset approach due to difficulting managing
...
classloading of bcel*.jar/ant-apache-bcel*.jar. Switching instead to manually
specifying a minimal set of packages/classes to include in the vcf.jar via
build.xml, and adding a unit test which creates a limited classloader
only aware of vcf.jar and tribble.jar and tries to use it to load the core
classes in the vcf jar.
Hopefully third time's the charm.
2011-10-06 12:49:51 -04:00
Mark DePristo
73f9d1f217
GATK read group requirement iron hand
...
-- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined
-- LIBS again throws an error if the complete list of samples isn't provided
-- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data.
-- Convenience constructors for UserExceptions.MalformedBAM
2011-10-06 08:40:35 -07:00
Mark DePristo
23845ac798
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 08:17:08 -07:00
Mark DePristo
4b5b9155a9
Fixed bad expected value in PedReaderUnitTest
2011-10-06 08:16:47 -07:00
Mark DePristo
daa5999489
Fixed typo in argument description
2011-10-06 08:16:25 -07:00
Guillermo del Angel
8a474e38ff
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 10:08:39 -04:00
Guillermo del Angel
93f7e632bd
Minor fix/enhancement for VariantEval: if a vcf has symbolic alleles, program would crash ungracefully - now we'll just skip record without processing. This is a big issue since we can't process 1000G integration files with code as is.
2011-10-06 10:07:46 -04:00
Mark DePristo
190be4d0d1
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-05 21:27:11 -07:00
Mark DePristo
8e6845806a
Allowing empty samples list in LIBS
...
-- Right now we cannot process BAM files without read groups because we enforce the samples list to not be empty when there's a SAM record. Now if there are reads and there are no samples we add the "null" sample so that LIBS walks the reads properly
2011-10-05 21:26:21 -07:00
Matt Hanna
180c8f286f
Merged bug fix from Stable into Unstable
2011-10-05 20:37:43 -04:00
Matt Hanna
55b9f06527
Ensure that IndelRealigner n-way out option supports MD5 generation.
2011-10-05 20:36:28 -04:00
Mark DePristo
be2d29ce69
Final PED documentation
2011-10-05 15:17:41 -07:00
Mark DePristo
3226d5dc0d
Merge branch 'master' into ped
2011-10-05 15:03:09 -07:00
Mark DePristo
6a573437af
Details documentation arguments for -ped
2011-10-05 15:00:58 -07:00
Mark DePristo
e7c80f7c45
Renaming quantitative trait to OtherPhenotype which is now a String not a double
...
-- we can now use PED file to represent population data or other arbitrary phenotype data, not just doubles
2011-10-05 12:26:33 -07:00
Mark DePristo
51ecc20867
getFamily() and associated methods implemented and tested
...
-- Sample no longer serializable
-- Sample now implements Comparable
2011-10-05 09:55:05 -07:00
Mark DePristo
f4bac58f14
Merged bug fix from Stable into Unstable
2011-10-04 21:00:34 -07:00
Mark DePristo
d1d39943d0
Updating MD5 for BAMs that I added a read group to, part 2
2011-10-04 21:00:15 -07:00
Mark DePristo
9bd3ba4c7e
Missed one MD5
2011-10-04 16:04:52 -07:00
Mark DePristo
ffdfdcde3f
Updating MD5s
...
-- Interval test now uses RG containing BAM
-- DoC sample name ordering has changed.
2011-10-04 15:54:45 -07:00
Mark DePristo
a45d985818
TODO method stubs
2011-10-04 15:54:09 -07:00
Mark DePristo
463eab7604
All MD5 mismatches for test are shown
...
-- Now for tests like DoC, with 20 output md5s, you see all of the differences before failing.
2011-10-04 15:53:52 -07:00
Mark DePristo
c642a080d4
Merged bug fix from Stable into Unstable
2011-10-04 14:08:41 -07:00
Mark DePristo
941317167e
Updating MD5 for BAMs that I added a read group to
2011-10-04 14:08:00 -07:00
Mark DePristo
e1d6c7a50a
Updating MD5 that have changed due to sample ordering differences
2011-10-04 09:33:23 -07:00
Mark DePristo
343a7b6b2f
Updating UG integration tests for arbitrary impact of sample order changes on downsampling
2011-10-04 08:14:00 -07:00
Mark DePristo
fee89e47ff
Only throws an error when there are no samples but there are reads
...
-- Handles the case when you are running a ROD traversal and yet the LIBS is still used to return null everywhere.
2011-10-04 06:50:54 -07:00
Mark DePristo
f552aede42
Only provide the sample names in the BAM file for efficiency
2011-10-04 06:50:12 -07:00
Mark DePristo
a27641e1fc
Cleaned up imports
2011-10-04 06:28:36 -07:00
Mark DePristo
b20689ff55
No longer supports extraProperties
...
-- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem
-- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record. If the two records are inconsistent, an error is thrown
-- addSample() in Sample.class now invokes mergeSample() when appropriate
-- Validation types are now only STRICT or SILENT
-- Validation code implemented in SampleDBBuilder
-- Extensive unit tests for SampleDBBuilder
2011-10-03 19:20:33 -07:00