Mark DePristo
e055a78f6e
LIBS now requires at least one sample be present
...
-- UnitTest provides a "null" sample for matching the reads without read groups
2011-09-30 09:49:35 -04:00
Mark DePristo
9860a2c989
Merge branch 'master' into ped
2011-09-30 09:28:18 -04:00
Mark DePristo
d901fed617
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-30 08:41:44 -04:00
Mauricio Carneiro
cabacf028d
Intermediate commit to fix interval skipping
...
may need additional testing.
2011-09-29 18:45:12 -04:00
Mark DePristo
b71b51751e
Bug fix for UnitTest
...
-- Provide the null sample to the LIBS, as this seems to be required for correctly passing this unit test
-- Will be fixed in a future update
2011-09-29 17:30:01 -04:00
Mark DePristo
1765fbeb6b
Merge branch 'master' into ped
2011-09-29 17:18:51 -04:00
Mark DePristo
98ecaf8aa0
Support for ReducedReads with reduced counts and average quals
...
-- ReadUtils and UnitTest updated to support new byte[] style
-- Removed unnecessary read transformer in PairHMM
2011-09-29 17:18:39 -04:00
Mauricio Carneiro
9508220157
fixed hard clipping both ends inside deletion
...
If both ends of the interval falls within a deletion in the read then hardClipBothEnds would cut the right tail first including the entire deletion, then fail to cut the left tail because there would not be any bases there anymore. Fixed.
2011-09-29 15:36:49 -04:00
Mark DePristo
9458f01409
Test cleanup of Sample object
2011-09-29 15:13:05 -04:00
Mark DePristo
625ffb6a07
LocusIteratorByState and ReadBackedPileups no long use Sample
2011-09-29 14:52:11 -04:00
Mark DePristo
b3a2371925
Merge branch 'master' into ped
2011-09-29 14:32:17 -04:00
Mark DePristo
68761a6e28
Removed sample from header
2011-09-29 14:13:05 -04:00
Mauricio Carneiro
a5e75cd14c
Outputting both consensus base qualities and counts
...
The base qualities of a consensus reads are now the average quality of the bases forming the consensus base (most common base) and the consensus quality tag now carry an array with the counts of each base in the consensus. This should increase file size but improve calling sensitivity/specificity.
2011-09-29 12:54:41 -04:00
Mark DePristo
505416b6c0
Merge branch 'master' into ped
2011-09-29 12:22:39 -04:00
Mauricio Carneiro
4086fa768f
Disabling all ReadClipperUnitTests
2011-09-29 12:20:35 -04:00
Mark DePristo
9536845e35
Cleaning up unused code in MV
2011-09-29 12:20:07 -04:00
Mark DePristo
5043d76c3d
Removing more bad uses of SampleDataSource creation
2011-09-29 12:16:34 -04:00
Mark DePristo
5c9227cf5e
Further cleanup of Sample database
...
-- Removing more and more unnecessary code
-- Partial removal of type safe Sample usage. On the road to SampleDB only
2011-09-29 11:50:05 -04:00
Mark DePristo
2a0cd556d3
Further cleanup of Sample
...
-- Cleaned up interface functions in GAE
-- Added Walker.getSampleDB() function which is an easier option for tools to get the samples db
2011-09-29 10:34:51 -04:00
Mark DePristo
e76f381628
Moved sample package from DataSources to gatk, and renamed it samples
...
-- All associated changes to the codebase are just header updates
2011-09-29 09:57:15 -04:00
Mark DePristo
e197dcd1f3
Pre-cleanup commit of Sample and SampleDataSource
...
-- SampleDataSource has all reader functionality disabled
2011-09-29 09:44:18 -04:00
Mark DePristo
4d31673cc5
No longer supporting YAML file allows us to delete 75% of the sample's codebase
2011-09-29 09:43:31 -04:00
Ryan Poplin
e366ee18bc
Adding ability to read in and make use of kmer quality tables during HMM evaluation
2011-09-29 07:46:19 -04:00
Mauricio Carneiro
fc86cd6fd8
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/carneiro/gatk/RR into rr
2011-09-29 00:12:15 -04:00
Roger Zurawicki
4fd5630f6a
Added ReadClipper Unit Test
...
* Includes tests that include HardClip to Read and Reference Coords.
* Changed ReadUtils.HardClipByReferenceCoordinates from private to protected to allow for testing
2011-09-28 23:13:50 -04:00
Matt Hanna
9272ed03b5
Merged bug fix from Stable into Unstable
2011-09-28 21:26:43 -04:00
Matt Hanna
0acaf2df65
Fix an embarrassing issue where a specific configuration of minimal coverage
...
over small intervals could cause reads to be dropped from the pileup. Nothing
to see here...
2011-09-28 21:23:01 -04:00
Guillermo del Angel
c8d3a720f9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 18:17:34 -04:00
Guillermo del Angel
7e3cb45093
Further performance optim in banded hmm, about 60% speed improvement over current implementation now
2011-09-28 16:27:28 -04:00
Ryan Poplin
1b1ca80df2
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 16:17:39 -04:00
Ryan Poplin
3b73dc89fe
Making several esoteric arguments in the BQSR @Hidden. Adding basic support for Complete Genomics machine cycle.
2011-09-28 16:17:31 -04:00
Mauricio Carneiro
ff2f4df043
Fixed hardclipping inside indel (right tail)
...
when hard clipping the right tail of a read falls inside a deletion, clipping should fall back to the last base before the deletion to follow the ReadClipper's contract.
2011-09-28 16:07:34 -04:00
Mauricio Carneiro
3c7b7f74ef
Optimized interval iteration
...
Using a TreedSet to manipulate getToolkit.getIntervals() and being smart about which intervals to test makes interval clipping O(1) instead of O(n).
2011-09-28 16:07:34 -04:00
Mauricio Carneiro
5c9b659c02
clipping both ends of the reads was modifying the original read
...
This goes against the ReadClipper contract, and was affecting the second part of the read that spans over multiple intervals. Fixed.
2011-09-28 16:07:34 -04:00
Guillermo del Angel
fe23e4d10c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 15:53:11 -04:00
Guillermo del Angel
e2b9030e93
First mostly fully functional implementation of banded pair HMM likelihood computation for indel caller. More experimentation to follow but it right now works in small data sets and at least it doesn't break existing things. Disabled by default at this point
2011-09-28 15:51:48 -04:00
Eric Banks
1b45f21774
Removing this command-line tool. Purposely not doing this in stable so that users who may still use it have time to find other options. But the docs are no longer on the wiki.
2011-09-28 13:18:32 -04:00
Eric Banks
1f0e354fae
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 13:13:21 -04:00
Eric Banks
bb619a9a3c
Fixing docs
2011-09-28 13:13:03 -04:00
Mark DePristo
5812004e06
Merge branch 'stable'
2011-09-28 11:36:40 -04:00
Mark DePristo
a5006831d7
Shows "" not empty space when default string value is ""
2011-09-28 11:35:52 -04:00
Mark DePristo
1e32281a15
Fix to not show -null when missing short name argument
2011-09-28 11:31:20 -04:00
Mauricio Carneiro
89544c209c
Fixing contracts
...
changed return type to Pair, changing contracts accordingly.
2011-09-28 11:19:17 -04:00
Eric Banks
eacbee3fe5
Merged bug fix from Stable into Unstable
2011-09-27 20:35:18 -04:00
Eric Banks
43b0c98298
Fix docs
2011-09-27 20:34:46 -04:00
Eric Banks
232a6df11c
Add longhand form to the error message.
2011-09-27 20:29:31 -04:00
Eric Banks
1d6fcb6eb1
Revert "Add longhand form to the error message to prevent users from posting borderline dumb posts to GS."
...
This reverts commit 75b2600527cfce05ae683cb394290ff2a80e8552.
2011-09-27 20:27:00 -04:00
Eric Banks
269b9826b6
Add longhand form to the error message to prevent users from posting borderline dumb posts to GS.
2011-09-27 20:26:36 -04:00
Mauricio Carneiro
3b6e43b7c4
Use reads that span multiple intervals
...
* RR will now compress reads that span across multiple intervals correctly and output them in the correct order.
* Fixed bug in getReadCoordinateForReferenceCoordinate where if the requested reference coordinate fell inside a deletion in the read the read would be clipped up to one element past the deletion.
2011-09-27 18:39:06 -04:00
Khalid Shakir
84bd355690
Merged bug fix from Stable into Unstable
2011-09-27 14:34:39 -04:00
Khalid Shakir
b090751f62
Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths.
...
Updates to HybridSelectionPipeline:
- Added annotations back via snpEff
- Minor updates to VQSR paths and lowered memory
2011-09-27 14:33:57 -04:00
Eric Banks
26e71f6688
The Omni files have multiple records (with the same ALT) at a particular location, with one PASSing and the other(s) filtered. Chris, this is why using this file as both eval and comp leads to ref/no-call cells in the GenotypeConcordance table. However, this led to non-determinism in VE because the VCs were placed in a HashSet; we use a LinkedHashMap instead to bring back determinism.
2011-09-27 11:03:17 -04:00
Guillermo del Angel
ceffefa6a6
Intermediate version with banded pair HMM
2011-09-27 10:18:58 -04:00
Mark DePristo
e99ff3caae
Removed lots of old, and not to be used, HMM options
...
-- resulted in massive code cleanup
-- GdA will integrate his new banded algorithm here
-- Removed: DO_CONTEXT_DEPENDENT_PENALTIES, GET_GAP_PENALTIES_FROM_DATA, INDEL_RECAL_FILE, dovit, GSA_PRODUCTION_ONLY
2011-09-27 10:08:40 -04:00
Mark DePristo
fa0efbc4ca
Refactoring of PairHMM to support reduced reads
2011-09-26 13:28:56 -04:00
Mark DePristo
a6b65d6347
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-26 13:26:21 -04:00
Mark DePristo
4f09453470
Refactored reduced read utilities
...
-- UnitTests for key functions on reduced reads
-- PileupElement calls static functions in ReadUtils
-- Simple routine that takes a reduced read and fills in its quals with its reduced qual
2011-09-26 12:58:31 -04:00
Eric Banks
234b74dd05
Merged bug fix from Stable into Unstable
2011-09-26 11:47:23 -05:00
Eric Banks
317b95fa57
Fixing some annotator docs
2011-09-26 11:46:45 -05:00
Mauricio Carneiro
b76dbc72f0
Fixed interval navigation bug.
...
If a read was hard clipped away from the current interval, all subsequent reads within that interval (not hardclipped) would be filtered out. Fixed.
2011-09-26 08:13:44 -04:00
Guillermo del Angel
9afccd11b1
Minor refactoring: add ability to MathUtils.normalizeFromLog10 to not go to linear domain but just substract max value from log values and return. Use this function in snp and indel GL computation.
2011-09-25 21:18:56 -04:00
Guillermo del Angel
3eef800889
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-24 21:20:11 -04:00
Guillermo del Angel
4707ab4a7d
Added unit tests to test genotype merges with PL's
2011-09-24 21:17:15 -04:00
Guillermo del Angel
203517fbb7
a) Cleanups/bug fixes to previous commit to CombineVariants.
...
b) Change md5 to reflect records that are now merged correctly.
c) Change unit merge alleles test to reflect the fact that a null non-variant vc object is not valid and not supported because there's no way to codify such object in a vcf. The code correctly converts this to a non-variant single-base event with whatever the reference is at that location.
2011-09-24 19:08:00 -04:00
Mauricio Carneiro
c31f4cb2f6
Cleaning leading insertions
...
With the current implementation, a read cannot start with a deletion or an insertion. Maybe this will change in the future, but for now, chop the leading insertion off.
2011-09-24 14:33:32 -04:00
Guillermo del Angel
cd058dd10f
a) Fixed md5 for legit change in UG output that now also no-calls genotypes w/0,0,0 in PL's in SNP case.
...
b) First reimplementation of new vc merger of different types. Previous version did it in two steps, first merging all vc's per type and then trying to see if resulting vc's would be merged if alleles of one type were a subset of another, but this won't work when uniquifying genotypes since sample names would be messed up and GT sample names wouldn't match VC sample names. Now, it's actually simpler: when splitting vc's by type before merging, we check for alleles of one vc being a subset of alleles of vc of another type and if so we put them together in same list.
2011-09-24 13:40:11 -04:00
Mark DePristo
bb11951255
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-24 09:26:45 -04:00
Mark DePristo
8d9e136bba
Merge branch 'stable'
2011-09-24 09:26:28 -04:00
Mark DePristo
6804ab6d2f
Bug fix for NPE in very short GATK runs
...
-- Was already in unstable, but not stable...
2011-09-24 09:25:29 -04:00
Mark DePristo
92acff46e5
Moved Haplotype into Utils root
2011-09-24 09:14:05 -04:00
Mark DePristo
f792353dcd
Framework for genotype unit test
2011-09-24 08:56:45 -04:00
Mark DePristo
c0bb0cb465
Make DiploidGenotype enum private to walkers.genotyper
2011-09-24 08:48:33 -04:00
Guillermo del Angel
3a4469a236
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-23 21:58:34 -04:00
Guillermo del Angel
0e74cc3c74
a) Treat SNP genotype likelihoods just as indels, in the sense that they're always normalized as PL's so one of them will always be zero. This creates minor numerical differences in Qual and annotations due to numerical approximations in AF computation.
...
b) Intermediate CombineVariants fixes, not ready yet
2011-09-23 21:58:20 -04:00
Khalid Shakir
1803bd6ae2
Merged bug fix from Stable into Unstable
2011-09-23 21:05:00 -04:00
Khalid Shakir
8ceb93b8ac
Fixed an integration test which crashed on the out of date LSF DRMAA library when run against the obsolete LSF dotkit instead of .combined_LSF_SGE
2011-09-23 21:03:22 -04:00
Mauricio Carneiro
7cac75ae1d
Merged bug fix from Stable into Unstable
2011-09-23 19:00:43 -04:00
Mauricio Carneiro
fbe3c1e0b3
Adding warning on HardClipping
...
Hard Clipping is still under heavy development and should not be used by anyone less prepared than MacGyver.
2011-09-23 19:00:19 -04:00
Mark DePristo
b66841f179
Static cache for binomial probability
...
-- Very low level performance optimization
2011-09-23 17:29:34 -04:00
Mauricio Carneiro
1a45c331b2
bringing the latest bug fixes to Reduce Reads
2011-09-23 16:40:06 -04:00
Mauricio Carneiro
9ea40f2e41
Deletions/Insertions in hard clip and bug fixes
...
* Deletions now count as hard clipped bases in order to recover the original alignment start of a clipped read.
* Insertions do not count as hard clipped bases for the same reason.
* This created a bug in the previous cigar cleaning function. Fixed.
2011-09-23 16:37:08 -04:00
David Roazen
40202c85e0
Merged bug fix from Stable into Unstable
2011-09-23 16:35:55 -04:00
David Roazen
e1cb5f6459
SnpEff annotator now assigns a functional class to each effect and distinguishes between actual effects and mere modifiers.
...
-We now assign a functional class (nonsense, missense, silent, or none) to each SnpEff effect, and add a
SNPEFF_FUNCTIONAL_CLASS annotation to the INFO field of the output VCF.
-Effects are now prioritized according to both biological impact and functional class, instead of impact only.
-Many of SnpEff's "low-impact" effects are now classified as "modifiers" with lower priority than every
other effect. This includes such "effects" as DOWNSTREAM, UPSTREAM, INTRON, GENE, EXON, and others that
really describe the location of the variant rather than its biological effect.
This code will be short-lived (likely 1.2-only), as the next version of SnpEff will include most of these
features directly.
Checking this change into Stable+Unstable instead of Unstable because the current functional class stratification
in VariantEval is basically broken and urgently needs to be fixed for production purposes.
2011-09-23 16:06:52 -04:00
Matt Hanna
e388c357ca
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-23 14:53:28 -04:00
Matt Hanna
cc23b0b8a9
Fix for recent change modelling unmapped shards: don't invoke optimization to combine mapped and unmapped shards.
2011-09-23 14:52:31 -04:00
Mark DePristo
e3d4efb283
Remove N2 EXACT model code, which should never be used
2011-09-23 11:55:21 -04:00
Mark DePristo
27ce3c822e
Merge branch 'stable'
2011-09-23 09:04:52 -04:00
Mark DePristo
2bb77a7978
Docs for all VariantAnnotator annotations
2011-09-23 09:04:16 -04:00
Mark DePristo
dd65ba5bae
@Hidden for DocumentationTest and GATKDocsExample
2011-09-23 09:03:37 -04:00
Mark DePristo
dfce301beb
Looks for @Hidden annotation on all classes and excludes them from the docs
2011-09-23 09:03:04 -04:00
Mark DePristo
106a26c42d
Minor file cleanup
2011-09-23 08:25:20 -04:00
Mark DePristo
a9f073fa68
Genotype merging unit tests for simpleMerge
...
-- Remaining TODOs are all for GdA
2011-09-23 08:24:49 -04:00
Mark DePristo
4397ce8653
Moved removePLs to VariantContextUtils
2011-09-23 08:24:20 -04:00
Eric Banks
a8e0fb26ea
Updating md5 because the file changed
2011-09-23 07:33:20 -04:00
Mark DePristo
c49cc623de
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-22 17:26:21 -04:00
Mark DePristo
dab7232e9a
simpleMerge UnitTest for not annotating and annotating to different info key
2011-09-22 17:26:11 -04:00
Mark DePristo
30ab3af0c8
A few more simpleMerge UnitTest tests for filtered vcs
2011-09-22 17:14:59 -04:00
Mark DePristo
5cf82f9236
simpleMerge UnitTest tests filtered VC merging
2011-09-22 17:05:12 -04:00
Mark DePristo
46ca33dc04
TestDataProvider now can be named
2011-09-22 17:04:32 -04:00
Mauricio Carneiro
96c875399c
Merging many bug fixes to reduce reads
2011-09-22 17:04:11 -04:00
Mauricio Carneiro
39b54211d0
Fixed hard clipping soft clipped bases after hard clips
...
if soft clipped bases were after a hard clipped section of the read, the hard clip was clipping the left soft clip tail as if it were a right tail. Mayhem.
2011-09-22 15:46:55 -04:00
Mark DePristo
68da555932
UnitTest for simpleMerge for alleles
2011-09-22 15:16:37 -04:00
Mauricio Carneiro
1acf7945c5
Fixed hard clipped cigar and alignment start
...
* Hard clipped Cigar now includes all insertions that were hard clipped and not the deletions.
* The alignment start is now recalculated according to the new hard clipped cigar representation
2011-09-22 14:51:14 -04:00
Eric Banks
80d7300de4
Unit test was passing in FORMAT as one of the sample names. There used to be a hack in the VCFHeader to check for this and remove it and I couldn't figure out why, but now I know. Hack was removed and now the unit test passes in only the sample names as per the contract.
2011-09-22 13:28:42 -04:00
Mauricio Carneiro
4e9020c9f7
Fixed alignment start for hard clipping insertions
2011-09-22 13:28:25 -04:00
Eric Banks
9c1728416c
Revert "Updating md5 for fixed file" because this was fixed properly in unstable (but will break SnpEff if put into Stable).
...
This reverts commit 6b4182c6ab3e214da4c73bc6f3687ac6d1c0b72c.
2011-09-22 13:16:42 -04:00
Eric Banks
888d8697b1
Merged bug fix from Stable into Unstable
2011-09-22 13:16:31 -04:00
Eric Banks
15a410b24b
Updating md5 for fixed file
2011-09-22 13:15:41 -04:00
Mark DePristo
ba5f83fee2
start of VariantContextUtils UnitTest
...
-- tests rsID merging
2011-09-22 12:10:39 -04:00
Mark DePristo
93dd1faa5f
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-22 11:20:10 -04:00
Mark DePristo
a05c959e5a
Empty unit tests for VariantContextUtils
...
-- will be expanded over the day
2011-09-22 11:20:07 -04:00
Mark DePristo
3fdee2b9ed
Merge from stable into unstable
2011-09-22 11:19:43 -04:00
Christopher Hartl
4f4a0fc38a
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git
2011-09-22 11:01:58 -04:00
Christopher Hartl
982c47bfa7
Remove duplicate effort in ReadUtils (with apologies to Mauricio)
...
Big (but not major) cleanup of code in ILG - mostly excising the old likelihood model
Activated the early-abort check for ILG. I think it should be better this way.
2011-09-22 10:58:26 -04:00
Mark DePristo
c514df6d18
Merge of stable into unstable
2011-09-22 10:34:27 -04:00
Mark DePristo
f81a41b889
Updating MD5s for CombineVariants
...
-- Old version had broken RSIDs, new version is fixed. No longer see rs1234,. as it is now just rs1234
2011-09-22 10:30:25 -04:00
Eric Banks
b8ea9ceb68
Adding integration test that uses the -V:dbsnp binding to make sure it won't fail later on if someone messes with Tribble.
2011-09-21 22:43:31 -04:00
Eric Banks
8f8b59a932
My interpretation of the VCF spec is that the FORMAT field should only be present if there is genotype/sample data. So the VCFCodec now throws an exception when it encounters such a case. I had to fix one of the integration test VCFs.
2011-09-21 22:23:28 -04:00
Christopher Hartl
dc96f6da79
Merge branch 'master' of ssh://chartl@gsa2/humgen/gsa-scr1/chartl/dev/git
2011-09-21 18:18:41 -04:00
Christopher Hartl
f9cdc119af
Added a method to ReadUtils that converts reads of the form 10S20M10S to 40M (just unclips the soft-clips).
...
Be careful when using this - if you're writing a bam file it will be potentially written out of order (since the previous alignment start was at the M, not the S).
2011-09-21 18:16:42 -04:00
Christopher Hartl
faff6e4019
Failed to commit changes to the GATKReport required for more easy access when using the files as data sources (read: histograms) for walkers
2011-09-21 18:15:23 -04:00
Mauricio Carneiro
96768c8a18
Sending latest bug fixes to Reduce Reads to the main repository
2011-09-21 17:43:11 -04:00
Mauricio Carneiro
70335b2b0a
Hard clipping soft clipped reads to fix misalignments.
...
Pre-softclipped reads (with high qual) are a complicated event to deal with in the Reduced Reads environment. I chose to hard clip them out for now and added a todo item to bring them back on in the future, perhaps as a variant region.
2011-09-21 17:12:01 -04:00
Christopher Hartl
ef05827c7b
Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-21 16:40:47 -04:00
Christopher Hartl
3b51d9106a
Adding in likelihood calculations for mendelian violations. Also fixing a minor and rare bug in SelectVariants when specifying family structure on the command line.
2011-09-21 16:40:29 -04:00
Mark DePristo
04968c88b3
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-21 15:43:25 -04:00
Mark DePristo
6bcfce225f
Fix for dynamic type determination for bgzip files
...
-- GZipInputStream handles bgzip files under linux, but not mac
-- Added BlockCompressedInputStream test as well, which works properly on bgzip files
2011-09-21 15:39:19 -04:00
Mark DePristo
9f6f0c443c
Marginally cleaner isVCFStream() function
...
-- cleanup trying to debug minor bug. Failed to fix the bug, but the code is nicer now
2011-09-21 15:25:01 -04:00
Ryan Poplin
5fef6dc5d0
Merged bug fix from Stable into Unstable
2011-09-21 15:23:06 -04:00
Ryan Poplin
2585fc3d6c
Updating Rscript path doc text for Broad users
2011-09-21 15:22:26 -04:00
Mark DePristo
74f9ccf6dd
Merge
2011-09-21 11:30:11 -04:00
Mark DePristo
6592972f82
Putative fix for BAQ array out of bounds
...
-- Old code required qual to be <64, which isn't strictly necessary. Now uses the Picard SAMUtils.MAX_PHRED_SCORE constant
-- Unittest to enforce this behavior
2011-09-21 11:25:08 -04:00
Eric Banks
174859fc68
Don't allow whitespace in the INFO field
2011-09-21 11:14:54 -04:00
Mark DePristo
ecc7f34774
Putative fix for BAQ problem.
2011-09-21 11:09:54 -04:00
Mark DePristo
7d11f93b82
Final bugfix for CombineVariants
...
-- Now handles multiple records at a site, so that you don't see records like set=dbsnp-dbsnp-dbsnp when combining something with dbsnp
-- Proper handling of ids. If you are merging files with multiple ids for the same record, the ids are merged into a comma separated list
2011-09-21 10:58:32 -04:00
Mark DePristo
a91ac0c5db
Intermediate commit of bugfixes to CombineVariants
2011-09-21 10:15:05 -04:00
David Roazen
b04d8eab55
Merged bug fix from Stable into Unstable
2011-09-20 17:24:14 -04:00
Mauricio Carneiro
758ecf2d43
Bringing latest updates of ReduceReads to the master repository
2011-09-20 16:35:09 -04:00
David Roazen
d9ea764611
SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file.
...
This change is urgently required for production, which is why it's going into Stable+Unstable
instead of just Unstable.
The keys for the SnpEff version and command header lines in the VCF file output by
VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally
different from the keys for those same lines in the SnpEff output file (SnpEffVersion
and SnpEffCmd), so that output files from VariantAnnotator won't be confused
with output files from SnpEff itself.
2011-09-20 16:30:55 -04:00
Mark DePristo
bffd3cca6f
Bug fix for reduced read; only adds regular bases for calculation
...
-- No longer passes on deletions for genotyping
2011-09-20 15:07:06 -04:00
Mark DePristo
a1b4cafe7a
Bug fix for NPE when timer wasn't initialized
2011-09-20 13:59:59 -04:00
Mark DePristo
b7511c5ff3
Fixed long-standing bug in tribble index creation
...
-- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index. This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write
-- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary. This can be used conveniently everywhere, and is what's written into the Tribble index
-- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils
-- VCFWriter now requires the master sequence dictionary
-- Updated walkers that create VCFWriters to provide the master sequence dictionary
2011-09-20 10:53:18 -04:00
Mark DePristo
230e16d7c0
Merge branch 'master' into rodrewrite
2011-09-20 06:54:18 -04:00
Mark DePristo
aa8afa3899
Merge
2011-09-19 21:16:47 -04:00
Mauricio Carneiro
56106d54ed
Changing ReadUtils behavior to comply with GenomeLocParser
...
Now the functions getRefCoordSoftUnclippedStart and getRefCoordSoftUnclippedEnd will return getUnclippedStart if the read is all contained within an insertion. Updated the contracts accordingly. This should give the same behavior as the GenomeLocParser now.
2011-09-19 14:00:00 -04:00
Mauricio Carneiro
080c957547
Fixing contracts for SoftUnclippedEnd utils
...
Now accepts reads that are entirely contained inside an insertion.
2011-09-19 13:53:53 -04:00
Mauricio Carneiro
5e832254a4
Fixing ReadAndInterval overlap comments.
2011-09-19 13:28:41 -04:00
Christopher Hartl
ecb8466662
Merged bug fix from Stable into Unstable
2011-09-19 12:32:08 -04:00
Christopher Hartl
8143def292
Fix the -T argument in the DepthOfCoverage docs
...
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 12:31:47 -04:00
Christopher Hartl
034b868588
Revert "Fix the -T argument in the DepthOfCoverage docs"
...
This reverts commit 0994efda998cf3a41b1a43696dbc852a441d5316.
2011-09-19 12:16:07 -04:00
Mark DePristo
cfde0e674b
Merge branch 'sgintervals'
2011-09-19 12:02:41 -04:00
Mark DePristo
3e93f246f7
Support for sample sets in AssignSomaticStatus
...
-- Also cleaned up SampleUtils.getSamplesFromCommandLine() to return a set, not a list, and trim the sample names.
2011-09-19 11:40:45 -04:00
Mark DePristo
41ffb25b74
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-19 10:55:18 -04:00
Christopher Hartl
ca1b30e4a4
Fix the -T argument in the DepthOfCoverage docs
...
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 10:29:06 -04:00
Mark DePristo
4ad330008d
Final intervals cleanup
...
-- No functional changes (my algorithm wouldn't work)
-- Major structural cleanup (returning more basic data structures that allow us to development new algorithm)
-- Unit tests for the efficiency of interval partitioning
2011-09-19 10:19:10 -04:00
Mark DePristo
6ea57bf036
Merge branch 'master' into sgintervals
2011-09-19 09:50:19 -04:00
Mark DePristo
6bd42c053d
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-18 20:18:39 -04:00
Roger Zurawicki
091c7197cd
Fixed memory leak and bug with deletions in clipping
...
The ClippingOp clip cigar function would run into a endless loop if the parameter were out of the reads range, I stopped the bug.
* There is no check to make sure the read coordinate are covered by the read though
When Hard clipping to interval, I added a check for deletions.
NOTE: method works for NA12878 WEx but needs to be more thoroughly tested/optimized
2011-09-18 19:21:51 -04:00
Guillermo del Angel
7fa1e237d9
Forgot to git stash pop new MD5's for CombineVariants integration test
2011-09-16 12:53:54 -04:00
Guillermo del Angel
e7b9a009b7
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-16 12:48:30 -04:00
Menachem Fromer
b2e8e11128
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-16 00:52:27 -04:00
Christopher Hartl
57b3efa2e2
Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-15 21:06:38 -04:00
Christopher Hartl
939babc820
Updating formating for ValidationAmplicons GATK docs
2011-09-15 21:05:51 -04:00
Christopher Hartl
9fdf1f8eb6
Fix some doc formatting for Depth of Coverage
2011-09-15 21:05:22 -04:00
Menachem Fromer
e6e9b08c9a
Must provide alleles VCF to UGCallVariants
2011-09-15 18:51:09 -04:00
David Roazen
d78e00e5b2
Renaming VariantAnnotator SnpEff keys
...
This is to head off potential confusion with the output from the SnpEff tool itself,
which also uses a key named EFF.
2011-09-15 17:42:15 -04:00
Eric Banks
1971fb35d7
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-15 16:55:33 -04:00
Eric Banks
9dc6354130
Oops didn't mean to touch this test before
2011-09-15 16:55:24 -04:00
Ryan Poplin
2a8b8efd2f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-15 16:26:35 -04:00
Ryan Poplin
2f58fdb369
Adding expected output doc to CountCovariates
2011-09-15 16:26:11 -04:00
Eric Banks
fd1831b4a5
Updating docs to include more details
2011-09-15 16:25:03 -04:00
Eric Banks
6d02a34bfb
Updating docs to include output
2011-09-15 16:17:54 -04:00
Eric Banks
4ef6a4598c
Updating docs to include output
2011-09-15 16:10:34 -04:00
Eric Banks
fe474b77f8
Updating docs so printing looks nicer
2011-09-15 16:05:39 -04:00
Eric Banks
f04e51c6c2
Adding docs from Andrey since his repo was all screwed up.
2011-09-15 15:38:56 -04:00
Guillermo del Angel
86480b2e13
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-15 15:31:07 -04:00
Eric Banks
d369d10593
Adding documentation before the release for GATK wiki page
2011-09-15 13:56:23 -04:00
Eric Banks
202405b1a1
Updating the FunctionalClass stratification in VariantEval to handle the snpEff annotations; this change really needs to be in before the release so that the pipeline can output semi-meaningful plots. This commit maintains backwards compatibility with the crappy Genomic Annotator output. However, I did clean up the code a bit so that we now use an Enum instead of hard-coded values (so it's now much easier to change things if we choose to do so in the future). I do not see this as the final commit on this topic - I think we need to make some changes to the snpEff annotator to preferentially choose certain annotations within effect classes; Mark, let's chat about this for a bit when you get back next week. Also, for the record, I should be blamed for David's temporary commit the other day because I gave him the green light (since when do you care about backwards compatibility anyways?). In any case, at least now we have something that works for both the old and new annotations.
2011-09-15 13:52:31 -04:00
David Roazen
1e682deb26
Minor html-formatting-related documentation fix to the SnpEff class.
2011-09-15 13:07:50 -04:00
Guillermo del Angel
a942fa38ef
Refine the way we merge records in CombineVariants of different types. As of before, two records of different types were not combined and were kept separate. This is still the case, except when the alleles of one record are a strict subset of alleles of another record. For example, a SNP with alleles {A*,T} and a mixed record with alleles {A*,T, AAT} are now combined when start position matches.
2011-09-15 10:22:28 -04:00
David Roazen
3db457ed01
Revert "Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames"
...
After discussing this with Mark, it seems clear that the old version of the
VariantEval FunctionalClass stratification is preferable to this version.
By reverting, we maintain backwards compatibility with legacy output files
from the old GenomicAnnotator, and can add SnpEff support later without
breaking that backwards compatibility.
This reverts commit b44acd1abd9ab6eec37111a19fa797f9e2ca3326.
2011-09-14 10:47:28 -04:00
David Roazen
e0c8c0ddcb
Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames
...
This is a temporary and hopefully short-lived solution. I've modified
the FunctionalClass stratification to stratify by effect impact as
defined by SnpEff annotations (high, moderate, and low impact) rather
than by the silent/missense/nonsense categories.
If we want to bring back the silent/missense/nonsense stratification,
we should probably take the approach of asking the SnpEff author
to add it as a feature to SnpEff rather than coding it ourselves,
since the whole point of moving to SnpEff was to outsource genomic
annotation.
2011-09-14 07:09:47 -04:00
David Roazen
1213b2f8c6
SnpEff 2.0.2 support
...
-Rewrote SnpEff support in VariantAnnotator to support the latest SnpEff release (version 2.0.2)
-Removed support for SnpEff 1.9.6 (and associated tribble codec)
-Will refuse to parse SnpEff output files produced by unsupported versions (or without a version tag)
-Correctly matches ref/alt alleles before annotating a record, unlike the previous version
-Correctly handles indels (again, unlike the previous version
2011-09-14 07:09:47 -04:00
Guillermo del Angel
5b1bf6e244
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-13 17:04:43 -04:00
Guillermo del Angel
c6672f2397
Intermediate (but necessary) fix for Beagle walkers: if a marker is absent in the Beagle output files, but present in the input vcf, there's no reason why it should be omitted in the output vcf. Rather, the vc is written as is from the input vcf
2011-09-13 16:57:37 -04:00
Mark DePristo
edf29d0616
Explicit info message about uploading S3 log
2011-09-12 22:16:52 -04:00
Mark DePristo
2316b6aad3
Trying to fix problems with S3 uploading behind firewalls
...
-- Cannot reproduce the very long waits reported by some users.
-- Fixed problem that exception might result in an undeleted file, which is now fixed with deleteOnExit()
2011-09-12 22:02:42 -04:00
Matt Hanna
64707c33bb
Merged bug fix from Stable into Unstable
2011-09-12 21:54:11 -04:00
Matt Hanna
e63d9d8f8e
Mauricio pointed out to me that dynamic merging the unmapped regions of multiple BAMs ('-L unmapped' with a BAM list)
...
was completely broken. Sorry about this! Fixed.
2011-09-12 21:50:59 -04:00
Eric Banks
ec4b30de6d
Patch from Laurent: typo leads to bad error messages.
2011-09-12 14:45:53 -04:00
David Roazen
9d9d438bc4
New VariantAnnotatorEngine capability: an initialize() method for all annotation classes.
...
All VariantAnnotator annotation classes may now have an (optional) initialize() method
that gets called by the VariantAnnotatorEngine ONCE before annotation starts.
As an example of how this can be used, the SnpEff annotation class will use the initialize()
method to check whether the SnpEff version number stored in the vcf header is a supported
version, and also to verify that its required RodBinding is present.
2011-09-12 13:00:53 -04:00
Ryan Poplin
981b78ea50
Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.
2011-09-12 12:17:43 -04:00
Ryan Poplin
60ebe68aff
Fixing issue in VariantEval in which insertion and deletion events weren't treated symmetrically. Added new option to require strict allele matching.
2011-09-12 09:43:23 -04:00
Guillermo del Angel
9344938360
Uncomment code to add deleted bases covering an indel to per-sample genotype reporting, update integration tests accordingly
2011-09-10 19:41:01 -04:00
Guillermo del Angel
b399424a9c
Fix integration test affected by non-calling all-zero PL samples, and add a more complicated multi-sample integration test from a phase 1 case, GBR with mixed technologies and complex input alleles
2011-09-09 20:44:47 -04:00
Guillermo del Angel
e95d484757
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-09 18:31:14 -04:00
Guillermo del Angel
a807205fc3
a) Minor optimization to softMax() computation to avoid redundant operations, results in about 5-10% increase in speed in indel calling.
...
b) Added (but left commented out since it may affect integration tests and to isolate commits) fix to per-sample DP reporting, so that deletions are included in count.
c) Bug fix to avoid having non-reference genotypes assigned to samples with PL=0,0,0. Correct behavior should be to no-call these samples, and to ignore these samples when computing AC distribution since their likelihoods are not informative.
2011-09-09 18:00:23 -04:00
Mauricio Carneiro
9e650dfc17
Fixing SelectVariants documentation
...
getting rid of messages telling users to go for the YAML file. The idea is to not support these anymore.
2011-09-09 16:25:31 -04:00
Mark DePristo
72536e5d6d
Done
2011-09-09 15:44:47 -04:00
Mark DePristo
3c8445b934
Performance bugfix for GenomeLoc.hashcode
...
-- old version overflowed so most GenomeLocs had 0 hashcode. Now uses or not plus to combine
2011-09-09 14:25:37 -04:00
Mark DePristo
c6436ee5f0
Whitespace cleanup
2011-09-09 14:24:29 -04:00
Mark DePristo
87dc5cfb24
Whitespace cleanup
2011-09-09 14:23:42 -04:00
Ryan Poplin
1953edcd2d
updating Validate Variants deletion integration test
2011-09-09 13:39:08 -04:00
Ryan Poplin
9ada9b3ed4
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-09 13:15:36 -04:00
Ryan Poplin
354529bff3
adding Validate Variants integration test with a deletion
2011-09-09 13:15:24 -04:00
Ryan Poplin
91c949db74
Fixing ValidateVariants so that it validates deletion records. Fixing GATKdocs.
2011-09-09 12:57:14 -04:00
Mark DePristo
06cb20f2a5
Intermediate commit cleaning up scatter intervals
...
-- Adding unit tests to ensure uniformity of intervals
2011-09-09 12:56:45 -04:00
Eric Banks
51eb95d638
Missed these tests before
2011-09-09 11:46:37 -04:00
Eric Banks
6ad8943ca0
CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly.
2011-09-09 09:45:24 -04:00
Mark DePristo
507574b1c8
Merge branch 'cancer'
2011-09-08 16:10:02 -04:00
Mark DePristo
48461b34af
Added TYPE argument to print out VariantType
2011-09-08 15:01:13 -04:00
Eric Banks
eaaba6eb51
Confirming that when stratifying by sample in VE the monomorphic sites for a given sample are not counted for the relevant metrics. Adding integration test to cover it.
2011-09-08 13:17:34 -04:00
Ryan Poplin
2636d216de
Adding indel vqsr integration test
2011-09-08 10:38:13 -04:00
Ryan Poplin
9cba1019c8
Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap
2011-09-08 09:25:13 -04:00
Ryan Poplin
e0020b2b29
Fixing PrintRODs. Now has input and only prints out one copy of each record
2011-09-08 08:58:37 -04:00
Ryan Poplin
29c968ab60
clean up
2011-09-08 08:42:43 -04:00
Ryan Poplin
59841f8232
Fixing genotype given alleles for indels. Only take the records that start at this locus.
2011-09-08 08:41:16 -04:00
Mark DePristo
cd2c511c4a
GCF improvements
...
-- Support for streaming VCF writing via the VCFWriter interface
-- GCF now has a header and a footer. The header is minimal, and contains a forward pointer to the position of the footer in the file.
-- Readers now read the header, and then jump to the footer to get the rest of the "header" information
-- Version now a field in GCF
2011-09-07 23:28:46 -04:00
Mark DePristo
fe5724b6ea
Refactored indexing part of StandardVCFWriter into superclass
...
-- Now other implementations of the VCFWriter can easily share common functions, such as writing an index on the fly
2011-09-07 23:27:08 -04:00
Mark DePristo
01b6177ce1
Renaming GVCF -> GCF
2011-09-07 17:10:56 -04:00
Mark DePristo
b220ed0d75
Merge branch 'master' into rodrewrite
2011-09-07 17:05:35 -04:00
Guillermo del Angel
45d54f6258
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 16:49:49 -04:00
Guillermo del Angel
9604fb2ba3
Necessary but not sufficient step to fix GenotypeGivenAlleles mode in UG which is now busted
2011-09-07 16:49:16 -04:00
Mark DePristo
2ded027762
Removed dysfunctional tranches support from VariantEval
2011-09-07 16:09:24 -04:00
Eric Banks
aa9e32f2f1
Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark.
2011-09-07 15:48:06 -04:00
Mark DePristo
d7e355b4b6
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 14:54:16 -04:00
Mark DePristo
9127849f5d
BugFix for unit test
2011-09-07 14:54:10 -04:00
Eric Banks
3a04955a30
We already had isPolymorphic and isMonomorphic in the VariantContext, but the implementation was incorrect for many edge cases (e.g. sites-only files, sites with samples who were no-called). Fixing. Moving on to VE now.
2011-09-07 14:01:42 -04:00
Guillermo del Angel
743bf7784c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 13:21:26 -04:00
Guillermo del Angel
5f22ef9a8c
Added missing javadoc info to Beagle arguments
2011-09-07 13:21:11 -04:00
Mark DePristo
3bcbfa6e06
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 13:13:17 -04:00
Mark DePristo
430da23446
At least 2 minutes must pass before a status message is printed, further stabilizing time estimates
2011-09-07 13:13:07 -04:00
Mauricio Carneiro
6857d0324e
Merge branch 'master' into rr
2011-09-07 12:59:08 -04:00
Mark DePristo
7e9e20fed0
Forgot to delete previous call
2011-09-07 12:54:52 -04:00
Mark DePristo
d23d620494
Pushing traversal engine timer start to as close to actual start as possible
...
-- Should make initial timings more accurate
2011-09-07 12:52:33 -04:00
Mark DePristo
6ff432e1f2
BugFix for TF argument to VariantEval, actually making it work properly
2011-09-07 12:50:17 -04:00
Mauricio Carneiro
131cb7effd
Bringing Reduce Reads bug fixes to the main repository
2011-09-07 12:25:53 -04:00
Mark DePristo
a1920397e8
Major bugfix for per sample VariantEval
...
-- per sample stratification was not being calculated correctly. The alt allele was always remaining, even if the genotype of the sample was hom-ref. Although conceptually fine, this breaks the assumptions of all of the eval modules, so per sample stratifications actually included all variants for everything. Eric is going to fix the system in general, so this commit may break the build.
2011-09-07 12:18:11 -04:00
Mark DePristo
a02636a1ac
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/ebanks/Sting_rodrefactor into rodrewrite
2011-09-07 10:50:00 -04:00
Mark DePristo
d5641cfac5
Merge branch 'variantEvalST'
2011-09-07 10:44:23 -04:00
Mark DePristo
2f4cf82e3b
VariantEval cleanup. Added VariantType Stratification
...
-- ArrayList are List where possible
-- states refactored into VariantStratifier base class (reduces many lines of duplicate code)
-- Added VariantType stratification that partitions report by VariantContext.Type
2011-09-07 10:43:53 -04:00
Christopher Hartl
436f6eb52b
Reverting Eric's change and pushing in some command-line-option documentation.
2011-09-07 08:53:30 -04:00
Eric Banks
1ef8a1750a
I asked nicely and got nothing. Then I threatened and still got nothing. So I am carrying through on my threats. Guillermo, you have a short reprieve because you were away on vacation, but let's get yours done tomorrow afternoon.
2011-09-06 21:07:49 -04:00
Eric Banks
da9c8ab386
Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly.
2011-09-06 20:39:42 -04:00
Mark DePristo
3db7ecb920
ReducedRead flag cached in GATKSAMRecord. 20% performance improvement
2011-09-06 15:11:38 -04:00
Roger Zurawicki
47607a7eff
Fixed bug where deletions messed up interval clipping
...
- Instead of using readLength, the ReadUtil function are used to get a proper read coordinate
- Added debug info in interval clipping ( with -dl)
NOTE: method might not be safe for production and checks need to be added to the ClippingOp code
2011-09-06 14:25:57 -04:00
Khalid Shakir
0adb388dee
Fixed bug in SelectVariants that was annotating sample_file / exclude_sample_file as @Argument instead of @Input meaning they weren't tracked in Queue.
...
Updates for HybridSelectionPipeline:
- Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5.
- If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds.
- Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches.
- Removed GSA_PRODUCTION_ONLY flag from indel caller.
- Updated indel hard filters based on delangel's analysis.
- Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.
2011-09-06 12:41:46 -04:00
Mark DePristo
d471617c65
GATK binary VCF (gvcf) prototype format for efficiency testing
...
-- Very minimal working version that can read / write binary VCFs with genotypes
-- Already 10x faster for sites, 5x for fully parsed genotypes, and 1000x for skipping genotypes when reading
2011-09-02 21:15:19 -04:00
Mark DePristo
048202d18e
Bugfix for cached quals
2011-09-02 21:13:28 -04:00
Mark DePristo
03aa04e37c
Simple refactoring to make formating functions public
2011-09-02 21:13:08 -04:00