Matt Hanna
9272ed03b5
Merged bug fix from Stable into Unstable
2011-09-28 21:26:43 -04:00
Matt Hanna
0acaf2df65
Fix an embarrassing issue where a specific configuration of minimal coverage
...
over small intervals could cause reads to be dropped from the pileup. Nothing
to see here...
2011-09-28 21:23:01 -04:00
Guillermo del Angel
c8d3a720f9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 18:17:34 -04:00
Guillermo del Angel
7e3cb45093
Further performance optim in banded hmm, about 60% speed improvement over current implementation now
2011-09-28 16:27:28 -04:00
Ryan Poplin
1b1ca80df2
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 16:17:39 -04:00
Ryan Poplin
3b73dc89fe
Making several esoteric arguments in the BQSR @Hidden. Adding basic support for Complete Genomics machine cycle.
2011-09-28 16:17:31 -04:00
Mauricio Carneiro
ff2f4df043
Fixed hardclipping inside indel (right tail)
...
when hard clipping the right tail of a read falls inside a deletion, clipping should fall back to the last base before the deletion to follow the ReadClipper's contract.
2011-09-28 16:07:34 -04:00
Mauricio Carneiro
3c7b7f74ef
Optimized interval iteration
...
Using a TreedSet to manipulate getToolkit.getIntervals() and being smart about which intervals to test makes interval clipping O(1) instead of O(n).
2011-09-28 16:07:34 -04:00
Mauricio Carneiro
5c9b659c02
clipping both ends of the reads was modifying the original read
...
This goes against the ReadClipper contract, and was affecting the second part of the read that spans over multiple intervals. Fixed.
2011-09-28 16:07:34 -04:00
Guillermo del Angel
fe23e4d10c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 15:53:11 -04:00
Guillermo del Angel
e2b9030e93
First mostly fully functional implementation of banded pair HMM likelihood computation for indel caller. More experimentation to follow but it right now works in small data sets and at least it doesn't break existing things. Disabled by default at this point
2011-09-28 15:51:48 -04:00
Eric Banks
1b45f21774
Removing this command-line tool. Purposely not doing this in stable so that users who may still use it have time to find other options. But the docs are no longer on the wiki.
2011-09-28 13:18:32 -04:00
Eric Banks
1f0e354fae
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 13:13:21 -04:00
Eric Banks
bb619a9a3c
Fixing docs
2011-09-28 13:13:03 -04:00
Mark DePristo
5812004e06
Merge branch 'stable'
2011-09-28 11:36:40 -04:00
Mark DePristo
a5006831d7
Shows "" not empty space when default string value is ""
2011-09-28 11:35:52 -04:00
Mark DePristo
1e32281a15
Fix to not show -null when missing short name argument
2011-09-28 11:31:20 -04:00
Mauricio Carneiro
89544c209c
Fixing contracts
...
changed return type to Pair, changing contracts accordingly.
2011-09-28 11:19:17 -04:00
Eric Banks
eacbee3fe5
Merged bug fix from Stable into Unstable
2011-09-27 20:35:18 -04:00
Eric Banks
43b0c98298
Fix docs
2011-09-27 20:34:46 -04:00
Eric Banks
232a6df11c
Add longhand form to the error message.
2011-09-27 20:29:31 -04:00
Eric Banks
1d6fcb6eb1
Revert "Add longhand form to the error message to prevent users from posting borderline dumb posts to GS."
...
This reverts commit 75b2600527cfce05ae683cb394290ff2a80e8552.
2011-09-27 20:27:00 -04:00
Eric Banks
269b9826b6
Add longhand form to the error message to prevent users from posting borderline dumb posts to GS.
2011-09-27 20:26:36 -04:00
Mauricio Carneiro
3b6e43b7c4
Use reads that span multiple intervals
...
* RR will now compress reads that span across multiple intervals correctly and output them in the correct order.
* Fixed bug in getReadCoordinateForReferenceCoordinate where if the requested reference coordinate fell inside a deletion in the read the read would be clipped up to one element past the deletion.
2011-09-27 18:39:06 -04:00
Khalid Shakir
84bd355690
Merged bug fix from Stable into Unstable
2011-09-27 14:34:39 -04:00
Khalid Shakir
b090751f62
Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths.
...
Updates to HybridSelectionPipeline:
- Added annotations back via snpEff
- Minor updates to VQSR paths and lowered memory
2011-09-27 14:33:57 -04:00
Eric Banks
26e71f6688
The Omni files have multiple records (with the same ALT) at a particular location, with one PASSing and the other(s) filtered. Chris, this is why using this file as both eval and comp leads to ref/no-call cells in the GenotypeConcordance table. However, this led to non-determinism in VE because the VCs were placed in a HashSet; we use a LinkedHashMap instead to bring back determinism.
2011-09-27 11:03:17 -04:00
Guillermo del Angel
ceffefa6a6
Intermediate version with banded pair HMM
2011-09-27 10:18:58 -04:00
Mark DePristo
e99ff3caae
Removed lots of old, and not to be used, HMM options
...
-- resulted in massive code cleanup
-- GdA will integrate his new banded algorithm here
-- Removed: DO_CONTEXT_DEPENDENT_PENALTIES, GET_GAP_PENALTIES_FROM_DATA, INDEL_RECAL_FILE, dovit, GSA_PRODUCTION_ONLY
2011-09-27 10:08:40 -04:00
Mark DePristo
fa0efbc4ca
Refactoring of PairHMM to support reduced reads
2011-09-26 13:28:56 -04:00
Mark DePristo
a6b65d6347
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-26 13:26:21 -04:00
Mark DePristo
4f09453470
Refactored reduced read utilities
...
-- UnitTests for key functions on reduced reads
-- PileupElement calls static functions in ReadUtils
-- Simple routine that takes a reduced read and fills in its quals with its reduced qual
2011-09-26 12:58:31 -04:00
Eric Banks
234b74dd05
Merged bug fix from Stable into Unstable
2011-09-26 11:47:23 -05:00
Eric Banks
317b95fa57
Fixing some annotator docs
2011-09-26 11:46:45 -05:00
Mauricio Carneiro
b76dbc72f0
Fixed interval navigation bug.
...
If a read was hard clipped away from the current interval, all subsequent reads within that interval (not hardclipped) would be filtered out. Fixed.
2011-09-26 08:13:44 -04:00
Guillermo del Angel
9afccd11b1
Minor refactoring: add ability to MathUtils.normalizeFromLog10 to not go to linear domain but just substract max value from log values and return. Use this function in snp and indel GL computation.
2011-09-25 21:18:56 -04:00
Guillermo del Angel
3eef800889
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-24 21:20:11 -04:00
Guillermo del Angel
4707ab4a7d
Added unit tests to test genotype merges with PL's
2011-09-24 21:17:15 -04:00
Guillermo del Angel
203517fbb7
a) Cleanups/bug fixes to previous commit to CombineVariants.
...
b) Change md5 to reflect records that are now merged correctly.
c) Change unit merge alleles test to reflect the fact that a null non-variant vc object is not valid and not supported because there's no way to codify such object in a vcf. The code correctly converts this to a non-variant single-base event with whatever the reference is at that location.
2011-09-24 19:08:00 -04:00
Mauricio Carneiro
c31f4cb2f6
Cleaning leading insertions
...
With the current implementation, a read cannot start with a deletion or an insertion. Maybe this will change in the future, but for now, chop the leading insertion off.
2011-09-24 14:33:32 -04:00
Guillermo del Angel
cd058dd10f
a) Fixed md5 for legit change in UG output that now also no-calls genotypes w/0,0,0 in PL's in SNP case.
...
b) First reimplementation of new vc merger of different types. Previous version did it in two steps, first merging all vc's per type and then trying to see if resulting vc's would be merged if alleles of one type were a subset of another, but this won't work when uniquifying genotypes since sample names would be messed up and GT sample names wouldn't match VC sample names. Now, it's actually simpler: when splitting vc's by type before merging, we check for alleles of one vc being a subset of alleles of vc of another type and if so we put them together in same list.
2011-09-24 13:40:11 -04:00
Mark DePristo
bb11951255
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-24 09:26:45 -04:00
Mark DePristo
8d9e136bba
Merge branch 'stable'
2011-09-24 09:26:28 -04:00
Mark DePristo
6804ab6d2f
Bug fix for NPE in very short GATK runs
...
-- Was already in unstable, but not stable...
2011-09-24 09:25:29 -04:00
Mark DePristo
92acff46e5
Moved Haplotype into Utils root
2011-09-24 09:14:05 -04:00
Mark DePristo
f792353dcd
Framework for genotype unit test
2011-09-24 08:56:45 -04:00
Mark DePristo
c0bb0cb465
Make DiploidGenotype enum private to walkers.genotyper
2011-09-24 08:48:33 -04:00
Guillermo del Angel
3a4469a236
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-23 21:58:34 -04:00
Guillermo del Angel
0e74cc3c74
a) Treat SNP genotype likelihoods just as indels, in the sense that they're always normalized as PL's so one of them will always be zero. This creates minor numerical differences in Qual and annotations due to numerical approximations in AF computation.
...
b) Intermediate CombineVariants fixes, not ready yet
2011-09-23 21:58:20 -04:00
Khalid Shakir
1803bd6ae2
Merged bug fix from Stable into Unstable
2011-09-23 21:05:00 -04:00
Khalid Shakir
8ceb93b8ac
Fixed an integration test which crashed on the out of date LSF DRMAA library when run against the obsolete LSF dotkit instead of .combined_LSF_SGE
2011-09-23 21:03:22 -04:00
Mauricio Carneiro
7cac75ae1d
Merged bug fix from Stable into Unstable
2011-09-23 19:00:43 -04:00
Mauricio Carneiro
fbe3c1e0b3
Adding warning on HardClipping
...
Hard Clipping is still under heavy development and should not be used by anyone less prepared than MacGyver.
2011-09-23 19:00:19 -04:00
Mark DePristo
b66841f179
Static cache for binomial probability
...
-- Very low level performance optimization
2011-09-23 17:29:34 -04:00
Mauricio Carneiro
1a45c331b2
bringing the latest bug fixes to Reduce Reads
2011-09-23 16:40:06 -04:00
Mauricio Carneiro
9ea40f2e41
Deletions/Insertions in hard clip and bug fixes
...
* Deletions now count as hard clipped bases in order to recover the original alignment start of a clipped read.
* Insertions do not count as hard clipped bases for the same reason.
* This created a bug in the previous cigar cleaning function. Fixed.
2011-09-23 16:37:08 -04:00
David Roazen
40202c85e0
Merged bug fix from Stable into Unstable
2011-09-23 16:35:55 -04:00
David Roazen
e1cb5f6459
SnpEff annotator now assigns a functional class to each effect and distinguishes between actual effects and mere modifiers.
...
-We now assign a functional class (nonsense, missense, silent, or none) to each SnpEff effect, and add a
SNPEFF_FUNCTIONAL_CLASS annotation to the INFO field of the output VCF.
-Effects are now prioritized according to both biological impact and functional class, instead of impact only.
-Many of SnpEff's "low-impact" effects are now classified as "modifiers" with lower priority than every
other effect. This includes such "effects" as DOWNSTREAM, UPSTREAM, INTRON, GENE, EXON, and others that
really describe the location of the variant rather than its biological effect.
This code will be short-lived (likely 1.2-only), as the next version of SnpEff will include most of these
features directly.
Checking this change into Stable+Unstable instead of Unstable because the current functional class stratification
in VariantEval is basically broken and urgently needs to be fixed for production purposes.
2011-09-23 16:06:52 -04:00
Matt Hanna
e388c357ca
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-23 14:53:28 -04:00
Matt Hanna
cc23b0b8a9
Fix for recent change modelling unmapped shards: don't invoke optimization to combine mapped and unmapped shards.
2011-09-23 14:52:31 -04:00
Mark DePristo
e3d4efb283
Remove N2 EXACT model code, which should never be used
2011-09-23 11:55:21 -04:00
Mark DePristo
27ce3c822e
Merge branch 'stable'
2011-09-23 09:04:52 -04:00
Mark DePristo
2bb77a7978
Docs for all VariantAnnotator annotations
2011-09-23 09:04:16 -04:00
Mark DePristo
dd65ba5bae
@Hidden for DocumentationTest and GATKDocsExample
2011-09-23 09:03:37 -04:00
Mark DePristo
dfce301beb
Looks for @Hidden annotation on all classes and excludes them from the docs
2011-09-23 09:03:04 -04:00
Mark DePristo
106a26c42d
Minor file cleanup
2011-09-23 08:25:20 -04:00
Mark DePristo
a9f073fa68
Genotype merging unit tests for simpleMerge
...
-- Remaining TODOs are all for GdA
2011-09-23 08:24:49 -04:00
Mark DePristo
4397ce8653
Moved removePLs to VariantContextUtils
2011-09-23 08:24:20 -04:00
Eric Banks
a8e0fb26ea
Updating md5 because the file changed
2011-09-23 07:33:20 -04:00
Mark DePristo
c49cc623de
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-22 17:26:21 -04:00
Mark DePristo
dab7232e9a
simpleMerge UnitTest for not annotating and annotating to different info key
2011-09-22 17:26:11 -04:00
Mark DePristo
30ab3af0c8
A few more simpleMerge UnitTest tests for filtered vcs
2011-09-22 17:14:59 -04:00
Mark DePristo
5cf82f9236
simpleMerge UnitTest tests filtered VC merging
2011-09-22 17:05:12 -04:00
Mark DePristo
46ca33dc04
TestDataProvider now can be named
2011-09-22 17:04:32 -04:00
Mauricio Carneiro
96c875399c
Merging many bug fixes to reduce reads
2011-09-22 17:04:11 -04:00
Mauricio Carneiro
39b54211d0
Fixed hard clipping soft clipped bases after hard clips
...
if soft clipped bases were after a hard clipped section of the read, the hard clip was clipping the left soft clip tail as if it were a right tail. Mayhem.
2011-09-22 15:46:55 -04:00
Mark DePristo
68da555932
UnitTest for simpleMerge for alleles
2011-09-22 15:16:37 -04:00
Mauricio Carneiro
1acf7945c5
Fixed hard clipped cigar and alignment start
...
* Hard clipped Cigar now includes all insertions that were hard clipped and not the deletions.
* The alignment start is now recalculated according to the new hard clipped cigar representation
2011-09-22 14:51:14 -04:00
Eric Banks
80d7300de4
Unit test was passing in FORMAT as one of the sample names. There used to be a hack in the VCFHeader to check for this and remove it and I couldn't figure out why, but now I know. Hack was removed and now the unit test passes in only the sample names as per the contract.
2011-09-22 13:28:42 -04:00
Mauricio Carneiro
4e9020c9f7
Fixed alignment start for hard clipping insertions
2011-09-22 13:28:25 -04:00
Eric Banks
9c1728416c
Revert "Updating md5 for fixed file" because this was fixed properly in unstable (but will break SnpEff if put into Stable).
...
This reverts commit 6b4182c6ab3e214da4c73bc6f3687ac6d1c0b72c.
2011-09-22 13:16:42 -04:00
Eric Banks
888d8697b1
Merged bug fix from Stable into Unstable
2011-09-22 13:16:31 -04:00
Eric Banks
15a410b24b
Updating md5 for fixed file
2011-09-22 13:15:41 -04:00
Mark DePristo
ba5f83fee2
start of VariantContextUtils UnitTest
...
-- tests rsID merging
2011-09-22 12:10:39 -04:00
Mark DePristo
93dd1faa5f
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-22 11:20:10 -04:00
Mark DePristo
a05c959e5a
Empty unit tests for VariantContextUtils
...
-- will be expanded over the day
2011-09-22 11:20:07 -04:00
Mark DePristo
3fdee2b9ed
Merge from stable into unstable
2011-09-22 11:19:43 -04:00
Christopher Hartl
4f4a0fc38a
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git
2011-09-22 11:01:58 -04:00
Christopher Hartl
982c47bfa7
Remove duplicate effort in ReadUtils (with apologies to Mauricio)
...
Big (but not major) cleanup of code in ILG - mostly excising the old likelihood model
Activated the early-abort check for ILG. I think it should be better this way.
2011-09-22 10:58:26 -04:00
Mark DePristo
c514df6d18
Merge of stable into unstable
2011-09-22 10:34:27 -04:00
Mark DePristo
f81a41b889
Updating MD5s for CombineVariants
...
-- Old version had broken RSIDs, new version is fixed. No longer see rs1234,. as it is now just rs1234
2011-09-22 10:30:25 -04:00
Eric Banks
b8ea9ceb68
Adding integration test that uses the -V:dbsnp binding to make sure it won't fail later on if someone messes with Tribble.
2011-09-21 22:43:31 -04:00
Eric Banks
8f8b59a932
My interpretation of the VCF spec is that the FORMAT field should only be present if there is genotype/sample data. So the VCFCodec now throws an exception when it encounters such a case. I had to fix one of the integration test VCFs.
2011-09-21 22:23:28 -04:00
Christopher Hartl
dc96f6da79
Merge branch 'master' of ssh://chartl@gsa2/humgen/gsa-scr1/chartl/dev/git
2011-09-21 18:18:41 -04:00
Christopher Hartl
f9cdc119af
Added a method to ReadUtils that converts reads of the form 10S20M10S to 40M (just unclips the soft-clips).
...
Be careful when using this - if you're writing a bam file it will be potentially written out of order (since the previous alignment start was at the M, not the S).
2011-09-21 18:16:42 -04:00
Christopher Hartl
faff6e4019
Failed to commit changes to the GATKReport required for more easy access when using the files as data sources (read: histograms) for walkers
2011-09-21 18:15:23 -04:00
Mauricio Carneiro
96768c8a18
Sending latest bug fixes to Reduce Reads to the main repository
2011-09-21 17:43:11 -04:00
Mauricio Carneiro
70335b2b0a
Hard clipping soft clipped reads to fix misalignments.
...
Pre-softclipped reads (with high qual) are a complicated event to deal with in the Reduced Reads environment. I chose to hard clip them out for now and added a todo item to bring them back on in the future, perhaps as a variant region.
2011-09-21 17:12:01 -04:00
Christopher Hartl
ef05827c7b
Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-21 16:40:47 -04:00
Christopher Hartl
3b51d9106a
Adding in likelihood calculations for mendelian violations. Also fixing a minor and rare bug in SelectVariants when specifying family structure on the command line.
2011-09-21 16:40:29 -04:00