hanna
5d4bbf41fb
Behave intelligently in the deepest levels of GATK record filtration when
...
we find a read flagged as 'mapped' in the unmapped region at the end of the
file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5365 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 04:52:55 +00:00
hanna
7a22f19366
More descriptive error when VerifyingSamIterator hits an inconsistent alignment. Also updated
...
case UserException.MalformedBAM to match case of UserExceptio.MissortedBAM for consistency and
ease-of-use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5364 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 03:55:24 +00:00
depristo
0181d95fe4
Intermediate optimization checkin. LinearExact model now about 10-20% faster than previous commit, by reorganizing and optimizing the if statements and genotype likelihood calculations. Next commit will include a banded implementation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5362 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 22:01:35 +00:00
ebanks
f0f4bc3363
This was busted because it assumed 1 (and only 1) record at each position. However it's possible to have 0 (which generated a NullPointer) or 2+ records (which dropped records).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5361 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 21:35:50 +00:00
depristo
c152ef4339
Better error message for unknown reference file extension.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5359 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 17:52:16 +00:00
hanna
bef83b8b09
Bug fix: was tracking state across BAMs that should've been tracked per-BAM.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5358 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 17:32:06 +00:00
depristo
bafa61c1fe
LINEAR_EXACT now the default model. Passes all integration tests. 2-3x faster in low-pass data. Tests on exome data ongoing, but potentially vastly faster there.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5357 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 17:14:36 +00:00
rpoplin
8e1aa6059a
New mode for CombineVariants to assume the incoming VCFs have the same samples and disjoint calls. Drastically reduces the runtime for routine combining operations. Very useful with Queue.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5356 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 15:52:17 +00:00
hanna
5e4b321f86
Add hidden command-line argument for low-memory sharding.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5355 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 15:13:16 +00:00
ebanks
ae42c0c7da
Bug fix based on GATK run report
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5354 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 14:18:12 +00:00
ebanks
660998065b
'Okay, now I'm absolutely certain that there are no more bugs in the constrained writer.'
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5353 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 03:48:40 +00:00
hanna
880c607d79
Disable validation of linear index against original linear index process.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5352 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 01:51:26 +00:00
hanna
dc62685a2f
For Ryan: force creation of BAM index when no reads are present in the BAM
...
file. Temporary fix until Picard changes the behavior of indexing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5351 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 01:50:42 +00:00
asivache
570186fa42
Added (deep) clone() and merge() to the RunningAverage utility class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5350 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 00:35:23 +00:00
hanna
43567b7fe3
Load the linear index without forcing the index for the entire contig to be
...
loaded into memory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5349 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 00:08:39 +00:00
ebanks
a20ce1436d
A temporary @hidden hack to get indel calling done for Phase I: don't try to call if there's too much coverage. Do not use this unless your last name rhymes with Shmoplin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5348 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 19:22:27 +00:00
hanna
3c7ae0d1a6
Special case handling of unmapped region in low memory sharder.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5346 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 17:38:30 +00:00
hanna
dd30ad751a
Fix bug in low memory sharder's interval accumulator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5345 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 17:11:22 +00:00
hanna
d6145de970
More comprehensive tracking of position when bin trees are sparse.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5344 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 15:53:43 +00:00
ebanks
bb969cd3a2
EMIT_ALL_SITES now does exactly that - even when there's no coverage or too many deletions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5343 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 05:05:00 +00:00
chartl
0723b0f44c
Generalized association is now working. Output is in a horrific format. Implementation of T-testing. Improvements are to look for classes dynamically (a la VariantEval/VariantAnnotator), beautify output, and do optimizations where they exist.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5341 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 01:23:37 +00:00
rpoplin
ce34a8a918
New hidden option in VQSR to not parse the genotypes of the incoming training data. Updated VQSR training in methods development pipeline to be more in line with best practices.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5340 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 23:19:51 +00:00
hanna
e7089f9870
Fix for particularly small, isolated intervals: make sure the bounds of the
...
bin tree are dictated by the lowest bin level, whether it exists or not.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5339 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 22:35:53 +00:00
hanna
c869d1c9cf
Fix misc issues in new protosharder regarding proper iterator termination when
...
an unexpectedly small amount of data is present.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5338 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 21:14:18 +00:00
hanna
e75366f738
Fixed performance issue in protosharding code -- turns out that the index
...
optimizer was mutating the data stored in the indices. Protosharding still
disabled by default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5334 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 17:32:12 +00:00
ebanks
8de83725f9
Simple walker to randomly break VCF files into (potentially unequal) subsets. Useful for e.g. cutting hapmap into training and evaluation sets.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5333 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 16:51:46 +00:00
delangel
d059d89a9d
Fixes and cleanups for indel eval module. Also outputs AT/CG ratio in dedicated column in IndelStatistics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5332 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 12:07:50 +00:00
ebanks
05fac8583d
Following up Mark's recent commit: hooking up the --maxPositionalMoveAllowed argument into the indel realigner and through to the SAM writer. We now ensure that no read is realigned more than N bases (200 by default, which is nowhere close to realistically possible). If anyone ever sees a warning message about this with the default value then please let me know because I need to see it for myself.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5331 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 04:40:54 +00:00
depristo
874406352c
Accidentally commited the N2 comparing test as well...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5330 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 04:15:30 +00:00
depristo
1dedfdb11b
Fixes for constrained movement Indel Realigner. Now sorts all of the reads in the interval before handing them to ConstrainedMateFixingSAMFileWriter to maintain correct contract between the two pieces of software
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5329 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 03:52:18 +00:00
depristo
d216830b92
Experimental linear version of the exact model. In testing, but gives identical results to N2 gold standard version, and passes integration tests. Performance optimizations still ongoing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5328 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 03:48:11 +00:00
ebanks
54facb2c51
Small change for Mauricio so that the correct metrics get output when running in GENOTYPE_GIVEN_ALLELES mode.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5327 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-27 06:08:32 +00:00
depristo
7ff8d23c64
Don't do genotype concordance on comp tracks without genotypes, even if they have an AC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5321 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 21:11:50 +00:00
hanna
600f73cbd6
A checkpoint commit of two BAM reading projects going on simultaneously. These two projects
...
are works in progress, and this checkin will provide a baseline against which to gauge
improvements to both projects.
Low-memory BAM protoshards (disabled by default):
- Currently passing ValidatingPileupIntegrationTest.
- Gets progressively slower throughout the traversal, but should run at least as fast as original implementation.
- Uses 10+ file handles per BAM, but should use 3.
BAM performance microbenchmark test system:
- Currently tests performance of BAM reading using SAM-JDK vs. GATK
- Tests do not hit all GATK performance hotspots.
- New tests that require input data in a slightly different form are hard to implement.
- Output of test results is not easily parseable (investigating Google Caliper for possible improvements).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5317 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 17:50:32 +00:00
ebanks
5d28cbda27
When crossing contigs it's crucial that the queue get flushed or else it will continue to accumulate reads without emitting. This is the last time I trust someone when they tell me that they are 'confident there are no bugs' in a tool.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5315 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 05:18:30 +00:00
rpoplin
1129f1535d
Fix for the HaplotypeScore optimization in AlignmentUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5310 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:40:18 +00:00
chartl
0f1c1fa26f
First general association module. Let the bug fixing begin!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5307 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 19:55:33 +00:00
chartl
292b421113
Framework for generalized association testing. Heavy lifting done in implementation of the AssociationContext(s) and AssociationContextAtom(s). Nothing really implemented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5306 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 18:12:39 +00:00
asivache
2f2aa339d9
Now makes all pairs, not only the good ones. The logic of selecting the "best" pair when the data are messy (e.g. multiple alignments available for an end) is still very naive
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5303 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:21:26 +00:00
asivache
abf3fcbb72
Little changes in recognized annotation terms; columns in annotated maf are now prioritized and multiple alternatives do not cause 'i don't know what to do' crash: e.g. if Chromosome and chr columns are both present, then Chromosome is taken (has a priority).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5302 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:19:06 +00:00
rpoplin
255cc246a2
Change in Methods development pipeline: dbsnp130 can't be used for anything, changed it to dbsnp129. Optimization for HaplotypeScore and the to-be-committed ReadRosRankSumTest in AlignmentUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5301 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:09:03 +00:00
chartl
97e1a5262e
-ct x no longer includes coverage in the previous bin
...
BatchMerge - additional support for indels (can't just test the alternate allele when it's an extended event, must also specify that you want to use the dindel model when you actually test the allele)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5300 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 15:52:04 +00:00
ebanks
ee6f112556
Phase 3: constrained movement is now the only option available in the realigner (so I guess technically it's not really an option). Several command-line options are deprecated. Code cleaned up. Wiki updated. Release coming. One phase left...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5299 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 14:59:48 +00:00
ebanks
93888e570b
Phase 2: after hours of testing, confirming that constrained mode looks good so moving the integration tests over to use it. Some cleanup. More cleanup coming in Phase 3.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5298 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 06:23:41 +00:00
carneiro
75bd0129e7
quick bug fix.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5296 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 19:16:20 +00:00
ebanks
9357bee921
Don't skip tri-allelic alleles passed in - just choose the first one.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5293 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:25:50 +00:00
carneiro
a2301383bb
quick walker to find out where the reads mapped to huref were mapped in the consensus reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5292 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:00:17 +00:00
ebanks
318035c147
Fixing up the output system of the Unified Genotyper. Deprecating the -all_bases and -genotype arguments. Adding instead the --output_mode (EMIT_VARIANTS_ONLY, EMIT_ALL_CONFIDENT_SITES, EMIT_ALL_SITES) and --genotyping_mode (DISCOVERY, GENOTYPE_GIVEN_ALLELES) arguments. UG now does the correct thing when passed alleles (bound to the 'alleles' rod) to use for genotyping; added several integration tests to cover this case. This commit will break the batched calls merging script, but Chris knows this and is ready for it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5288 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 06:07:18 +00:00
ebanks
d7f98ccd9c
Adding --doNotWriteOriginalQuals argument to BQ recalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5286 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 04:00:00 +00:00
depristo
1a5d296737
ReplaceReadGroups. Fixes BAM files without read group info. MissingReadGroup points people to this tool now. Please point users on the forum to this tool now. Will migrate to Picard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5284 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-21 14:02:41 +00:00