fromer
0b45de14ed
Some minor updates to fully utilize the functionality of reduceByInterval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5411 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-09 20:38:08 +00:00
rpoplin
509daac9f7
Minor bug fix in k-means implementation. Updating VQSR integration tests in preparation for VQSRv2 by removing some unused features such as VariantDatum.weight and ti/tv cutting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5410 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-09 00:26:28 +00:00
carneiro
fa7284b7a1
Genotype And Validate walker is now ready to be used by anyone.
...
given an annotated VCF and a BAM file, it genotypes (using the reads in the BAM) each variant in the VCF (for snp or indel) and validates (or not) the 'known' annotation. Outputs a truth table with the PPV and NPV values, and optionally a vcf file with the variants that had enough coverage to be validated. You can optionally provide a minimum depth of coverage and only do the analysis conditional on that. (will write a wiki for this walker, as it might be useful for future validation essays).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5409 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 22:10:38 +00:00
chartl
da88c29b6e
Added a module to test for reference mismatch associations, and a self-normalized/self-normalizing version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5408 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 20:01:28 +00:00
chartl
31a2575c7b
Fixes:
...
- Don't know how I got the wiggle header so utterly wrong. Fixed.
- Q-values now have a static maximum of 2000 so IGV averaging won't make everything look spikey and ugly.
- Changing windows to size 100 for (hopefully) better resolution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5406 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 17:16:21 +00:00
chartl
1b310401fe
Due to the approximation not being well-founded in this case, (and the non-existence of a pre-computed table at this time), pushing up the cutoff
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5405 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 16:24:42 +00:00
delangel
00ac51acc8
Added several integration tests for UG indel caller:
...
- Basic
- Multiple technology
- Test minIndelCnt parameter
Added also 2 disabled tests:
- Parallelization: issue w/code right now is that if -nt > 1, filter field shows "PASS" instead to ".", cause TBD
- Genotype given alleles mode: code not working yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5404 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 16:21:21 +00:00
chartl
77fe902dbd
Testing modules now use wider windows and heftier shift, hopefully this will remove some of the noisiness of the results. Some UStatistics were changed to TStatistics to try and limit noisiness as well. Walker will also additionally write out wiggle files directly (which can be converted into "proper" tdf files via igvtools tile [args] [in].wig [out].tdf [ref]) subject to some restrictions. MWU could get stuck in a long-running recursive regime, it'd be nice to have a table lookup or a good small-n large-m approximation, for now the uniform should work just fine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5403 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 15:26:13 +00:00
carneiro
b733cba7c7
re-fixing for a different approach suggested by eric!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5402 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 04:54:49 +00:00
kiran
d0598c7a04
Somehow missed this test when I was updating the md5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5400 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 23:53:42 +00:00
kiran
b6339967f8
Updated GenomicAnnotator integration tests to include the -NO_HEADER argument so that they tests op yelling about trtrivial differences
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5398 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 23:07:01 +00:00
hanna
85ff983a59
Failed to include some required GenomeLoc utilities in my last commit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5397 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 23:00:17 +00:00
carneiro
02006954bc
UG: small bug fix when creating empty variant contexts in UG for the -EMIT_ALL_SITES to allow indels.
...
GAV: First version of the walker that validates reads from a BAM file based on an annotated VCF with TP/FP annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5396 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 22:51:04 +00:00
hanna
9384b2ff65
A few quick fixes to temporarily make the LowMemorySharder return exactly the
...
same shards as the previous sharder, so that I can directly compare filespans
to see where some performance bugs lie.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5395 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 22:43:14 +00:00
depristo
0b4e51317b
Now includes project consensus high sensitivity data set
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5394 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 20:52:11 +00:00
kiran
43056d0188
Fixed integration test to reflect changes regarding when comp tracks got subset to fewer samples and whether no-call sites would get pulled in for comp tracks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5393 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 20:25:57 +00:00
carneiro
73e43d8d2c
Added functionality:
...
-disc (--discordance) parameter together with a ROD track will output a VCF with the variants in the ROD track that are not present in the 'variants' VCF. Useful tool to list the variants from hapmap (for example) that weren't called in a dataset.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5392 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 19:18:15 +00:00
kshakir
dc33fbed7c
Switched the CVUnitTest broken info from an Integer to a String since as of r5383 Integers are no longer broken when converted to Floats.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5390 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 16:33:14 +00:00
delangel
8c262eb605
Initial commit of new likelihood model to evaluate indel quality. Principle is simple, a plain Pair HMM with affine gap penalties (in log space) that does quasi-local alignment between reads and candidate haplotypes and which in theory should be more solid and more reliable than the older Dindel-based model. It also allows to be easily extensible in the future if we decide to introduce either context-dependent and/or read-dependent gap penalties.
...
Model is disabled by default and we're still using the old Dindel model until I'm more confident that new model is a definitive improvement, so right now this is enabled by hidden command line arguments, and it's not to be used yet.
In detail:
a) Several refactorings to share softMax() available to other modules, so its now part of MathUtils.
b) Refactored a couple of read utilities and moved from BAQ to ReadUtils.
c) New PairHMMIndelErrorModel class implementing new likelihood model
d) Several new hidden debug arguments in UAC.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5389 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 15:31:58 +00:00
kshakir
96fe540d66
Removing .tmp~ file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5388 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 14:52:38 +00:00
kshakir
92045ecaa6
Finally figured out what data in the LSF C API call lsb_readjobinfo is causing JNA to SIGSEGV with a strlen error:
...
- LSF recycles memory for C arrays, but sets a separate variable setting the size of the array to zero.
- JNA only sees the non-NULL pointer and starts to auto-access it, sometimes causing a SIGSEGV.
- In the short term neutered the jobInfoEnt structure so that this bad array is not autoRead().
QGraph updates:
- Job status is now checked in bulk every 30 seconds instead of one job at a time, even in the middle of dispatching jobs.
- If there is a hiccup (unexpected but not fatal error) during status check then the the error is ignored and status is checked again 30 seconds later.
- Jobs prefer to dispatch depth vs. breadth first.
More refactoring of SG framework separating the reusable code from the implementations.
The DistributedScatterFunction is still a work in progress and is not enabled yet. Still need to think through how Queue handles when a job dies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5387 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 04:29:06 +00:00
chartl
60ddc08cdf
Added a boatload of new case-control association modules. Switched the U-test to use longs rather than ints (it just so happened that I overflowed and started getting negative U statistics. Not good.) Added the ALL association type for ease of specifying that we want to throw the book at something. Added an svn-commit.tmp~ because i can't get rid of it even with --force. Hopefully I can remove it after.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5386 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 21:58:12 +00:00
depristo
5c979633f0
Due to a problem in the way that dynamic type selection works, I've added an explicit (temporary) ability to restrict VE to specific variant types (SNPs, INDELs, etc), so that calculations will work when a site has a SNP in dbSNP but is called as an indel, causing the SNP site to mysteriously disappear from the comp track, a huge problem for validation report. VEU updated to allow both dynamic type (old) and just returning everything in the track.
...
Also, created a standard Queue script that calculates a suite of standard indel and SNP assessment results. Will be the basis for a general evaluation Queue script with standardized data files for SNPs and Indels.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5385 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 19:31:12 +00:00
depristo
2f1e249aed
A proper validation report, calculating TP, FP, FN, sensitivity, FDR, PPV. Treats comp as a set of sites that have been either filtered (failed in assay), validated (polymorphic among samples), or invalidated (AC=0 or all genotypes = hom-ref). Very useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5384 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 19:27:40 +00:00
depristo
af71576a07
CalculateChromosomeCounts() now only calculates AC, AF, and AN when there are genotypes. Can now combine variants with headers that differ in only whether a field is a integer or a float. Updated CombineVariants integrationtest, as incorrect AC values where being calculated in the previous GS outputs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5383 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 19:25:52 +00:00
depristo
5b8fdc5b1f
Slightly optimized calculation for ~linear exact model, as well as totally incorrect banded calculation, for future development, if this proves useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5382 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 18:47:08 +00:00
chartl
fc5a071de2
Output format is 10^6 times better - now uses the multiplexer to write tdf tracks that can (after conversion to binary with igvtools) can be loaded directly into igv.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5381 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 18:23:52 +00:00
chartl
a40a8006b5
Added in unit tests for the statistics calculated by the test runner; and bug-fixes to the calculations; so we have some assurance that the statistics coming out the back-end are correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5380 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 16:54:02 +00:00
hanna
c40efe1dea
Fixed exception for BAMs without filenames (unit tests, BAM input streaming,
...
etc.).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5379 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 13:43:49 +00:00
depristo
ad51f30244
A trivial, but useful, sum of a list of integers
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5378 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-05 06:09:05 +00:00
depristo
9a8356892a
Cleaner error (really now just warnings) if you can't reach the S3 for logging
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5377 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-05 06:08:35 +00:00
hanna
10516f5de4
Fixed one low-memory sharder performance culprit: regions with no BAM data
...
whatsoever were misusing the Picard MergingIterator, triggering a re-traversal
through the entire contig.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5376 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-04 21:26:22 +00:00
ebanks
337b54136f
2 fixes. For Mark: when insertions can be partially left-aligned, we were reading off the wrong bases. For GS post: the stored VariantContext.REFERENCE_BASE_FOR_INDEL_KEY needs to be updated when left-aligning because it can change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5375 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-04 21:00:08 +00:00
kiran
b42005e7d7
Fixed issue where comp tracks with genotypes that didn't exactly overlap the eval track were getting dropped. Fixed issue where the 'row' column wasn't being output for things implementing TableType. This is an urgent patch for Mark - it'll break tests until I go back and update the md5s.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5374 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-04 16:51:12 +00:00
kiran
1861ca90fc
A change to the definition of CpG sites (is now, from 5' to 3' a CG dinucleotide in the reference, and the CpG site is at the C, rather than either at the C or a G).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5373 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-04 15:36:07 +00:00
chartl
9ca1dd5d62
Miscellaneous changes:
...
- RefMetaDataTracker: grabbing variant contexts given a prefix (not sure where else this was implemented, if someone can show me I'll remove it)
- VCFUtils: grabbing VCF headers given a prefix
- MathUtils: Useful functions for calculating statistics on collections of Numbers
- VariantAnnotator: Made isUniqueHeaderLine a public static method -- maybe this should go into a different class. Not sure.
- Associations: PluginManager now used to propagate classes, implementations for Z,T,U tests, slight alteration to format to make the objects stored
in the window optionally different from those returned by whatever statistic is run across the window
Added:
- MannWhitneyU. Started to fix up WilcoxonRankSum but there are comments in there questioning the validity of some of the code, and I'm sure that
it's actually doing a U test. This implementation includes the direct calculation of p-values for small sample sizes, and a uniform approximation
for when one of the sample sets is small, and the other large. Unit tests to follow.
- BootstrapCallsMerger: takes n VCFs which have been called on the same samples; merges them together while averaging the annotations
- BootstrapCalls.q: qscript for testing the effectiveness of boostrap low-pass calling on the exome
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5372 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 22:43:36 +00:00
carneiro
8ae42b70ac
give it an annotated VCF and a BAM and it creates a truth table on the validation of the VCF calls. This is just the first version, not ready for primetime.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5371 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 21:45:11 +00:00
rpoplin
f7ef35b8f5
Removing untrue comments in the GaussianMixtureModel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5369 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 18:18:47 +00:00
chartl
9e12cd1312
Gotta include the changes i made to get an init function into the contexts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5368 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 15:45:51 +00:00
chartl
835a26d145
A pass at a sample-normalized test. I think maybe all of them will simply do their own normalizing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5367 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 15:42:25 +00:00
chartl
ef38fd1e0e
Major refactoring of association testing framework. New modules are now beyond trivial to implement. One hurdle remains which is how to deal with statistics that ought to be sample-normalized (e.g. depth, insert-size [when multiple libraries are used], and possibly others).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5366 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 14:27:16 +00:00
hanna
5d4bbf41fb
Behave intelligently in the deepest levels of GATK record filtration when
...
we find a read flagged as 'mapped' in the unmapped region at the end of the
file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5365 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 04:52:55 +00:00
hanna
7a22f19366
More descriptive error when VerifyingSamIterator hits an inconsistent alignment. Also updated
...
case UserException.MalformedBAM to match case of UserExceptio.MissortedBAM for consistency and
ease-of-use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5364 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-03 03:55:24 +00:00
depristo
0181d95fe4
Intermediate optimization checkin. LinearExact model now about 10-20% faster than previous commit, by reorganizing and optimizing the if statements and genotype likelihood calculations. Next commit will include a banded implementation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5362 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 22:01:35 +00:00
ebanks
f0f4bc3363
This was busted because it assumed 1 (and only 1) record at each position. However it's possible to have 0 (which generated a NullPointer) or 2+ records (which dropped records).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5361 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 21:35:50 +00:00
depristo
c152ef4339
Better error message for unknown reference file extension.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5359 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 17:52:16 +00:00
hanna
bef83b8b09
Bug fix: was tracking state across BAMs that should've been tracked per-BAM.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5358 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 17:32:06 +00:00
depristo
bafa61c1fe
LINEAR_EXACT now the default model. Passes all integration tests. 2-3x faster in low-pass data. Tests on exome data ongoing, but potentially vastly faster there.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5357 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 17:14:36 +00:00
rpoplin
8e1aa6059a
New mode for CombineVariants to assume the incoming VCFs have the same samples and disjoint calls. Drastically reduces the runtime for routine combining operations. Very useful with Queue.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5356 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 15:52:17 +00:00
hanna
5e4b321f86
Add hidden command-line argument for low-memory sharding.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5355 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 15:13:16 +00:00
ebanks
ae42c0c7da
Bug fix based on GATK run report
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5354 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 14:18:12 +00:00
ebanks
660998065b
'Okay, now I'm absolutely certain that there are no more bugs in the constrained writer.'
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5353 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 03:48:40 +00:00
hanna
880c607d79
Disable validation of linear index against original linear index process.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5352 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 01:51:26 +00:00
hanna
dc62685a2f
For Ryan: force creation of BAM index when no reads are present in the BAM
...
file. Temporary fix until Picard changes the behavior of indexing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5351 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 01:50:42 +00:00
asivache
570186fa42
Added (deep) clone() and merge() to the RunningAverage utility class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5350 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 00:35:23 +00:00
hanna
43567b7fe3
Load the linear index without forcing the index for the entire contig to be
...
loaded into memory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5349 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 00:08:39 +00:00
ebanks
a20ce1436d
A temporary @hidden hack to get indel calling done for Phase I: don't try to call if there's too much coverage. Do not use this unless your last name rhymes with Shmoplin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5348 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 19:22:27 +00:00
hanna
3c7ae0d1a6
Special case handling of unmapped region in low memory sharder.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5346 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 17:38:30 +00:00
hanna
dd30ad751a
Fix bug in low memory sharder's interval accumulator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5345 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 17:11:22 +00:00
hanna
d6145de970
More comprehensive tracking of position when bin trees are sparse.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5344 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 15:53:43 +00:00
ebanks
bb969cd3a2
EMIT_ALL_SITES now does exactly that - even when there's no coverage or too many deletions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5343 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 05:05:00 +00:00
chartl
0723b0f44c
Generalized association is now working. Output is in a horrific format. Implementation of T-testing. Improvements are to look for classes dynamically (a la VariantEval/VariantAnnotator), beautify output, and do optimizations where they exist.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5341 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 01:23:37 +00:00
rpoplin
ce34a8a918
New hidden option in VQSR to not parse the genotypes of the incoming training data. Updated VQSR training in methods development pipeline to be more in line with best practices.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5340 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 23:19:51 +00:00
hanna
e7089f9870
Fix for particularly small, isolated intervals: make sure the bounds of the
...
bin tree are dictated by the lowest bin level, whether it exists or not.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5339 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 22:35:53 +00:00
hanna
c869d1c9cf
Fix misc issues in new protosharder regarding proper iterator termination when
...
an unexpectedly small amount of data is present.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5338 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 21:14:18 +00:00
ebanks
5ac9af472c
Adding performance test for case with very high coverage (> 600,000x) over an interval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5336 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 19:48:56 +00:00
hanna
e75366f738
Fixed performance issue in protosharding code -- turns out that the index
...
optimizer was mutating the data stored in the indices. Protosharding still
disabled by default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5334 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 17:32:12 +00:00
ebanks
8de83725f9
Simple walker to randomly break VCF files into (potentially unequal) subsets. Useful for e.g. cutting hapmap into training and evaluation sets.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5333 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 16:51:46 +00:00
delangel
d059d89a9d
Fixes and cleanups for indel eval module. Also outputs AT/CG ratio in dedicated column in IndelStatistics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5332 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 12:07:50 +00:00
ebanks
05fac8583d
Following up Mark's recent commit: hooking up the --maxPositionalMoveAllowed argument into the indel realigner and through to the SAM writer. We now ensure that no read is realigned more than N bases (200 by default, which is nowhere close to realistically possible). If anyone ever sees a warning message about this with the default value then please let me know because I need to see it for myself.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5331 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 04:40:54 +00:00
depristo
874406352c
Accidentally commited the N2 comparing test as well...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5330 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 04:15:30 +00:00
depristo
1dedfdb11b
Fixes for constrained movement Indel Realigner. Now sorts all of the reads in the interval before handing them to ConstrainedMateFixingSAMFileWriter to maintain correct contract between the two pieces of software
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5329 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 03:52:18 +00:00
depristo
d216830b92
Experimental linear version of the exact model. In testing, but gives identical results to N2 gold standard version, and passes integration tests. Performance optimizations still ongoing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5328 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 03:48:11 +00:00
ebanks
54facb2c51
Small change for Mauricio so that the correct metrics get output when running in GENOTYPE_GIVEN_ALLELES mode.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5327 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-27 06:08:32 +00:00
depristo
7ff8d23c64
Don't do genotype concordance on comp tracks without genotypes, even if they have an AC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5321 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 21:11:50 +00:00
hanna
600f73cbd6
A checkpoint commit of two BAM reading projects going on simultaneously. These two projects
...
are works in progress, and this checkin will provide a baseline against which to gauge
improvements to both projects.
Low-memory BAM protoshards (disabled by default):
- Currently passing ValidatingPileupIntegrationTest.
- Gets progressively slower throughout the traversal, but should run at least as fast as original implementation.
- Uses 10+ file handles per BAM, but should use 3.
BAM performance microbenchmark test system:
- Currently tests performance of BAM reading using SAM-JDK vs. GATK
- Tests do not hit all GATK performance hotspots.
- New tests that require input data in a slightly different form are hard to implement.
- Output of test results is not easily parseable (investigating Google Caliper for possible improvements).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5317 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 17:50:32 +00:00
ebanks
5d28cbda27
When crossing contigs it's crucial that the queue get flushed or else it will continue to accumulate reads without emitting. This is the last time I trust someone when they tell me that they are 'confident there are no bugs' in a tool.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5315 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 05:18:30 +00:00
ebanks
cba88a8861
Elegant solution to the determinism problem: force testNG to run tests in the order that I want it to.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5312 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 21:32:35 +00:00
rpoplin
1129f1535d
Fix for the HaplotypeScore optimization in AlignmentUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5310 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:40:18 +00:00
ebanks
15dfac6bf7
Updating integration test to be in sync with previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5309 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:21:58 +00:00
ebanks
06e3c34e7f
Updating performance test to be in sync with previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5308 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:13:35 +00:00
chartl
0f1c1fa26f
First general association module. Let the bug fixing begin!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5307 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 19:55:33 +00:00
chartl
292b421113
Framework for generalized association testing. Heavy lifting done in implementation of the AssociationContext(s) and AssociationContextAtom(s). Nothing really implemented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5306 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 18:12:39 +00:00
asivache
2f2aa339d9
Now makes all pairs, not only the good ones. The logic of selecting the "best" pair when the data are messy (e.g. multiple alignments available for an end) is still very naive
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5303 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:21:26 +00:00
asivache
abf3fcbb72
Little changes in recognized annotation terms; columns in annotated maf are now prioritized and multiple alternatives do not cause 'i don't know what to do' crash: e.g. if Chromosome and chr columns are both present, then Chromosome is taken (has a priority).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5302 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:19:06 +00:00
rpoplin
255cc246a2
Change in Methods development pipeline: dbsnp130 can't be used for anything, changed it to dbsnp129. Optimization for HaplotypeScore and the to-be-committed ReadRosRankSumTest in AlignmentUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5301 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:09:03 +00:00
chartl
97e1a5262e
-ct x no longer includes coverage in the previous bin
...
BatchMerge - additional support for indels (can't just test the alternate allele when it's an extended event, must also specify that you want to use the dindel model when you actually test the allele)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5300 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 15:52:04 +00:00
ebanks
ee6f112556
Phase 3: constrained movement is now the only option available in the realigner (so I guess technically it's not really an option). Several command-line options are deprecated. Code cleaned up. Wiki updated. Release coming. One phase left...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5299 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 14:59:48 +00:00
ebanks
93888e570b
Phase 2: after hours of testing, confirming that constrained mode looks good so moving the integration tests over to use it. Some cleanup. More cleanup coming in Phase 3.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5298 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 06:23:41 +00:00
ebanks
c59c8b9872
Phase I of my promise to Mark: fleshed out integration tests for Indel Realigner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5297 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 21:05:20 +00:00
carneiro
75bd0129e7
quick bug fix.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5296 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 19:16:20 +00:00
ebanks
9357bee921
Don't skip tri-allelic alleles passed in - just choose the first one.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5293 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:25:50 +00:00
carneiro
a2301383bb
quick walker to find out where the reads mapped to huref were mapped in the consensus reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5292 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:00:17 +00:00
ebanks
318035c147
Fixing up the output system of the Unified Genotyper. Deprecating the -all_bases and -genotype arguments. Adding instead the --output_mode (EMIT_VARIANTS_ONLY, EMIT_ALL_CONFIDENT_SITES, EMIT_ALL_SITES) and --genotyping_mode (DISCOVERY, GENOTYPE_GIVEN_ALLELES) arguments. UG now does the correct thing when passed alleles (bound to the 'alleles' rod) to use for genotyping; added several integration tests to cover this case. This commit will break the batched calls merging script, but Chris knows this and is ready for it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5288 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 06:07:18 +00:00
ebanks
d7f98ccd9c
Adding --doNotWriteOriginalQuals argument to BQ recalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5286 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 04:00:00 +00:00
depristo
1a5d296737
ReplaceReadGroups. Fixes BAM files without read group info. MissingReadGroup points people to this tool now. Please point users on the forum to this tool now. Will migrate to Picard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5284 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-21 14:02:41 +00:00
depristo
aa4a4e515d
Safer interface for ReorderSam. Better error checking. Documentation. Moving into Picard now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5283 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-20 14:35:44 +00:00
depristo
cd7a7091ba
Lexicographic error points users to the ReorderSam wiki entry
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5281 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:45:37 +00:00
depristo
444bf83acf
A simple utility for reordering a BAM file based on a new reference sequence. This tool can be used to efficiently correct a lexicographically sorted BAM file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5279 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:24:32 +00:00
kshakir
290afae047
GSA-423 Better reporting for errors in QScript.script().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5276 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 22:21:15 +00:00
kiran
52f860c9b2
Modified MD5s to account for Andrey's new MNP column in CountVariants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5274 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 13:13:58 +00:00
kiran
cb95e68fc0
CpG is no longer a standard stratification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5273 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 07:17:35 +00:00
kiran
9ddee96f93
When subsetting by sample, need to take extra care that hom-ref sites don't accidentally get treated as variant sites in CompOverlap. Renamed convenience method for creating command-lines in integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5272 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 06:26:38 +00:00
delangel
1bc5c7e99b
boneheaded mistake, mixed up my min and max
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5271 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 04:02:14 +00:00
kiran
92c82200c9
Fixed an issue where an eval module with TableType objects would get an extra, empty table in the output, screwing up the parse in R.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5267 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:03:46 +00:00
asivache
7ffcade3c3
Added MNP to recognized and counted event types
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5266 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:37:38 +00:00
depristo
57c66b5602
Supports GQ now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5265 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:30:25 +00:00
kshakir
a189454343
FCP only adds the expand intervals QFunction once per script instead of once per QFunction using the ExpandTargets scala trait.
...
Eval dbSNP's type now based on eval dbSNP instead of genotype dbSNP.
Using an external treemap instead of the JGraphT internal node set to speed up larger graph generation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5261 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 19:09:03 +00:00
delangel
f1d708f4d4
Fixes for HRun annotation in case of indels:
...
a) In case of a deletion value was completely broken, we'd report 0 or -1.
b) For indels, we report maximum of forward and backward values - I've seen empirically many sites which are not strand biased but which seem to be artifacts and the homopolymer run is always to the right only (because we left align by convention).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5260 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 18:57:21 +00:00
asivache
0e04e95245
Bug fix: when extracting reference sequence for the event from the reference genome, the tool was treating Deletions and MNPs of length N in exactly the same way: ref_bases[current_pos+1,...,current_pos+N]. This is correct for Deletions but not for MNPs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5258 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 16:15:42 +00:00
asivache
52eedaf22d
Subtle but very annoying bug due to incorrect exit condition on backward traversal. Example of incorrect old behavior (found by Martha Borkan, this normally would NOT happen with the combination of match/mismatch/open/extend parameters we have been using; use match=10.0, mismatch= -9.0, open= -15.0, extend= -6.66 in older builds in order to reproduce):
...
let's align two sequences (shown below, good alignment)
AAATTTGGTAAAA-GT
AAATTTGGTAAAAGGT
now let's reverse the same very sequences and align again
TGAAAATGGTTTAAA
TGGAAAATGGTTTAAA
Note how we lost the deletion and got a mismatch instead at the very first letter of the upper sequence. The overall score of any particular alignment does not depend on the direction of the traversal, so the best alignment (with the highest score) should stay the same too.
New version fixes this issue and produces correct alignment of reverse sequences (up to the different choice of redundant position for the deletion):
T-GAAAATGGTTTAAA
TGGAAAATGGTTTAAA
This version also has the main() method reinstated, so the aligner can be run on its own as a little app.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5255 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 00:02:32 +00:00
fromer
6e291820d3
GeneNamesIntervalWalker outputs all genes in each interval; walkers now require a ROD named "intervals"
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5254 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 19:58:09 +00:00
fromer
b304ced801
Updated haplotype calculator to correctly terminate haploptypes RIGHT BEFORE an unphased het
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5252 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 17:10:01 +00:00
depristo
5a51c9a815
AWS_S3 logging is now enabled by default. It first tries to log internally at the Broad, and if it can't goes to AWS_S3. DEV option is removed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5249 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 20:20:14 +00:00
kshakir
d185c2961f
Added pipeline for calling FCP in batches called MultiFullCallingPipeline.
...
Bug smashes for the MCFP:
Synchronized access to LSF library and modifications to the QGraph.
If values are missing from the graph with -run make sure to exit with a non-zero.
Refactored QGraph to pre-generate a unique Int for each QNode speeding up getHashCode/equals inside the graph.
Added jobPriority and removed jobLimitSeconds from QFunction.
All scatter gather is by default in a single sub directory queueScatterGather.
Moved some FCPTest into BaseTest/PipelineTest for use by MFCPTest.
Rev'ed the 1000G bams used for validation from v1 to v2 and added code to look for the bams before running other tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5247 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 18:26:14 +00:00
fromer
d6e3f2eba6
Added GC content calculator for CNV data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5240 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 22:29:55 +00:00
asivache
7a11b4f35d
Another change in variant classification values
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5237 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 17:47:58 +00:00
asivache
7f7d7eb2d1
Inconsequential changes, more 'variant classification' values are recognized
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5236 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 17:36:39 +00:00
kiran
d3660aa00e
Very basic functionality for annotating indels (specifies whether the indel is frameshift, inframe, or non-coding). Does not attempt to recalculate the variant codon, variant amino acid, or whether the site falls within a splice region. Added a convenience method to WalkerTest for building command-line arguments with the proper spacing (so that I stop getting annoyed when I've gotten it wrong and the test system yells at me.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5235 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-13 17:58:20 +00:00
hanna
8d6db5d188
Additional logging of the temp file creation, management, and merging process
...
for VCF files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5234 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 22:07:25 +00:00
asivache
03482bf7c4
Number of MQ0 reads in each sample (format field)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5229 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 17:16:26 +00:00
asivache
8560bb290b
Allelic fractions are now computed on MQ>0 reads only; total depth in each sample still includes MQ0 as per usual convention. Also renamed for clarity.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5228 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 17:13:15 +00:00
ebanks
9554df1a7c
Adding integration test for indels in VF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5227 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 16:58:57 +00:00
hanna
b992abb6eb
A few more unit tests plus some extra
...
functionality for BAM index visualization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5222 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-09 01:51:34 +00:00
kshakir
4d1cca95bb
Removed deprecated getDbsnpFile.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:12:15 +00:00
kshakir
a8ab5a5fb9
After code review with APSG, trying a patch for SIGSEGV errors which checks the LSF result codes from lsb_openjobinfo instead of checking for a null return value from lsb_readjobinfo.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5220 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:08:22 +00:00
delangel
f3de9ee3e0
Refactoring of indel evaluation code to make it easier for external functions to get access to indel classification, in preparation for IndelMetricsByAC to stratify indel classes by AC (not done yet).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5219 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 17:35:16 +00:00
delangel
3635606cd8
Temp checkin just for experimentation: exposed probabilistic alignment parameters to command line interface to make it easier to experiment on their effects, although a full scrap/rewrite of this should be coming soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5218 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 17:33:29 +00:00
ebanks
196eb77699
CG var format is screwed up and doesn't quite fit into the VariantsToVCF mold (we need to see multiple records before we can assign genotypes to a given position), so it's safer to keep this separate from the other well-behaved formats. Hopefully, it's temporary anyways.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5216 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 03:18:38 +00:00
ebanks
4fe0fcd707
Updates to handle CG data, headers, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5215 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 03:16:05 +00:00
kshakir
8040998c15
Renamed the pipeline yaml dbsnpFile to genotypeDbsnp, and added an evalDbsnp.
...
Added a genotypeDbsnpType and evalDbsnpType to check the extensions for .vcf or .rod.
Moved renaming of "recalibrated" bams to "cleaned" from sed to yaml generation template (see diff for more info).
Renamed fCP.q to FCP.q.
Though it's still disabled until VariantEval is updated, added changes above to the FCPTest.
Removed refseq table from the queue.sh wrapper script. Only specified in the yaml.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5213 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 22:01:09 +00:00
fromer
bceb2a9460
Now that Mauricio has updated the PacBio BAM to properly have RG, can use sample name in the walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5212 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 20:26:57 +00:00
kiran
ecbc38aff0
If no comp rod is specified, specify the dummy name none so that we still get counts.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5211 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 19:24:52 +00:00
carneiro
1fbfd4082e
Cycle covariate now works with pacbio reads. No need to override the platform anymore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5210 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 17:14:55 +00:00
asivache
2a04e0d378
Explicitly set logger's level to info - otherwise samtools is too chatty
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5209 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 17:08:50 +00:00
ebanks
698096dc5a
Moving VariantsToVCF to the proper directory; removing the oneoffs CG indel converter in preparation for a ligitimate CG variant Feature class in the works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5207 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 05:21:01 +00:00
kiran
35c688ac67
Updated md5 for testVCFStreamingChain to reflect latest changes to VariantEval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5206 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 21:22:05 +00:00
kiran
1f820d5026
Added two files from some refactoring changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5205 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 19:20:12 +00:00
kiran
1085bbf303
Fixed issue where all comp tracks were being treated as known tracks. Fixed issue where multiple JEXL expressions were causing an exception because the underlying object did not implement the Comparable interface. Fixed issue where variants being compared to the known track were not being checked for equality of variation type. Fixed issue where functional annotations were not being iterated over properly. Refactored a lot of helper methods into a separate VariantEvalUtils utility class. Significantly expanded the test suite using a small VCF with SNPs, indels, and non-variant loci which makes it much easier to see what the proper answer should be, and included the appropriate grep and awk commands in the comments to confirm the values.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5204 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 19:19:20 +00:00
kshakir
cc5d695bcf
Renamed the IPFL Test to IPFL PipelineTest so that it'll be picked up by the PipelineTests.
...
HACK: Turned off JNA autoRead() in the jobInfoEnt LSF structure to try and dodge the SIGSEGV during strlen calls during bmods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5201 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-05 00:06:12 +00:00
depristo
ce51ffb56e
Oops, old local paths committed on accident.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5200 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 23:35:56 +00:00
depristo
29f3ad72f3
SAMFileWriter that allows the user to move reads, but only a bit, in an incoming coordinated sorted BAM files. Does some local reordering and local mate fixing, under specified constrained. These constrains allow us to make a special -- under testing for Eric, who promised to try this out a bit, expand test cases and integration tests -- but soon to be the default and only model of the realigner that only moves reads with ISIZE < 3000 that directly emits a coordinate sorted, mate fixed validating BAM file without needing FixMates externally. Preliminary testing shows this runs in a totally fine amount of memory and produces equivalent results to the previous version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5199 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:27:05 +00:00
depristo
11ea321b39
Trivial header cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5198 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:23:15 +00:00
depristo
fe4aa58d35
Removing unused class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5197 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:22:28 +00:00
depristo
0ad1ea4aa1
Fixed Umapped misspelling
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5196 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:21:41 +00:00
asivache
03f265d8bd
Change DP format field description in the header line (expected count=1)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5195 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 21:28:25 +00:00
asivache
c0e998621c
Computes two format (genotype) level annotations: total read depth in the given sample (DP format field) and fraction of reads supporting alt allele(s) in the given sample (FA format field)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5193 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 21:23:55 +00:00
asivache
8700b74640
Now annotates indels as well. Probably can also annotate mixed vcf with indels +snps, but not tested in that mode...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5192 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 20:28:03 +00:00
hanna
5c3198520c
A few minor modifications masquerading as significant changes according to
...
svn's logs:
- Copied BAM indexing engine from Picard back into the GATK anticipating
shard merging algorithm. Tried to leave most of the building blocks in
Picard. If this turns into a logistical nightmare, I'll merge the building
blocks into the GATK as well.
- Reorganized the org.broadinstitute.sting.gatk.datasources package, giving
better separation of query and management functionality for reads, ref, rmd,
and samples.
- Merged Shard building blocks into org.broadinstitute.sting.gatk.datasources.
reads package, indicating it's current strong relationship with the reads,
rather than the general unifying element I wish this would be.
- Collapsed BAMFormatAwareShard into Shard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5184 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:59:19 +00:00
kiran
9ddc95c833
NewEvaluationContext needs to be generated in the inner loop. Otherwise, multiple comp tracks end up getting routed to the same row of the output table. Added test to cover multiple comp tracks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5181 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 07:04:53 +00:00