rpoplin
2a2538136d
A version of VQSRv2 that does contrastive clustering in two passes. The walkers will be renamed when they are moved to core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5443 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 21:03:56 +00:00
carneiro
fcc347bb05
making sure the output is as pretty as I said it would be on the wiki.
...
wikipage for this walker is up, at : http://www.broadinstitute.org/gsa/wiki/index.php/Genotype_and_Validate#Examples
use it ;)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5442 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 20:32:09 +00:00
ebanks
239dae0985
Absolutely nothing to get excited about. This is just the skeleton for the local assembler. It doesn't do anything at all now except for collect reads over each -L interval and pass them to an assembly engine (which isn't implemented yet). The interface for the AssemblyEngine will change later, but for now this one is the most conducive to debugging.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5441 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 20:31:54 +00:00
corin
6d09cdd4bc
This is a walker that lets the user generate the bed file for declaring variants true positives or false positives. For use with the IGV crowd sourcing project.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5440 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 19:56:16 +00:00
depristo
f75ad0dee3
Now in Picard, and released to the public
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5439 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 19:36:56 +00:00
carneiro
9dfe4c9cb7
moving GenotypeAndValidate to the playground. It's ready to be used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5438 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 19:19:18 +00:00
kshakir
c18f1aa828
Added an optional tag argument RodBind, similar to the Tag argument on TaggedFile.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5437 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 17:49:03 +00:00
carneiro
33c7593218
YAML integrated mendelian violation utility class, integrated and tested through select variants. Wiki is updated.
...
ps: I moved it out of tribble. If you think it should reside in a different place, just yell at me.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5436 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 16:43:37 +00:00
carneiro
42f70d9e07
join all per-lane Bams before doing target realigning and indel cleaning.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5435 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 16:11:03 +00:00
hanna
5406e779d2
Ryan noticed that I accidentally killed a public interface method for getting tag information.
...
Reinstated. Proper unit test to follow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5434 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-14 15:51:19 +00:00
depristo
3e3ec85807
Checked for consistency with the previous integration tests, and updated the walker and test to use the new I/O system (always prints 4 digits on floats.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5433 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-13 15:24:22 +00:00
depristo
b99e27bf9b
In the process of optimizing ProduceBeagleInputWalker, discovered that the GenotypeLikelihoods, the UG, and Genotype objects were using old-style GL tags internally, and then converting from Likelihoods -> GL String -> Likelihoods -> PL String throughout the GATK. It was both painful and led to convoluted code throughout the system. Removed everything but GL conversion -> PL in the GenotypeLikelihoods objects, and now all of the codes in UG now immediately provides GenotypeLikelihoods to the Genotype objects, which is converted straight to PL now. Resulted in a 30% speed up in ProduceBeagleLikelihoods, passes integration tests without any modifications, and likely speeds up writing any VCFs with likelihoods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5432 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-13 00:07:51 +00:00
rpoplin
ceb08f9ee6
Moving some math around in VQSRv2.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5431 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 15:15:05 +00:00
depristo
d01d4fdeb5
Optimized version of produce beagle tool, along with experimental (hidden) support for combining likelihoods depending on estimate false positive rate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5430 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 02:06:28 +00:00
depristo
ee8f2871f7
A better output for Genotype Concordance summary. Now does only % comp hom-ref called hom-ref, het called het, and hom-var called hom-var, which are the quantities we typically show in slides. Updated intergration tests to reflect this change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5429 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 02:03:48 +00:00
kshakir
93de326066
Added a new @PartitionBy for walkers to specify how to cut up their inputs.
...
Now building all javadoc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5428 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 01:33:08 +00:00
delangel
8ca3390ee0
Low level plumbing work required to have a context dependent error model with the new indel probabilistic alignment model. This just adds an extra input argument and does some refactoring so that when an actual model is ready it will be easy to plug in.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5427 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 00:00:55 +00:00
carneiro
e35a67b3cc
changed the name of the parameter to make the wiki more uniform.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5426 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 17:54:53 +00:00
carneiro
4a84a81d17
SelectVariants: added parameters for mendelian violation. Given a trio vcf, it will generate a VCF with the sites that are mendelian violations.
...
GenotypeAndValidate: now annotates the validations with callStatus.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5425 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 17:47:53 +00:00
delangel
b03055099a
a) Changed the way we classify and log indel events (e.g. in IndelClasses table inside IndelStatistics VE module). Made names clearer, and split logging of event length with number of repetitions of event.
...
b) Add an experimental annotation to log indel type string inside the INFO field, just for debugging/temp analysis purposes (will consider making it standard if it proves useful).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5424 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 17:37:41 +00:00
chartl
4a09d25a90
One last little thing, I swear
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5423 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 17:37:40 +00:00
chartl
be1f6af815
Let's go the other way.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5422 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 17:36:41 +00:00
chartl
572b2707f2
oops
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5421 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 17:34:27 +00:00
chartl
aea0c733a4
A pilot for empirical recalibration of association scores.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5420 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 17:32:45 +00:00
rpoplin
b3464a6031
Initial commit of VQSRv2 that passes the old integration tests. Not ready to be used yet unless your name rhymes with ... oh wait, that's me.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5419 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 15:18:34 +00:00
depristo
ccc773d175
Refactoring, cleanup, and performance improvements to ProduceBeagleInput. It's really a shame that there's no integration tests...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5418 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-11 13:55:30 +00:00
kshakir
097a9a59e8
Updated LSF libraries to use Pointer instead of Structure.ByReference for struct arrays since the the latter is autoRead() and LSF doesn't always return null for empty arrays.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5417 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 22:58:54 +00:00
kshakir
2058fc12bc
Small bug fixes:
...
Added a property to allow generating Queue extensions around an external version of the GATK.
Updated argument order in -help.
Restored the ability to use QScript trait imported annotations in constructors.
Removing line feeds from email password files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5416 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 19:54:14 +00:00
ebanks
4baeb5979f
It turns out that Math.log10() can return 0, which leads to QUALs being set to -0, which is off-spec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5415 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 03:08:56 +00:00
ebanks
3596c56602
New attempt at the constrained movement version of the indel realigner (I've kept around the old writer for now). The new contract is that the realigner must ask permission before trying to clean an area; permission will be denied by the CM-Manager if it was required to flush its cache of reads because of too much depth within a distance of maxInsertSizeForMovingReadPairs. Added integration tests to cover different max cache sizes, including an expected exception when too small a value is chosen. The actual logic changes were fairly minor - much of this commit is really just some cleanup. I'd like to throw 1000G Phase I at it, but will respectfully wait for Ryan to hit his deadline before doing so.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5414 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 02:48:29 +00:00
kshakir
204582bcd5
Printing out counts of functions as they are dispatched.
...
Deleting files from intermediate jobs as soon as all the dependent jobs are done.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5413 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 01:45:20 +00:00
rpoplin
ff7edc4493
Minor bug fix in empiricalMu prior calculation in VQSR.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5412 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 00:42:38 +00:00
fromer
0b45de14ed
Some minor updates to fully utilize the functionality of reduceByInterval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5411 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-09 20:38:08 +00:00
rpoplin
509daac9f7
Minor bug fix in k-means implementation. Updating VQSR integration tests in preparation for VQSRv2 by removing some unused features such as VariantDatum.weight and ti/tv cutting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5410 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-09 00:26:28 +00:00
carneiro
fa7284b7a1
Genotype And Validate walker is now ready to be used by anyone.
...
given an annotated VCF and a BAM file, it genotypes (using the reads in the BAM) each variant in the VCF (for snp or indel) and validates (or not) the 'known' annotation. Outputs a truth table with the PPV and NPV values, and optionally a vcf file with the variants that had enough coverage to be validated. You can optionally provide a minimum depth of coverage and only do the analysis conditional on that. (will write a wiki for this walker, as it might be useful for future validation essays).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5409 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 22:10:38 +00:00
chartl
da88c29b6e
Added a module to test for reference mismatch associations, and a self-normalized/self-normalizing version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5408 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 20:01:28 +00:00
kshakir
63e1625cc5
Instead of just looking for $SGE_ROOT specifically hunting for $SGE_ROOT/lib/drmaa.jar to enable early grid engine builds.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5407 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 18:27:08 +00:00
chartl
31a2575c7b
Fixes:
...
- Don't know how I got the wiggle header so utterly wrong. Fixed.
- Q-values now have a static maximum of 2000 so IGV averaging won't make everything look spikey and ugly.
- Changing windows to size 100 for (hopefully) better resolution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5406 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 17:16:21 +00:00
chartl
1b310401fe
Due to the approximation not being well-founded in this case, (and the non-existence of a pre-computed table at this time), pushing up the cutoff
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5405 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 16:24:42 +00:00
delangel
00ac51acc8
Added several integration tests for UG indel caller:
...
- Basic
- Multiple technology
- Test minIndelCnt parameter
Added also 2 disabled tests:
- Parallelization: issue w/code right now is that if -nt > 1, filter field shows "PASS" instead to ".", cause TBD
- Genotype given alleles mode: code not working yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5404 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 16:21:21 +00:00
chartl
77fe902dbd
Testing modules now use wider windows and heftier shift, hopefully this will remove some of the noisiness of the results. Some UStatistics were changed to TStatistics to try and limit noisiness as well. Walker will also additionally write out wiggle files directly (which can be converted into "proper" tdf files via igvtools tile [args] [in].wig [out].tdf [ref]) subject to some restrictions. MWU could get stuck in a long-running recursive regime, it'd be nice to have a table lookup or a good small-n large-m approximation, for now the uniform should work just fine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5403 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 15:26:13 +00:00
carneiro
b733cba7c7
re-fixing for a different approach suggested by eric!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5402 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 04:54:49 +00:00
kshakir
5564838087
Email password should be specified on the command line OR the password file, not and.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5401 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-08 03:42:09 +00:00
kiran
d0598c7a04
Somehow missed this test when I was updating the md5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5400 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 23:53:42 +00:00
kshakir
c03341aec1
JobRunner can be specified on the command line. -bsub is currently short form of -jobRunner Lsf706.
...
Added an empty wrapper for a GridEngine job runner which is only activated when SGE_ROOT is detected.
Refactored a bit more common code into CommandLineJobRunner / JobRunner / FunctionEdge.
Status for analyisNames now includes the number of functions in state.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5399 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 23:52:48 +00:00
kiran
b6339967f8
Updated GenomicAnnotator integration tests to include the -NO_HEADER argument so that they tests op yelling about trtrivial differences
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5398 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 23:07:01 +00:00
hanna
85ff983a59
Failed to include some required GenomeLoc utilities in my last commit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5397 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 23:00:17 +00:00
carneiro
02006954bc
UG: small bug fix when creating empty variant contexts in UG for the -EMIT_ALL_SITES to allow indels.
...
GAV: First version of the walker that validates reads from a BAM file based on an annotated VCF with TP/FP annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5396 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 22:51:04 +00:00
hanna
9384b2ff65
A few quick fixes to temporarily make the LowMemorySharder return exactly the
...
same shards as the previous sharder, so that I can directly compare filespans
to see where some performance bugs lie.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5395 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 22:43:14 +00:00
depristo
0b4e51317b
Now includes project consensus high sensitivity data set
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5394 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-07 20:52:11 +00:00