hanna
78343be52c
At some time in the recent past, we lost our ability to process the '-L all'
...
argument. Brought it back, and added an integrationtest to make sure it
stays around.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4390 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 15:58:43 +00:00
delangel
e80742e72f
Use -o as argument for output file in ProduceBeagleInputWalker, to be consistent with other walkers (you're welcome, chartl :)).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4386 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 22:46:39 +00:00
hanna
732aa32758
Every Sting app from now on will be forced into the US English locale.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4385 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 21:55:21 +00:00
fromer
20ffe484bc
Added detection and INFO field marking of phasing inconsistencies (and optional filtration using --filterInconsistentSites)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4384 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 19:28:56 +00:00
rpoplin
a6c7de95c8
By using the AC info field instead of parsing the genotypes we cut 78% off the runtime of VariantRecalibrator. There is a new argument to force the parsing of genotypes if necessary. Various other optimizations throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4383 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 18:56:50 +00:00
ebanks
2d1265771f
Fix for G: make sure to generate the genotype conformations in the grid for the target frequency when not using grid search for anything except the conformations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4382 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 16:44:53 +00:00
delangel
4556e3b273
First iteration in filling up exact AF calculation with new refactored UG. Code computes EM iterations of exact AF spectrum and returns to caller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4381 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 16:21:54 +00:00
ebanks
0d71dff928
Small bug fix to the new UG (need to initialize the entire posteriors array) means that we also get identical results as old UG when calling with 60 samples in the pilot1 data. Now that I'm happier with UGv2, I've transitioned it to use the correct AF priors instead of the busted ones still in the old UG.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4379 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 14:24:50 +00:00
hanna
eee134baf2
Chris found a bug in the downsampler where, if the number of reads entering
...
the pileup at the next alignment start is large, we don't add as many of those
incoming reads as we should. No integration tests were affected.
Thanks, Chris!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4378 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 11:18:12 +00:00
ebanks
0ec07ad99a
Initial version of refactored Unified Genotyper. Using SNP genotype likelihoods and GRID_SEARCH AF estimation models, achieves the exact same results as original UG on 1-2 samples with the exception of strand bias (not implemented yet); other than that I have no idea. Needs tons more testing. Do not use. For Guillermo only.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4377 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 08:42:25 +00:00
kshakir
6df7f9318f
For enums generate the full path to the Enum type to avoid collisions such as enum Model and enum Model used in the same class.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4376 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 05:28:59 +00:00
fromer
e322e71c2f
Restored SVN history for phasing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4373 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 00:02:02 +00:00
fromer
720aaca8a0
Trying to restore SVN history for phasing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4372 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:50:28 +00:00
fromer
bf88117ead
Trying to restore SVN history for phasing directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4371 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:48:24 +00:00
fromer
dfb5143a41
Restore folder
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4370 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:46:07 +00:00
fromer
7c909bef82
Moved phasing classes out of playground! The code is still under production, though...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4369 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:21:28 +00:00
fromer
8d8980e8eb
Fixed phasing algorithm to: 1. More correctly weed out irrelevant reads and sites; 2. Crudely flag sites with large phase discrepancies betweens reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4368 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:02:53 +00:00
chartl
5a5c72c80d
Accidentally commited some debug output to PackageUtils, reverting change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4367 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 21:58:42 +00:00
chartl
862c94c8ce
Small change for Matt -- output partition types in lexicographic order.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4365 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 20:08:03 +00:00
ebanks
7ad87d328d
Make sure to uppercase ref bases since they aren't coming from the engine
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4364 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 19:05:46 +00:00
bthomas
96cccafb0d
Adding a few helper methods for accessing sample metadata, and associated unit tests. These are motivated by discussion with Ryan about how he'll use sample metadata in VariantEvalwalker - hopefully will make it easier for him. Methods are:
...
-- getToolkit().subContextFromSampleProperty(): filters a VariantContext to genotypes that come from samples that have a given property value
-- getToolkit().getSamplesWithProperty(): gets all samples with a given property
-- getToolkit().getSamplesFromVariantContext(): sample objects that are referenced by name in a VariantContext
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4361 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 02:16:25 +00:00
ebanks
1034853a84
Adding 'solexa' to list of known/supported platforms
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4357 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-27 02:38:38 +00:00
aaron
70f03a7113
first pass of well-formatted tribble exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4352 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-25 03:29:33 +00:00
kshakir
edaa278edd
Removed cases where various toolkit functions were accessing GenomeAnalysisEngine.instance.
...
This will allow other programs like Queue to reuse the functionality.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4351 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-25 02:49:30 +00:00
hanna
497bcbcbb7
Recent changes to the build system make the build system complain loudly about
...
pieces of core that depend on playground. Most of these have been eliminated by
(temporarily) promoting Aaron's report system to core in this checkin. I'll
follow up with other changes in separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4350 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 22:09:12 +00:00
hanna
6ebca5d219
Enhancements to build external projects for walker sharing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4348 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 21:17:16 +00:00
corin
eb1fa4bff3
changes an argument to an output so I can use it to track dependencies in queue
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4347 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 21:07:09 +00:00
depristo
745b8cc6d3
GATK now detects and UserExceptions when human lexicographically sorted data is provided
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4343 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 15:19:48 +00:00
rpoplin
1931b2e1bd
Three fixes for VariantFiltrationWalker: Trying to filter an empty VCF file will produce a well-formed VCF file with zero records instead of a blank file, needed for pipelines. The first record's genotype info fields are now in the same order as all the others. The VCF header lines are pulled from just the input variant rod instead of from all rods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4341 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 13:52:56 +00:00
kshakir
4ed9f437e9
Sliced the GAE in half like a gordian knot to avoid the constant merge conflicts.
...
The GAE half has all the walker specific code. The new "Abstract" GAE has the rest of the logic.
More refactoring to come, with the end goal of having a tool that other java analysis programs (Queue, etc.) can use to read in genomic data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4339 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-23 23:28:55 +00:00
rpoplin
0c9fabb06f
Fix in AnalyzeAnnotations, somebody changed it look for ID in the vc's info field. This dinosaur desperately needs integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4338 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-23 19:48:44 +00:00
hanna
0c781968fb
Tried to do a bit of pre-commit refactoring and screwed it up. Fixed.
...
Thanks to Ryan for identifying the problem.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4336 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-23 18:17:29 +00:00
depristo
d081b9b352
Improvements to error messages about @Requires and @Allows
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4334 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-23 12:08:27 +00:00
hanna
7841b301c4
Added more diagnostics so that I have some idea of what a 'general' exception
...
is. Required to fix bug ZjhCJAdwhtFq1x54ZlmlN8pFNcbrRpdJ and similar. We
might want to change this particular case to a ReviewedStingException after
we gain a bit more experience with it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4333 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 21:32:01 +00:00
fromer
44ccfc3531
Updated Phasing algorithm + evaluation module to properly implement haplotypes [including homozygous genotypes]; Implemented dynamic window phasing model for LARGE increase in efficiency
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4332 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 21:29:58 +00:00
hanna
8f75d88519
Fix for GATK run report ids:
...
mOVsxGfDiiSMxVs2PPTVjzYTVbizlD6e
f9kUHUADFsZ0LiTGxRL5zPmq9kZcA4cQ
8eGHWJFAlBVmgxwPi3sMd1RmiN2PwHOf
iLhvHWveypKb2F8vKS5irHylc3pYvlOb
HDttXKUMEVoPrvVeWrH7E0htxYyNydMx
plus a bit of cleanup of custom exceptions in the sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4330 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 19:49:25 +00:00
kshakir
20b38b38f3
Updated from SnakeYAML 1.6 to 1.7.
...
Added a pipeline java bean and YAML utility to serialize java beans.
Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format.
Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference.
More changes to come as this code gets tested out in the fullCallingPipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 19:47:49 +00:00
hanna
fb5d595ef0
Disable VCF header output in the Beagle integrationtest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4327 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 16:50:03 +00:00
hanna
0c99c97685
The engine now automatically adds the command-line arguments to the header of every VCF, unless -NO_HEADER is specified.
...
Changed integration tests, adding the -NO_HEADER argument, for walkers that previously did not include the command-line
arg headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4326 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 15:27:58 +00:00
aaron
1af9ca6d45
enabling tests that now pass with the conitg length validation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4325 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 22:20:50 +00:00
depristo
522830fb01
Support for --assume-single-sample in UG, better malformated bam exceptions, and ignoring out of order contigs in seqdictutils. All for the CG bam file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4323 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 20:33:34 +00:00
aaron
3938d53738
one broken build short of the hat trick. Fixing the unix test which expects the sequence dictionary of the Tribble track to equal the reference; we actually return the sequence dictionary of the track iself, with each contig set to the length of the sequence dictionary contig entry.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4322 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 18:47:20 +00:00
aaron
b968af5db5
The tribble indexes are now updated with correct sequence lengths for each contig they have in their sequence dictionary. Also clean-up in the RMD track builder.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4321 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 18:21:22 +00:00
aaron
2586f0a1ca
fix for the build I broke - the original file got corrupted, which I replaced with a version that didn't have the header stripped off. Other integration tests passed, but this test relied on the header being stripped off.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4320 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 15:35:25 +00:00
rpoplin
547763b230
Better error message for Petr's null pointer exception. Also added an exception integration test because I'm certain this used to work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4319 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 13:44:40 +00:00
depristo
8719dde59d
Now prints out PASS when a variant is unfiltered
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4318 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 13:16:41 +00:00
delangel
205fc0b636
Cleanup: Use Tribble's version of createVariantContextWithPaddedAlleles (no real functional difference) to avoid duplicated code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4315 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 19:53:30 +00:00
delangel
a10cfe213b
Small bug fix in simple indel genotyper: Likelihood of case where best haplotype pair was (REF,REF) was not computed correctly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4314 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 17:04:39 +00:00
ebanks
f5a30d0248
I just spoke to Andrey & Kiran (the original authors of these tools), and they voted to kill these in favor of Picard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4313 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 13:27:35 +00:00
delangel
f64b6fddc1
Major changes/improvements to indel genotyper:
...
a) Redid way to compute path metrics in indel error model. Paper formulation where we have an anchor point in the alignemt between read and haplotype won't work in practice except in nice data sets that are perfectly indel-realigned and that are well mapped by aligner. New formulation doesn't assume this, and it's actually simpler and uses less code. It now resembles more a classic SW dynamic programming formulation but it still preserves the HMM probabilistic formulation.
b) Added a programmable call threshold, set by command line.
c) Use now sample name from BAM file, remove -sampleName argument.
d) Simplify loop to compute read-haplotype likelihoods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4311 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-19 23:47:31 +00:00