ebanks
2d1265771f
Fix for G: make sure to generate the genotype conformations in the grid for the target frequency when not using grid search for anything except the conformations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4382 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 16:44:53 +00:00
delangel
4556e3b273
First iteration in filling up exact AF calculation with new refactored UG. Code computes EM iterations of exact AF spectrum and returns to caller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4381 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 16:21:54 +00:00
ebanks
0d71dff928
Small bug fix to the new UG (need to initialize the entire posteriors array) means that we also get identical results as old UG when calling with 60 samples in the pilot1 data. Now that I'm happier with UGv2, I've transitioned it to use the correct AF priors instead of the busted ones still in the old UG.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4379 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 14:24:50 +00:00
ebanks
0ec07ad99a
Initial version of refactored Unified Genotyper. Using SNP genotype likelihoods and GRID_SEARCH AF estimation models, achieves the exact same results as original UG on 1-2 samples with the exception of strand bias (not implemented yet); other than that I have no idea. Needs tons more testing. Do not use. For Guillermo only.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4377 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 08:42:25 +00:00
fromer
720aaca8a0
Trying to restore SVN history for phasing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4372 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:50:28 +00:00
fromer
dfb5143a41
Restore folder
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4370 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:46:07 +00:00
fromer
7c909bef82
Moved phasing classes out of playground! The code is still under production, though...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4369 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:21:28 +00:00
fromer
8d8980e8eb
Fixed phasing algorithm to: 1. More correctly weed out irrelevant reads and sites; 2. Crudely flag sites with large phase discrepancies betweens reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4368 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:02:53 +00:00
kshakir
edaa278edd
Removed cases where various toolkit functions were accessing GenomeAnalysisEngine.instance.
...
This will allow other programs like Queue to reuse the functionality.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4351 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-25 02:49:30 +00:00
hanna
497bcbcbb7
Recent changes to the build system make the build system complain loudly about
...
pieces of core that depend on playground. Most of these have been eliminated by
(temporarily) promoting Aaron's report system to core in this checkin. I'll
follow up with other changes in separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4350 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 22:09:12 +00:00
fromer
44ccfc3531
Updated Phasing algorithm + evaluation module to properly implement haplotypes [including homozygous genotypes]; Implemented dynamic window phasing model for LARGE increase in efficiency
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4332 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 21:29:58 +00:00
ebanks
f5a30d0248
I just spoke to Andrey & Kiran (the original authors of these tools), and they voted to kill these in favor of Picard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4313 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 13:27:35 +00:00
bthomas
bc12055fcf
Quick patch to fix the sample code. It wasn't actually initializing the sample data source, so I added a call to initializeSampleDataSource() in GenomeAnalysisEngine. I think there was just an error resolving the versions of GenomeAnalysisEngine
...
Also added a new error message that I thought would be helpful...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4301 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 14:05:26 +00:00
aaron
782e0018e4
removal of most of the old GATK ROD system; also a fix for -Dsingle so we can again run just a single unit or integration test (single tests in tribble can be run with the -DsingleTest option now). More to come.
...
*** Three integration tests had to change: ***
RecalibarationWalkersIntegrationTest:
One of the tests was using the interval as the snp track, and wasn't supplying a DbSNP track (for CountCovariates)
SequenomValidationConverterIntegrationTest:
relies on Plink ROD which we've removed.
PileupWalkerIntegrationTest:
we no longer have implicit interval tracks, so there isn't a rod name over the specified region. Otherwise the same result.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4292 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 22:54:49 +00:00
bthomas
e5f81d25d4
Adding the --sample-metadata (-SM) command line argument and associated functionality. This is something Matt and I have been working on for a while. Basically, it allows you to integrate sample metadata into an analysis, by including a sample file. More detailed documentation is on the wiki: http://www.broadinstitute.org/gsa/wiki/index.php/Adding_Sample_data_to_an_analysis
...
This commit adds two important classes: Sample, which contains data about one sample; and SampleDataSource, which manages sample data a la ReferenceDataSource and ReadsDataSource.
This code should be stable, but it has not been integrated with existing walkers yet. That's the next commit.
In the meantime, feel free to experiment with the code - there are two basic example walkers in the playground.sample package. And PLEASE let me know if you see any errors/inconsistencies.
Note that this also adds a new dependency on SnakeYaml, a YAML parser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4285 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 11:50:22 +00:00
fromer
248cc308b2
ReadBackedPhasing silently ignores sites with ploidy != 2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4272 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 21:14:17 +00:00
fromer
528f6344af
Moved ReadBackedPhasingWalker to phasing sub-directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4271 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 19:36:41 +00:00
depristo
7880863eb7
Final step in error refactoring. GATK exception is now ReviewedStingException, indicating that this exception is really what one wants. Only use this exception when you have thought about StingException vs. UserException and made a real decision.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4267 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:07:38 +00:00
depristo
7ad8fbdd5a
Moved GATKException to exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4266 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:47:19 +00:00
depristo
595907e98e
Moving StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4262 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:34:15 +00:00
depristo
40e6179911
Penultimate step in exception system overhaul. UserError is now UserException. This class should be used for all communication with the USER for problems with their inputs. Engine now validates sequence dictionaries for compatibility, detecting not only lack of overlap but now inconsistent headers (b36 ref with v37 BAM, for example) as well as ref / bam order inconsistency. New -U option to allow users to tolerate dangerous seq dict issues. WalkerTest system now supports testing for exceptions (see email and wiki for docs). Tests for vcf and bam vs. ref incompatibility. Waiting on Tribble seq dict improvements to detect b36 VCF with b37 ref (currently cannot tell this is wrong.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4258 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:02:43 +00:00
depristo
8f1a32acae
All exceptions thrown by the GATK have been reviewed and UserErrors replaced where appropriate. Shazam. Another check-in will remove the GATKException and restore the StingException.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4252 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 15:25:30 +00:00
depristo
ca9c7389ee
Not useful
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4238 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 02:33:03 +00:00
depristo
8708753a6a
checkin for removal
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4237 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 02:32:46 +00:00
fromer
1b1ec7e52d
Changed default phasing window size to 10
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4235 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 21:28:36 +00:00
depristo
7eeabe534a
QSample walker for 1KG -- measures aggregate quality of sequencing. Includes misc. improvements throughtout the code, including using the new Tribble GenotypeLikelihoods class for working with VCF GLs from the UG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4211 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 18:21:43 +00:00
fromer
529eecd4dc
Added phasing sub-directory to keep walkers directory clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4208 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:38:46 +00:00
fromer
c0ce9ca8cc
Added phasing sub-directory to keep walkers directory clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4207 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:32:30 +00:00
fromer
c119f64514
Added phasing sub-directory to keep walkers directory clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4205 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:24:18 +00:00
fromer
a1cf3398a5
Added basic version of phasing evaluation: GenotypePhasingEvaluator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4196 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 22:09:50 +00:00
fromer
50f7f18cbd
Changed ReadBackedPhasing default PQ threshold to 10
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4166 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:26:15 +00:00
kiran
16b75e3b9a
A new version of the ErrorRateByReadPosition walker, using the GATKReport functionality to store and emit its output. This version of the walker is roughly half the number of lines as the previous version, owing simply to the removal of all of the output formatting that's now handled by GATKReport.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4160 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:41:13 +00:00
kiran
fd19c63aaf
A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module).
...
This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R. In the end, you get a table that looks like this:
##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads
cycle errorrate.61PA8.7 qualavg.61PA8.7
0 0.007451835696110506 25.474613284804366
1 0.002362777171937477 29.844949954504095
2 9.087604507451836E-4 32.87590975254731
3 5.452562704471102E-4 34.498999090081895
4 9.087604507451836E-4 35.14831665150137
5 5.452562704471102E-4 36.07223435225619
6 5.452562704471102E-4 36.1217248908297
7 5.452562704471102E-4 36.1910480349345
8 5.452562704471102E-4 36.00345705967977
...
A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession. Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone. This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect.
The display property of individual columns can be turned off. This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file.
Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations. For instance, two whole columns can be divided, the results of the operation being stored in a third column. This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:39:24 +00:00
hanna
de5ccfb0b1
Moved hasPileupBeenDownsampled() based on Eric's request. Also eliminated
...
@Deprecated constructors from AlignmentContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4142 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:12:05 +00:00
ebanks
bfcac33e80
Cleaning up playground utils and tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:25:47 +00:00
ebanks
4979dcc9a7
Finishing up the playground cleanup (for now)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4135 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:19:37 +00:00
ebanks
0452b1ab68
archiving, removing, or promoting to core from playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4134 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:07:42 +00:00
ebanks
dfae48cee0
Moving supported tools to core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4127 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 13:56:19 +00:00
ebanks
e06b2c90ef
Cap the default size of join tables; this can be modified with the --maxJoinTableSize argument. Also, misc cleanup of the comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4125 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 05:21:26 +00:00
ebanks
79cd716671
More cleanup of the Genomic Annotator. Also, we now require join tables to have unique entries for the column keyed on the join.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4124 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 04:43:52 +00:00
fromer
39da567d48
Changed ReadBackedPhasing to be a RodWalker (corrected to By(READS))
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4120 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:53:04 +00:00
ebanks
4678613893
Significant fixes for the Genomic Annotator.
...
1. Rip out all of Ben's code intended to circumvent the stable VCF Writer output system in multi-threaded mode (I threw up a little when
I saw this code). This will improve memory consumption when running with -nt.
2. Don't annotate indels or > bi-allelic sites.
3. Fix bug where not all records were making it into the output VCF.
4. General code clean up.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4118 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:16:50 +00:00
fromer
41e53d37e1
Changed ReadBackedPhasing to be a RodWalker (more efficient, since it is ROD-focused)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4117 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 19:43:57 +00:00
fromer
aa8cf25d08
Implemented fully symmetric sliding window read-backed phaser
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4104 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 21:12:32 +00:00
ebanks
90aef66ec5
Minor fixes for my last commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:25:29 +00:00
ebanks
ef795825fd
Yet more argument consistency updates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4089 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 20:52:30 +00:00
ebanks
ccda4f6ec1
More output consistency changes (updating wiki docs as I go along).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:46:08 +00:00
ebanks
55a8306a0d
Update the @RMD tags to look for VariantContext.class instead of ReferenceOrderedDatum.class. Since the test for rod type is broken this won't affect anything right now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4084 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:49:37 +00:00
aaron
35b9883dd6
vcfwriter is in tribble now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4083 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:01:04 +00:00
kiran
295472bf69
Simple change to handle a no-call (must avoid asking for the second allele, which will be be null in this case). Also, added a hack to deal with input VCFs where there are no genotype likelihoods (needed in order to process Hapmap and 1KG VCFs). In this mode, called genotypes are assigned a likelihood of 0.96, and alternative genotypes are given 0.02 each. I know Beagle actually takes genotype data without likelihoods, so this might not be the right way to do this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4081 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:13:09 +00:00