delangel
8a7f5aba4b
First more or less sort of functional framework for statistical Indel error caller. Current implementation computes Pr(read|haplotype) based on Dindel's error model. A simple walker that takes an existing vcf, generates haplotypes around calls and computes genotype likelihoods is used to test this as first example. No attempt yet to use prior information on indel AF, nor to use multi-sample caller abilities.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4197 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 00:25:34 +00:00
fromer
a1cf3398a5
Added basic version of phasing evaluation: GenotypePhasingEvaluator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4196 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 22:09:50 +00:00
kshakir
fd5970fdd4
At chartl's superb suggestion, command line files are now all Files instead of old method of sometimes "has a File". Should be easier when reassigning them.
...
No longer generating deprecated GATK arguments on the Queue extensions.
Emitting deprecation warnings to Queue compile to help debugging issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:30:48 +00:00
rpoplin
0bb05fb472
Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4194 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:12:09 +00:00
chartl
3a4844ebde
Additional partition types into DepthOfCoverage:
...
- Sequencing Center
- Platform
- Sample by Center
- Sample by Platform
- Sample by Platform by Center <---- needed for analysis I'm doing
The fact that the latter three needed their own partition types, rather than being dictatable from the command line, combined with the new hierarchical traversal types, and new output formatting engine, suggest that DepthOfCoverageV3 is about ready to be retired in favor of a newer, sleeker version.
For now, this will do.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4193 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 19:30:03 +00:00
chartl
590bb50d16
Test for missing read group
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4192 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 14:22:13 +00:00
kiran
acd6bd2430
Experimental tool to annotates indels that are provided in a VCF file based on RefGene. Specifies gene, transcript, strand, type (Non-frameshift, frameshift, 5'-UTR, 3'-UTR, SpliceSiteDisruption, Intron, or Unknown).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4191 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 23:30:28 +00:00
hanna
dc5f858d29
Replaced placeholder support for splitting by read group with read support (sorry everyone), and added relatively comprehensive unit tests to ensure that splitting by read group works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4190 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:24:50 +00:00
rpoplin
b28f63a948
Base recalibrator now uses -o and deprecates -outputBam
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4189 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:13:50 +00:00
kshakir
33400074fa
Updated tribble BED parsing code to use the official UCSC spec, and updated tests to match expected results.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4188 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 21:49:06 +00:00
depristo
995cfe34fe
You can have an error so early that some engine fields are uninitialized. Commit protects RunReport from these errors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4185 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 19:00:25 +00:00
rpoplin
a975db2c2e
Bug fix for the case of reads with no read bases!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4184 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 16:58:54 +00:00
rpoplin
469bbaa240
Added more integration tests for the variant quality score recalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4181 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:31:24 +00:00
depristo
8c4009ee18
Oops, don't enable reporting in integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4179 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 22:56:18 +00:00
rpoplin
5b94c926c8
More precise language.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4178 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:44:22 +00:00
rpoplin
96040726ac
Better exception text for the common error of providing only dbsnp but giving dbsnp sites zero clustering weight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4177 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:36:43 +00:00
depristo
32c6b48106
Proper memory metrics in the file. Please use -et if at all possible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4175 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:30:09 +00:00
chartl
63c7cbd89b
Forgot to commit this long ago, change so the tables are correctly propagated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4174 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 19:06:52 +00:00
aaron
db4ff7317f
allowing empty RMD files (we need to not validate their sequence dictionaries against the reference in this case)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4173 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 17:45:33 +00:00
ebanks
3d6c4fc55f
Removing the obsolete --hapmap and --hapmap_chip options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4172 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:57:05 +00:00
depristo
b33873206a
GATKRunReport now has an ID (random 32 char string) that uniquely identifies the JOB run and can be used to find a run in the run repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4171 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:18:57 +00:00
ebanks
3c956110f3
Fixing up the VCFWriter storage code: instead of assuming all samples are coming from the input bam file (they're not), just use the original VCF header for writing the temporary thread files. Now parallelization in e.g. the Genomic Annotator works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4168 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 02:16:07 +00:00
aaron
69d92fab4f
adding the ability to get iterators from Tribble without having an index, and updating the Tabix code to the latest Samtools SVN version (this still doesn't fix the outstanding tabix bugs, waiting for Heng on that).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4167 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:49:23 +00:00
fromer
50f7f18cbd
Changed ReadBackedPhasing default PQ threshold to 10
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4166 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:26:15 +00:00
chartl
e64d1be475
Check if VC is null before trying to subset it (can happen with indels)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4165 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 20:43:37 +00:00
depristo
1ddb5d17c9
hostname now fully qualified and working
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4163 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 17:04:37 +00:00
depristo
4c28fc3a39
Clear documentation for GATKRunReport
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4161 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 15:59:25 +00:00
kiran
16b75e3b9a
A new version of the ErrorRateByReadPosition walker, using the GATKReport functionality to store and emit its output. This version of the walker is roughly half the number of lines as the previous version, owing simply to the removal of all of the output formatting that's now handled by GATKReport.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4160 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:41:13 +00:00
kiran
fd19c63aaf
A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module).
...
This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R. In the end, you get a table that looks like this:
##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads
cycle errorrate.61PA8.7 qualavg.61PA8.7
0 0.007451835696110506 25.474613284804366
1 0.002362777171937477 29.844949954504095
2 9.087604507451836E-4 32.87590975254731
3 5.452562704471102E-4 34.498999090081895
4 9.087604507451836E-4 35.14831665150137
5 5.452562704471102E-4 36.07223435225619
6 5.452562704471102E-4 36.1217248908297
7 5.452562704471102E-4 36.1910480349345
8 5.452562704471102E-4 36.00345705967977
...
A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession. Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone. This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect.
The display property of individual columns can be turned off. This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file.
Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations. For instance, two whole columns can be divided, the results of the operation being stored in a third column. This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:39:24 +00:00
ebanks
df76474b34
Proper filtering when indels are being lifted over
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4158 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 04:48:31 +00:00
depristo
3fd2392090
Improved interface to getting command line options. Now fully traverses all objects to get all internal argument collections. Preliminary (but disabled version) of phoning home (see -et argument for more information). Captures correct and erroring out runs and writes out gzipped, xml report with lots of useful information. Needs a bit more information but is approximately working. Reports going to /humgen/gsa-hpprojects/GATK/reports/ in submitted directory that will be collated by some external tool. Only operating if -et STANDARD or -et STDOUT are provided currently and REPORT_DIR contains a file called ENABLE. WalkerTest now adds -et NO_ET to tests to avoid populating the reports with tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4155 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-28 22:53:32 +00:00
rpoplin
9c3f403307
Add the calculated lod value to the info field of each recalibrated VCF record.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4153 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 21:33:58 +00:00
delangel
fe19539188
Small bug fix: if a read falls at the edge of an indel event (but is not part of it), don't count it towards consistency computation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4152 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 20:37:27 +00:00
rpoplin
54355b1864
In variant quality score recalibrator Preserve the definition of known and novel to be presence in dbSNP or not even when training with 1KG project calls.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4151 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 19:07:59 +00:00
ebanks
7a5f297083
actually modify the vcf when a sample has been down-sampled
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4150 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 19:03:21 +00:00
ebanks
9860db64a3
Fix up liftover to enable lifting over indels
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4148 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 17:55:27 +00:00
hanna
fb177c4fee
If only dcov is specified, assume that selected downsample type is BY_SAMPLE.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4147 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 17:35:41 +00:00
ebanks
9584cbc05e
UG now downsamples to 250x by default
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4146 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:53:15 +00:00
ebanks
431392330e
Re-enable the max records in ram argument, which I accidentally removed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4145 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:42:49 +00:00
hanna
de5ccfb0b1
Moved hasPileupBeenDownsampled() based on Eric's request. Also eliminated
...
@Deprecated constructors from AlignmentContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4142 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:12:05 +00:00
ebanks
427a2f85e9
The Indel Realigner now lets the engine do all of the setup for args affecting the SAM writer. Thanks, Matt!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4141 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:19:47 +00:00
asivache
a3d9d23b0f
Now prints het genotype with GQ=0 for each indel; in two-sample (normal-tumor) mode, prints both genotypes (N and T) as hets for germline events or hom ref for N and het for T for somatic events (all genotypes still have GQ=0)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4140 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:06:42 +00:00
ebanks
dda84a0e54
Re-enabling indels for the Genomic Annotator as per Steve's patch. Steve assures me that he will test this out really well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4139 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:01:25 +00:00
hanna
6f4af47aac
setMaxRecordsInRam now a member of StingSAMFileWriter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4138 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 14:50:41 +00:00
ebanks
bfcac33e80
Cleaning up playground utils and tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:25:47 +00:00
ebanks
4979dcc9a7
Finishing up the playground cleanup (for now)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4135 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:19:37 +00:00
ebanks
0452b1ab68
archiving, removing, or promoting to core from playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4134 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:07:42 +00:00
hanna
d773b3264b
Eliminated -mrl option.
...
Eliminated -fmq0 option.
Eliminated read group hallucination.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4133 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 21:38:03 +00:00
depristo
f384d4a5d6
A java reimplementation of vcf2table in python; supports getting more useful information about genotypes (HET, e.g.) than was possible in python.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4130 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 17:50:33 +00:00
asivache
1e193e4c20
prinring '\n' at the end of line leads to some aesthetical advantages
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4129 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 16:29:42 +00:00
asivache
9b3ffa5f64
Now outputs VCF (as standard output associated with -o)! Can also outptut, in parallel, a lightweight bed and fully annotated .txt (old verbose format) with --bed and --verbose, respectively
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4128 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 16:26:03 +00:00
ebanks
dfae48cee0
Moving supported tools to core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4127 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 13:56:19 +00:00
ebanks
45d895dcf4
Remove the check in the Unified Genotyper for hitting the max reads at locus value. Instead, simply add a flag to the INFO field if any of the samples has been downsampled. 95% hooked up.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4126 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 05:50:47 +00:00
ebanks
e06b2c90ef
Cap the default size of join tables; this can be modified with the --maxJoinTableSize argument. Also, misc cleanup of the comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4125 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 05:21:26 +00:00
ebanks
79cd716671
More cleanup of the Genomic Annotator. Also, we now require join tables to have unique entries for the column keyed on the join.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4124 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 04:43:52 +00:00
ebanks
dd7f136298
Office-mate courtesy: fixing Andrey's busted integration test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4123 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 02:00:06 +00:00
kshakir
0105e8d063
Updated Queue GATK generation to reflect -B and -I changes.
...
To add support for "-I:tumor tumor.bam", the GATK argument
import_file (-I) is now generated as a List of NamedFile objects.
Could not get sugar working 100%. To activate sugar import the
gatk package. This effectively adds a new method to java.io.File
called toNamedFile. When adding a file to the list call
countReads.import_file :+= myJavaFile.toNamedFile
See scala/qscript/examples for actual examples.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4122 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 22:17:36 +00:00
hanna
bdb3a7ebe6
The tagger was automatically combining identical tags, but this is a problem
...
for the ROD system. Eliminate tag combine operation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4121 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 22:01:32 +00:00
fromer
39da567d48
Changed ReadBackedPhasing to be a RodWalker (corrected to By(READS))
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4120 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:53:04 +00:00
ebanks
4678613893
Significant fixes for the Genomic Annotator.
...
1. Rip out all of Ben's code intended to circumvent the stable VCF Writer output system in multi-threaded mode (I threw up a little when
I saw this code). This will improve memory consumption when running with -nt.
2. Don't annotate indels or > bi-allelic sites.
3. Fix bug where not all records were making it into the output VCF.
4. General code clean up.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4118 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:16:50 +00:00
fromer
41e53d37e1
Changed ReadBackedPhasing to be a RodWalker (more efficient, since it is ROD-focused)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4117 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 19:43:57 +00:00
rpoplin
ac58eb3cbb
Slightly better error message for the common error of only providing a dbsnp track but giving it zero clustering weight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4114 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:41:21 +00:00
rpoplin
5623e01602
GenerateVariantClusters and VariantRecalibrator now uses hapmap and 1kg ROD bindings (in addition to dbsnp) to distinguish between knowns and novels. It no longer looks at by-hapmap validation status so providing hapmap is highly recommended. Example on the wiki. Input variants tracks now must start with input.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4113 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:33:40 +00:00
hanna
bf0b6bd486
Update integration tests to use the new ROD syntax.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4112 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:13:30 +00:00
asivache
14198b74d5
Can now compute av. qualities and stddevs per cycle for both original (when present in bam) and recalibrated quals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4111 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 17:14:58 +00:00
asivache
23dbaa68e6
Can design assays when multiple (distinct) events occur at the same locus (one assay per event)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4110 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 16:52:47 +00:00
ebanks
b4baa3eb8f
Cleanup. INDELS model is now disconnected (and renamed 'DINDEL' in preparation for adding plumbing for Guillermo soon)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4106 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 14:52:51 +00:00
hanna
3dc78855fd
Command-line argument tagging is in, and the ROD system is hacked slightly to support the new syntax
...
(-B:name,type file) as well as the old syntax. Also, a bonus feature: BAMs can now be tagged at the
command-line, which should allow us to get rid of some of the hackier calls in GenomeAnalysisEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4105 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 03:47:57 +00:00
fromer
aa8cf25d08
Implemented fully symmetric sliding window read-backed phaser
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4104 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 21:12:32 +00:00
ebanks
cba5f05538
Small fixes for consistency in the numbers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4103 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:48:25 +00:00
rpoplin
7bbd67f3c4
Fixing stray comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4102 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:19:39 +00:00
rpoplin
85007ffa87
Some clean up for the variant recalibrator. Now uses @Input and @Output so that it can join the Queue party. Users now specify a -o, -clusterFile, -tranchesFile, and -reportDatFile. Example on the wiki. ApplyVariantCuts now has an integration test. Base quality recalibrator now requires a dbsnp rod or vcf file. Now that the base quality recalibrator is using @Output the PrintStream shouldn't be closed in OnTraversalDone.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4101 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:14:58 +00:00
delangel
f2b138d975
Small refactoring: make Haplotype a public class since it will be soon extended and shared with other callers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4100 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 17:52:36 +00:00
ebanks
43f1fb2380
Okay, finally done with VCF compression. Now:
...
1. Uses blocked gzip compression.
2. No more -bzip option available (since we can't compress to sdout).
3. Only file extensions that are compressed are .gz and .gzip.
4. No more need for CompressedVCFWriter.java
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4099 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 16:36:54 +00:00
ebanks
25fb53e7a2
Oops, forgot to call toLowerCase().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4097 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 14:43:24 +00:00
ebanks
7957b60768
We now automatically compress the output VCF if the file suffix is one of the supported types (.gz, .bz, .bz2). You can still specify -bzip if you want to use another file suffix (or pipe it to sdout for some reason).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4096 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 14:39:59 +00:00
rpoplin
7a8b6b87da
Committing Michael Yourshaw's patch for AnalyzeCovariates. We spawn each RScript process and wait for it to finish in series. Thanks Michael!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4095 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 13:06:25 +00:00
ebanks
9fb151f417
Minor update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4094 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 05:17:10 +00:00
ebanks
44f3c5639a
I have finally figured out that when you volunteer to do something in group meeting, you keep getting pestered about it on Mark's Omniplan doc until it gets done (except for contig aliasing, of course). As such...
...
We can now emit bzipped VCFs from the GATK.
Details: any walker that defines a VCFWriter for its @Output (i.e. pretty much every core walker from UG and on), also has associated with it the -bzip (--bzip_compression) boolean argument. When set, it will emit a VCF that is compressed with bzip2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4093 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 04:14:50 +00:00
hanna
691333f75c
Force isRequired() to be false for @Deprecated args.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4092 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:50:30 +00:00
hanna
5d6a6420a9
New behavior for filling it output streams: if required==true for a field and the field
...
is an output stream, we'll automatically create it and point it to stdout. Otherwise,
we'll leave it empty.
I think about it like this: marking a field 'required' indicates to the GATK that the
walker author requires a value for this field, and if the GATK can provide one without
end user intervention, it will. Maybe this is hackish. We'll try it and see.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4091 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:39:13 +00:00
ebanks
90aef66ec5
Minor fixes for my last commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:25:29 +00:00
ebanks
ef795825fd
Yet more argument consistency updates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4089 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 20:52:30 +00:00
aaron
7474afa7a3
allow other objects access to the static method that resolves bam lists, and some renaming and improved documentation for the function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4087 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:52:00 +00:00
ebanks
ccda4f6ec1
More output consistency changes (updating wiki docs as I go along).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:46:08 +00:00
ebanks
c9c6ff49c2
Deprecated 'O' in favor of 'o' in the cleaner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4085 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:09:24 +00:00
ebanks
55a8306a0d
Update the @RMD tags to look for VariantContext.class instead of ReferenceOrderedDatum.class. Since the test for rod type is broken this won't affect anything right now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4084 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:49:37 +00:00
aaron
35b9883dd6
vcfwriter is in tribble now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4083 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:01:04 +00:00
aaron
2d3b6d89dc
adding the ability in Tribble to create indexes from a stream of features, so that we can create multiple indexes from one pass of the file. In the GATK we now create multiple indexes, and choose the
...
most appropriate based on feature density, and the longest feature in the file. Also:
- Converted Tribble to TestNG; it has better features and is about 6x faster.
- As much code clean-up as I could get done. More to do, especially in the example code.
- Moved asserts in the code to throw exceptions.
- Added getBinSize to the index interface; both indexes already implemented this.
- Removed the abstract parts of the indexCreator interface; this is now more simple.
- Added an IndexType enumeration; might be overkill but it is at least a single point of entry for index information.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4082 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 06:54:59 +00:00
kiran
295472bf69
Simple change to handle a no-call (must avoid asking for the second allele, which will be be null in this case). Also, added a hack to deal with input VCFs where there are no genotype likelihoods (needed in order to process Hapmap and 1KG VCFs). In this mode, called genotypes are assigned a likelihood of 0.96, and alternative genotypes are given 0.02 each. I know Beagle actually takes genotype data without likelihoods, so this might not be the right way to do this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4081 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:13:09 +00:00
kiran
dec713a184
Simple test code from Steve Schaffner to compute R^2 and D'. This is just for educational purposes. Don't use this code for anything, ever!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4080 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:06:16 +00:00
hanna
8252494fa9
Forgot to update UG performance test to reflect the new -o argument.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4079 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 00:57:16 +00:00
hanna
c177801d81
Add deprecated command-line arguments, and switched over UG to output to
...
-o/--out instead of -varout. Let's watch as our intrepid support engineer
gracefully responds to all the incoming questions of the form: "the GATK told
me to use -o instead of -varout. What do I do?"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4078 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 21:01:44 +00:00
hanna
b80cf7d1d9
Modifications to the output system for better interaction with @Output. Multiplexed arguments. More details in the Monday meeting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4077 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 14:27:05 +00:00
ebanks
30a104228a
Don't require entropy reduction when cleaning only at known sites; instead we need to trust the known indels. This will improve consistency between lane-level and aggregated cleaning.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4076 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 02:44:38 +00:00
depristo
b6989289fc
Potential bug fix for bad references where some codons may have Ns
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4075 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 12:09:33 +00:00
kiran
121b4f23b6
Simple change to allow a list of samples or regular expressions to be provided in a text file (one line per sample).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4074 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 00:01:48 +00:00
ebanks
165dc6d3b0
Ryan, what did you decide about supporting this tool? Is it still useful?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4073 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 19:16:14 +00:00
ebanks
2ef2f1b24a
Fix UG's simple indel calculation model so that deletions are created correctly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4072 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 15:35:47 +00:00
aaron
fa36731faf
fixes for VariantEval integration tests affected by the spaces to underscores change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4070 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 22:43:20 +00:00