depristo
995cfe34fe
You can have an error so early that some engine fields are uninitialized. Commit protects RunReport from these errors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4185 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 19:00:25 +00:00
rpoplin
a975db2c2e
Bug fix for the case of reads with no read bases!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4184 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 16:58:54 +00:00
rpoplin
469bbaa240
Added more integration tests for the variant quality score recalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4181 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:31:24 +00:00
rpoplin
5b94c926c8
More precise language.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4178 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:44:22 +00:00
rpoplin
96040726ac
Better exception text for the common error of providing only dbsnp but giving dbsnp sites zero clustering weight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4177 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:36:43 +00:00
depristo
32c6b48106
Proper memory metrics in the file. Please use -et if at all possible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4175 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:30:09 +00:00
chartl
63c7cbd89b
Forgot to commit this long ago, change so the tables are correctly propagated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4174 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 19:06:52 +00:00
aaron
db4ff7317f
allowing empty RMD files (we need to not validate their sequence dictionaries against the reference in this case)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4173 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 17:45:33 +00:00
ebanks
3d6c4fc55f
Removing the obsolete --hapmap and --hapmap_chip options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4172 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:57:05 +00:00
depristo
b33873206a
GATKRunReport now has an ID (random 32 char string) that uniquely identifies the JOB run and can be used to find a run in the run repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4171 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:18:57 +00:00
ebanks
3c956110f3
Fixing up the VCFWriter storage code: instead of assuming all samples are coming from the input bam file (they're not), just use the original VCF header for writing the temporary thread files. Now parallelization in e.g. the Genomic Annotator works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4168 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 02:16:07 +00:00
aaron
69d92fab4f
adding the ability to get iterators from Tribble without having an index, and updating the Tabix code to the latest Samtools SVN version (this still doesn't fix the outstanding tabix bugs, waiting for Heng on that).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4167 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:49:23 +00:00
fromer
50f7f18cbd
Changed ReadBackedPhasing default PQ threshold to 10
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4166 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:26:15 +00:00
chartl
e64d1be475
Check if VC is null before trying to subset it (can happen with indels)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4165 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 20:43:37 +00:00
depristo
1ddb5d17c9
hostname now fully qualified and working
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4163 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 17:04:37 +00:00
depristo
4c28fc3a39
Clear documentation for GATKRunReport
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4161 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 15:59:25 +00:00
kiran
16b75e3b9a
A new version of the ErrorRateByReadPosition walker, using the GATKReport functionality to store and emit its output. This version of the walker is roughly half the number of lines as the previous version, owing simply to the removal of all of the output formatting that's now handled by GATKReport.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4160 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:41:13 +00:00
kiran
fd19c63aaf
A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module).
...
This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R. In the end, you get a table that looks like this:
##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads
cycle errorrate.61PA8.7 qualavg.61PA8.7
0 0.007451835696110506 25.474613284804366
1 0.002362777171937477 29.844949954504095
2 9.087604507451836E-4 32.87590975254731
3 5.452562704471102E-4 34.498999090081895
4 9.087604507451836E-4 35.14831665150137
5 5.452562704471102E-4 36.07223435225619
6 5.452562704471102E-4 36.1217248908297
7 5.452562704471102E-4 36.1910480349345
8 5.452562704471102E-4 36.00345705967977
...
A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession. Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone. This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect.
The display property of individual columns can be turned off. This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file.
Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations. For instance, two whole columns can be divided, the results of the operation being stored in a third column. This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:39:24 +00:00
ebanks
df76474b34
Proper filtering when indels are being lifted over
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4158 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 04:48:31 +00:00
depristo
3fd2392090
Improved interface to getting command line options. Now fully traverses all objects to get all internal argument collections. Preliminary (but disabled version) of phoning home (see -et argument for more information). Captures correct and erroring out runs and writes out gzipped, xml report with lots of useful information. Needs a bit more information but is approximately working. Reports going to /humgen/gsa-hpprojects/GATK/reports/ in submitted directory that will be collated by some external tool. Only operating if -et STANDARD or -et STDOUT are provided currently and REPORT_DIR contains a file called ENABLE. WalkerTest now adds -et NO_ET to tests to avoid populating the reports with tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4155 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-28 22:53:32 +00:00
rpoplin
9c3f403307
Add the calculated lod value to the info field of each recalibrated VCF record.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4153 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 21:33:58 +00:00
delangel
fe19539188
Small bug fix: if a read falls at the edge of an indel event (but is not part of it), don't count it towards consistency computation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4152 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 20:37:27 +00:00
rpoplin
54355b1864
In variant quality score recalibrator Preserve the definition of known and novel to be presence in dbSNP or not even when training with 1KG project calls.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4151 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 19:07:59 +00:00
ebanks
7a5f297083
actually modify the vcf when a sample has been down-sampled
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4150 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 19:03:21 +00:00
ebanks
9860db64a3
Fix up liftover to enable lifting over indels
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4148 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 17:55:27 +00:00
hanna
fb177c4fee
If only dcov is specified, assume that selected downsample type is BY_SAMPLE.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4147 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 17:35:41 +00:00
ebanks
9584cbc05e
UG now downsamples to 250x by default
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4146 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:53:15 +00:00
ebanks
431392330e
Re-enable the max records in ram argument, which I accidentally removed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4145 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:42:49 +00:00
hanna
de5ccfb0b1
Moved hasPileupBeenDownsampled() based on Eric's request. Also eliminated
...
@Deprecated constructors from AlignmentContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4142 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:12:05 +00:00
ebanks
427a2f85e9
The Indel Realigner now lets the engine do all of the setup for args affecting the SAM writer. Thanks, Matt!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4141 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:19:47 +00:00
asivache
a3d9d23b0f
Now prints het genotype with GQ=0 for each indel; in two-sample (normal-tumor) mode, prints both genotypes (N and T) as hets for germline events or hom ref for N and het for T for somatic events (all genotypes still have GQ=0)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4140 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:06:42 +00:00
ebanks
dda84a0e54
Re-enabling indels for the Genomic Annotator as per Steve's patch. Steve assures me that he will test this out really well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4139 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:01:25 +00:00
hanna
6f4af47aac
setMaxRecordsInRam now a member of StingSAMFileWriter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4138 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 14:50:41 +00:00
ebanks
bfcac33e80
Cleaning up playground utils and tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:25:47 +00:00
ebanks
4979dcc9a7
Finishing up the playground cleanup (for now)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4135 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:19:37 +00:00
ebanks
0452b1ab68
archiving, removing, or promoting to core from playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4134 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:07:42 +00:00
hanna
d773b3264b
Eliminated -mrl option.
...
Eliminated -fmq0 option.
Eliminated read group hallucination.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4133 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 21:38:03 +00:00
depristo
f384d4a5d6
A java reimplementation of vcf2table in python; supports getting more useful information about genotypes (HET, e.g.) than was possible in python.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4130 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 17:50:33 +00:00
asivache
1e193e4c20
prinring '\n' at the end of line leads to some aesthetical advantages
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4129 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 16:29:42 +00:00
asivache
9b3ffa5f64
Now outputs VCF (as standard output associated with -o)! Can also outptut, in parallel, a lightweight bed and fully annotated .txt (old verbose format) with --bed and --verbose, respectively
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4128 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 16:26:03 +00:00
ebanks
dfae48cee0
Moving supported tools to core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4127 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 13:56:19 +00:00
ebanks
45d895dcf4
Remove the check in the Unified Genotyper for hitting the max reads at locus value. Instead, simply add a flag to the INFO field if any of the samples has been downsampled. 95% hooked up.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4126 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 05:50:47 +00:00
ebanks
e06b2c90ef
Cap the default size of join tables; this can be modified with the --maxJoinTableSize argument. Also, misc cleanup of the comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4125 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 05:21:26 +00:00
ebanks
79cd716671
More cleanup of the Genomic Annotator. Also, we now require join tables to have unique entries for the column keyed on the join.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4124 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 04:43:52 +00:00
kshakir
0105e8d063
Updated Queue GATK generation to reflect -B and -I changes.
...
To add support for "-I:tumor tumor.bam", the GATK argument
import_file (-I) is now generated as a List of NamedFile objects.
Could not get sugar working 100%. To activate sugar import the
gatk package. This effectively adds a new method to java.io.File
called toNamedFile. When adding a file to the list call
countReads.import_file :+= myJavaFile.toNamedFile
See scala/qscript/examples for actual examples.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4122 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 22:17:36 +00:00
hanna
bdb3a7ebe6
The tagger was automatically combining identical tags, but this is a problem
...
for the ROD system. Eliminate tag combine operation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4121 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 22:01:32 +00:00
fromer
39da567d48
Changed ReadBackedPhasing to be a RodWalker (corrected to By(READS))
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4120 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:53:04 +00:00
ebanks
4678613893
Significant fixes for the Genomic Annotator.
...
1. Rip out all of Ben's code intended to circumvent the stable VCF Writer output system in multi-threaded mode (I threw up a little when
I saw this code). This will improve memory consumption when running with -nt.
2. Don't annotate indels or > bi-allelic sites.
3. Fix bug where not all records were making it into the output VCF.
4. General code clean up.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4118 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:16:50 +00:00
fromer
41e53d37e1
Changed ReadBackedPhasing to be a RodWalker (more efficient, since it is ROD-focused)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4117 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 19:43:57 +00:00
rpoplin
ac58eb3cbb
Slightly better error message for the common error of only providing a dbsnp track but giving it zero clustering weight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4114 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:41:21 +00:00
rpoplin
5623e01602
GenerateVariantClusters and VariantRecalibrator now uses hapmap and 1kg ROD bindings (in addition to dbsnp) to distinguish between knowns and novels. It no longer looks at by-hapmap validation status so providing hapmap is highly recommended. Example on the wiki. Input variants tracks now must start with input.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4113 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:33:40 +00:00
hanna
bf0b6bd486
Update integration tests to use the new ROD syntax.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4112 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:13:30 +00:00
asivache
14198b74d5
Can now compute av. qualities and stddevs per cycle for both original (when present in bam) and recalibrated quals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4111 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 17:14:58 +00:00
asivache
23dbaa68e6
Can design assays when multiple (distinct) events occur at the same locus (one assay per event)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4110 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 16:52:47 +00:00
ebanks
b4baa3eb8f
Cleanup. INDELS model is now disconnected (and renamed 'DINDEL' in preparation for adding plumbing for Guillermo soon)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4106 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 14:52:51 +00:00
hanna
3dc78855fd
Command-line argument tagging is in, and the ROD system is hacked slightly to support the new syntax
...
(-B:name,type file) as well as the old syntax. Also, a bonus feature: BAMs can now be tagged at the
command-line, which should allow us to get rid of some of the hackier calls in GenomeAnalysisEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4105 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 03:47:57 +00:00
fromer
aa8cf25d08
Implemented fully symmetric sliding window read-backed phaser
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4104 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 21:12:32 +00:00
ebanks
cba5f05538
Small fixes for consistency in the numbers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4103 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:48:25 +00:00
rpoplin
7bbd67f3c4
Fixing stray comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4102 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:19:39 +00:00
rpoplin
85007ffa87
Some clean up for the variant recalibrator. Now uses @Input and @Output so that it can join the Queue party. Users now specify a -o, -clusterFile, -tranchesFile, and -reportDatFile. Example on the wiki. ApplyVariantCuts now has an integration test. Base quality recalibrator now requires a dbsnp rod or vcf file. Now that the base quality recalibrator is using @Output the PrintStream shouldn't be closed in OnTraversalDone.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4101 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:14:58 +00:00
delangel
f2b138d975
Small refactoring: make Haplotype a public class since it will be soon extended and shared with other callers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4100 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 17:52:36 +00:00
ebanks
43f1fb2380
Okay, finally done with VCF compression. Now:
...
1. Uses blocked gzip compression.
2. No more -bzip option available (since we can't compress to sdout).
3. Only file extensions that are compressed are .gz and .gzip.
4. No more need for CompressedVCFWriter.java
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4099 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 16:36:54 +00:00
ebanks
25fb53e7a2
Oops, forgot to call toLowerCase().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4097 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 14:43:24 +00:00
ebanks
7957b60768
We now automatically compress the output VCF if the file suffix is one of the supported types (.gz, .bz, .bz2). You can still specify -bzip if you want to use another file suffix (or pipe it to sdout for some reason).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4096 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 14:39:59 +00:00
rpoplin
7a8b6b87da
Committing Michael Yourshaw's patch for AnalyzeCovariates. We spawn each RScript process and wait for it to finish in series. Thanks Michael!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4095 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 13:06:25 +00:00
ebanks
9fb151f417
Minor update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4094 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 05:17:10 +00:00
ebanks
44f3c5639a
I have finally figured out that when you volunteer to do something in group meeting, you keep getting pestered about it on Mark's Omniplan doc until it gets done (except for contig aliasing, of course). As such...
...
We can now emit bzipped VCFs from the GATK.
Details: any walker that defines a VCFWriter for its @Output (i.e. pretty much every core walker from UG and on), also has associated with it the -bzip (--bzip_compression) boolean argument. When set, it will emit a VCF that is compressed with bzip2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4093 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 04:14:50 +00:00
hanna
691333f75c
Force isRequired() to be false for @Deprecated args.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4092 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:50:30 +00:00
hanna
5d6a6420a9
New behavior for filling it output streams: if required==true for a field and the field
...
is an output stream, we'll automatically create it and point it to stdout. Otherwise,
we'll leave it empty.
I think about it like this: marking a field 'required' indicates to the GATK that the
walker author requires a value for this field, and if the GATK can provide one without
end user intervention, it will. Maybe this is hackish. We'll try it and see.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4091 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:39:13 +00:00
ebanks
90aef66ec5
Minor fixes for my last commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:25:29 +00:00
ebanks
ef795825fd
Yet more argument consistency updates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4089 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 20:52:30 +00:00
aaron
7474afa7a3
allow other objects access to the static method that resolves bam lists, and some renaming and improved documentation for the function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4087 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:52:00 +00:00
ebanks
ccda4f6ec1
More output consistency changes (updating wiki docs as I go along).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:46:08 +00:00
ebanks
c9c6ff49c2
Deprecated 'O' in favor of 'o' in the cleaner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4085 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:09:24 +00:00
ebanks
55a8306a0d
Update the @RMD tags to look for VariantContext.class instead of ReferenceOrderedDatum.class. Since the test for rod type is broken this won't affect anything right now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4084 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:49:37 +00:00
aaron
35b9883dd6
vcfwriter is in tribble now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4083 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:01:04 +00:00
aaron
2d3b6d89dc
adding the ability in Tribble to create indexes from a stream of features, so that we can create multiple indexes from one pass of the file. In the GATK we now create multiple indexes, and choose the
...
most appropriate based on feature density, and the longest feature in the file. Also:
- Converted Tribble to TestNG; it has better features and is about 6x faster.
- As much code clean-up as I could get done. More to do, especially in the example code.
- Moved asserts in the code to throw exceptions.
- Added getBinSize to the index interface; both indexes already implemented this.
- Removed the abstract parts of the indexCreator interface; this is now more simple.
- Added an IndexType enumeration; might be overkill but it is at least a single point of entry for index information.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4082 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 06:54:59 +00:00
kiran
295472bf69
Simple change to handle a no-call (must avoid asking for the second allele, which will be be null in this case). Also, added a hack to deal with input VCFs where there are no genotype likelihoods (needed in order to process Hapmap and 1KG VCFs). In this mode, called genotypes are assigned a likelihood of 0.96, and alternative genotypes are given 0.02 each. I know Beagle actually takes genotype data without likelihoods, so this might not be the right way to do this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4081 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:13:09 +00:00
kiran
dec713a184
Simple test code from Steve Schaffner to compute R^2 and D'. This is just for educational purposes. Don't use this code for anything, ever!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4080 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:06:16 +00:00
hanna
c177801d81
Add deprecated command-line arguments, and switched over UG to output to
...
-o/--out instead of -varout. Let's watch as our intrepid support engineer
gracefully responds to all the incoming questions of the form: "the GATK told
me to use -o instead of -varout. What do I do?"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4078 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 21:01:44 +00:00
hanna
b80cf7d1d9
Modifications to the output system for better interaction with @Output. Multiplexed arguments. More details in the Monday meeting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4077 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 14:27:05 +00:00
ebanks
30a104228a
Don't require entropy reduction when cleaning only at known sites; instead we need to trust the known indels. This will improve consistency between lane-level and aggregated cleaning.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4076 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 02:44:38 +00:00
depristo
b6989289fc
Potential bug fix for bad references where some codons may have Ns
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4075 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 12:09:33 +00:00
kiran
121b4f23b6
Simple change to allow a list of samples or regular expressions to be provided in a text file (one line per sample).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4074 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 00:01:48 +00:00
ebanks
165dc6d3b0
Ryan, what did you decide about supporting this tool? Is it still useful?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4073 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 19:16:14 +00:00
ebanks
2ef2f1b24a
Fix UG's simple indel calculation model so that deletions are created correctly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4072 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 15:35:47 +00:00
fromer
1c4784999a
Updated to work exclusively in log10 space
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4069 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 21:31:07 +00:00
fromer
3af4e618cc
Fixed precision issues with PQ (phasing quality)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4068 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:34:47 +00:00
kshakir
88ca1fb22c
Lazy loading reflections so Queue can hack the classpath before the PluginManager looks for classes.
...
Removed extra quotes from 'cd' pre-exec command.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4067 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:29:52 +00:00
aaron
63ada20da5
allow RefSeq files to optionally contain the header line, which is the default output from the UCSC table browser
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4065 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:25:37 +00:00
fromer
effeedf1a3
Updated Bayesian phasing method to output per-site phasing statistics (and to not cap PQ at 40)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4064 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:55:47 +00:00
aaron
04e5b28f6d
updates for VCF; we can no longer cache genotypes or alleles in a static array, this is bad for sharred memory parallel runs. One instance per codec was better for performance than using ThreadLocal code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4063 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:34:44 +00:00
corin
8054b6b295
Changing a name of a column for variantevals output for easier reading by R--let me know if this needs to be updated elsewhere; it's just a space to an underscore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4062 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:18:16 +00:00
ebanks
4b94f8c21b
Silly me, I forgot to check for the contig boundaries. Thank goodness for performance tests!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4061 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 18:40:26 +00:00
aaron
f16bb1e830
fix for a bug in package utils.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4060 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 15:01:50 +00:00
fromer
15c5aa6e48
Efficient iteration over all possible combinations of variable assignments, for variables of arbitrary cardinalities
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4059 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 14:14:37 +00:00
ebanks
1ec305cd15
Fix for running the cleaner at the lane-level for known indels only: instead of relying on the reads to get the reference sequence, we now use an IndexedFastaSequenceFile in all cases and pad the reference with bases on either end. This allows us to deal with cases in which we are trying to clean just a single deletion-containing read with tiny LOD (so the read needs to be pushed off the seen reference; @Reference doesn't yet work for Read Walkers) and has the added benefit of allowing us now to get much larger known indels that aren't completely covered with reads.
...
Thanks to Matt for the advice.
Also, for Guillermo: while I was at it, I changed the .stats debug output to emit the original interval instead of the cleaned region.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4058 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 11:31:13 +00:00
ebanks
98f7679619
Fixed the bug reported on GS regarding a clipped read that got moved several hundred bases away. The code that got triggered here was written back in the original version of the cleaner and it never actually did the right thing.
...
While I was fixing it, I noticed that we weren't allowing the cleaner to un-clean reads with indels when they're wrong even though we should. Hypothetically, that should rarely happen: only when we can left-align out an indel or when the original mapper really went haywire. This situation is rare enough that I'm calling logger.info to let the user know it's happening and suggesting that they double-check that everything looks right with their reads. Better to be extra-cautious now that the cleaner is moving into the 1kg and Broad production pipelines soon
.
Mark, have no fear: this was truly a rare edge case - one that won't affect the cleaning stats. There is no need to re-clean the data processing paper bams!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4057 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 01:42:48 +00:00
aaron
3dc4d3c3a9
removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also
...
removed the auto-deletion of the reflections jar, and removed the very old OmniPlan document we had checked-in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4056 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 00:42:37 +00:00
fromer
1336ea17a3
quality-scored-based Bayesian phasing algorithm implemented
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4055 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-18 21:17:46 +00:00
fromer
553bda4e0e
PreciseNonNegativeDouble permits precise arithmetic operations on NON-NEGATIVE double values
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4054 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-18 21:10:58 +00:00
rpoplin
8f15b2ba72
Memory optimization for the VariantRecalibrator. Only add variants to the list if they pass the novelty and qual filters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4051 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 21:57:28 +00:00
kshakir
b7c60b9729
Queue now uses its own version instead of the gatk version.
...
Added a Queue release directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4050 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 19:34:23 +00:00
aaron
c1df293feb
remove testing code from tribble track builder, set the command line program in walker test to null to reclaim memory in integration tests, and removed some orphaned intergration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4046 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 23:52:01 +00:00
rpoplin
578e7fa36d
Don't output -0 as qual value in VariantRecalibrator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4044 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 16:47:58 +00:00
kiran
3d63302b70
Deprecated. Use SelectVariants instead.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4043 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 15:07:50 +00:00
depristo
20db00a3e8
Lazy reference loading; the engine doesn't fetch the reference bases until you actually call ref.getBases(). With the new hidden --dontUpdateUG to table recalibrator this is 2-3x faster than before. Enabled for locus, read, and rod walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4042 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:46:22 +00:00
aaron
9ab647b730
adding checks to the RefSeq rod for line's that contain less than the required number of columns (we expect there to be 16 columns)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4041 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:34:32 +00:00
aaron
b23545fafa
re-enable the check for up-to-date versions in the Tribble index.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4039 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 12:47:58 +00:00
ebanks
37586d3a43
Don't exception out when bad aligners emit wonky alignments; instead, just don't clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4038 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 02:36:04 +00:00
depristo
a36951f11a
@output and @input arguments for table recalibration for use with Q
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4037 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 18:36:28 +00:00
depristo
61064d7075
GenotypeConcordance log file -- if provided, GC module will write FN/FP information to this file by context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4036 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 18:35:57 +00:00
depristo
0d209d5442
Nicer printing out of clustering
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4035 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 16:02:13 +00:00
kshakir
307c8ca027
Created a new playground script for cleaning bams in Firehose.
...
Some refactoring of Queue extensions for reusability in scripts.
Putting the extensions into the Queue.jar after building them.
More updates to GATK walker arguments specifying @Input and @Output for Queue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4032 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 23:52:24 +00:00
fromer
dfe2922b5e
First working version of statistical haplotype phaser
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4031 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 21:29:45 +00:00
ebanks
f36c0ed613
Stop building obsolete VCFTools and CGUtilities
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4030 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 19:28:36 +00:00
rpoplin
222f61df87
Bug fix for damoskow in TableRecalibration. Shouldn't try to update the reference mismatch rate tag for an unmapped read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4028 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 18:57:07 +00:00
kshakir
80a70ccf03
Repopulating rodsToSamples. Code reviewed by Eric.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4027 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 17:07:18 +00:00
hanna
cb144734c0
Getting rid of GenotypeWriter interface. Of note:
...
- GATKVCFWriter deleted, to be replaced if absolutely necessary when VCF writing goes into Tribble.
- VCFWriter is now an interface, for easier redirection.
- VCFWriterImpl fleshes out the VCFWriter interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4026 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 16:33:22 +00:00
kshakir
542d394e09
Cleaning up Queue debugging output.
...
-l DEBUG with local programs now prints out the stdout/stderr of the programs as they are run.
More documentation in the examples with a new even simpler CountReads example.
Took out unused option to build Queue GATK extensions separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4025 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:54:08 +00:00
chartl
49a3db9dfe
A brief implementation of a QD calculation that is not quite so bimodal for known variants (multiplicatively penalizes QD by (n variant samples)/(n variant alleles) ). Not sure how helpful this will be (which is why it is in oneoffs). Seems nice on MCKD1, but I'm still playing with the optimization.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4024 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:42:37 +00:00
chartl
c6a8fba922
Occasionally if a JEXL expression results in no variants being captured (like "QD > 20.0" on filtered variants) the per-sample mapping from samples to eval objects can be empty. This semi-hacky fix prevents null pointer exceptions in setting up the resulting empty table (by jumping straight to it in this case)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4023 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:37:45 +00:00
ebanks
f874e548aa
Shame on us. FlagStat used ints instead of longs, so we ended up getting negative read counts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4022 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 03:00:57 +00:00
kshakir
f39dce1082
Exposed CommandLineFunction defaults to the Queue.jar command line (see -help).
...
Added ability to skip up-to-date jobs where the outputs are older than the inputs.
Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names.
Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile
Moved Hidden from the GATK to StingUtils.
Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7
Added Queue to javadoc and testing build targets.
Added first Queue unit test.
Another pass at avoiding cycles in the DAG thanks to all function I/O being files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 21:58:26 +00:00
chartl
8c08f47923
1) Make sure that the table size is set correctly in finalize()
...
2) Make sure variants are biallelic before asking for isTransversion()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4016 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:32:22 +00:00
hanna
41d57b7139
Massive cleanup of read filtering.
...
- Eliminate reduncancy of filter application.
- Track filter metrics per-shard to facitate per merging.
- Flatten counting iterator hierarchy for easier debugging.
- Rename Reads class to ReadProperties and track it outside of the Sting iterators.
Note: because shards are currently tied so closely to reads and not the merged triplet of <reads,ref,RODs>, the metrics
classes are managed by the SAMDataSource when they should be managed by something more general. For now, we're hacking
the reads data source to manage the metrics; in the future, something more general should manage the metrics classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4015 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:17:11 +00:00
ebanks
7385cce494
Useful tool for calculating the perentage of misaligned reads at homozygous non-ref indel sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4013 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:57:44 +00:00
ebanks
cc9e6b4ad9
Moved into Tribble to be with VC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4012 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:14:32 +00:00
aaron
14e492fa80
fix for a problem in readNextRecord() of BFS, where we'd go looking for the next record far into in the next contig because (f.getEnd() >= start) was never true once we cycled to a new conitg. Added a check for contig identity. Also, removed duplicate HW calculation classes in the GATK and Tribble.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4011 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:01:38 +00:00
flannick
cd4cd6db81
Added option to print out discordant sites in GenotypeConcordance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4006 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:55:19 +00:00
flannick
18fc5c8c3e
Initial implementation of annotator to compute allele balance for each sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4005 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:40:17 +00:00
flannick
1dc373b9d0
Initial implementation of evaluator to compute popgen theta statistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4004 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:36:34 +00:00
aaron
0a8ebcb4f9
moving tests over from the GATK to Tribble, and added a speed-up to the readNextRecord() that Mark suggested. Also removed the contained flag from the queries to Tribble in the GATK.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4003 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 17:54:59 +00:00
ebanks
3ff6e3404e
Alleles are now returned in a consistent order, so we can deal with tri-allelic sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4002 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 15:21:10 +00:00
ebanks
ca5b274f16
Unit, integration, and performance tests are all busted, so this is a good time to make a big commit...
...
Major cleanup of the genotype writer code from the calling end. UG no longer supports making calls in anything but VCF, and that allows us to use the VCFWriter more generically now. Putting the ball in Matt's court to finish collapsing everything.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3996 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 04:18:29 +00:00
ebanks
419a36f74c
Starting the clean up of the sting.utils.genotype code which is all either moving to Tribble, moving to sting.utils.vcf, or being removed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3994 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 02:16:05 +00:00
depristo
2a4a4b0aab
VariantRecalibrator now calls plot_Tranches directly so it works on the farm
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3993 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 23:17:16 +00:00
depristo
c2c0c1f57c
Removing used --enable_overlap_filters argument; Eric assures me this won't break the currently broken tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3992 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 22:27:13 +00:00
aaron
0f29f2ae3f
fixes for the Tree index, and some small clean-up in the GATK.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3991 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:50 +00:00
rpoplin
3eee3183fd
Checking in the tiger team changes. LOD calculation modified. -qScale is back in case people need it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3990 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:03 +00:00
ebanks
0eeb659aa3
Useful utility function to print out the Allele as a String since toString prints out * for refs. It was annoying to keep seeing new String(Allele.getBases()).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3989 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:35:56 +00:00
chartl
d0ecb8875a
Added - a class to count functional annotations by sample (currently for the MAF annotation strings, soon to be migrated to genomic annotator once it is up and running)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3988 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:09:13 +00:00
aaron
5b0b9e79ba
protect against nulls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3987 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 19:21:39 +00:00
depristo
8944800f60
Minor refactoring for Ryan
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3986 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 18:05:23 +00:00
kshakir
4f51a02dea
Changed logging level to default at INFO instead of WARN.
...
Changes to StingUtils command line for use in Queue, replacing Queue's use of property files.
Updates to walkers used in existing QScripts to add @Input/@Output.
RMD used in @Required/@Allows now has a new default equal to "any" type.
New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions.
Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.)
Removed dependency on BroadCore by porting LSF job submitter to scala.
Ivy now pulls down module dependencies from maven.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 16:42:48 +00:00
aaron
30178c05c5
providing a way to specify how you'd like -BTI combined with your -L options; set BTIMR to either UNION (default) or INTERSECTION.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3983 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 14:00:52 +00:00
hanna
6b4a1e3b9f
Reenabling code that was commented out after it was confirmed to work by many participating in this thread:
...
http://getsatisfaction.com/gsa/topics/error_thrown_when_reading_reference_file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3981 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 00:12:09 +00:00
kiran
48e311a5ea
Added copyright notice.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3980 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:11:51 +00:00
kiran
9aa70d9c7c
Replaced by SelectVariants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3979 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:07:42 +00:00
kiran
758ab428f5
Better logging info for the samples being selected and the sample expressions being ignored.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3978 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:03:37 +00:00
ebanks
637a1e5055
Updating to use the new VA interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3975 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:31:01 +00:00
ebanks
bd6d5a8d51
Adding command-line header to VA and VF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3974 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:21:15 +00:00
kiran
64446f0ddf
Avoid NaNs in the final output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3973 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:16:52 +00:00
ebanks
3f6e44dc71
Updated recalibrator and cleaner to output full command-lines in the bam header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3972 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:39:18 +00:00
kiran
0da0dfa1da
Cosmetic change - lower-case for all command-line arguments' short names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3971 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:12:01 +00:00
kiran
eb1bb94d1c
Moved the evaluation of the JEXL expressions to a point *after* the samples are subset and the INFO-field annotations are updated. I think this makes more sense than having the evaluations happen beforehand, since it seems jarring to have the JEXL expressions operate on the annotations before they're updated, and have the file contain the annotations after they're updated. Now, selecting on something like allele frequency will actually apply to the annotations that actually end up in the file, while selection on other annotations (which are carried over without modification) will act exactly the same regardless.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3970 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:09:02 +00:00
ebanks
594b7912f1
Added a generic method for returning the complete command-line used when calling a walker, to be used in the bam/vcf headers. As requested, every possible engine/walker argument is included. I've added it to the Unified Genotyper output, so people can try it out and let me know what they think. Something that needs to be discussed in group meeting: what happens when we merge VCFs? Do we keep all of the command-lines?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3969 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 03:53:07 +00:00
kiran
6e389059cf
An improved version of VariantSubset and VariantSelect, meant to replace those walkers. Takes in a VCF and creates a subsetted VCF by sample(s), JEXL expressions, or both.
...
When subsetting by sample, the -SN argument is treated as a literal sample name and, if no match is found, as a regular expression. This allows for a large number of samples to be selected at once (useful when, for instance, cases are given one sample name prefix and controls are given another).
After the subsetting procedure, the INFO-field annotations AC, AN, AF, and DP are all recalculated to properly reflect the new contents of the VCF.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3968 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 02:57:06 +00:00
ebanks
ac4699a650
Re-enabling this test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3962 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:20:37 +00:00
depristo
f275041b1c
-minimalVCF for CombineVariants. Work around for broken locking code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3960 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 16:10:59 +00:00
ebanks
341e752c6c
1) AlleleBalance is no longer a standard annotation, but the Allelic Depth (AD) is for each sample.
...
2) Small fixes in the VCFWriter:
a) Trailing missing values weren't being removed if their count was > 1 (e.g. ".,.")
b) We were handling key values that were Lists, but not Arrays. We now handle both.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3956 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 12:05:14 +00:00
aaron
c68625f055
Fixes from Mark for the MutableContexts; this fixes the clearGenotypes() and the clearFilters() methods, and adds a method to clear the attributes. Also added is a method for creating a variant context where the attribute list is pruned to a specific subset, which can be null.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3955 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 22:39:51 +00:00
aaron
72ae81c6de
VariantContext has now moved over to Tribble, and the VCF4 parser is now the only VCF parser in town. Other changes include:
...
- Tribble is included directly in the GATK repo; those who have access to commit to Tribble can now directly commit from the GATK directory from Intellij; command line users can commit from
inside the tribble directory.
- Hapmap ROD now in Tribble; all mentions have been switched over.
- VariantContext does not know about GenomeLoc; use VariantContextUtils.getLocation(VariantContext vc) to get a genome loc.
- VariantContext.getSNPSubstitutionType is now in VariantContextUtils.
- This does not include the checked-in project files for Intellij; still running into issues with changes to the iml files being marked as changes by SVN
I'll send out an email to GSAMembers with some more details.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3954 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 18:47:53 +00:00
fromer
b21f90aee0
Added preliminary framework for performing short-range phasing (ReadBackedPhasingWalker.java)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3953 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 14:56:34 +00:00
rpoplin
a8d37da10b
Checking in everyone's changes to the variant recalibrator. We now calculate the variant quality score as a LOD score between the true and false hypothesis. Allele Count prior is changed to be (1 - 0.5^ac). Known prior breaks out HapMap sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3952 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 14:12:19 +00:00
ebanks
07addf1187
Fix for Kiran: since the Variant Annotator will re-annotate on top of existing annotations it makes sense to remove old headers if they conflict with the definitions being added by VA.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3951 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 06:44:39 +00:00
ebanks
1539791a04
Fix for Kiran: when using VCFs for the comp tracks in the Annotator(s), don't put the headers from them into the output VCF.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3950 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 04:45:47 +00:00
ebanks
227c4b10f0
Bug fix for Chris: convert comp tracks to VC so that we can respect the filter field. Added an integration test to cover this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3949 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 04:13:16 +00:00
ebanks
84ca2f27bb
Bug fix for Chris: added method createPotentiallyInvalidGenomeLoc() to the GenomeLocParser that doesn't check that the contig exists in the sequence dictionary. This is crucial for lifting over from one reference to another, as sometimes contigs names change in the liftover (e.g. chrM to MT).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3948 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 03:19:02 +00:00
ebanks
f247cbf68e
I want to be the first to use the new super-cool Hidden annotation! No more telling people not to use the cleaner debugging options.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3947 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 02:44:37 +00:00
hanna
78bfe6ac48
Added @Hidden annotation, a way to deliberately exclude experimental fields and
...
walkers from the help system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3946 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 02:26:46 +00:00
chartl
82d6c5073b
A simple read strand filter for potluri on get satisfaction
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3945 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 23:23:50 +00:00
asivache
d53d5ffbf6
A utility class that computes running average and standard deviation for a stream of numbers it is being fed with. Updates mean/stddev on the fly and does not cache the observations, so it uses no memory and also should be stable against overflow/loss of precision. Simple unit test is also provided (does *not* stress-test the engine with millions of numbers though).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3944 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 21:39:02 +00:00
ebanks
8d8acc9fae
Moving G's MyHapScore to replace the old HapScore
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3943 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 21:00:54 +00:00
ebanks
7858ffec32
Spit out the error in the warning message so that Sendu can tell me what his problem is
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3942 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 20:40:28 +00:00
delangel
86211b74e8
Bug fix: when padding alleles in creating a Variant context from an indel, leave no-call alleles as no-call alleles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3940 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:51:10 +00:00
chartl
38e65f6e1b
Added: A VariantEval module that gives simple metrics by sample, an an abstract class that makes per-sample modules easy to write (but a little bit clunky since a class needs be defined for each data point -- see SimpleMetricsBySample as an example). AnalysisModuleScanner needed a slight update to pull in data points from parent classes for this to work (thanks Khalid for showing me how to do this). After a code review with Aaron (thanks) and ensuring integration tests pass, I am committing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3939 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:37:39 +00:00
hanna
f13d52e427
Attempt to determine whether underlying filesystem supports file locking and
...
disable on-the-fly dict and fai generation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3938 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:28:27 +00:00
asivache
a47824d680
A couple of type specific implementations of a single extend() method: takes an array (byte[] or short[] currently) and "extends" it to the left or to the right by the specified number of elements. Returns newly allocated array, with the content of original array copied in (if we extend by n elements to the left, then the returned array will have n default-filled elements *followed* by the content of the old array).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3932 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:30:48 +00:00
asivache
012a7cf0a5
mismatchCount now has a version that counts mismatches only along a part of the read (takes additional args start_on_read and length_on_read to specify the read's subsequence to be interrogated);
...
isMateUnmapped() convenience shortcut method added.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3931 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:27:35 +00:00
delangel
e6e8a20a1e
1) Fix MyHaplotypeScore to ignore 454 reads, since all those pathological non-existing indels make some sites' score blow up. If a site is only covered by 454 reads, we (hopefully) detect this graciously and just emit a score of 0.0 for the site.
...
2) New annotation SByDepth = log10(-StrandBias/Depth) (non-standard annotation, key name = "SBD"). If StrandBias/Depth happens to be positive (very rare but can happen), annotation gets value=-1000.
3) Abstracted out new class AnnotationByDepth so that QD and SBD can share code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3930 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:23:08 +00:00
ebanks
bf60ed0b25
Needed it here too: warn user instead of dying if the R script cannot be executed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3929 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 13:11:27 +00:00
ebanks
40ffe34686
Warn user instead of dying if the R script cannot be executed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3928 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 13:08:15 +00:00
ebanks
17d5e89734
Now --list annotates which modules are Standard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3927 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 21:00:37 +00:00
ebanks
72875cf717
Removing annoying printouts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3926 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 19:55:00 +00:00
ebanks
2307bed742
VariantEval now uses the "standard" modules only by default. You can add other modules with the -E argument and not use all of the standard ones with -noStandard (they can be added back individually with -E).
...
Generalized some of the packaging code from VariantAnnotator. Matt might want to take a look to make this nicer...?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3925 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 16:51:10 +00:00
ebanks
a7ff9caf54
Added sanity check against bad people and/or crazy big indels at edges of ref context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3918 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 05:37:17 +00:00
hanna
5f1b67c1de
Coping out and forcing the entire GATK (and associated JVM) to use US English
...
locale. Method to force JVM into proper locale exists in CommandLineProgram
and is disabled by default, but implementers of CommandLineProgram can opt in
to the forced US locale by calling a static method.
Question for the VCF developers: I removed the code to explicitly output doubles
in US locale. Do you / how do you want to handle this in applications that use
Tribble outside the GATK?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3917 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 03:48:26 +00:00
chartl
2bc69572cb
Make transcript2info capable of handling b37/hg19 contigs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3915 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-02 17:32:08 +00:00
depristo
c203e0fb02
Added JEXL support for hetCount, homRefCount, and homVarCount in VCs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3914 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-02 12:24:11 +00:00
depristo
7fab5c0a8f
support for -singleton_fp_rate arguments to variant recalibrator instead of the pop.gen. AF prior. Worth experimenting with Ryan.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3913 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-31 21:17:47 +00:00
ebanks
6d91cd587e
Be explicitly clear about which options are for debugging purposes only and shouldn't be used if your username is not ebanks@broad. If only we had a @hidden annotation option for args...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3909 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:18:31 +00:00
depristo
ac8048f17b
Support for automated selects for tranches in variant eval -- use -tf to make tranch-specific ve outputs. ApplyVariantCuts with tranche reading functions for general use, along with todo for ryan. CombineVariants now has --filteredAreUncalled and will treat filtered snps in input VCFs are uncalled, and so won't emit -filteredInOther set features
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3908 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:16:43 +00:00
chartl
9231d13252
Minor modification: adding an argument to make slightly more general.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3907 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 05:20:20 +00:00
chartl
db54d63fc7
Hahaha yes, ownage. This now works.
...
BTW, Eric, thanks for forwarding the DepthOfCoverage thread to gsamembers. I'd forgotten about reduce by interval. Mighty helpful in this case!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3906 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 04:23:02 +00:00
chartl
3e3f8c7692
Simple count intervals walker, as per my recent email to GSAMembers. Never use this. It doesn't behave the way you think it does.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3905 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 03:39:23 +00:00
delangel
ba1a330293
Corrected location and made more explicit the error message thrown if someone tries to read a VCF 3.3 file with indels, which is not supported.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3901 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 20:02:47 +00:00
delangel
e1a34685fd
Add back MyHaplotypeScore as a new implementation for HaplotypeScore, this time as a non-standard annotation. Implementaiton is also better, it computes better consensus haplotypes, ranks them by sum of quality score.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3890 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 21:23:19 +00:00
hanna
6c93b13428
A Java sizeof, implemented using the Java instrumentation API. Can either get the memory consumed either only by a single
...
object or by a single object and all the references it contains. Requires a command-line change to add a Java agent to
the command-line; see the Sizeof.java javadoc for details.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3889 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 18:44:15 +00:00
rpoplin
f5566a6593
Knocking out some quick findBugs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3887 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 14:10:59 +00:00