depristo
8683087756
Suppl. tools for working with and displaying GATK run reports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4176 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:32:22 +00:00
depristo
32c6b48106
Proper memory metrics in the file. Please use -et if at all possible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4175 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:30:09 +00:00
chartl
63c7cbd89b
Forgot to commit this long ago, change so the tables are correctly propagated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4174 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 19:06:52 +00:00
aaron
db4ff7317f
allowing empty RMD files (we need to not validate their sequence dictionaries against the reference in this case)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4173 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 17:45:33 +00:00
ebanks
3d6c4fc55f
Removing the obsolete --hapmap and --hapmap_chip options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4172 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:57:05 +00:00
depristo
b33873206a
GATKRunReport now has an ID (random 32 char string) that uniquely identifies the JOB run and can be used to find a run in the run repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4171 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:18:57 +00:00
chartl
5e710050d6
minor change, bamFiles comes from the input list, not the script
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4170 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:03:35 +00:00
chartl
1a14dbee1e
Adding in .bam indexing; commit for Khalid
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4169 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 15:21:41 +00:00
ebanks
3c956110f3
Fixing up the VCFWriter storage code: instead of assuming all samples are coming from the input bam file (they're not), just use the original VCF header for writing the temporary thread files. Now parallelization in e.g. the Genomic Annotator works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4168 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 02:16:07 +00:00
aaron
69d92fab4f
adding the ability to get iterators from Tribble without having an index, and updating the Tabix code to the latest Samtools SVN version (this still doesn't fix the outstanding tabix bugs, waiting for Heng on that).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4167 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:49:23 +00:00
fromer
50f7f18cbd
Changed ReadBackedPhasing default PQ threshold to 10
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4166 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:26:15 +00:00
chartl
e64d1be475
Check if VC is null before trying to subset it (can happen with indels)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4165 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 20:43:37 +00:00
kiran
e14a347e2e
Now prints cluster report to a single PDF, rather than a dozen different PDFs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4164 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 18:58:39 +00:00
depristo
1ddb5d17c9
hostname now fully qualified and working
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4163 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 17:04:37 +00:00
depristo
9556004dbb
now supports -o option as well as verbose output mode
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4162 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 16:00:00 +00:00
depristo
4c28fc3a39
Clear documentation for GATKRunReport
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4161 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 15:59:25 +00:00
kiran
16b75e3b9a
A new version of the ErrorRateByReadPosition walker, using the GATKReport functionality to store and emit its output. This version of the walker is roughly half the number of lines as the previous version, owing simply to the removal of all of the output formatting that's now handled by GATKReport.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4160 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:41:13 +00:00
kiran
fd19c63aaf
A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module).
...
This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R. In the end, you get a table that looks like this:
##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads
cycle errorrate.61PA8.7 qualavg.61PA8.7
0 0.007451835696110506 25.474613284804366
1 0.002362777171937477 29.844949954504095
2 9.087604507451836E-4 32.87590975254731
3 5.452562704471102E-4 34.498999090081895
4 9.087604507451836E-4 35.14831665150137
5 5.452562704471102E-4 36.07223435225619
6 5.452562704471102E-4 36.1217248908297
7 5.452562704471102E-4 36.1910480349345
8 5.452562704471102E-4 36.00345705967977
...
A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession. Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone. This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect.
The display property of individual columns can be turned off. This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file.
Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations. For instance, two whole columns can be divided, the results of the operation being stored in a third column. This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:39:24 +00:00
ebanks
df76474b34
Proper filtering when indels are being lifted over
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4158 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 04:48:31 +00:00
chartl
2ffa98aea5
Ugh! varout --> out
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4157 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 02:34:41 +00:00
chartl
d7edce31a2
Commit of fCP for Khalid
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4156 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 02:24:25 +00:00
depristo
3fd2392090
Improved interface to getting command line options. Now fully traverses all objects to get all internal argument collections. Preliminary (but disabled version) of phoning home (see -et argument for more information). Captures correct and erroring out runs and writes out gzipped, xml report with lots of useful information. Needs a bit more information but is approximately working. Reports going to /humgen/gsa-hpprojects/GATK/reports/ in submitted directory that will be collated by some external tool. Only operating if -et STANDARD or -et STDOUT are provided currently and REPORT_DIR contains a file called ENABLE. WalkerTest now adds -et NO_ET to tests to avoid populating the reports with tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4155 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-28 22:53:32 +00:00
chartl
576ae30df1
A version of the full calling pipeline queue script that fully compiles without String/File/NamedFile type exceptions (e.g. expected String but got NamedFile/Expected NamedFile but got File). Pipeline itself is under testing with 5 bam files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4154 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-28 22:51:11 +00:00
rpoplin
9c3f403307
Add the calculated lod value to the info field of each recalibrated VCF record.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4153 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 21:33:58 +00:00
delangel
fe19539188
Small bug fix: if a read falls at the edge of an indel event (but is not part of it), don't count it towards consistency computation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4152 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 20:37:27 +00:00
rpoplin
54355b1864
In variant quality score recalibrator Preserve the definition of known and novel to be presence in dbSNP or not even when training with 1KG project calls.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4151 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 19:07:59 +00:00
ebanks
7a5f297083
actually modify the vcf when a sample has been down-sampled
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4150 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 19:03:21 +00:00
chartl
c6441b585a
Actually hook up the new indel genotyper and merge analyses into DAG (aka "i forgot to add()")
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4149 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 18:00:50 +00:00
ebanks
9860db64a3
Fix up liftover to enable lifting over indels
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4148 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 17:55:27 +00:00
hanna
fb177c4fee
If only dcov is specified, assume that selected downsample type is BY_SAMPLE.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4147 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 17:35:41 +00:00
ebanks
9584cbc05e
UG now downsamples to 250x by default
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4146 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:53:15 +00:00
ebanks
431392330e
Re-enable the max records in ram argument, which I accidentally removed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4145 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:42:49 +00:00
chartl
7908237b90
Full calling pipeline now calls indels through the indel genotyper, merges with combine variants, and filters on them. Since new genomic annotator is fast, it is no longer scatter-gathered.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4144 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:28:24 +00:00
kshakir
78946c4ffd
Allowing the Queue to run the GATK via -cp instead of only from -jar.
...
Added an example of using a walker with Queue and a custom -classpath.
Removed an unused import statement in NamedFileWrapper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4143 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:25:59 +00:00
hanna
de5ccfb0b1
Moved hasPileupBeenDownsampled() based on Eric's request. Also eliminated
...
@Deprecated constructors from AlignmentContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4142 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:12:05 +00:00
ebanks
427a2f85e9
The Indel Realigner now lets the engine do all of the setup for args affecting the SAM writer. Thanks, Matt!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4141 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:19:47 +00:00
asivache
a3d9d23b0f
Now prints het genotype with GQ=0 for each indel; in two-sample (normal-tumor) mode, prints both genotypes (N and T) as hets for germline events or hom ref for N and het for T for somatic events (all genotypes still have GQ=0)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4140 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:06:42 +00:00
ebanks
dda84a0e54
Re-enabling indels for the Genomic Annotator as per Steve's patch. Steve assures me that he will test this out really well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4139 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:01:25 +00:00
hanna
6f4af47aac
setMaxRecordsInRam now a member of StingSAMFileWriter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4138 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 14:50:41 +00:00
aaron
467405094a
up the test mem. from 2g to 4g; we're currently hitting the 2g in aggregate across some of the larger tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4137 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:39:05 +00:00
ebanks
bfcac33e80
Cleaning up playground utils and tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:25:47 +00:00
ebanks
4979dcc9a7
Finishing up the playground cleanup (for now)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4135 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:19:37 +00:00
ebanks
0452b1ab68
archiving, removing, or promoting to core from playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4134 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:07:42 +00:00
hanna
d773b3264b
Eliminated -mrl option.
...
Eliminated -fmq0 option.
Eliminated read group hallucination.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4133 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 21:38:03 +00:00
kiran
7671502e1b
Changes from James Pirruccello: now can handle differences between UCSC and NCBI tables, properly sorting despite the contig prefix differences (presence or absence of 'chr'), and converts NCBI format to UCSC format for use by the GenomicAnnotator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4132 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 19:02:29 +00:00
corin
8931a63588
updated a whole bunch of column names to work like i want them to and added more informative figures for DOC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4131 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 18:19:09 +00:00
depristo
f384d4a5d6
A java reimplementation of vcf2table in python; supports getting more useful information about genotypes (HET, e.g.) than was possible in python.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4130 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 17:50:33 +00:00
asivache
1e193e4c20
prinring '\n' at the end of line leads to some aesthetical advantages
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4129 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 16:29:42 +00:00
asivache
9b3ffa5f64
Now outputs VCF (as standard output associated with -o)! Can also outptut, in parallel, a lightweight bed and fully annotated .txt (old verbose format) with --bed and --verbose, respectively
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4128 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 16:26:03 +00:00
ebanks
dfae48cee0
Moving supported tools to core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4127 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 13:56:19 +00:00