Matt Hanna
41d70abe4e
At chartl's request, add the bwa aln -N and bwa aln -m parameters to the bindings.
2012-01-17 14:47:53 -05:00
Matt Hanna
32ccde374b
Merged bug fix from Stable into Unstable
2012-01-17 11:08:35 -05:00
Matt Hanna
3ba918aff1
Error message cleanup in BAM indexing code.
2012-01-17 11:05:42 -05:00
Mark DePristo
aa8a885a5b
Generalizing forest.R analysis script
...
-- Support for N tree analyses
-- Testing of NA omit and roughfix options
-- Misc. analyses and refactoring
2012-01-16 09:33:41 -05:00
Mark DePristo
8ddac9a06f
Don't show individual jobs in queueStatus for gsaadm, just count
2012-01-16 09:33:05 -05:00
Mark DePristo
61f82f138f
Extract a high-level GATK version from the SVN / GIT full version numbers in analyzeRunReports
...
-- Maps SVN versions 1.0.5988 for example to 0.5, 1.0.6134 to 0.6, etc
-- Maps GIT versions 1.x-XXX to 1.x
Used in tableau analyses
2012-01-16 09:30:48 -05:00
Mauricio Carneiro
8272c8bd26
Added exceptions to CGL walker
...
* Assert that a user provided a VCF not some other type of ROD
* Assert that the VCF has samples
* Assert that the samples in the BAM exist in the VCF
* Warn the user if not all samples in the BAM are present in the VCF
2012-01-14 14:10:19 -05:00
Mauricio Carneiro
cec7107762
Better location for the downsampling of reads in PrintReads
...
* using the filter() instead of map() makes for a cleaner walker.
* renaming the unit tests to make more sense with the other unit and integration tests
2012-01-14 14:06:09 -05:00
Mauricio Carneiro
3a9d9789ae
Removing old scripts for genotype accuracy
2012-01-13 16:57:05 -05:00
Mauricio Carneiro
3110a8b69d
Genotype likelihoods calibration tool refactored
...
* automatically generates pdf with all the plots
* new and updated documentation
* R script now lives in the classpath (under private)
2012-01-13 16:34:36 -05:00
Khalid Shakir
ca48f04fb8
Better handling in pre QC R scripts for older projects (whole_exome_agilent_designed_120) that came out before some metrics were added to Picard.
...
PCT_PF_READS was plotted with a plot title for PCT_PF_ALIGNED_READS. Now plotting both metrics separately.
2012-01-13 16:31:56 -05:00
Mark DePristo
b06074d6e7
Updated SortingVCFWriterBase to use PriorityBlockingQueue so that the class is thread-safe
...
-- Uses PriorityBlockingQueue instead of PriorityQueue
-- synchronized keywords added to all key functions that modify internal state
Note that this hasn't been tested extensivesly. Based on report:
http://getsatisfaction.com/gsa/topics/missing_loci_output_in_multi_thread_mode_when_implement_sortingvcfwriterbase?utm_content=topic_link&utm_medium=email&utm_source=new_topic
2012-01-13 09:33:16 -05:00
Mauricio Carneiro
28aa353501
Added "unbiased" downsampling parameter to PrintReads
...
* also cleaned up and updated part of the unit tests for print reads. Needs a more thorough cleaning.
2012-01-12 16:33:55 -05:00
Matt Hanna
2c3176eb80
Merged bug fix from Stable into Unstable
2012-01-12 13:31:10 -05:00
Matt Hanna
cd43f016ce
Fixed NPE in getNextOverlappingBAMScheduleEntry() when mixed mapped/unmapped interval lists are used. Added integrationtest to verify behavior.
2012-01-12 13:29:11 -05:00
Eric Banks
ed34b4f088
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-12 10:27:26 -05:00
Eric Banks
e7fe9910f7
Create the temp storage for calculating cell values just once as per Mark's TODO
2012-01-12 10:27:10 -05:00
Eric Banks
f5f5ed5dcd
Don't initialize the cell conformation values (use an else in the loop instead) as per Mark's TODO
2012-01-12 08:50:03 -05:00
Eric Banks
410a340ef5
Swapping the iteration order to run over AF conformations and then samples instead of the reverse minimizes calls to HashMap.get; instead of it being O(n) since we called it for each sample it's now O(1). Runtime on T2D GENES test set is reduced by 5-10%. More optimizations to follow.
2012-01-12 02:04:03 -05:00
Mauricio Carneiro
423d4ac2d3
Quick fix to CalibrateGenotypeLikelihoods
...
we were using an old check for no calls that doesn't work anymore.
2012-01-11 17:47:44 -05:00
Mauricio Carneiro
77a03c9709
Patching special case in the adaptor clipping
...
* if the adaptor boundary is more than MAXIMUM_ADAPTOR_SIZE bases away from the read, then let's not clip anything and consider the fragment to be undetermined for this read pair.
* updated md5's accordingly
2012-01-11 17:47:44 -05:00
Mark DePristo
34cf2fe43b
Merged bug fix from Stable into Unstable
2012-01-11 08:55:20 -05:00
Mark DePristo
2e47336a81
Only print out error report for most recent release in runGATKReport.py
2012-01-11 08:54:46 -05:00
Khalid Shakir
aae61767c6
queueJobReport now compresses PDF when running R 2.13+.
...
Updated PostCallingQC.scala's VE and R to include missense to silent ratio and plot.
2012-01-10 17:32:30 -05:00
Khalid Shakir
a9a6516527
Merged bug fix from Stable into Unstable
2012-01-10 16:16:10 -05:00
Khalid Shakir
ef50e77ee2
When running Queue jobs locally, merge the stderr to the stdout log if the error file is NOT specified.
...
Updated VE strats in the HSP for plotting Ka/Ks by AC.
2012-01-10 16:10:25 -05:00
Eric Banks
3475bfafd3
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-10 12:39:15 -05:00
Mauricio Carneiro
5bf960deb8
adding dbsnp to indel VQSR
2012-01-10 12:38:49 -05:00
Eric Banks
25d0d53d88
Moving the approximate summing of log10 vals to MathUtils; keeping the more efficient implementation of fast rounding.
2012-01-10 12:38:47 -05:00
Eric Banks
589397d611
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-10 12:36:48 -05:00
Eric Banks
c5320ef1af
Resolving changes in integration test during merge
2012-01-10 12:14:16 -05:00
Matt Hanna
e923a2e512
Revving Picard to incorporate final version of ReadWalker performance improvements.
2012-01-10 12:12:33 -05:00
Eric Banks
0f36f6947e
Resolving merge conflicts
2012-01-10 11:44:16 -05:00
Eric Banks
f2cecce10f
Much better implementation of the approximate summing of an array of log10 values (including more efficient rounding). Now effectively takes 0% of UG runtime on T2D GENES (as opposed to 11% previously).
2012-01-10 11:34:23 -05:00
Matt Hanna
509c3d87b0
Merged bug fix from Stable into Unstable
2012-01-09 23:08:46 -05:00
Matt Hanna
dc60757b68
Eliminate unnecessary strong references (and therefore memory held) by tree reduce entries that have already been processed.
...
Thanks to Tim Fennell for the bug report.
2012-01-09 23:04:53 -05:00
Menachem Fromer
4c3c93fc92
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-09 18:18:09 -05:00
Menachem Fromer
133739a76e
Add option to run the longer components on a different queue
2012-01-09 18:17:39 -05:00
Mauricio Carneiro
6b9dcaf979
added a -o option to AssessLikelihoodsAtTruth
2012-01-09 16:53:48 -05:00
Mauricio Carneiro
6f2abd76df
Updating the MDCP with the new indel gold standard from Ryan.
2012-01-09 15:31:18 -05:00
Mark DePristo
f01f7d1db8
Merged bug fix from Stable into Unstable
...
Merge branch 'master' into unstable
2012-01-09 08:41:14 -05:00
Mark DePristo
845c0b1c66
Merge branch 'master' of ssh://depristo@gsa1/humgen/gsa-scr1/gsa-engineering/git/stable
2012-01-09 08:40:59 -05:00
Mark DePristo
f5add25c72
Improved formatting of queueStatus
2012-01-09 08:40:53 -05:00
Matt Hanna
fda1795791
Merged bug fix from Stable into Unstable
2012-01-08 22:04:44 -05:00
Matt Hanna
1f1233b669
Fix for a rare but insidious bug in position tracking during async BAM file reading.
...
Thanks to Khalid for spotting and reporting the issue.
2012-01-08 22:03:35 -05:00
Menachem Fromer
f741ec6c6a
Replaced dotFile with shortDescription, as per Khalid's latest update
2012-01-08 12:51:50 -05:00
Menachem Fromer
87a690e6df
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-08 12:23:21 -05:00
Menachem Fromer
1ddac59a49
Added alpha version of Exome CNV calling pipeline script. To run it, you would need to checkout and compile our C++ code by 'git clone /psych/genetics_data/projects/seq/exome/CNV/git_master/xhmm', though this is not yet recommended since this process is all still preliminary
2012-01-08 12:22:18 -05:00
Khalid Shakir
5793625592
No more "Q-<pid>@<host>". Generated log file names now use the first output + ".out" (ex. my.vcf.out) or the name of the first QScript plus the order the function was added (ex. MyScript-1.out). The same function added twice with the same outputs will now have the same default logs, meaning the 2nd instance of the function won't be added to the graph twice.
...
QScript accessor to QSettings to specify a default runName and other default function settings.
Because log files are no longer pseudo-random their presense can be used to tell if a job without other file outputs is "done". For now still using the log's .done file in addition to original outputs.
Gathered log files concatenate all log files together into the stdout.
InProcessFunctions now have PrintStreams for stdout and stderr.
Updated ivy to use commons-io 2.1 for copying logs to the stdout PrintStream. Removed snakeyaml.
During graph tracking of outputs the Index files, and now BAM MD5s, are tracked with the gathering of the original file.
In Queue generated wrappers for the GATK the Index and MD5s used for tracking are switched to private scope.
Added more detailed output when running with -l DEBUG.
Simplified graphviz visualization for additional debugging.
Switched usage of the scala class 'List' to the trait 'Seq' (think java.util.ArrayList vs. using the interface java.util.List)
Minor cleanup to build including sending ant gsalib to R's default libloc.
2012-01-08 12:11:55 -05:00
Mark DePristo
90cc17ee2a
Merged bug fix from Stable into Unstable
...
Conflicts:
private/shell/runGATKReport.csh
2012-01-06 18:14:51 -05:00