Commit Graph

8546 Commits (0f36f6947ebbc7c781fa2f94038339fce4e01833)

Author SHA1 Message Date
Eric Banks 0f36f6947e Resolving merge conflicts 2012-01-10 11:44:16 -05:00
Eric Banks f2cecce10f Much better implementation of the approximate summing of an array of log10 values (including more efficient rounding). Now effectively takes 0% of UG runtime on T2D GENES (as opposed to 11% previously). 2012-01-10 11:34:23 -05:00
Matt Hanna 509c3d87b0 Merged bug fix from Stable into Unstable 2012-01-09 23:08:46 -05:00
Matt Hanna dc60757b68 Eliminate unnecessary strong references (and therefore memory held) by tree reduce entries that have already been processed.
Thanks to Tim Fennell for the bug report.
2012-01-09 23:04:53 -05:00
Menachem Fromer 4c3c93fc92 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-09 18:18:09 -05:00
Menachem Fromer 133739a76e Add option to run the longer components on a different queue 2012-01-09 18:17:39 -05:00
Mauricio Carneiro 6b9dcaf979 added a -o option to AssessLikelihoodsAtTruth 2012-01-09 16:53:48 -05:00
Mauricio Carneiro 6f2abd76df Updating the MDCP with the new indel gold standard from Ryan. 2012-01-09 15:31:18 -05:00
Mark DePristo f01f7d1db8 Merged bug fix from Stable into Unstable
Merge branch 'master' into unstable
2012-01-09 08:41:14 -05:00
Mark DePristo 845c0b1c66 Merge branch 'master' of ssh://depristo@gsa1/humgen/gsa-scr1/gsa-engineering/git/stable 2012-01-09 08:40:59 -05:00
Mark DePristo f5add25c72 Improved formatting of queueStatus 2012-01-09 08:40:53 -05:00
Matt Hanna fda1795791 Merged bug fix from Stable into Unstable 2012-01-08 22:04:44 -05:00
Matt Hanna 1f1233b669 Fix for a rare but insidious bug in position tracking during async BAM file reading.
Thanks to Khalid for spotting and reporting the issue.
2012-01-08 22:03:35 -05:00
Menachem Fromer f741ec6c6a Replaced dotFile with shortDescription, as per Khalid's latest update 2012-01-08 12:51:50 -05:00
Menachem Fromer 87a690e6df Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-08 12:23:21 -05:00
Menachem Fromer 1ddac59a49 Added alpha version of Exome CNV calling pipeline script. To run it, you would need to checkout and compile our C++ code by 'git clone /psych/genetics_data/projects/seq/exome/CNV/git_master/xhmm', though this is not yet recommended since this process is all still preliminary 2012-01-08 12:22:18 -05:00
Khalid Shakir 5793625592 No more "Q-<pid>@<host>". Generated log file names now use the first output + ".out" (ex. my.vcf.out) or the name of the first QScript plus the order the function was added (ex. MyScript-1.out). The same function added twice with the same outputs will now have the same default logs, meaning the 2nd instance of the function won't be added to the graph twice.
QScript accessor to QSettings to specify a default runName and other default function settings.
Because log files are no longer pseudo-random their presense can be used to tell if a job without other file outputs is "done". For now still using the log's .done file in addition to original outputs.
Gathered log files concatenate all log files together into the stdout.
InProcessFunctions now have PrintStreams for stdout and stderr.
Updated ivy to use commons-io 2.1 for copying logs to the stdout PrintStream. Removed snakeyaml.
During graph tracking of outputs the Index files, and now BAM MD5s, are tracked with the gathering of the original file.
In Queue generated wrappers for the GATK the Index and MD5s used for tracking are switched to private scope.
Added more detailed output when running with -l DEBUG.
Simplified graphviz visualization for additional debugging.
Switched usage of the scala class 'List' to the trait 'Seq' (think java.util.ArrayList vs. using the interface java.util.List)
Minor cleanup to build including sending ant gsalib to R's default libloc.
2012-01-08 12:11:55 -05:00
Mark DePristo 90cc17ee2a Merged bug fix from Stable into Unstable
Conflicts:
	private/shell/runGATKReport.csh
2012-01-06 18:14:51 -05:00
Mark DePristo 63b7a70c44 Removing very costly analyses of all GATK versions. Will be replaced by Tableau website 2012-01-06 18:13:19 -05:00
Mauricio Carneiro 1f88a1bfe2 Small fix to RRead script
* fixing the downsample strategy variable
2012-01-06 17:25:04 -05:00
Mauricio Carneiro f6a18aea63 Updated MDCP with INDEL best practices
* chose 90.0 indel cut target for most datasets (this is arbitrary).
2012-01-06 17:21:59 -05:00
Mark DePristo 65c614fb4b Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-06 16:38:26 -05:00
Mark DePristo d9da37f9b4 Added SQL table creation and log loading to analyzeRunReports
-- You can create (and drop the old) GATK_LOG table with the setupDB command
-- You can load data into the database with the loadToDB command

Currently I'm pushing up all of the GATK logs into the new MySQL server setup for the gsa group.  Details of the server are in the code, for those interested.  All of this is part of my experimentation with Tableau for visualizing GATK run logs.
2012-01-06 16:35:53 -05:00
Guillermo del Angel d4e7655d14 Added ability to call multiallelic indels, if -multiallelic is included in UG arguments. Simple idea: we genotype all alleles with count >= minIndelCnt.
To support this, refactored code that computes consensus alleles. To ease merging of mulitple alt alleles, we create a single vc for each alt alleles and then use VariantContextUtils.simpleMerge to carry out merging, which takes care of handling all corner conditions already. In order to use this, interface to GenotypeLikelihoodsCalculationModel changed to pass in a GenomeLocParser object (why are these objects to hard to handle??).
More testing is required and feature turned off my default.
2012-01-06 11:24:38 -05:00
Mauricio Carneiro 43224ef364 Turning the Adaptive Downsampler on with 100 by default 2012-01-05 23:47:27 -05:00
Mark DePristo dd80ffbbbe Merged bug fix from Stable into Unstable 2012-01-05 21:51:48 -05:00
Mark DePristo c96fee477c Bug fix for VariantSummary
-- Call sets with indels > 50 bp in length are tagged as CNVs in the tag (following the 1000 Genomes convention) and were unconditionally checking whether the CNV is already known, by looking at the known cnvs file, which is optional.  Fixed.  Has the annoying side effect that indels > 50bp in size are not counted as indels, and so are substrated from both the novel and known counts for indels.  C'est la vie
-- Added integration test to check for this case, using Mauricio's most recent VCF file for NA12878 which has many large indels.  Using this more recent and representative file probably a good idea for more future tests in VE and other tools.  File is NA12878.HiSeq.WGS.b37_decoy.indel.recalibrated.vcf in Validation_Data
2012-01-05 21:51:06 -05:00
Eric Banks f5e10e9879 Merged bug fix from Stable into Unstable 2012-01-05 15:35:09 -05:00
Eric Banks 18ed954741 Compute Ti/Tv only if bi-allelic 2012-01-05 15:33:26 -05:00
Ryan Poplin a6886a4cc0 Initial commit of the Active Region Traversal. Not ready to be used by anyone yet. 2012-01-04 17:03:21 -05:00
Guillermo del Angel 58d4539304 Enabled banded indel computation by default. Reversed logic in input UG argument so that we can still disable it if required. Minor changes to integration tests due to minor differences in GL's and in annotations 2012-01-04 15:28:26 -05:00
Christopher Hartl 5cdde168af Switch from using BWA to direct edit distance inspection. Seems to work quite well. 2012-01-04 14:25:43 -05:00
Christopher Hartl 310c05bd09 Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-04 01:12:23 -05:00
Christopher Hartl 31ecc38db8 Initial implementation of a walker for redesigning low or high GC baits using a fairly textbook genetic algorithm. 2012-01-04 01:10:28 -05:00
David Roazen fe67276e1e Merged bug fix from Stable into Unstable 2012-01-04 00:54:02 -05:00
Khalid Shakir 253a07fdb1 Implicits conversion issue/bug: QScript String<==>File shortcuts at compile time do not make String.equals(File) at runtime. 2012-01-03 18:43:45 -05:00
Mauricio Carneiro 9ff8a01da2 Merged bug fix from Stable into Unstable 2012-01-03 18:10:39 -05:00
Mauricio Carneiro 9b55505c03 Fixing PairHMMIndelErrorModel array out of bounds
This error was due to the ReadClipper change of contract. Before the read utils would return null if a read was entirely clipped, now it returns an empty (safe) GATKSAMRecord.
2012-01-03 18:08:46 -05:00
Christopher Hartl 2c3a9ce02f Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable 2012-01-03 17:25:56 -05:00
David Roazen 621ee2b613 Merged bug fix from Stable into Unstable 2012-01-03 16:56:49 -05:00
Christopher Hartl 9093de1132 Cleanup: remove code to calculate the MLE AC in the UGE. 2012-01-03 15:58:51 -05:00
Christopher Hartl 2d093828a4 Final changes to Junky (been frozen for a while, but uncommitted) and the qscript for it. A first cursory implementation of the trellis-based Exact AC-constrained genotyping algorithm in UGE. Nothing calls into it, so this should be entirely safe (and, no surprise, it passes UG integration tests). 2012-01-03 15:33:04 -05:00
David Roazen ea6e718cb8 SnpEff 2.0.5 support. Re-enabled SnpEff in the HybridSelectionPipeline.
For now, we recommend only running with the GRCh37.64 database.
2012-01-03 15:18:36 -05:00
Christopher Hartl 93e1417b6e Update to the VSS GATK documentation. 2012-01-03 13:39:31 -05:00
David Roazen 4984ca5e31 Merged bug fix from Stable into Unstable 2012-01-03 11:03:30 -05:00
David Roazen f3f01da1af Enforce serial dependencies in RecalibrationWalkersIntegrationTest
Some tests in this class were intermittently not being executed due
to being randomly scheduled before tests whose results they depend on.
Now the serial dependencies are enforced to avoid problematic orderings.
2012-01-03 10:42:41 -05:00
David Roazen 055364d786 Always use full, three-part version numbers.
Previously, the initial release of a new GATK version had a version
number with only one part (eg., "1.4"). This could potentially mislead
people into thinking it's the most recent revision of a release, instead
of the least recent.

Now, initial releases will have full, three-part version numbers
(eg., "1.4-0-g472fc94") like everything else.
2012-01-03 10:25:19 -05:00
Eric Banks ab8d47d9a5 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-03 09:38:49 -05:00
Mauricio Carneiro ca669ae744 Optimizations to the CoverageByRG walker
* outputs only the groups of read groups necessary, avoiding multiple pileup creations every call to map
   * now also counts the number of variants associated with a given ROD (dbSNP) exist in the interval
   * new column: interval size
2012-01-03 09:36:01 -05:00
Mauricio Carneiro 3d4bf273de Added getPileupForReadGroups to ReadBackPileup
* returns a pileup for all the read groups provided.
   * saves us from multiple calls to getPileup (which is very inefficient)
2012-01-03 09:35:11 -05:00