Commit Graph

8597 Commits (ca11f6830389c8bbd527d7fcdb1ccd39a2793a2a)

Author SHA1 Message Date
Mark DePristo ca11f68303 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-18 08:29:03 -05:00
Mark DePristo 9e77facda5 More analyses for random forest test script forest.R 2012-01-18 08:28:47 -05:00
Mark DePristo 5bd1a45879 Usability improvements to analyzeRunReports
-- Print out the name / db of SQL server, not a python connection object
-- Print out the ID, not a python objects, of XML record that fails to convert
2012-01-18 08:27:15 -05:00
Mark DePristo b52db51599 Don't try to write log to a non-existant file 2012-01-18 08:26:49 -05:00
Mark DePristo 763c81d520 No longer enforce MAX_ALLELE_SIZE in VCF codec
-- Instead issue a warning when a large (>1MB) record is encountered
-- Optimized ref.getBytes()[i] => (byte)ref.charAt(i), which avoids an implicit O(n) allocation each iteration through computeReverseClipping()
2012-01-18 07:35:11 -05:00
Mark DePristo 0c7865fdb5 UnitTest for reverseAlleleClipping
-- No code modified yet, just implementing a unit test to ensure correctness of the existing code
2012-01-18 07:35:11 -05:00
Christopher Hartl 9770250b72 Fix for Amy W - evidently binding defaults are not null but an unbound object, which caused the improper branch to be entered into. 2012-01-17 17:28:58 -05:00
Mark DePristo b0560f9440 Rev. tribble to fix BED codec bug in tribble 51 2012-01-17 16:40:26 -05:00
Mark DePristo 62801e430a Bugfix for unnecessary optimization
-- don't cache the ref bytes
2012-01-17 16:40:26 -05:00
Mark DePristo f2b0575dee Detect unreasonably large allele strings (>2^16) and throw an error
-- samtools can emit alleles where the ref is 42M Ns and this caused the GATK (via tribble) to hang in several places.
-- Tribble was updated so we actually could read the line properly (rev. to 51 here).
-- Still the parsing algorithms in the GATK aren't happy with such a long allele.  Instead of optimizing the code around an improper use case I put in a limit of 2^16 bp for any allele, and throw a meaningful exception when encountered.
2012-01-17 16:40:26 -05:00
Menachem Fromer 816dcf9616 Finally got around to adding support for Eric's fix to permit annotation exclusion by VariantAnnotator 2012-01-17 16:35:16 -05:00
Mauricio Carneiro ff2fc514ae Updated plots to CGL walker
a few updates on the CalibrateGenotypeLikelihoods walker output

   * Fixed ggplot2 issue with dataset with poor coverage
   * Added jitter as default geometry
   * Dropped the cut by technology from the graphs
2012-01-17 15:14:47 -05:00
Ryan Poplin 56761297dd Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-17 15:03:32 -05:00
Ryan Poplin 75f87db468 Replacing Mills file with new gold standard indel set in the resource bundle for release with v1.5 2012-01-17 15:02:45 -05:00
Matt Hanna 40ebc17437 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-17 14:49:17 -05:00
Matt Hanna 41d70abe4e At chartl's request, add the bwa aln -N and bwa aln -m parameters to the bindings. 2012-01-17 14:47:53 -05:00
Mark DePristo 2390449f0f Local and S3 archiving scripts now push data to MySQL as well 2012-01-17 14:42:48 -05:00
Menachem Fromer 80a1ae254b Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-17 14:25:40 -05:00
Menachem Fromer 284a8e9ddc Fixed to match recent minor updates by Khalid and Eric 2012-01-17 14:24:41 -05:00
Christopher Hartl cde224746f Bait Redesign supports baits that overlap, by picking only the start of intervals.
CalibrateGenotypeLikelihoods supports using an external VCF as input for genotype likelihoods. Currently can be a per-sample VCF, but has un-implemented methods for allowing a read-group VCF to be used.

Removed the old constrained genotyping code from UGE -- the trellis calculated is exactly the same as that done in the MLE AC estimate; so we should just re-use that one.
2012-01-17 13:51:05 -05:00
Matt Hanna 32ccde374b Merged bug fix from Stable into Unstable 2012-01-17 11:08:35 -05:00
Matt Hanna 3ba918aff1 Error message cleanup in BAM indexing code. 2012-01-17 11:05:42 -05:00
Mark DePristo aa8a885a5b Generalizing forest.R analysis script
-- Support for N tree analyses
-- Testing of NA omit and roughfix options
-- Misc. analyses and refactoring
2012-01-16 09:33:41 -05:00
Mark DePristo 8ddac9a06f Don't show individual jobs in queueStatus for gsaadm, just count 2012-01-16 09:33:05 -05:00
Mark DePristo 61f82f138f Extract a high-level GATK version from the SVN / GIT full version numbers in analyzeRunReports
-- Maps SVN versions 1.0.5988 for example to 0.5, 1.0.6134 to 0.6, etc
-- Maps GIT versions 1.x-XXX to 1.x

Used in tableau analyses
2012-01-16 09:30:48 -05:00
Mauricio Carneiro 8272c8bd26 Added exceptions to CGL walker
* Assert that a user provided a VCF not some other type of ROD
   * Assert that the VCF has samples
   * Assert that the samples in the BAM exist in the VCF
   * Warn the user if not all samples in the BAM are present in the VCF
2012-01-14 14:10:19 -05:00
Mauricio Carneiro cec7107762 Better location for the downsampling of reads in PrintReads
* using the filter() instead of map() makes for a cleaner walker.
   * renaming the unit tests to make more sense with the other unit and integration tests
2012-01-14 14:06:09 -05:00
Mauricio Carneiro 3a9d9789ae Removing old scripts for genotype accuracy 2012-01-13 16:57:05 -05:00
Mauricio Carneiro 3110a8b69d Genotype likelihoods calibration tool refactored
* automatically generates pdf with all the plots
   * new and updated documentation
   * R script now lives in the classpath (under private)
2012-01-13 16:34:36 -05:00
Khalid Shakir ca48f04fb8 Better handling in pre QC R scripts for older projects (whole_exome_agilent_designed_120) that came out before some metrics were added to Picard.
PCT_PF_READS was plotted with a plot title for PCT_PF_ALIGNED_READS. Now plotting both metrics separately.
2012-01-13 16:31:56 -05:00
Mark DePristo b06074d6e7 Updated SortingVCFWriterBase to use PriorityBlockingQueue so that the class is thread-safe
-- Uses PriorityBlockingQueue instead of PriorityQueue
-- synchronized keywords added to all key functions that modify internal state

Note that this hasn't been tested extensivesly.  Based on report:

http://getsatisfaction.com/gsa/topics/missing_loci_output_in_multi_thread_mode_when_implement_sortingvcfwriterbase?utm_content=topic_link&utm_medium=email&utm_source=new_topic
2012-01-13 09:33:16 -05:00
Mauricio Carneiro 28aa353501 Added "unbiased" downsampling parameter to PrintReads
* also cleaned up and updated part of the unit tests for print reads. Needs a more thorough cleaning.
2012-01-12 16:33:55 -05:00
Matt Hanna 2c3176eb80 Merged bug fix from Stable into Unstable 2012-01-12 13:31:10 -05:00
Matt Hanna cd43f016ce Fixed NPE in getNextOverlappingBAMScheduleEntry() when mixed mapped/unmapped interval lists are used. Added integrationtest to verify behavior. 2012-01-12 13:29:11 -05:00
Eric Banks ed34b4f088 Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-12 10:27:26 -05:00
Eric Banks e7fe9910f7 Create the temp storage for calculating cell values just once as per Mark's TODO 2012-01-12 10:27:10 -05:00
Eric Banks f5f5ed5dcd Don't initialize the cell conformation values (use an else in the loop instead) as per Mark's TODO 2012-01-12 08:50:03 -05:00
Eric Banks 410a340ef5 Swapping the iteration order to run over AF conformations and then samples instead of the reverse minimizes calls to HashMap.get; instead of it being O(n) since we called it for each sample it's now O(1). Runtime on T2D GENES test set is reduced by 5-10%. More optimizations to follow. 2012-01-12 02:04:03 -05:00
Mauricio Carneiro 423d4ac2d3 Quick fix to CalibrateGenotypeLikelihoods
we were using an old check for no calls that doesn't work anymore.
2012-01-11 17:47:44 -05:00
Mauricio Carneiro 77a03c9709 Patching special case in the adaptor clipping
* if the adaptor boundary is more than MAXIMUM_ADAPTOR_SIZE bases away from the read, then let's not clip anything and consider the fragment to be undetermined for this read pair.
   * updated md5's accordingly
2012-01-11 17:47:44 -05:00
Mark DePristo 34cf2fe43b Merged bug fix from Stable into Unstable 2012-01-11 08:55:20 -05:00
Mark DePristo 2e47336a81 Only print out error report for most recent release in runGATKReport.py 2012-01-11 08:54:46 -05:00
Khalid Shakir aae61767c6 queueJobReport now compresses PDF when running R 2.13+.
Updated PostCallingQC.scala's VE and R to include missense to silent ratio and plot.
2012-01-10 17:32:30 -05:00
Khalid Shakir a9a6516527 Merged bug fix from Stable into Unstable 2012-01-10 16:16:10 -05:00
Khalid Shakir ef50e77ee2 When running Queue jobs locally, merge the stderr to the stdout log if the error file is NOT specified.
Updated VE strats in the HSP for plotting Ka/Ks by AC.
2012-01-10 16:10:25 -05:00
Eric Banks 3475bfafd3 Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-10 12:39:15 -05:00
Mauricio Carneiro 5bf960deb8 adding dbsnp to indel VQSR 2012-01-10 12:38:49 -05:00
Eric Banks 25d0d53d88 Moving the approximate summing of log10 vals to MathUtils; keeping the more efficient implementation of fast rounding. 2012-01-10 12:38:47 -05:00
Eric Banks 589397d611 Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-10 12:36:48 -05:00
Eric Banks c5320ef1af Resolving changes in integration test during merge 2012-01-10 12:14:16 -05:00