Mark DePristo
ca11f68303
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-18 08:29:03 -05:00
Mark DePristo
9e77facda5
More analyses for random forest test script forest.R
2012-01-18 08:28:47 -05:00
Mark DePristo
5bd1a45879
Usability improvements to analyzeRunReports
...
-- Print out the name / db of SQL server, not a python connection object
-- Print out the ID, not a python objects, of XML record that fails to convert
2012-01-18 08:27:15 -05:00
Mark DePristo
b52db51599
Don't try to write log to a non-existant file
2012-01-18 08:26:49 -05:00
Mark DePristo
763c81d520
No longer enforce MAX_ALLELE_SIZE in VCF codec
...
-- Instead issue a warning when a large (>1MB) record is encountered
-- Optimized ref.getBytes()[i] => (byte)ref.charAt(i), which avoids an implicit O(n) allocation each iteration through computeReverseClipping()
2012-01-18 07:35:11 -05:00
Mark DePristo
0c7865fdb5
UnitTest for reverseAlleleClipping
...
-- No code modified yet, just implementing a unit test to ensure correctness of the existing code
2012-01-18 07:35:11 -05:00
Christopher Hartl
9770250b72
Fix for Amy W - evidently binding defaults are not null but an unbound object, which caused the improper branch to be entered into.
2012-01-17 17:28:58 -05:00
Mark DePristo
b0560f9440
Rev. tribble to fix BED codec bug in tribble 51
2012-01-17 16:40:26 -05:00
Mark DePristo
62801e430a
Bugfix for unnecessary optimization
...
-- don't cache the ref bytes
2012-01-17 16:40:26 -05:00
Mark DePristo
f2b0575dee
Detect unreasonably large allele strings (>2^16) and throw an error
...
-- samtools can emit alleles where the ref is 42M Ns and this caused the GATK (via tribble) to hang in several places.
-- Tribble was updated so we actually could read the line properly (rev. to 51 here).
-- Still the parsing algorithms in the GATK aren't happy with such a long allele. Instead of optimizing the code around an improper use case I put in a limit of 2^16 bp for any allele, and throw a meaningful exception when encountered.
2012-01-17 16:40:26 -05:00
Menachem Fromer
816dcf9616
Finally got around to adding support for Eric's fix to permit annotation exclusion by VariantAnnotator
2012-01-17 16:35:16 -05:00
Mauricio Carneiro
ff2fc514ae
Updated plots to CGL walker
...
a few updates on the CalibrateGenotypeLikelihoods walker output
* Fixed ggplot2 issue with dataset with poor coverage
* Added jitter as default geometry
* Dropped the cut by technology from the graphs
2012-01-17 15:14:47 -05:00
Ryan Poplin
56761297dd
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-17 15:03:32 -05:00
Ryan Poplin
75f87db468
Replacing Mills file with new gold standard indel set in the resource bundle for release with v1.5
2012-01-17 15:02:45 -05:00
Matt Hanna
40ebc17437
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-17 14:49:17 -05:00
Matt Hanna
41d70abe4e
At chartl's request, add the bwa aln -N and bwa aln -m parameters to the bindings.
2012-01-17 14:47:53 -05:00
Mark DePristo
2390449f0f
Local and S3 archiving scripts now push data to MySQL as well
2012-01-17 14:42:48 -05:00
Menachem Fromer
80a1ae254b
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-17 14:25:40 -05:00
Menachem Fromer
284a8e9ddc
Fixed to match recent minor updates by Khalid and Eric
2012-01-17 14:24:41 -05:00
Christopher Hartl
cde224746f
Bait Redesign supports baits that overlap, by picking only the start of intervals.
...
CalibrateGenotypeLikelihoods supports using an external VCF as input for genotype likelihoods. Currently can be a per-sample VCF, but has un-implemented methods for allowing a read-group VCF to be used.
Removed the old constrained genotyping code from UGE -- the trellis calculated is exactly the same as that done in the MLE AC estimate; so we should just re-use that one.
2012-01-17 13:51:05 -05:00
Matt Hanna
32ccde374b
Merged bug fix from Stable into Unstable
2012-01-17 11:08:35 -05:00
Matt Hanna
3ba918aff1
Error message cleanup in BAM indexing code.
2012-01-17 11:05:42 -05:00
Mark DePristo
aa8a885a5b
Generalizing forest.R analysis script
...
-- Support for N tree analyses
-- Testing of NA omit and roughfix options
-- Misc. analyses and refactoring
2012-01-16 09:33:41 -05:00
Mark DePristo
8ddac9a06f
Don't show individual jobs in queueStatus for gsaadm, just count
2012-01-16 09:33:05 -05:00
Mark DePristo
61f82f138f
Extract a high-level GATK version from the SVN / GIT full version numbers in analyzeRunReports
...
-- Maps SVN versions 1.0.5988 for example to 0.5, 1.0.6134 to 0.6, etc
-- Maps GIT versions 1.x-XXX to 1.x
Used in tableau analyses
2012-01-16 09:30:48 -05:00
Mauricio Carneiro
8272c8bd26
Added exceptions to CGL walker
...
* Assert that a user provided a VCF not some other type of ROD
* Assert that the VCF has samples
* Assert that the samples in the BAM exist in the VCF
* Warn the user if not all samples in the BAM are present in the VCF
2012-01-14 14:10:19 -05:00
Mauricio Carneiro
cec7107762
Better location for the downsampling of reads in PrintReads
...
* using the filter() instead of map() makes for a cleaner walker.
* renaming the unit tests to make more sense with the other unit and integration tests
2012-01-14 14:06:09 -05:00
Mauricio Carneiro
3a9d9789ae
Removing old scripts for genotype accuracy
2012-01-13 16:57:05 -05:00
Mauricio Carneiro
3110a8b69d
Genotype likelihoods calibration tool refactored
...
* automatically generates pdf with all the plots
* new and updated documentation
* R script now lives in the classpath (under private)
2012-01-13 16:34:36 -05:00
Khalid Shakir
ca48f04fb8
Better handling in pre QC R scripts for older projects (whole_exome_agilent_designed_120) that came out before some metrics were added to Picard.
...
PCT_PF_READS was plotted with a plot title for PCT_PF_ALIGNED_READS. Now plotting both metrics separately.
2012-01-13 16:31:56 -05:00
Mark DePristo
b06074d6e7
Updated SortingVCFWriterBase to use PriorityBlockingQueue so that the class is thread-safe
...
-- Uses PriorityBlockingQueue instead of PriorityQueue
-- synchronized keywords added to all key functions that modify internal state
Note that this hasn't been tested extensivesly. Based on report:
http://getsatisfaction.com/gsa/topics/missing_loci_output_in_multi_thread_mode_when_implement_sortingvcfwriterbase?utm_content=topic_link&utm_medium=email&utm_source=new_topic
2012-01-13 09:33:16 -05:00
Mauricio Carneiro
28aa353501
Added "unbiased" downsampling parameter to PrintReads
...
* also cleaned up and updated part of the unit tests for print reads. Needs a more thorough cleaning.
2012-01-12 16:33:55 -05:00
Matt Hanna
2c3176eb80
Merged bug fix from Stable into Unstable
2012-01-12 13:31:10 -05:00
Matt Hanna
cd43f016ce
Fixed NPE in getNextOverlappingBAMScheduleEntry() when mixed mapped/unmapped interval lists are used. Added integrationtest to verify behavior.
2012-01-12 13:29:11 -05:00
Eric Banks
ed34b4f088
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-12 10:27:26 -05:00
Eric Banks
e7fe9910f7
Create the temp storage for calculating cell values just once as per Mark's TODO
2012-01-12 10:27:10 -05:00
Eric Banks
f5f5ed5dcd
Don't initialize the cell conformation values (use an else in the loop instead) as per Mark's TODO
2012-01-12 08:50:03 -05:00
Eric Banks
410a340ef5
Swapping the iteration order to run over AF conformations and then samples instead of the reverse minimizes calls to HashMap.get; instead of it being O(n) since we called it for each sample it's now O(1). Runtime on T2D GENES test set is reduced by 5-10%. More optimizations to follow.
2012-01-12 02:04:03 -05:00
Mauricio Carneiro
423d4ac2d3
Quick fix to CalibrateGenotypeLikelihoods
...
we were using an old check for no calls that doesn't work anymore.
2012-01-11 17:47:44 -05:00
Mauricio Carneiro
77a03c9709
Patching special case in the adaptor clipping
...
* if the adaptor boundary is more than MAXIMUM_ADAPTOR_SIZE bases away from the read, then let's not clip anything and consider the fragment to be undetermined for this read pair.
* updated md5's accordingly
2012-01-11 17:47:44 -05:00
Mark DePristo
34cf2fe43b
Merged bug fix from Stable into Unstable
2012-01-11 08:55:20 -05:00
Mark DePristo
2e47336a81
Only print out error report for most recent release in runGATKReport.py
2012-01-11 08:54:46 -05:00
Khalid Shakir
aae61767c6
queueJobReport now compresses PDF when running R 2.13+.
...
Updated PostCallingQC.scala's VE and R to include missense to silent ratio and plot.
2012-01-10 17:32:30 -05:00
Khalid Shakir
a9a6516527
Merged bug fix from Stable into Unstable
2012-01-10 16:16:10 -05:00
Khalid Shakir
ef50e77ee2
When running Queue jobs locally, merge the stderr to the stdout log if the error file is NOT specified.
...
Updated VE strats in the HSP for plotting Ka/Ks by AC.
2012-01-10 16:10:25 -05:00
Eric Banks
3475bfafd3
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-10 12:39:15 -05:00
Mauricio Carneiro
5bf960deb8
adding dbsnp to indel VQSR
2012-01-10 12:38:49 -05:00
Eric Banks
25d0d53d88
Moving the approximate summing of log10 vals to MathUtils; keeping the more efficient implementation of fast rounding.
2012-01-10 12:38:47 -05:00
Eric Banks
589397d611
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-10 12:36:48 -05:00
Eric Banks
c5320ef1af
Resolving changes in integration test during merge
2012-01-10 12:14:16 -05:00