Commit Graph

9406 Commits (d8f6bc232b7d2dfe7b797b2b3e6cd7ff58ae271c)

Author SHA1 Message Date
Ryan Poplin 0268da7560 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-18 09:53:00 -05:00
Ryan Poplin 60024e0d7b updating TDT integration test 2012-01-18 09:52:50 -05:00
David Roazen b7c65cb089 Merged bug fix from Stable into Unstable 2012-01-18 09:52:47 -05:00
Ryan Poplin 11982b5a34 We no longer calculate the population-level TDT statistic if there are fewer than 5 trios with full genotype likelihood information. When there is a high degree of missingness the results are skewed or in the worst case come out as NaN. 2012-01-18 09:42:41 -05:00
Mark DePristo ca11f68303 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-18 08:29:03 -05:00
Mark DePristo 9e77facda5 More analyses for random forest test script forest.R 2012-01-18 08:28:47 -05:00
Mark DePristo 5bd1a45879 Usability improvements to analyzeRunReports
-- Print out the name / db of SQL server, not a python connection object
-- Print out the ID, not a python objects, of XML record that fails to convert
2012-01-18 08:27:15 -05:00
Mark DePristo b52db51599 Don't try to write log to a non-existant file 2012-01-18 08:26:49 -05:00
Mark DePristo 763c81d520 No longer enforce MAX_ALLELE_SIZE in VCF codec
-- Instead issue a warning when a large (>1MB) record is encountered
-- Optimized ref.getBytes()[i] => (byte)ref.charAt(i), which avoids an implicit O(n) allocation each iteration through computeReverseClipping()
2012-01-18 07:35:11 -05:00
Mark DePristo 0c7865fdb5 UnitTest for reverseAlleleClipping
-- No code modified yet, just implementing a unit test to ensure correctness of the existing code
2012-01-18 07:35:11 -05:00
David Roazen d5199db8ec Be explicit about setting the snpEff -onlyCoding option in the pipeline
When run without an explicit -onlyCoding option, as we've been doing up to
now, snpEff automatically sets -onlyCoding to "true" provided that there is
at least one transcript marked as "protein_coding", which will always be the
case for us in practice (and indeed, all pipeline runs so far with snpEff
2.0.5 have run with -onlyCoding auto-set to "true").

However, given the disastrous effect on annotation quality setting
"-onlyCoding false" has, we wish to be explicit with this option
rather than relying on snpEff's auto-detection logic.
2012-01-17 20:04:27 -05:00
Christopher Hartl 9770250b72 Fix for Amy W - evidently binding defaults are not null but an unbound object, which caused the improper branch to be entered into. 2012-01-17 17:28:58 -05:00
Mark DePristo b0560f9440 Rev. tribble to fix BED codec bug in tribble 51 2012-01-17 16:40:26 -05:00
Mark DePristo 62801e430a Bugfix for unnecessary optimization
-- don't cache the ref bytes
2012-01-17 16:40:26 -05:00
Mark DePristo f2b0575dee Detect unreasonably large allele strings (>2^16) and throw an error
-- samtools can emit alleles where the ref is 42M Ns and this caused the GATK (via tribble) to hang in several places.
-- Tribble was updated so we actually could read the line properly (rev. to 51 here).
-- Still the parsing algorithms in the GATK aren't happy with such a long allele.  Instead of optimizing the code around an improper use case I put in a limit of 2^16 bp for any allele, and throw a meaningful exception when encountered.
2012-01-17 16:40:26 -05:00
Menachem Fromer 816dcf9616 Finally got around to adding support for Eric's fix to permit annotation exclusion by VariantAnnotator 2012-01-17 16:35:16 -05:00
Ryan Poplin 8b0ddf0aaf Adding notes to CountCovariates docs about using interval lists as database of known variation 2012-01-17 16:13:13 -05:00
Mauricio Carneiro ff2fc514ae Updated plots to CGL walker
a few updates on the CalibrateGenotypeLikelihoods walker output

   * Fixed ggplot2 issue with dataset with poor coverage
   * Added jitter as default geometry
   * Dropped the cut by technology from the graphs
2012-01-17 15:14:47 -05:00
Ryan Poplin 56761297dd Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-17 15:03:32 -05:00
Ryan Poplin 75f87db468 Replacing Mills file with new gold standard indel set in the resource bundle for release with v1.5 2012-01-17 15:02:45 -05:00
Matt Hanna 40ebc17437 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-17 14:49:17 -05:00
Matt Hanna 41d70abe4e At chartl's request, add the bwa aln -N and bwa aln -m parameters to the bindings. 2012-01-17 14:47:53 -05:00
Mark DePristo 2390449f0f Local and S3 archiving scripts now push data to MySQL as well 2012-01-17 14:42:48 -05:00
Ryan Poplin ae259f81cc Bug fixing for merging of read fragments when one fragment contained an indel 2012-01-17 14:39:27 -05:00
Menachem Fromer 80a1ae254b Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-17 14:25:40 -05:00
Menachem Fromer 284a8e9ddc Fixed to match recent minor updates by Khalid and Eric 2012-01-17 14:24:41 -05:00
Christopher Hartl cde224746f Bait Redesign supports baits that overlap, by picking only the start of intervals.
CalibrateGenotypeLikelihoods supports using an external VCF as input for genotype likelihoods. Currently can be a per-sample VCF, but has un-implemented methods for allowing a read-group VCF to be used.

Removed the old constrained genotyping code from UGE -- the trellis calculated is exactly the same as that done in the MLE AC estimate; so we should just re-use that one.
2012-01-17 13:51:05 -05:00
Ryan Poplin 8e23c98dd9 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-17 13:46:28 -05:00
Matt Hanna 32ccde374b Merged bug fix from Stable into Unstable 2012-01-17 11:08:35 -05:00
Matt Hanna 3ba918aff1 Error message cleanup in BAM indexing code. 2012-01-17 11:05:42 -05:00
Mark DePristo aa8a885a5b Generalizing forest.R analysis script
-- Support for N tree analyses
-- Testing of NA omit and roughfix options
-- Misc. analyses and refactoring
2012-01-16 09:33:41 -05:00
Mark DePristo 8ddac9a06f Don't show individual jobs in queueStatus for gsaadm, just count 2012-01-16 09:33:05 -05:00
Mark DePristo 61f82f138f Extract a high-level GATK version from the SVN / GIT full version numbers in analyzeRunReports
-- Maps SVN versions 1.0.5988 for example to 0.5, 1.0.6134 to 0.6, etc
-- Maps GIT versions 1.x-XXX to 1.x

Used in tableau analyses
2012-01-16 09:30:48 -05:00
Mauricio Carneiro 8272c8bd26 Added exceptions to CGL walker
* Assert that a user provided a VCF not some other type of ROD
   * Assert that the VCF has samples
   * Assert that the samples in the BAM exist in the VCF
   * Warn the user if not all samples in the BAM are present in the VCF
2012-01-14 14:10:19 -05:00
Mauricio Carneiro cec7107762 Better location for the downsampling of reads in PrintReads
* using the filter() instead of map() makes for a cleaner walker.
   * renaming the unit tests to make more sense with the other unit and integration tests
2012-01-14 14:06:09 -05:00
Mauricio Carneiro 3a9d9789ae Removing old scripts for genotype accuracy 2012-01-13 16:57:05 -05:00
Mauricio Carneiro 3110a8b69d Genotype likelihoods calibration tool refactored
* automatically generates pdf with all the plots
   * new and updated documentation
   * R script now lives in the classpath (under private)
2012-01-13 16:34:36 -05:00
Khalid Shakir ca48f04fb8 Better handling in pre QC R scripts for older projects (whole_exome_agilent_designed_120) that came out before some metrics were added to Picard.
PCT_PF_READS was plotted with a plot title for PCT_PF_ALIGNED_READS. Now plotting both metrics separately.
2012-01-13 16:31:56 -05:00
Mark DePristo b06074d6e7 Updated SortingVCFWriterBase to use PriorityBlockingQueue so that the class is thread-safe
-- Uses PriorityBlockingQueue instead of PriorityQueue
-- synchronized keywords added to all key functions that modify internal state

Note that this hasn't been tested extensivesly.  Based on report:

http://getsatisfaction.com/gsa/topics/missing_loci_output_in_multi_thread_mode_when_implement_sortingvcfwriterbase?utm_content=topic_link&utm_medium=email&utm_source=new_topic
2012-01-13 09:33:16 -05:00
Mauricio Carneiro 28aa353501 Added "unbiased" downsampling parameter to PrintReads
* also cleaned up and updated part of the unit tests for print reads. Needs a more thorough cleaning.
2012-01-12 16:33:55 -05:00
Matt Hanna 2c3176eb80 Merged bug fix from Stable into Unstable 2012-01-12 13:31:10 -05:00
Matt Hanna cd43f016ce Fixed NPE in getNextOverlappingBAMScheduleEntry() when mixed mapped/unmapped interval lists are used. Added integrationtest to verify behavior. 2012-01-12 13:29:11 -05:00
Eric Banks ed34b4f088 Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-12 10:27:26 -05:00
Eric Banks e7fe9910f7 Create the temp storage for calculating cell values just once as per Mark's TODO 2012-01-12 10:27:10 -05:00
Eric Banks f5f5ed5dcd Don't initialize the cell conformation values (use an else in the loop instead) as per Mark's TODO 2012-01-12 08:50:03 -05:00
Eric Banks 410a340ef5 Swapping the iteration order to run over AF conformations and then samples instead of the reverse minimizes calls to HashMap.get; instead of it being O(n) since we called it for each sample it's now O(1). Runtime on T2D GENES test set is reduced by 5-10%. More optimizations to follow. 2012-01-12 02:04:03 -05:00
Mauricio Carneiro 423d4ac2d3 Quick fix to CalibrateGenotypeLikelihoods
we were using an old check for no calls that doesn't work anymore.
2012-01-11 17:47:44 -05:00
Mauricio Carneiro 77a03c9709 Patching special case in the adaptor clipping
* if the adaptor boundary is more than MAXIMUM_ADAPTOR_SIZE bases away from the read, then let's not clip anything and consider the fragment to be undetermined for this read pair.
   * updated md5's accordingly
2012-01-11 17:47:44 -05:00
Mark DePristo 34cf2fe43b Merged bug fix from Stable into Unstable 2012-01-11 08:55:20 -05:00
Mark DePristo 2e47336a81 Only print out error report for most recent release in runGATKReport.py 2012-01-11 08:54:46 -05:00