Commit Graph

11422 Commits (ffbd4d85f2e0112b32df0bbba00330b00a0806cf)

Author SHA1 Message Date
Ryan Poplin 00c23bf704 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-05 15:53:05 -05:00
Ryan Poplin 234ff64556 Changes to AssessNA12878 to allow for 100s of input callsets to assess against the database. 2012-12-05 15:52:57 -05:00
Ami Levy-Moonshine 5d78a61f7a Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-05 15:07:12 -05:00
Mark DePristo d0cab795b7 Got caught in the middle of a bad integration test, that was fixed in independent push. Moved test bam into testdata. 2012-12-05 14:49:22 -05:00
Mark DePristo 465694078e Major performance improvement to the GATK engine
-- The NanoSchedule timing code (in NSRuntimeProfile) was crazy expensive, but never showed up in the profilers.  Removed all of the timing code from the NanoScheduler, the NSRuntimeProfile itself, and updated the unit tests.
-- For tools that largely pass through data quickly, this change reduces runtimes by as much as 10x.  For the RealignerTargetCreator example, the runtime before this commit was 3 hours, and after is 30 minutes (6x improvement).
-- Took this opportunity to improve the GATK ProgressMeter.  NotifyOfProgress now just keeps track of the maximum position seen, and a separate daemon thread ProgressMeterDaemon periodically wakes up and prints the current progress.  This removes all inner loop calls to the GATK timers.
-- The history of the bug started here: http://gatkforums.broadinstitute.org/discussion/comment/2402#Comment_2402
2012-12-05 14:49:22 -05:00
Mark DePristo 2b601571e7 Better error handling in NanoScheduler
-- The previous nanoscheduler would deadlock in the case where an Error, not an Exception, was thrown.  Errors, like out of memory, would cause the whole system to die.  This bugfix resolves that issue
2012-12-05 14:49:22 -05:00
Mark DePristo 51dbb562c9 Reduce amount of debugging information from NA12878KnowledgeBaseServer 2012-12-05 14:49:22 -05:00
Mauricio Carneiro efe256ec09 binary search implementation to find the minimum coverage
speeds up the walker from 7 days to 12 minutes on chr20.
2012-12-05 14:45:57 -05:00
Chris Hartl 430d6a07f2 Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable 2012-12-05 11:20:28 -05:00
Eric Banks 0c925856cb Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-05 02:00:39 -05:00
Eric Banks ef87b18e09 In retrospect, it wasn't a good idea to have FisherStrand handle reduced reads since they are always on the forward strand. For now, FS ignores reduced reads but I've added a note (and JIRA) to make this work once the RR het compression is enabled (since we will have directionality in reads then). 2012-12-05 02:00:35 -05:00
Mauricio Carneiro 13896356ad Added bootstrapping and fixed the GLM model of the FMCC 2012-12-05 01:32:19 -05:00
Mauricio Carneiro 30f013aeb0 Added a copy() method for ReadBackedPileups
necessary to create new alignment contexts with hard-copies of the pileup.
2012-12-05 01:32:18 -05:00
Mauricio Carneiro 6feda540a4 Better error message for SimpleGATKReports 2012-12-05 01:32:18 -05:00
Eric Banks 726332db79 Disabling the testNoCmdLineHeaderStdout test in UG because it keeps crashing when I run it locally 2012-12-05 00:54:00 -05:00
kshakir 61bde6210b Restored RemoteFile push and pull in base QScript. 2012-12-04 12:34:07 -05:00
Randal Moore 8d2d0253a2 introduce a level of indirection for the forum URLs - this new function will allow me a place to morph the URL into something that is supported by Confluence
Signed-off-by: Eric Banks <ebanks@broadinstitute.org>
2012-12-03 22:33:02 -05:00
Eric Banks 1af41754e3 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-03 22:01:11 -05:00
Eric Banks bca860723a Updating tests to handle bad validation data files (that used the wrong qual score encoding); overrides push from stable. 2012-12-03 22:01:07 -05:00
Eric Banks 387c0defed don't change md5 here because I am handling it separately from unstable with a better command-line in the test 2012-12-03 21:49:45 -05:00
Eric Banks ef95757311 Fix MD5 because of a need to fix a busted bam file in our validation directory (it used the wrong quality score encoding...) 2012-12-03 21:46:46 -05:00
Guillermo del Angel 4ced2e4ffc Merge branch 'develop' of github.com:broadinstitute/cmi-gatk into develop 2012-12-03 20:14:43 -05:00
Guillermo del Angel c2c6b858e3 Better checks/more flexibility in fastq2bam parsing. Immediate benefit: we can now process normal-only samples, and metadata should be able to specify tumor/normal pairs in any order. Hard-coded hacks removed. DEV-134 #resolve #time 3m 2012-12-03 20:14:37 -05:00
Menachem Fromer 472381245a Allow for more refined control of memory and queues to run with 2012-12-03 17:07:03 -05:00
Eric Banks 67932b357d Bug fix for RR: don't let the softclip start position be less than 1 2012-12-03 15:59:14 -05:00
Ryan Poplin d5ed184691 Updating the HC integration test md5s. According to the NA12878 knowledge base this commit cuts down the FP rate by more than 50 percent with no loss in sensitivity. 2012-12-03 15:38:59 -05:00
Ryan Poplin a47da9bb2f Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-03 14:30:14 -05:00
Ryan Poplin 156d6a5e0b misc minor bug fixes to GenotypingEngine. 2012-12-03 12:47:35 -05:00
Eric Banks 5fed9df295 Quick fix: base qual array in the GATKSAMRecord stores the actual phred values (-33) and not the original bytes (duh). 2012-12-03 12:18:20 -05:00
Eric Banks b6839b3049 Added checking in the GATK for mis-encoded quality scores.
The check is performed by a Read Transformer that samples (currently set to once
every 1000 reads so that we don't hurt overall GATK performance) from the input
reads and checks to make sure that none of the base quals is too high (> Q60). If
we encounter such a base then we fail with a User Error.

* Can be over-ridden with --allow_potentially_misencoded_quality_scores.
* Also, the user can choose to fix his quals on the fly (presumably using PrintReads
  to write out a fixed bam) with the --fix_misencoded_quality_scores argument.

Added unit tests.
2012-12-03 11:18:41 -05:00
Ryan Poplin 18b002c99c Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-03 10:08:56 -05:00
Eric Banks 6f523a1ea0 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-03 08:41:21 -05:00
Eric Banks 59fc7456cf Updated expectations for novel TiTv in HSP after Mark's fixes to the exact model 2012-12-03 08:41:13 -05:00
Mark DePristo f0a4710247 Callset summary now includes a table for the consensus itself 2012-12-02 16:40:12 -05:00
Mark DePristo ce9a323c04 NA12878 knowledge base automatically filters duplicate records out in the SiteIterator
-- Now it doesn't matter if there are duplicate records (all fields equal up to the date) in the knowledge base
2012-12-02 14:21:29 -05:00
Ryan Poplin 1bdf17ef53 Reworking of how the likelihood calculation is organized in the HaplotypeCaller to facilitate the inclusion of per allele downsampling. We now use the downsampling for both the GL calculations and the annotation calculations. 2012-12-02 11:58:32 -05:00
Mark DePristo 1828d33a5a Bugfix to AssessNA12878
-- Wasn't handling indel overlaps correctly in SiteIterator.getSitesBefore, causing it to incorrectly skip variants underlying indels (the getSitesBefore was considering both start and stop [not the correct behavior]) causing it to only get sites up to the first record whose stop overlapped the requested start.
2012-12-02 11:09:15 -05:00
Eric Banks d7b951b6f3 Finished up my reviews for megabase chr20:10M-11M. Fixed out of order record from earlier. 2012-12-01 23:35:21 -05:00
Mark DePristo 2849889af5 Updating md5 for UG 2012-12-01 14:24:19 -05:00
Ami Levy-Moonshine d0b8cc7773 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-01 00:08:25 -05:00
Ami Levy-Moonshine 969c995298 work under development - catVariants. Changes to AssessRRQuals based on Eric todo comments. bug fix in CombineVariants 2012-12-01 00:08:19 -05:00
depristo 3105f13df3 Merge pull request #4 from jsilter/master
Remove validate, add note to put it back in when public gatk catches up
2012-11-30 13:24:44 -08:00
Mark DePristo 1100f0733b Reviews for all unique omni poly sites on chr20
Updated setup script to includes these and ebanks reviews as well.  Eric -- your file is currently not sorted, fyi
2012-11-30 16:23:27 -05:00
Jacob Silterra 02e98fa516 Remove validate, add note to put it back in when public gatk catches up 2012-11-30 16:08:00 -05:00
Mark DePristo 8020ba14db Minor cleanup of SAMDataSource as part of my system review
-- Changed a few function from public to protected, as they are only used by the package contents, to simplify the SAMDataSource interface
2012-11-30 15:04:41 -05:00
Mark DePristo 66bbe46e5b MongoDBManager prints out meaningful information with toString 2012-11-30 15:04:41 -05:00
Mark DePristo 3248ca3f91 Validate MongoVariantContext on creation 2012-11-30 15:04:40 -05:00
Mark DePristo 79dbcc205c Minor cleanup for working version of igv 2012-11-30 15:04:40 -05:00
Mark DePristo 6b6a14cc6d Moving ConsensusSummarizer to its appropriate home in core of NA12878KB 2012-11-30 15:04:40 -05:00
Douglas Voet e1b5b562eb fix TrivalTask compile issues 2012-11-30 10:48:38 -05:00