Ryan Poplin
00c23bf704
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2012-12-05 15:53:05 -05:00
Ryan Poplin
234ff64556
Changes to AssessNA12878 to allow for 100s of input callsets to assess against the database.
2012-12-05 15:52:57 -05:00
Ami Levy-Moonshine
5d78a61f7a
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2012-12-05 15:07:12 -05:00
Mark DePristo
d0cab795b7
Got caught in the middle of a bad integration test, that was fixed in independent push. Moved test bam into testdata.
2012-12-05 14:49:22 -05:00
Mark DePristo
465694078e
Major performance improvement to the GATK engine
...
-- The NanoSchedule timing code (in NSRuntimeProfile) was crazy expensive, but never showed up in the profilers. Removed all of the timing code from the NanoScheduler, the NSRuntimeProfile itself, and updated the unit tests.
-- For tools that largely pass through data quickly, this change reduces runtimes by as much as 10x. For the RealignerTargetCreator example, the runtime before this commit was 3 hours, and after is 30 minutes (6x improvement).
-- Took this opportunity to improve the GATK ProgressMeter. NotifyOfProgress now just keeps track of the maximum position seen, and a separate daemon thread ProgressMeterDaemon periodically wakes up and prints the current progress. This removes all inner loop calls to the GATK timers.
-- The history of the bug started here: http://gatkforums.broadinstitute.org/discussion/comment/2402#Comment_2402
2012-12-05 14:49:22 -05:00
Mark DePristo
2b601571e7
Better error handling in NanoScheduler
...
-- The previous nanoscheduler would deadlock in the case where an Error, not an Exception, was thrown. Errors, like out of memory, would cause the whole system to die. This bugfix resolves that issue
2012-12-05 14:49:22 -05:00
Mark DePristo
51dbb562c9
Reduce amount of debugging information from NA12878KnowledgeBaseServer
2012-12-05 14:49:22 -05:00
Mauricio Carneiro
efe256ec09
binary search implementation to find the minimum coverage
...
speeds up the walker from 7 days to 12 minutes on chr20.
2012-12-05 14:45:57 -05:00
Chris Hartl
430d6a07f2
Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable
2012-12-05 11:20:28 -05:00
Eric Banks
0c925856cb
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2012-12-05 02:00:39 -05:00
Eric Banks
ef87b18e09
In retrospect, it wasn't a good idea to have FisherStrand handle reduced reads since they are always on the forward strand. For now, FS ignores reduced reads but I've added a note (and JIRA) to make this work once the RR het compression is enabled (since we will have directionality in reads then).
2012-12-05 02:00:35 -05:00
Mauricio Carneiro
13896356ad
Added bootstrapping and fixed the GLM model of the FMCC
2012-12-05 01:32:19 -05:00
Mauricio Carneiro
30f013aeb0
Added a copy() method for ReadBackedPileups
...
necessary to create new alignment contexts with hard-copies of the pileup.
2012-12-05 01:32:18 -05:00
Mauricio Carneiro
6feda540a4
Better error message for SimpleGATKReports
2012-12-05 01:32:18 -05:00
Eric Banks
726332db79
Disabling the testNoCmdLineHeaderStdout test in UG because it keeps crashing when I run it locally
2012-12-05 00:54:00 -05:00
kshakir
61bde6210b
Restored RemoteFile push and pull in base QScript.
2012-12-04 12:34:07 -05:00
Randal Moore
8d2d0253a2
introduce a level of indirection for the forum URLs - this new function will allow me a place to morph the URL into something that is supported by Confluence
...
Signed-off-by: Eric Banks <ebanks@broadinstitute.org>
2012-12-03 22:33:02 -05:00
Eric Banks
1af41754e3
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2012-12-03 22:01:11 -05:00
Eric Banks
bca860723a
Updating tests to handle bad validation data files (that used the wrong qual score encoding); overrides push from stable.
2012-12-03 22:01:07 -05:00
Eric Banks
387c0defed
don't change md5 here because I am handling it separately from unstable with a better command-line in the test
2012-12-03 21:49:45 -05:00
Eric Banks
ef95757311
Fix MD5 because of a need to fix a busted bam file in our validation directory (it used the wrong quality score encoding...)
2012-12-03 21:46:46 -05:00
Guillermo del Angel
4ced2e4ffc
Merge branch 'develop' of github.com:broadinstitute/cmi-gatk into develop
2012-12-03 20:14:43 -05:00
Guillermo del Angel
c2c6b858e3
Better checks/more flexibility in fastq2bam parsing. Immediate benefit: we can now process normal-only samples, and metadata should be able to specify tumor/normal pairs in any order. Hard-coded hacks removed. DEV-134 #resolve #time 3m
2012-12-03 20:14:37 -05:00
Menachem Fromer
472381245a
Allow for more refined control of memory and queues to run with
2012-12-03 17:07:03 -05:00
Eric Banks
67932b357d
Bug fix for RR: don't let the softclip start position be less than 1
2012-12-03 15:59:14 -05:00
Ryan Poplin
d5ed184691
Updating the HC integration test md5s. According to the NA12878 knowledge base this commit cuts down the FP rate by more than 50 percent with no loss in sensitivity.
2012-12-03 15:38:59 -05:00
Ryan Poplin
a47da9bb2f
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2012-12-03 14:30:14 -05:00
Ryan Poplin
156d6a5e0b
misc minor bug fixes to GenotypingEngine.
2012-12-03 12:47:35 -05:00
Eric Banks
5fed9df295
Quick fix: base qual array in the GATKSAMRecord stores the actual phred values (-33) and not the original bytes (duh).
2012-12-03 12:18:20 -05:00
Eric Banks
b6839b3049
Added checking in the GATK for mis-encoded quality scores.
...
The check is performed by a Read Transformer that samples (currently set to once
every 1000 reads so that we don't hurt overall GATK performance) from the input
reads and checks to make sure that none of the base quals is too high (> Q60). If
we encounter such a base then we fail with a User Error.
* Can be over-ridden with --allow_potentially_misencoded_quality_scores.
* Also, the user can choose to fix his quals on the fly (presumably using PrintReads
to write out a fixed bam) with the --fix_misencoded_quality_scores argument.
Added unit tests.
2012-12-03 11:18:41 -05:00
Ryan Poplin
18b002c99c
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2012-12-03 10:08:56 -05:00
Eric Banks
6f523a1ea0
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2012-12-03 08:41:21 -05:00
Eric Banks
59fc7456cf
Updated expectations for novel TiTv in HSP after Mark's fixes to the exact model
2012-12-03 08:41:13 -05:00
Mark DePristo
f0a4710247
Callset summary now includes a table for the consensus itself
2012-12-02 16:40:12 -05:00
Mark DePristo
ce9a323c04
NA12878 knowledge base automatically filters duplicate records out in the SiteIterator
...
-- Now it doesn't matter if there are duplicate records (all fields equal up to the date) in the knowledge base
2012-12-02 14:21:29 -05:00
Ryan Poplin
1bdf17ef53
Reworking of how the likelihood calculation is organized in the HaplotypeCaller to facilitate the inclusion of per allele downsampling. We now use the downsampling for both the GL calculations and the annotation calculations.
2012-12-02 11:58:32 -05:00
Mark DePristo
1828d33a5a
Bugfix to AssessNA12878
...
-- Wasn't handling indel overlaps correctly in SiteIterator.getSitesBefore, causing it to incorrectly skip variants underlying indels (the getSitesBefore was considering both start and stop [not the correct behavior]) causing it to only get sites up to the first record whose stop overlapped the requested start.
2012-12-02 11:09:15 -05:00
Eric Banks
d7b951b6f3
Finished up my reviews for megabase chr20:10M-11M. Fixed out of order record from earlier.
2012-12-01 23:35:21 -05:00
Mark DePristo
2849889af5
Updating md5 for UG
2012-12-01 14:24:19 -05:00
Ami Levy-Moonshine
d0b8cc7773
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2012-12-01 00:08:25 -05:00
Ami Levy-Moonshine
969c995298
work under development - catVariants. Changes to AssessRRQuals based on Eric todo comments. bug fix in CombineVariants
2012-12-01 00:08:19 -05:00
depristo
3105f13df3
Merge pull request #4 from jsilter/master
...
Remove validate, add note to put it back in when public gatk catches up
2012-11-30 13:24:44 -08:00
Mark DePristo
1100f0733b
Reviews for all unique omni poly sites on chr20
...
Updated setup script to includes these and ebanks reviews as well. Eric -- your file is currently not sorted, fyi
2012-11-30 16:23:27 -05:00
Jacob Silterra
02e98fa516
Remove validate, add note to put it back in when public gatk catches up
2012-11-30 16:08:00 -05:00
Mark DePristo
8020ba14db
Minor cleanup of SAMDataSource as part of my system review
...
-- Changed a few function from public to protected, as they are only used by the package contents, to simplify the SAMDataSource interface
2012-11-30 15:04:41 -05:00
Mark DePristo
66bbe46e5b
MongoDBManager prints out meaningful information with toString
2012-11-30 15:04:41 -05:00
Mark DePristo
3248ca3f91
Validate MongoVariantContext on creation
2012-11-30 15:04:40 -05:00
Mark DePristo
79dbcc205c
Minor cleanup for working version of igv
2012-11-30 15:04:40 -05:00
Mark DePristo
6b6a14cc6d
Moving ConsensusSummarizer to its appropriate home in core of NA12878KB
2012-11-30 15:04:40 -05:00
Douglas Voet
e1b5b562eb
fix TrivalTask compile issues
2012-11-30 10:48:38 -05:00