Commit Graph

10472 Commits (6a5a70cdf1a80751d1fe54594c0d0d2ee6a3fa87)

Author SHA1 Message Date
Mark DePristo 6a5a70cdf1 Done GSA-539: SimpleTimer should use System.nanoTime for nanoSecond resolution 2012-09-05 15:45:23 -04:00
Mark DePristo 59109d5eeb NanoScheduler tracks time outside of its execute call 2012-09-05 15:45:23 -04:00
Mark DePristo 800a27c3a7 NanoScheduler tracks time within input, map, and reduce
-- Helpful for understanding where the time goes to each bit of the code.
-- Controlled by a local static boolean, to avoid the potential overhead in general
2012-09-05 15:45:23 -04:00
Mark DePristo 7087b22ea3 No debugging output (even conditional) for ReadTransformers in PrintReads 2012-09-05 15:45:23 -04:00
Mark DePristo e01258b261 NanoScheduler now supports printProgress. Bugfixes to printProgress
-- TraverseReadsNano prints progress at the end of each traversal unit
-- Fix bugs in TraversalEngine printProgress
    -- Synchronize the method so we don't get multiple logged outputs when two or more HMSs call printProgress before initialization at the start!
    -- Fix the logic for mustPrint, which actually had the logic of mustNotPrint.  Now we see the done log line that was always supposed to be there
    -- Fix output formatting, as the done() line was incorrectly shifting over the % complete by 1 char as 100.0% didn't fit in %4.1f
-- Add clearer doc on -PF argument so that people know that the performance log can be generated to standard out if one wants
2012-09-05 15:45:23 -04:00
Mark DePristo 6055101df8 NanoScheduler no longer groups inputs, each map() call is interlaced now
-- Maximizes the efficiency of the threads
-- Simplifies interface (yea!)
-- Reduces number of combinatorial tests that need to be performed
2012-09-05 15:45:22 -04:00
Mark DePristo 397a5551ef More memory for gatkdocs and extracthelp targets 2012-09-05 15:45:22 -04:00
Mark DePristo e3b4cc02aa Done GSA-282: Unindexed traversals crash if a read goes off the end of a contig
-- Already fixed in the codebase.  Added unindexed bam and integration tests to ensure this is fine going forward.
2012-09-05 15:45:22 -04:00
Yossi Farjoun d6884e705a Revert "fixed a typo in StringText.properties"
This reverts commit b74c1c17e748f75e59d23545084b983e2a8d2fa6.
2012-09-05 15:21:00 -04:00
Yossi Farjoun ad5fa449e7 fixed a typo in the string comment 2012-09-05 14:46:10 -04:00
Yossi Farjoun f4b39a7545 Merge branch 'master' of ssh://gsa4/humgen/gsa-scr1/gsa-engineering/git/unstable
merging trivially after a commit
2012-09-05 14:33:39 -04:00
Yossi Farjoun 6e517df5d9 fixed a typo in StringText.properties 2012-09-05 14:33:08 -04:00
Eric Banks fc06f39411 Fixed docs for Pileup walker 2012-09-05 09:55:34 -04:00
Christopher Hartl d795437202 - New UserExceptions added for when ReadFilters or Walkers specified on the command line are not found. When -rf xxxx cannot find the class corresponding to xxxx, all read filters are printed in a better formatted way, with links to their gatk docs.
- VariantAnnotatorEngine changed to call genotype annotations even if pilups and allele -> likelihood mappings are not present. Current genotype annotations altered to check for null pilupes and null mappings.
2012-09-04 16:41:44 -04:00
Ryan Poplin 9cc1a9931b Resolving merge conflicts. 2012-09-04 10:47:38 -04:00
Ryan Poplin c9944d81ef Skip array needs to also be used in the updateDataForRead function of the delocalized BQSR. 2012-09-04 10:33:37 -04:00
Mark DePristo 0892f2b8b2 Closing GSA-287:LocusReferenceView doesn't do very well in the case where contigs land off the end of the reference
-- Confirmed that reads spanning off the end of the chromosome don't cause an exception by adding integration test for a single read that starts 7 bases from the end of chromosome 1 and spans 90 bases or so off.  Added pileup integration test to ensure this behavior continues to work
2012-09-03 20:18:56 -04:00
Mark DePristo 52d6bea804 a few more useful git ignores 2012-09-01 11:08:36 -04:00
Mark DePristo 1b0ce511a6 Updating BQSR tests due to my change to reset BQSR calibration data 2012-08-31 19:51:09 -04:00
Eric Banks 277ba94c7b Update from dbsnp135 to dbsnp137. 2012-08-31 14:06:29 -04:00
Eric Banks 5ea7cd6dcc Updating resource bundle: no reason to include both genotype and sites files for Omni and HM3, sites are enough. Also, don't include duplicate entry for the Mills indels. 2012-08-31 14:01:54 -04:00
Mark DePristo f066a02f3e Merge branch 'applyRecalibration' 2012-08-31 13:43:52 -04:00
Mark DePristo c9ea213c9b Make BaseRecalibration thread-safe
-- In the process uncovered two strange things
    1 -- qualityScoreByFullCovariateKey was created but never used.  Seems like a cache?
    2 -- Discovered nasty bug in BaseRecalibrator: https://jira.broadinstitute.org/browse/GSA-534
2012-08-31 13:42:42 -04:00
Mark DePristo 27ddebee53 Protect PrintReads from strange state from TraverseReadsUnitTests 2012-08-31 13:42:41 -04:00
Mark DePristo e028901d54 Fixed bad contract in ReadTransformer 2012-08-31 13:42:41 -04:00
Mark DePristo cf91d894e4 Fix build problems with tests 2012-08-31 13:42:41 -04:00
Mark DePristo 817ece37a2 General infrastructure for ReadTransformers
-- These are like read filters but can be applied either on input, on output, of handled by the walker
-- Previous example of BAQ now uses the general framework
    -- Resulted in massive conceptual cleanup of SAMDataSource and ReadProperties!  Yeah!
-- BQSR now uses this framework.  We can now do BQSR on input, on output, or within a walker
-- PrintReads now handles all read transformers in the walker in map, enabling us to parallelize PrintReads with BAQ and BQSR
-- Currently BQSR is excepting in parallel, which subsequent commit with fix
-- Removed global variable setting in GenomeAnalysisEngine for BAQ, as command line parameters are cleanly handled by ReadTransformer infrastructure
-- In principle ReadFilters are just a special kind of ReadTransformer, but this refactoring is larger than I can do. It's a JIRA entry
-- Many files touched simply due to the refactoring and renaming of classes
2012-08-31 13:42:41 -04:00
Ryan Poplin ff6ebbf3fd Resolving merge conflicts. 2012-08-31 11:25:55 -04:00
Ryan Poplin e22bd09477 Initial fix for delocalized BQSR to make it work with new RefMetaDataTracker 2012-08-31 11:23:08 -04:00
Christopher Hartl 143fbead03 Adding an experimental format field annotation that calculates the per-sample residual dosage after accounting for LD. It's meant to be run in a single pass over a chromosome, for instance.
Currently it does not work due to a bug in the variant annotator engine, see GSA-532. When that's fixed it'll likely reveal broken code.
2012-08-31 04:04:00 -04:00
Eric Banks ac0c44720b I started to put together a set of unit tests for the PileupElement creation functionality of LocusIteratorByState and found pretty quickly that it's definitely still busted for indels. The data provider is nowhere near comprehensive yet, but I need to sit back and think about how to really test some of the functionality of LIBS. Committing what I have for now because at the very least it'll be helpful going forward (failing tests are commented out with TODO). 2012-08-30 22:49:13 -04:00
Mark DePristo 39400c56a9 Update md5s for VQSR, as VQSLOD is now a double and gets the standard double precision treatment in VCF 2012-08-30 19:41:49 -04:00
Mark DePristo 2f749b5e52 Added ThreadSafeMapReduce interface, super of TreeReducible
-- A higher level interface to declare parallelism capability of a walker.  This interface means that the walker can be multi-threaded, but doesn't necessarily support TreeReducible interface, which forces you to have a combine ReduceType operation that isn't appropriate for parallel read walkers
-- Updated ReadWalkers to implement ThreadSafeMapReduce not TreeReducible
2012-08-30 19:41:49 -04:00
Mark DePristo 544740d45d tasking for n threads should give you n threads in NanoScheduler, not n - 1 2012-08-30 19:41:49 -04:00
Mark DePristo 1212dfd2ef Reduce the number of test combinations in ReadBasedREferenceOrderedView 2012-08-30 19:41:49 -04:00
Mark DePristo 7a462399ce Fix GSA-529: Fix RODs for parallel read walkers
-- TraverseReadsNano modified to read in all input data before invoking maps, so the input to TraverseReadsNano is a MapData object holding the sam record, the ref context, and the refmetadatatracker.
-- Update ValidateRODForReads to be tree reducible, using synchronized map and explicitly sort the output map from locations -> counts in onTraversalDone
-- Expanded integration tests to test nt 1, 2, 4.
2012-08-30 19:41:49 -04:00
Mark DePristo 7d95176539 Bugfix to compareTo and equals in GenomeLoc
-- Yes, GenomeLoc.compareTo was broken.  The compareTo function only considered the contig and start position, but not the stop, when comparing genome locs.
-- Updated GenomeLoc.compareTo function to account for stop.  Updated GATK code where necessary to fix resulting problems that depended on this.
-- Added unit tests to ensure that hashcode, equals, and compareTo are all correct for GenomeLocs
2012-08-30 19:41:49 -04:00
Mark DePristo 5a9610d875 ReadShards now default to 10K (up from 1K) reads per samFile up to 250K
-- This should help make the inputs for parallel read walkers a little meater, and avoid spinning the shard creation infrastructure so often
2012-08-30 19:41:49 -04:00
Christopher Hartl 5a142fe265 After dicussion with Ryan/Eric, the Structural_Indel variant type is now gone, and has been entirely replaced with the access pattern .isStructuralIndel(). This makes it a strict subtype of indel. I agree that this method is a bit more sensible.
In addition, fix for GSA-310. If supplied -rf argument does not match a known read filter, the list of read filters will be printed, and users directed to the documentation for more information.
2012-08-30 17:57:31 -04:00
Mark DePristo 82b2845b9f Fix: GSA-531 ApplyRecalibration writing to BCF: java.lang.String cannot be cast to java.lang.Double
-- LOD must be added a double to attributes, not as string, so that it can be written out as BCF
2012-08-30 16:59:57 -04:00
Mark DePristo 7b4caec8cb Fix: GSA-531 ApplyRecalibration writing to BCF: java.lang.String cannot be cast to java.lang.Double
-- LOD must be added a double to attributes, not as string, so that it can be written out as BCF
2012-08-30 16:56:36 -04:00
Mark DePristo 863a3d73b8 Added ThreadSafeMapReduce interface, super of TreeReducible
-- A higher level interface to declare parallelism capability of a walker.  This interface means that the walker can be multi-threaded, but doesn't necessarily support TreeReducible interface, which forces you to have a combine ReduceType operation that isn't appropriate for parallel read walkers
-- Updated ReadWalkers to implement ThreadSafeMapReduce not TreeReducible
2012-08-30 16:21:17 -04:00
Mark DePristo 59508f8266 tasking for n threads should give you n threads in NanoScheduler, not n - 1 2012-08-30 15:57:29 -04:00
Mark DePristo 27d1c63448 Reduce the number of test combinations in ReadBasedREferenceOrderedView 2012-08-30 15:56:58 -04:00
Mark DePristo 72cf6bdd9f Fix GSA-529: Fix RODs for parallel read walkers
-- TraverseReadsNano modified to read in all input data before invoking maps, so the input to TraverseReadsNano is a MapData object holding the sam record, the ref context, and the refmetadatatracker.
-- Update ValidateRODForReads to be tree reducible, using synchronized map and explicitly sort the output map from locations -> counts in onTraversalDone
-- Expanded integration tests to test nt 1, 2, 4.
2012-08-30 15:10:58 -04:00
Mark DePristo 7f166c3198 Bugfix to compareTo and equals in GenomeLoc
-- Yes, GenomeLoc.compareTo was broken.  The compareTo function only considered the contig and start position, but not the stop, when comparing genome locs.
-- Updated GenomeLoc.compareTo function to account for stop.  Updated GATK code where necessary to fix resulting problems that depended on this.
-- Added unit tests to ensure that hashcode, equals, and compareTo are all correct for GenomeLocs
2012-08-30 15:07:02 -04:00
Ryan Poplin 7b366d4049 misc cleanup in active region traversal. 2012-08-30 11:01:01 -04:00
Mark DePristo 792092b891 ReadShards now default to 10K (up from 1K) reads per samFile up to 250K
-- This should help make the inputs for parallel read walkers a little meater, and avoid spinning the shard creation infrastructure so often
2012-08-30 10:39:16 -04:00
Mark DePristo 76853806b0 Print out the time when downloads finished from S3 2012-08-30 10:15:11 -04:00
Mark DePristo 21dd70ed36 Test to ensure that ReadBasedReferenceOrderedView produces stateless objects
-- Stateless objects are required for nano-scheduling.  This means you can take the RefMetaDataTracker provided by ReadBasedReferenceOrderedView, store it way, get another from the same view, and the original one behaves the same.
2012-08-30 10:15:11 -04:00