-- The function getReducedCounts() was returning the undecoded reduced read tag, which looks like [10, 5, -1, -5] when the depths were [10, 15, 9, 5]. The only function that actually gave the real counts was getReducedCount(int i) which did the proper decoding. Now GATKSAMRecord decodes the tag into the proper depths vector so that getReduceCounts() returns what one reasonably expects it to, and getReduceCount(i) merely looks up the value at i. Added unit test to ensure this behavior going forward.
-- Changed the name of setReducedCounts() to setReducedCountsTag as this function assumes that counts have already been encoded in the tag way.
-- Trims down active regions and associated reads and haplotypes to a smaller interval based on the events actually in the haplotypes within the original active region (without extension). Radically speeds up calculations when using large active region extensions. The ActiveRegion.trim algorithm does the best job it can of trimming an active region down to a requested interval while ensuring the resulting active region has a region (and extension) no bigger than the original while spanning as much of the requested extend as possible. The trimming results in an active region that is a subset of the previous active region based on the position and types of variants found among the haplotypes
-- Retire error corrector, archive old code and repurpose subsystem into a general kmer counter. The previous error corrector was just broken (conceptually) and was disabled by default in the engine. Now turning on error correction throws a UserException. Old part of the error corrector that counts kmers was extracted and put into KMerCounter.java
-- Add final simplify graph call after we prune away the non-reference paths in DeBruijnAssembler
-- Moved R^2 LD haplotype merging system to the utils.haplotype package
-- New LD merging only enabled with HC argument.
-- EventExtractor and EventExtractorUnitTest refactors so we can test the block substitution code without having to enabled it via a static variable
-- A few misc. bug fixes in LDMerger itself
-- Refactoring of Haplotype event splitting and merging code
-- Renamed EventExtractor to EventMap
-- EventMap has a static method that computes the event maps among n haplotypes
-- Refactor Haplotype score and base comparators into their own classes and unit tested them
-- Refactored R^2 based LD merging code into its own class HaplotypeR2Calculator and unit tested much of it.
-- LDMerger now uses the HaplotypeR2Calculator, which cleans up the code a bunch and allowed me to easily test that code with a MockHaplotypeR2Calculator. For those who haven't seen this testing idiom, have a look, and very useful
-- New algorithm uses a likelihood-ratio test to compute the probability that only the phased haplotypes exist in the population.
-- Fixed fundamental bug in the way the previous R^2 implementation worked
-- Optimizations for HaplotypeLDCalculator: only compute the per sample per haplotype summed likelihoods once, regardless of how many calls there are
-- Previous version would enter infinite loop if it merged two events but the second event had other low likelihood events in other haplotypes that didn't get removed. Now when events are removed they are removed from all event maps, regardless of whether the haplotypes carry both events
-- Bugfixes for EventMap in the HaplotypeCaller as well. Previous version was overly restrictive, requiring that the first event to make into a block substitution was a snp. In some cases we need to merge an insertion with a deletion, such as when the cigar is 10M2I3D4M. The new code supports this. UnitTested and documented as well. LDMerger handles case where merging two alleles results in a no-op event. Merging CA/C + A/AA -> CAA/CAA -> no op. Handles this case by removing the two events. UnitTested
-- Turn off debugging output for the LDMerger in the HaplotypeCaller unless -debug was enabled
-- This new version does a much more specific test (that's actually right). Here's the new algorithm:
* Compute probability that two variants are in phase with each other and that no
* compound hets exist in the population.
*
* Implemented as a likelihood ratio test of the hypothesis:
*
* x11 and x22 are the only haplotypes in the populations
*
* vs.
*
* all four haplotype combinations (x11, x12, x21, and x22) all exist in the population.
*
* Now, since we have to have both variants in the population, we exclude the x11 & x11 state. So the
* p of having just x11 and x22 is P(x11 & x22) + p(x22 & x22).
*
* Alternatively, we might have any configuration that gives us both 1 and 2 alts, which are:
*
* - P(x11 & x12 & x21) -- we have hom-ref and both hets
* - P(x22 & x12 & x21) -- we have hom-alt and both hets
* - P(x22 & x12) -- one haplotype is 22 and the other is het 12
* - P(x22 & x21) -- one haplotype is 22 and the other is het 21
-A UserException is now thrown if either the fai or dict file for the
reference does not exist, with pointers to instructions for creating
these files.
-Gets rid of problematic file locking that was causing intermittent
errors on our farm.
-Integration tests to verify that correct exceptions are thrown in
the case of a missing fai / dict file.
GSA-866 #resolve
-The algorithm for finding the intersection of two sets of intervals
relies on the sortedness of the intervals within each set, but the engine
was not sorting the intervals before attempting to find the intersection.
-The result was that if one or both interval lists was unsorted / lexicographically
sorted, we would often fail to find the intersection correctly.
-Now the IntervalBinding sorts all sets of intervals before returning them,
solving the problem.
-Added an integration test for this case.
GSA-909 #resolve
Currently, the multi-allelic test is covering the following case:
Eval A T,C
Comp A C
reciprocate this so that the reverse can be covered.
Eval A C
Comp A T,C
And furthermore, modify ConcordanceMetrics to more properly handle the situation where multiple alternate alleles are available in the comp. It was possible for an eval C/C sample to match a comp T/T sample, so long as the C allele were also present in at least one other comp sample.
This comes from the fact that "truth" reference alleles can be paired with *any* allele also present in the truth VCF, while truth het/hom var sites are restricted to having to match only the alleles present in the genotype. The reason that truth ref alleles are special case is as follows, imagine:
Eval: A G,T 0/0 2/0 2/2 1/1
Comp: A C,T 0/0 1/0 0/0 0/0
Even though the alt allele of the comp is a C, the assessment of genotypes should be as follows:
Sample1: ref called ref
Sample2: alleles don't match (the alt allele of the comp was not assessed in eval)
Sample3: ref called hom-var
Sample4: alleles don't match (the alt allele of the eval was not assessed in comp)
Before this change, Sample2 was evaluated as "het called het" (as the T allele in eval happens to also be in the comp record, just not in the comp sample). Thus: apply current
logic to comp hom-refs, and the more restrictive logic ("you have to match an allele in the comp genotype") when the comp is not reference.
Also in this commit,major refactoring and testing for MathUtils. A large number of methods were not used at all in the codebase, these methods were removed:
- dotProduct(several types). logDotProduct is used extensively, but not the real-space version.
- vectorSum
- array shuffle, random subset
- countOccurances (general forms, the char form is used in the codebase)
- getNMaxElements
- array permutation
- sorted array permutation
- compare floats
- sum() (for integer arrays and lists).
Final keyword was extensively added to MathUtils.
The ratio() and percentage() methods were revised to error out with non-positive denominators, except in the case of 0/0 (which returns 0.0 (ratio), or 0.0% (percentage)). Random sampling code was updated to make use of the cleaner implementations of generating permutations in MathUtils (allowing the array permutation code to be retired).
The PaperGenotyper still made use of one of these array methods, since it was the only walker it was migrated into the genotyper itself.
In addition, more extensive tests were added for
- logBinomialCoefficient (Newton's identity should always hold)
- logFactorial
- log10sumlog10 and its approximation
All unit tests pass
* It is now cleaner and easier to test; added tests for newly implemented methods.
* Many fixes to the logic to make it work
* The most important change was that after triggering het compression we actually need to back it out if it
creates reads that incorporated too many softclips at any one position (because they get unclipped).
* There was also an off-by-one error in the general code that only manifested itself with het compression.
* Removed support for creating a het consensus around deletions (which was broken anyways).
* Mauricio gave his blessing for this.
* Het compression now works only against known sites (with -known argument).
* The user can pass in one or more VCFs with known SNPs (other variants are ignored).
* If no known SNPs are provided het compression will automatically be disabled.
* Added SAM tag to stranded (i.e. het compressed) reduced reads to distinguish their
strandedness from normal reduced reads.
* GATKSAMRecord now checks for this tag when determining whether or not the read is stranded.
* This allows us to update the FisherStrand annotation to count het compressed reduced reads
towards the FS calculation.
* [It would have been nice to mark the normal reads as unstranded but then we wouldn't be
backwards compatible.]
* Updated integration tests accordingly with new het compressed bams (both for RR and UG).
* In the process of fixing the FS annotation I noticed that SpanningDeletions wasn't handling
RR properly, so I fixed it too.
* Also, the test in the UG engine for determining whether there are too many overlapping
deletions is updated to handle RR.
* I added a special hook in the RR integration tests to additionally run the systematic
coverage checking tool I wrote earlier.
* AssessReducedCoverage is now run against all RR integration tests to ensure coverage is
not lost from original to reduced bam.
* This helped uncover a huge bug in the MultiSampleCompressor where it would drop reads
from all but 1 sample (now fixed).
* AssessReducedCoverage moved from private to protected for packaging reasons.
* #resolve GSA-639
At this point, this commit encompasses most of what is needed for het compression to go live.
There are still a few TODO items that I want to get in before the 2.5 release, but I will save
those for a separate branch because as it is I feel bad for the person who needs to review all
these changes (sorry, Mauricio).
-- added calls to representativeCount() of the pileup instead of using ++
-- renamed CallableLoci integration test
-- added integration test for reduce read support on callable loci
-- DeBruijnAssemblerUnitTest and AlignmentUtilsUnitTest were both in DEBUG = true mode (bad!)
-- Remove the maxHaplotypesToConsider feature of HC as it's not useful
-- @Output isn't required for AssessNA12878
-- Previous version would could non-variant sites in NA12878 that resulted from subsetting a multi-sample VC to NA12878 as CALLED_BUT_NOT_IN_DB sites. Now they are properly skipped
-- Bugfix for subsetting samples to NA12878. Previous version wouldn't trim the alleles when subsetting down a multi-sample VCF, so we'd have false FN/FP sites at indels when the multi-sample VCF has alleles that result in the subset for NA12878 having non-trimmed alleles. Fixed and unit tested now.
Increase one timeout, restore others that were only timing out due to the
Java crypto lib bug to their original values.
-DOUBLE timeout for NanoSchedulerUnitTest.testNanoSchedulerInLoop()
-REDUCE timeout for EngineFeaturesIntegrationTest to its original value
-REDUCE timeout for MaxRuntimeIntegrationTest to its original value
-REDUCE timeout for GATKRunReportUnitTest to its original value
ALL GATK DEVELOPERS PLEASE READ NOTES BELOW:
I have updated the @Output annotation to behave differently and to include a 'defaultToStdout' tag.
* The 'defaultToStdout' tags lets walkers specify whether to default to stdout if -o is not provided.
* The logic for @Output is now:
* if required==true then -o MUST be provided or a User Error is generated.
* if required==false and defaultToStdout==true then the output is assigned to stdout if no -o is provided.
* this is the default behavior (i.e. @Output with no modifiers).
* if required==false and defaultToStdout==false then the output object is null.
* use this combination for truly optional outputs (e.g. the -badSites option in AssessNA12878).
* I have updated walkers so that previous behavior has been maintained (as best I could).
* In general, all @Outputs with default long/short names have required=false.
* Walkers with nWayOut options must have required==false and defaultToStdout==false (I added checks for this)
* I added unit tests for @Output changes with David's help (thanks!).
* #resolve GSA-837
* ClippingOp updated to incorporate Ns in the hard clips.
* ReadUtils.getReadCoordinateForReferenceCoordinate() updated to account for Ns.
* Added test that covers the BQSR case we saw.
* Created GSA-856 (for Mauricio) to add lots of tests to ReadUtils.
* It will require refactoring code and not in the scope of what I was willing to do to fix this.
-- Strandless GATK reads are ones where they don't really have a meaningful strand value, such as Reduced Reads or fragment merged reads. Added GATKSAMRecord support for such reads, along with unit tests
-- The merge overlapping fragments code in FragmentUtils now produces strandless merged fragments
-- FisherStrand annotation generalized to treat strandless as providing 1/2 the representative count for both strands. This means that that merged fragments are properly handled from the HC, so we don't hallucinate fake strand-bias just because we managed to merge a lot of reads together.
-- The previous getReducedCount() wouldn't work if a read was made into a reduced read after getReducedCount() had been called. Added new GATKSAMRecord method setReducedCounts() that does the right thing. Updated SlidingWindow and SyntheticRead to explicitly call this function, and so the readTag parameter is now gone.
-- Update MD5s for change to FS calculation. Differences are just minor updates to the FS
-- Code was undocumented, big, and not well tested. All three things fixed.
-- Currently not passing, but the framework works well for testing
-- Added concat(byte[] ... arrays) to utils
-Allow the default S3 put timeout of 30 seconds for GATKRunReports
to be overridden via a constructor argument, and use a timeout
of 300 seconds for tests. The timeout remains 30 seconds in all
other cases.
-Change integration tests that themselves dispatch farm jobs
into pipeline tests. Necessary because some farm nodes are
not set up as submit hosts. Pipeline tests are still run
directly on gsa4.
-Bump up the timeout for the MaxRuntimeIntegrationTest even more
(was still occasionally failing on the farm!)
- This was needed since samples with spaces in their names are regularly found in the picard pipeline.
- Modified the tests to account for this (removed spaces from the good tests, and changed the failing tests accordingly)
- Cleaned up the unit tests using a @DataProvider (I'm in love...).
- Moved AlleleBiasedDownsamplingUtilsUnitTest to public to match location of class it is testing (due to the way bamboo operates)
-Make MaxRuntimeIntegrationTest more lenient by assuming that startup overhead
might be as long as 120 seconds on a very slow node, rather than the original
assumption of 20 seconds
-In TraverseActiveRegionsUnitTest, write temp bam file to the temp directory, not
to the current working directory
-SimpleTimerUnitTest: This test was internally inconsistent. It asserted that
a particular operation should take no more than 10 milliseconds, and then asserted
again that this same operation should take no more than 100 microseconds (= 0.1 millisecond).
On a slow node it could take slightly longer than 100 microseconds, however.
Changed the test to assert that the operation should require no more than 10000 microseconds
(= 10 milliseconds)
-change global default test timeout from 20 to 40 minutes (things just take longer
on the farm!)
-build.xml: allow runtestonly target to work with scala test classes
* ReadTransformers can say they must be first, must be last, or don't care.
* By default, none of the existing ones care about ordering except BQSR (must be first).
* This addresses a bug reported on the forum where BAQ is incorrectly applied before BQSR.
* The engine now orders the read transformers up front before applying iterators.
* The engine checks for enabled RTs that are not compatible (e.g. both must be first) and blows up (gracefully).
* Added unit tests.
-- The new code includes a new mode to write out a BAM containing reads realigned to the called haplotypes from the HC, which can be easily visualized in IGV.
-- Previous functionality maintained, with bug fixes
-- Haplotype BAM writing code now lives in utils
-- Created a base class that includes most of the functionality of writing reads realigned to haplotypes onto haplotypes.
-- Created two subclasses, one that writes all haplotypes (previous functionality) and a CalledHaplotypeBAMWriter that will only write reads aligned to the actually called haplotypes
-- Extended PerReadAlleleLikelihoodMap.getMostLikelyAllele to optionally restrict set of alleles to consider best
-- Massive increase in unit tests in AlignmentUtils, along with several new powerful functions for manipulating cigars
-- Fix bug in SWPairwiseAlignment that produces cigar elements with 0 size, and are now fixed with consolidateCigar in AlignmentUtils
-- HaplotypeCaller now tracks the called haplotypes in the GenotypingEngine, and returns this information to the HC for use in visualization.
-- Added extensive docs to HaplotypeCaller on how to use this capability
-- BUGFIX -- don't modify the read bases in GATKSAMRecord in LikelihoodCalculationEngine in the HC
-- Cleaned up SWPairwiseAlignment. Refactored out the big main and supplementary static methods. Added a unit test with a bug TODO to fix what seems to be an edge case bug in SW
-- Integration test to make sure we can actually write a BAM for each mode. This test only ensures that the code runs and doesn't exception out. It doesn't actually enforce any MD5s
-- HaplotypeBAMWriter also left aligns indels in the reads, as SW can return a random placement of a read against the haplotype. Calls leftAlign to make the alignments more clear, with unit test of real read to cover this case
-- Writes out haplotypes for both all haplotype and called haplotype mode
-- Haplotype writers now get the active region call, regardless of whether an actual call was made. Only emitting called haplotypes is moved down to CalledHaplotypeBAMWriter
* Fixed GenomeLocSortedSet.add() to ensure that overlapping intervals are detected and an exception is thrown.
* Fixed GenomeLocSortedSet.addRegion() by merging it with the add() method; it now produces sorted inputs in all cases.
* Cleaned up duplicated code throughout the engine to create a list of intervals over all contigs.
* Added more unit tests for add functionality of GLSS.
* Resolves GSA-775.
* Split the cases into reads that don't have a RG at all vs. those with a RG that's not defined in the header.
* Added integration tests to make sure that the correct error is thrown.
* Resolved GSA-407.
-Some QScripts used by public pipeline tests unnecessarily used the (now protected) UnifiedGenotyper.
Changed them to use PrintReads instead.
-Moved ExampleUnifiedGenotyperPipelineTest to protected
-Attempt to fix the flawed and sporadically failing MisencodedBaseQualityUnitTest:
After looking at this class a bit, I think the problem was the use of global arrays for the quals
shared across all reads in all tests (BAMRecord class definitely does not make a separate copy for
each read!). One test (testFixBadQuals) modifies the bad quals array, and if this happens to run
before the testBadQualsThrowsError test the bad quals array will have been "fixed" and no exception
will be thrown.
-replace unnecessary uses of the UnifiedGenotyper by public integration tests
with PrintReads
-move NanoSchedulerIntegrationTest to protected, since it's completely dependent
on the UnifiedGenotyper
-- This is done to take advantage of longer reads which can produce less ambiguous haplotypes
-- Integration tests change for HC and BiasedDownsampling
The GATK engine does not behave correctly when contigs are indexed
differently in the reads sequence dictionaries vs. the reference
sequence dictionary, and the inconsistently-indexed contigs are included
in the user's intervals. For example, given the dictionaries:
Reference dictionary = { chrM, chr1, chr2, ... }
BAM dictionary = { chr1, chr2, ... }
and the interval "-L chr1", the engine would fail to correctly retrieve
the reads from chr1, since chr1 has a different index in the two dictionaries.
With this patch, we throw an exception if there are contig index differences
between the dictionaries for reads and reference, AND the user's intervals
include at least one of the mismatching contigs.
The user can disable this exception via -U ALLOW_SEQ_DICT_INCOMPATIBILITY
In all other cases, dictionary validation behaves as before.
I also added comprehensive unit tests for the (previously-untested)
SequenceDictionaryUtils class.
GSA-768 #resolve
-- Instead of doing a full SW alignment against the reference we read off bubbles from the assembly graph.
-- Smith-Waterman is run only on the base composition of the bubbles which drastically reduces runtime.
-- Refactoring graph functions into a new DeBruijnAssemblyGraph class.
-- Bug fix in path.getBases().
-- Adding validation code to the assembly engine.
-- Renaming SimpleDeBruijnAssembler to match the naming of the new Assembly graph class.
-- Adding bug fixes, docs and unit tests for DeBruijnAssemblyGraph and KBestPaths classes.
-- Added ability to ignore bubbles that are too divergent from the reference
-- Max kmer can't be bigger than the extension size.
-- Reverse the order that we create the assembly graphs so that the bigger kmers are used first.
-- New algorithm for determining unassembled insertions based on the bubble traversal instead of the full SW alignment.
-- Don't need the full read span reference loc for anything any more now that we clip down to the extended loc for both assembly and likelihood evaluation.
-- Updating HaplotypeCaller and BiasedDownsampling integration tests.
-- Rebased everything into one commit as requested by Eric
-- improvements to the bubble traversal are coming as a separate push
-- changed SkipException constructors that are now private in TestNG
-- Updated build.xml to use the latest testng
-- Added guice dependency to ivy
-- Fixed broken SampleDBUnitTest
The SampleDBUnitTest was only passing before because the map comparison in the old TestNG was broken. It was comparing two DIFFERENT samples and testing for "equals"
GSA-695 #resolve
-- AssessNA12878 now breaks out multi-allelics into bi-allelic components. This means that we can properly assess multi-allelic calls against the bi-allelic KB
-- Refactor AssessNA12878, moving into assess package in KB. Split out previously private classes in the walker itself into separate classes. Added real docs for all of the classes.
-- Vastly expand (from 0) unit tests for NA12878 assessments
-- Allow sites only VCs to be evaluated by Assessor
-- Move utility for creating simple VCs from a list of string alleles from GATKVariantContextUtilsUnitTest to GATKVariantContextUtils
-- Assessor bugfix for discordant records at a site. Previous version didn't handle properly the case where one had a non-matching call in the callset w.r.t. the KB, so that the KB element was eaten during the analysis. Fixed. UnitTested
-- See GSA-781 -- Handle multi-allelic variants in KB for more information
-- Bugfix for missing site counting in AssessNA12878. Previous version would count N misses for every missed value at a site. Not that this has much impact but it's worth fixing
-- UnitTests for BadSitesWriter
-- UnitTests for filtered and filtering sites in the Assessor
-- Cleanup end report generation code (simply the code). Note that instead of "indel" the new code will print out "INDELS"
-- Assessor DoC calculations now us LIBS and RBPs for the depth calculation. The previous version was broken for reduced reads. Added unit test that reads a complex reduced read example and matches the DoC of this BAM with the output of the GATK DoC tool here.
-- Added convenience constructor for LIBS using just SAMFileReader and an iterator. It's now easy to create a LIBS from a BAM at a locus. Added advanceToLocus function that moves the LIBS to a specific position. UnitTested via the assessor (which isn't ideal, but is a proper test)
-- Now supports trimming the alleles from both the reverse and forward direction.
-- Added lots of unit tests for forwrad allele trimming, as well as creating VC from forward and reverse trimming.
-- Added docs and tests for the code, to bring it up to GATK spec
-- modified ReadBin GenomeLoc to keep track of softStart() and softEnd() of the reads coming in, to make sure the reference will always be sufficient even if we want to use the soft-clipped bases
-- changed the verification from readLength to aligned bases to allow reads with soft-clipped bases
-- switched TreeSet -> PriorityQueue in the ConstrainedMateFixer as some different reads can be considered equal by picard's SAMRecordCoordinateComparator (the Set was replacing them)
-- pulled out ReadBin class so it can be testable
-- added unit tests for ReadBin with soft-clips
-- added tests for getMismatchCount (AlignmentUtils) to make sure it works with soft-clipped reads
GSA-774 #resolve
-- Active regions are created as normal, but they are split and trimmed to the engine intervals when added to the traversal, if there are intervals present.
-- UnitTests for ActiveRegion.splitAndTrimToIntervals
-- GenomeLocSortedSet.getOverlapping uses binary search to efficiently in ~ log N time find overlapping intervals
-- UnitTesting overlap function in GenomeLocSortedSet
-- Discovered fundamental implementation bug in that adding genome locs out of order (elements on 20 then on 19) produces an invalid GenomeLocSortedSet. Created a JIRA to address this: https://jira.broadinstitute.org/browse/GSA-775
-- Constructor that takes a collection of genome locs now sorts its input and merges overlapping intervals
-- Added docs for the constructors in GLSS
-- Update HaplotypeCaller MD5s, which change because ActiveRegions are now restricted to the engine intervals, which changes slightly the regions in the tests and so the reads in the regions, and thus the md5s
-- GenomeAnalysisEngineUnitTest needs to provide non-null genome loc parser
-- log10 functions in QualityUtils allow -Infinity to allow log10(0.0) values
-- Fix edge condition of log10OneMinusX failing with Double.MIN_VALUE
-- Fix another edge condition of log10OneMinusX failing with a small but not min_value double
-- Fixed a few conversion bugs with edge case quals (ones that were very high)
-- Fixed a critical bug in the conversion of quals that was causing near capped quals to fall below their actual value. Will undoubtedly need to fix md5s
-- More precise prob -> qual calculations for very high confidence events in phredScaleCorrectRate, trueProbToQual, and errorProbToQual. Very likely to improve accuracy of many calculations in the GATK
-- Added errorProbToQual and trueProbToQual calculations that accept an integer cap, and perform the (tricky) conversion from int to byte correctly.
-- Full docs and unit tests for phredScaleCorrectRate and phredScaleErrorRate.
-- Renamed probToQual to trueProbToQual
-- Added goodProbability and log10OneMinusX to MathUtils
-- Went through the GATK and cleaned up many uses of QualityUtils
-- Cleanup constants in QualityUtils
-- Added full docs for all of the constants
-- Rename MAX_QUAL_SCORE to MAX_SAM_QUAL_SCORE for clarity
-- Moved MAX_GATK_USABLE_Q_SCORE to RecalDatum, as it's s BQSR specific feature
-- Convert uses of QualityUtils.errorProbToQual(1-x) to QualityUtils.trueProbToQual(x)
-- Cleanup duplicate quality score routines in MathUtils. Moved and renamed MathUtils.log10ProbabilityToPhredScale => QualityUtils.phredScaleLog10ErrorRate. Removed 3 routines from MathUtils, and remapped their usages into the better routines in QualityUtils