2009-07-06 23:41:30 +08:00
|
|
|
/*
|
2013-01-11 06:04:08 +08:00
|
|
|
* Copyright (c) 2012 The Broad Institute
|
|
|
|
|
*
|
|
|
|
|
* Permission is hereby granted, free of charge, to any person
|
|
|
|
|
* obtaining a copy of this software and associated documentation
|
|
|
|
|
* files (the "Software"), to deal in the Software without
|
|
|
|
|
* restriction, including without limitation the rights to use,
|
|
|
|
|
* copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
|
|
|
* copies of the Software, and to permit persons to whom the
|
|
|
|
|
* Software is furnished to do so, subject to the following
|
|
|
|
|
* conditions:
|
|
|
|
|
*
|
|
|
|
|
* The above copyright notice and this permission notice shall be
|
|
|
|
|
* included in all copies or substantial portions of the Software.
|
|
|
|
|
*
|
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
|
|
|
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
|
|
|
|
|
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
|
|
|
|
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
|
|
|
|
|
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
|
|
|
|
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
|
|
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
|
|
|
|
|
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
|
|
|
*/
|
2009-07-06 23:41:30 +08:00
|
|
|
|
|
|
|
|
package org.broadinstitute.sting.utils;
|
|
|
|
|
|
2013-01-31 05:41:23 +08:00
|
|
|
import org.apache.commons.io.FileUtils;
|
|
|
|
|
import org.broadinstitute.sting.utils.io.IOUtils;
|
2010-11-02 05:31:44 +08:00
|
|
|
import org.testng.Assert;
|
2009-11-18 00:50:01 +08:00
|
|
|
import org.broadinstitute.sting.BaseTest;
|
Replace DeBruijnAssembler with ReadThreadingAssembler
Problem
-------
The DeBruijn assembler was too slow. The cause of the slowness was the need to construct many kmer graphs (from max read length in the interval to 11 kmer, in increments of 6 bp). This need to build many kmer graphs was because the assembler (1) needed long kmers to assemble through regions where a shorter kmer was non-unique in the reference, as we couldn't split cycles in the reference (2) shorter kmers were needed to be sensitive to differences from the reference near the edge of reads, which would be lost often when there was chain of kmers of longer length that started before and after the variant.
Solution
--------
The read threading assembler uses a fixed kmer, in this implementation by default two graphs with 10 and 25 kmers. The algorithm operates as follows:
identify all non-unique kmers of size K among all reads and the reference
for each sequence (ref and read):
find a unique starting position of the sequence in the graph by matching to a unique kmer, or starting a new source node if non exist
for each base in the sequence from the starting vertex kmer:
look at the existing outgoing nodes of current vertex V. If the base in sequence matches the suffix of outgoing vertex N, read the sequence to N, and continue
If no matching next vertex exists, find a unique vertex with kmer K. If one exists, merge the sequence into this vertex, and continue
If a merge vertex cannot be found, create a new vertex (note this vertex may have a kmer identical to another in the graph, if it is not unique) and thread the sequence to this vertex, and continue
This algorithm has a key property: it can robustly use a very short kmer without introducing cycles, as we will create paths through the graph through regions that aren't unique w.r.t. the sequence at the given kmer size. This allows us to assemble well with even very short kmers.
This commit includes many critical changes to the haplotype caller to make it fast, sensitive, and accurate on deep and shallow WGS and exomes, the key changes are highlighted below:
-- The ReadThreading assembler keeps track of the maximum edge multiplicity per sample in the graph, so that we prune per sample, not across all samples. This change is essential to operate effectively when there are many deep samples (i.e., 100 exomes)
-- A new pruning algorithm that will only prune linear paths where the maximum edge weight among all edges in the path have < pruningFactor. This makes pruning more robust when you have a long chain of bases that have high multiplicity at the start but only barely make it back into the main path in the graph.
-- We now do a global SmithWaterman to compute the cigar of a Path, instead of the previous bubble-based SmithWaterman optimization. This change is essential for us to get good variants from our paths when the kmer size is small. It also ensures that we produce a cigar from a path that only depends only the sequence of bases in the path, unlike the previous approach which would depend on both the bases and the way the path was decomposed into vertices, which depended on the kmer size we used.
-- Removed MergeHeadlessIncomingSources, which was introducing problems in the graphs in some cases, and just isn't the safest operation. Since we build a kmer graph of size 10, this operation is no longer necessary as it required a perfect match of 10 bp to merge anyway.
-- The old DebruijnAssembler is still available with a command line option
-- The number of paths we take forward from the each assembly graph is now capped at a factor per sample, so that we allow 128 paths for a single sample up to 10 x nSamples as necessary. This is an essential change to make the system work well for large numbers of samples.
-- Add a global mismapping parameter to the HC likelihood calculation: The phredScaledGlobalReadMismappingRate reflects the average global mismapping rate of all reads, regardless of their mapping quality. This term effects the probability that a read originated from the reference haploytype, regardless of its edit distance from the reference, in that the read could have originated from the reference haplotype but from another location in the genome. Suppose a read has many mismatches from the reference, say like 5, but has a very high mapping quality of 60. Without this parameter, the read would contribute 5 * Q30 evidence in favor of its 5 mismatch haplotype compared to reference, potentially enough to make a call off that single read for all of these events. With this parameter set to Q30, though, the maximum evidence against the reference that this (and any) read could contribute against reference is Q30. -- Controllable via a command line argument, defaulting to Q60 rate. Results from 20:10-11 mb for branch are consistent with the previous behavior, but this does help in cases where you have rare very divergent haplotypes
-- Reduced ActiveRegionExtension from 200 bp to 100 bp, which is a performance win and the large extension is largely unnecessary with the short kmers used with the read threading assembler
Infrastructure changes / improvements
-------------------------------------
-- Refactored BaseGraph to take a subclass of BaseEdge, so that we can use a MultiSampleEdge in the ReadThreadingAssembler
-- Refactored DeBruijnAssembler, moving common functionality into LocalAssemblyEngine, which now more directly manages the subclasses, requiring them to only implement a assemble() method that takes ref and reads and provides a List<SeqGraph>, which the LocalAssemblyEngine takes forward to compute haplotypes and other downstream operations. This allows us to have only a limited amount of code that differentiates the Debruijn and ReadThreading assemblers
-- Refactored active region trimming code into ActiveRegionTrimmer class
-- Cleaned up the arguments in HaplotypeCaller, reorganizing them and making arguments @Hidden and @Advanced as appropriate. Renamed several arguments now that the read threading assembler is the default
-- LocalAssemblyEngineUnitTest reads in the reference sequence from b37, and assembles with synthetic reads intervals from 10-11 mbs with only the reference sequence as well as artificial snps, deletions, and insertions.
-- Misc. updates to Smith Waterman code. Added generic interface to called not surpisingly SmithWaterman, making it easier to have alternative implementations.
-- Many many more unit tests throughout the entire assembler, and in random utilities
2013-04-18 20:17:15 +08:00
|
|
|
import org.testng.annotations.DataProvider;
|
2010-11-02 05:31:44 +08:00
|
|
|
import org.testng.annotations.Test;
|
2009-11-18 00:50:01 +08:00
|
|
|
|
2013-01-31 05:41:23 +08:00
|
|
|
import java.io.File;
|
2013-02-01 06:46:18 +08:00
|
|
|
import java.util.*;
|
2009-07-06 23:41:30 +08:00
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Testing framework for general purpose utilities class.
|
|
|
|
|
*
|
|
|
|
|
* @author hanna
|
|
|
|
|
* @version 0.1
|
|
|
|
|
*/
|
|
|
|
|
|
2010-04-08 14:14:15 +08:00
|
|
|
public class UtilsUnitTest extends BaseTest {
|
2013-02-01 06:46:18 +08:00
|
|
|
@Test
|
|
|
|
|
public void testAppend() {
|
|
|
|
|
for ( int leftSize : Arrays.asList(0, 1, 2, 3) ) {
|
|
|
|
|
for ( final int rightSize : Arrays.asList(0, 1, 2) ) {
|
|
|
|
|
final List<Integer> left = new LinkedList<Integer>();
|
|
|
|
|
for ( int i = 0; i < leftSize; i++ ) left.add(i);
|
|
|
|
|
final List<Integer> total = new LinkedList<Integer>();
|
|
|
|
|
for ( int i = 0; i < leftSize + rightSize; i++ ) total.add(i);
|
|
|
|
|
|
|
|
|
|
if ( rightSize == 0 )
|
|
|
|
|
Assert.assertEquals(Utils.append(left), total);
|
|
|
|
|
if ( rightSize == 1 )
|
|
|
|
|
Assert.assertEquals(Utils.append(left, leftSize), total);
|
|
|
|
|
if ( rightSize == 2 )
|
|
|
|
|
Assert.assertEquals(Utils.append(left, leftSize, leftSize + 1), total);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
}
|
2009-07-06 23:41:30 +08:00
|
|
|
|
|
|
|
|
@Test
|
|
|
|
|
public void testDupStringNoChars() {
|
|
|
|
|
String duped = Utils.dupString('a',0);
|
2010-11-02 05:31:44 +08:00
|
|
|
Assert.assertEquals(duped.length(), 0, "dupString did not produce zero-length string");
|
2009-07-06 23:41:30 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
@Test
|
|
|
|
|
public void testDupStringOneChar() {
|
|
|
|
|
String duped = Utils.dupString('b',1);
|
2010-11-02 05:31:44 +08:00
|
|
|
Assert.assertEquals(duped.length(), 1, "dupString did not produce single character string");
|
|
|
|
|
Assert.assertEquals(duped.charAt(0), 'b', "dupString character was incorrect");
|
2009-07-06 23:41:30 +08:00
|
|
|
}
|
|
|
|
|
|
2013-02-23 04:22:43 +08:00
|
|
|
@Test
|
|
|
|
|
public void testXor() {
|
|
|
|
|
Assert.assertEquals(Utils.xor(false, false), false, "xor F F failed");
|
|
|
|
|
Assert.assertEquals(Utils.xor(false, true), true, "xor F T failed");
|
|
|
|
|
Assert.assertEquals(Utils.xor(true, false), true, "xor T F failed");
|
|
|
|
|
Assert.assertEquals(Utils.xor(true, true), false, "xor T T failed");
|
|
|
|
|
}
|
|
|
|
|
|
2009-07-06 23:41:30 +08:00
|
|
|
@Test
|
|
|
|
|
public void testDupStringMultiChar() {
|
|
|
|
|
String duped = Utils.dupString('c',5);
|
2010-11-02 05:31:44 +08:00
|
|
|
Assert.assertEquals(duped.length(), 5, "dupString did not produce five character string");
|
|
|
|
|
Assert.assertEquals(duped,"ccccc","dupString string was incorrect");
|
2009-07-06 23:41:30 +08:00
|
|
|
}
|
|
|
|
|
|
2009-11-18 00:50:01 +08:00
|
|
|
@Test
|
|
|
|
|
public void testJoinMap() {
|
|
|
|
|
Map<String,Integer> map = new LinkedHashMap<String,Integer>();
|
|
|
|
|
map.put("one",1);
|
|
|
|
|
map.put("two",2);
|
|
|
|
|
String joined = Utils.joinMap("-",";",map);
|
|
|
|
|
Assert.assertTrue("one-1;two-2".equals(joined));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
@Test
|
|
|
|
|
public void testJoinMapLargerSet() {
|
|
|
|
|
Map<String,Integer> map = new LinkedHashMap<String,Integer>();
|
|
|
|
|
map.put("one",1);
|
|
|
|
|
map.put("two",2);
|
|
|
|
|
map.put("three",1);
|
|
|
|
|
map.put("four",2);
|
|
|
|
|
map.put("five",1);
|
|
|
|
|
map.put("six",2);
|
|
|
|
|
String joined = Utils.joinMap("-",";",map);
|
|
|
|
|
Assert.assertTrue("one-1;two-2;three-1;four-2;five-1;six-2".equals(joined));
|
|
|
|
|
}
|
|
|
|
|
|
2013-02-11 11:21:26 +08:00
|
|
|
@Test
|
|
|
|
|
public void testConcat() {
|
|
|
|
|
final String s1 = "A";
|
|
|
|
|
final String s2 = "CC";
|
|
|
|
|
final String s3 = "TTT";
|
|
|
|
|
final String s4 = "GGGG";
|
|
|
|
|
Assert.assertEquals(new String(Utils.concat()), "");
|
|
|
|
|
Assert.assertEquals(new String(Utils.concat(s1.getBytes())), s1);
|
|
|
|
|
Assert.assertEquals(new String(Utils.concat(s1.getBytes(), s2.getBytes())), s1 + s2);
|
|
|
|
|
Assert.assertEquals(new String(Utils.concat(s1.getBytes(), s2.getBytes(), s3.getBytes())), s1 + s2 + s3);
|
|
|
|
|
Assert.assertEquals(new String(Utils.concat(s1.getBytes(), s2.getBytes(), s3.getBytes(), s4.getBytes())), s1 + s2 + s3 + s4);
|
|
|
|
|
}
|
|
|
|
|
|
2010-11-23 06:59:42 +08:00
|
|
|
@Test
|
|
|
|
|
public void testEscapeExpressions() {
|
|
|
|
|
String[] expected, actual;
|
|
|
|
|
|
2011-03-26 08:41:47 +08:00
|
|
|
expected = new String[] {"one", "two", "three"};
|
|
|
|
|
actual = Utils.escapeExpressions("one two three");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions(" one two three");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions("one two three ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions(" one two three ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions(" one two three ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
|
2010-11-23 06:59:42 +08:00
|
|
|
expected = new String[] {"one", "two", "three four", "five", "six"};
|
|
|
|
|
actual = Utils.escapeExpressions("one two 'three four' five six");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions(" one two 'three four' five six");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions("one two 'three four' five six ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions(" one two 'three four' five six ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
2011-03-26 08:41:47 +08:00
|
|
|
actual = Utils.escapeExpressions(" one two 'three four' five six ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
2010-11-23 06:59:42 +08:00
|
|
|
|
|
|
|
|
expected = new String[] {"one two", "three", "four"};
|
|
|
|
|
actual = Utils.escapeExpressions("'one two' three four");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions(" 'one two' three four");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions("'one two' three four ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions(" 'one two' three four ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
2011-03-26 08:41:47 +08:00
|
|
|
actual = Utils.escapeExpressions(" 'one two' three four ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
2010-11-23 06:59:42 +08:00
|
|
|
|
|
|
|
|
expected = new String[] {"one", "two", "three four"};
|
|
|
|
|
actual = Utils.escapeExpressions("one two 'three four'");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions(" one two 'three four'");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions("one two 'three four' ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
|
|
|
|
actual = Utils.escapeExpressions(" one two 'three four' ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
2011-03-26 08:41:47 +08:00
|
|
|
actual = Utils.escapeExpressions(" one two 'three four' ");
|
|
|
|
|
Assert.assertEquals(actual, expected);
|
2010-11-23 06:59:42 +08:00
|
|
|
}
|
2013-01-31 05:41:23 +08:00
|
|
|
|
|
|
|
|
@Test
|
|
|
|
|
public void testCalcMD5() throws Exception {
|
|
|
|
|
final File source = new File(publicTestDir + "exampleFASTA.fasta");
|
|
|
|
|
final String sourceMD5 = "36880691cf9e4178216f7b52e8d85fbe";
|
|
|
|
|
|
|
|
|
|
final byte[] sourceBytes = IOUtils.readFileIntoByteArray(source);
|
|
|
|
|
Assert.assertEquals(Utils.calcMD5(sourceBytes), sourceMD5);
|
|
|
|
|
|
|
|
|
|
final String sourceString = FileUtils.readFileToString(source);
|
|
|
|
|
Assert.assertEquals(Utils.calcMD5(sourceString), sourceMD5);
|
|
|
|
|
}
|
Replace DeBruijnAssembler with ReadThreadingAssembler
Problem
-------
The DeBruijn assembler was too slow. The cause of the slowness was the need to construct many kmer graphs (from max read length in the interval to 11 kmer, in increments of 6 bp). This need to build many kmer graphs was because the assembler (1) needed long kmers to assemble through regions where a shorter kmer was non-unique in the reference, as we couldn't split cycles in the reference (2) shorter kmers were needed to be sensitive to differences from the reference near the edge of reads, which would be lost often when there was chain of kmers of longer length that started before and after the variant.
Solution
--------
The read threading assembler uses a fixed kmer, in this implementation by default two graphs with 10 and 25 kmers. The algorithm operates as follows:
identify all non-unique kmers of size K among all reads and the reference
for each sequence (ref and read):
find a unique starting position of the sequence in the graph by matching to a unique kmer, or starting a new source node if non exist
for each base in the sequence from the starting vertex kmer:
look at the existing outgoing nodes of current vertex V. If the base in sequence matches the suffix of outgoing vertex N, read the sequence to N, and continue
If no matching next vertex exists, find a unique vertex with kmer K. If one exists, merge the sequence into this vertex, and continue
If a merge vertex cannot be found, create a new vertex (note this vertex may have a kmer identical to another in the graph, if it is not unique) and thread the sequence to this vertex, and continue
This algorithm has a key property: it can robustly use a very short kmer without introducing cycles, as we will create paths through the graph through regions that aren't unique w.r.t. the sequence at the given kmer size. This allows us to assemble well with even very short kmers.
This commit includes many critical changes to the haplotype caller to make it fast, sensitive, and accurate on deep and shallow WGS and exomes, the key changes are highlighted below:
-- The ReadThreading assembler keeps track of the maximum edge multiplicity per sample in the graph, so that we prune per sample, not across all samples. This change is essential to operate effectively when there are many deep samples (i.e., 100 exomes)
-- A new pruning algorithm that will only prune linear paths where the maximum edge weight among all edges in the path have < pruningFactor. This makes pruning more robust when you have a long chain of bases that have high multiplicity at the start but only barely make it back into the main path in the graph.
-- We now do a global SmithWaterman to compute the cigar of a Path, instead of the previous bubble-based SmithWaterman optimization. This change is essential for us to get good variants from our paths when the kmer size is small. It also ensures that we produce a cigar from a path that only depends only the sequence of bases in the path, unlike the previous approach which would depend on both the bases and the way the path was decomposed into vertices, which depended on the kmer size we used.
-- Removed MergeHeadlessIncomingSources, which was introducing problems in the graphs in some cases, and just isn't the safest operation. Since we build a kmer graph of size 10, this operation is no longer necessary as it required a perfect match of 10 bp to merge anyway.
-- The old DebruijnAssembler is still available with a command line option
-- The number of paths we take forward from the each assembly graph is now capped at a factor per sample, so that we allow 128 paths for a single sample up to 10 x nSamples as necessary. This is an essential change to make the system work well for large numbers of samples.
-- Add a global mismapping parameter to the HC likelihood calculation: The phredScaledGlobalReadMismappingRate reflects the average global mismapping rate of all reads, regardless of their mapping quality. This term effects the probability that a read originated from the reference haploytype, regardless of its edit distance from the reference, in that the read could have originated from the reference haplotype but from another location in the genome. Suppose a read has many mismatches from the reference, say like 5, but has a very high mapping quality of 60. Without this parameter, the read would contribute 5 * Q30 evidence in favor of its 5 mismatch haplotype compared to reference, potentially enough to make a call off that single read for all of these events. With this parameter set to Q30, though, the maximum evidence against the reference that this (and any) read could contribute against reference is Q30. -- Controllable via a command line argument, defaulting to Q60 rate. Results from 20:10-11 mb for branch are consistent with the previous behavior, but this does help in cases where you have rare very divergent haplotypes
-- Reduced ActiveRegionExtension from 200 bp to 100 bp, which is a performance win and the large extension is largely unnecessary with the short kmers used with the read threading assembler
Infrastructure changes / improvements
-------------------------------------
-- Refactored BaseGraph to take a subclass of BaseEdge, so that we can use a MultiSampleEdge in the ReadThreadingAssembler
-- Refactored DeBruijnAssembler, moving common functionality into LocalAssemblyEngine, which now more directly manages the subclasses, requiring them to only implement a assemble() method that takes ref and reads and provides a List<SeqGraph>, which the LocalAssemblyEngine takes forward to compute haplotypes and other downstream operations. This allows us to have only a limited amount of code that differentiates the Debruijn and ReadThreading assemblers
-- Refactored active region trimming code into ActiveRegionTrimmer class
-- Cleaned up the arguments in HaplotypeCaller, reorganizing them and making arguments @Hidden and @Advanced as appropriate. Renamed several arguments now that the read threading assembler is the default
-- LocalAssemblyEngineUnitTest reads in the reference sequence from b37, and assembles with synthetic reads intervals from 10-11 mbs with only the reference sequence as well as artificial snps, deletions, and insertions.
-- Misc. updates to Smith Waterman code. Added generic interface to called not surpisingly SmithWaterman, making it easier to have alternative implementations.
-- Many many more unit tests throughout the entire assembler, and in random utilities
2013-04-18 20:17:15 +08:00
|
|
|
|
|
|
|
|
@Test
|
|
|
|
|
public void testLongestCommonOps() {
|
|
|
|
|
for ( int prefixLen = 0; prefixLen < 20; prefixLen++ ) {
|
|
|
|
|
for ( int extraSeq1Len = 0; extraSeq1Len < 10; extraSeq1Len++ ) {
|
|
|
|
|
for ( int extraSeq2Len = 0; extraSeq2Len < 10; extraSeq2Len++ ) {
|
|
|
|
|
for ( int max = 0; max < 50; max++ ) {
|
|
|
|
|
final String prefix = Utils.dupString("A", prefixLen);
|
|
|
|
|
final int expected = Math.min(prefixLen, max);
|
|
|
|
|
|
|
|
|
|
{
|
|
|
|
|
final String seq1 = prefix + Utils.dupString("C", extraSeq1Len);
|
|
|
|
|
final String seq2 = prefix + Utils.dupString("G", extraSeq1Len);
|
|
|
|
|
Assert.assertEquals(Utils.longestCommonPrefix(seq1.getBytes(), seq2.getBytes(), max), expected, "LongestCommonPrefix failed: seq1 " + seq1 + " seq2 " + seq2 + " max " + max);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
{
|
|
|
|
|
final String seq1 = Utils.dupString("C", extraSeq1Len) + prefix;
|
|
|
|
|
final String seq2 = Utils.dupString("G", extraSeq1Len) + prefix;
|
|
|
|
|
Assert.assertEquals(Utils.longestCommonSuffix(seq1.getBytes(), seq2.getBytes(), max), expected, "longestCommonSuffix failed: seq1 " + seq1 + " seq2 " + seq2 + " max " + max);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2009-07-06 23:41:30 +08:00
|
|
|
}
|