Added a more efficient implementation of the KBest haplotype finder code (CONT.)

Changes:

  1. Addressed review comments on new K-best haplotype assembly graph finder.
  2. Generalize KBestHaplotypeFinder to deal with multiple source and sink vertices.
  3. Updated test to use KBestHaplotypeFinder instead of KBestPaths
  4. Retired KBestPaths to the archive.
  5. Small improvements to the code and documentation.
This commit is contained in:
Valentin Ruano-Rubio 2014-03-03 16:33:05 -05:00
parent 7acf2eb0e7
commit 69bf2b3247
16 changed files with 525 additions and 485 deletions

View File

@ -1,185 +1,194 @@
/* /*
* By downloading the PROGRAM you agree to the following terms of use: * By downloading the PROGRAM you agree to the following terms of use:
* *
* BROAD INSTITUTE - SOFTWARE LICENSE AGREEMENT - FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY * BROAD INSTITUTE - SOFTWARE LICENSE AGREEMENT - FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
* *
* This Agreement is made between the Broad Institute, Inc. with a principal address at 7 Cambridge Center, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE). * This Agreement is made between the Broad Institute, Inc. with a principal address at 7 Cambridge Center, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
* *
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and * WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions. * WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows: * NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
* *
* 1. DEFINITIONS * 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK2 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute/GATK on the EFFECTIVE DATE. * 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK2 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute/GATK on the EFFECTIVE DATE.
* *
* 2. LICENSE * 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. * 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement. * The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement. * 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM. * 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
* *
* 3. OWNERSHIP OF INTELLECTUAL PROPERTY * 3. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication. * LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012 Broad Institute, Inc. * Copyright 2012 Broad Institute, Inc.
* Notice of attribution: The GATK2 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc. * Notice of attribution: The GATK2 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes. * LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
* *
* 4. INDEMNIFICATION * 4. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement. * LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
* *
* 5. NO REPRESENTATIONS OR WARRANTIES * 5. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME. * THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING. * IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
* *
* 6. ASSIGNMENT * 6. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void. * This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
* *
* 7. MISCELLANEOUS * 7. MISCELLANEOUS
* 7.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries. * 7.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 7.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes. * 7.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 7.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4. * 7.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 7.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt. * 7.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 7.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter. * 7.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 7.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement. * 7.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 7.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles. * 7.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/ */
package org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs; package org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs;
import com.google.common.collect.MinMaxPriorityQueue; import java.util.ArrayList;
import com.google.java.contract.Ensures; import java.util.Collection;
import java.util.PriorityQueue;
import java.io.Serializable;
import java.util.*;
/** /**
* Class for finding the K best paths (as determined by the sum of multiplicities of the edges) in a graph. * K-best sub-haplotype finder that selects the best solutions out of a collection of sub-haplotype finders.
* This is different from most graph traversals because we want to test paths from any source node to any sink node.
* *
* User: ebanks, rpoplin, mdepristo * @author Valentin Ruano-Rubio <valentin@broadinstitute.org>
* Date: Mar 23, 2011
*/ */
public class KBestPaths<T extends BaseVertex, E extends BaseEdge> { class AggregatedSubHaplotypeFinder implements KBestSubHaplotypeFinder {
private final boolean allowCycles;
/** /**
* Create a new KBestPaths finder that follows cycles in the graph * Collection of subFinders that provided the actual solutions.
*/ */
public KBestPaths() { private final Collection<? extends KBestSubHaplotypeFinder> subFinders;
this(true);
}
/** /**
* Create a new KBestPaths finder * Flag indicating whether the sub-finders have been processed or not.
*/
private boolean processedSubFinders = false;
/**
* Holds the number of k-best solution that this finder would ever return.
*/
private int count = 0;
/**
* Holds the best {@code i} paths to the sink so far calculated where {@code i+1} is the length of this list.
* *
* @param allowCycles should we allow paths that follow cycles in the graph? * <p>As more results are requested the array will grow. All positions and solutions are
* calculated up to {@code i}</p>.
*/ */
public KBestPaths(final boolean allowCycles) { private ArrayList<KBestHaplotype> rankedSubHaplotype;
this.allowCycles = allowCycles;
}
protected static class MyInt { public int val = 0; }
/** /**
* Compare paths such that paths with greater weight are earlier in a list * Priority queue with next best haplotype solution from each sub-finder; previous ones are
* already part {@link #rankedSubHaplotype}.
*/ */
protected static class PathComparatorTotalScore implements Comparator<Path>, Serializable { private PriorityQueue<MyKBestHaplotypeResult> nextBestSubHaplotypes;
/**
* Creates a new aggregated sub-haplotype finder given its sub-finders.
* @param finders set of sub-finders.
*/
public AggregatedSubHaplotypeFinder(final Collection<? extends KBestSubHaplotypeFinder> finders) {
if (finders == null) throw new IllegalArgumentException("finder collection cannot be null");
this.subFinders = finders;
}
@Override
public int getCount() {
processSubFindersIfNeeded();
return count;
}
private void processSubFindersIfNeeded() {
if (processedSubFinders) return;
long count = 0;
nextBestSubHaplotypes = new PriorityQueue<>(subFinders.size());
for (final KBestSubHaplotypeFinder finder : subFinders) {
final int finderCount = finder.getCount();
if (finderCount == 0) continue;
count += finderCount;
nextBestSubHaplotypes.add(new MyKBestHaplotypeResult(finder,0));
}
this.count = (int) Math.min(Integer.MAX_VALUE,count);
rankedSubHaplotype = new ArrayList<>(10);
processedSubFinders = true;
}
@Override
public KBestHaplotype getKBest(int k) {
if (k < 0) throw new IllegalArgumentException("k cannot be negative");
processSubFindersIfNeeded();
if (k >= count) throw new IllegalArgumentException("k cannot be equal or greater than the count");
if (k < rankedSubHaplotype.size())
return rankedSubHaplotype.get(k);
rankedSubHaplotype.ensureCapacity(k+1);
for (int i = rankedSubHaplotype.size(); i <= k; i++) {
// since k < possibleHaplotypeCount is guarantee no to be empty.
if (nextBestSubHaplotypes.isEmpty())
throw new IllegalStateException("what the heck " + k + " " + count);
final MyKBestHaplotypeResult nextResult = nextBestSubHaplotypes.remove();
nextResult.rank = i;
rankedSubHaplotype.add(nextResult);
final int subRank = nextResult.result.rank();
// if there is no further solution from the same child we cannot add another solution from that child.
if (subRank + 1 >= nextResult.subFinder.getCount())
continue;
nextBestSubHaplotypes.add(new MyKBestHaplotypeResult(nextResult.subFinder, subRank + 1));
}
return rankedSubHaplotype.get(k);
}
/**
* Custom implementation of {@link KBestHaplotype} to encapsulate sub-finder results.
*/
private class MyKBestHaplotypeResult extends KBestHaplotype {
private KBestSubHaplotypeFinder subFinder;
private final KBestHaplotype result;
private int rank;
private MyKBestHaplotypeResult(final KBestSubHaplotypeFinder finder, final int rank) {
this.subFinder = finder;
this.result = finder.getKBest(rank);
this.rank = -1;
}
@Override @Override
public int compare(final Path path1, final Path path2) { public SeqGraph graph() {
return path2.getScore() - path1.getScore(); return result.graph();
}
}
/**
* @see #getKBestPaths(BaseGraph, int) retriving the best 1000 paths
*/
public List<Path<T,E>> getKBestPaths( final BaseGraph<T, E> graph ) {
return getKBestPaths(graph, 1000);
}
/**
* @see #getKBestPaths(BaseGraph, int, java.util.Set, java.util.Set) retriving the first 1000 paths
* starting from all source vertices and ending with all sink vertices
*/
public List<Path<T,E>> getKBestPaths( final BaseGraph<T,E> graph, final int k ) {
return getKBestPaths(graph, k, graph.getSources(), graph.getSinks());
}
/**
* @see #getKBestPaths(BaseGraph, int, java.util.Set, java.util.Set) with k=1000
*/
public List<Path<T,E>> getKBestPaths( final BaseGraph<T,E> graph, final Set<T> sources, final Set<T> sinks ) {
return getKBestPaths(graph, 1000, sources, sinks);
}
/**
* @see #getKBestPaths(BaseGraph, int, java.util.Set, java.util.Set) with k=1000
*/
public List<Path<T,E>> getKBestPaths( final BaseGraph<T,E> graph, final T source, final T sink ) {
return getKBestPaths(graph, 1000, source, sink);
}
/**
* @see #getKBestPaths(BaseGraph, int, java.util.Set, java.util.Set) with singleton source and sink sets
*/
public List<Path<T,E>> getKBestPaths( final BaseGraph<T,E> graph, final int k, final T source, final T sink ) {
return getKBestPaths(graph, k, Collections.singleton(source), Collections.singleton(sink));
}
/**
* Traverse the graph and pull out the best k paths.
* Paths are scored via their comparator function. The default being PathComparatorTotalScore()
* @param graph the graph from which to pull paths
* @param k the number of paths to find
* @param sources a set of vertices we want to start paths with
* @param sinks a set of vertices we want to end paths with
* @return a list with at most k top-scoring paths from the graph
*/
@Ensures({"result != null", "result.size() <= k"})
public List<Path<T,E>> getKBestPaths( final BaseGraph<T,E> graph, final int k, final Set<T> sources, final Set<T> sinks ) {
if( graph == null ) { throw new IllegalArgumentException("Attempting to traverse a null graph."); }
// a min max queue that will collect the best k paths
final MinMaxPriorityQueue<Path<T,E>> bestPaths = MinMaxPriorityQueue.orderedBy(new PathComparatorTotalScore()).maximumSize(k).create();
// run a DFS for best paths
for ( final T source : sources ) {
final Path<T,E> startingPath = new Path<T,E>(source, graph);
findBestPaths(startingPath, sinks, bestPaths, new MyInt());
} }
// the MinMaxPriorityQueue iterator returns items in an arbitrary order, so we need to sort the final result @Override
final List<Path<T,E>> toReturn = new ArrayList<Path<T,E>>(bestPaths); public int score() {
Collections.sort(toReturn, new PathComparatorTotalScore()); return result.score();
return toReturn; }
}
/** @Override
* Recursive algorithm to find the K best paths in the graph from the current path to any of the sinks public boolean isReference() {
* @param path the current path progress return result.isReference();
* @param sinks a set of nodes that are sinks. Will terminate and add a path if the last vertex of path is in this set }
* @param bestPaths a path to collect completed paths.
* @param n used to limit the search by tracking the number of vertices visited across all paths @Override
*/ public int rank() {
private void findBestPaths( final Path<T,E> path, final Set<T> sinks, final Collection<Path<T,E>> bestPaths, final MyInt n ) { return rank;
if ( sinks.contains(path.getLastVertex())) { }
bestPaths.add(path);
} else if( n.val > 10000 ) { @Override
// do nothing, just return, as we've done too much work already protected SeqVertex head() {
} else { return result.head();
// recursively run DFS }
final ArrayList<E> edgeArrayList = new ArrayList<E>(path.getOutgoingEdgesOfLastVertex());
Collections.sort(edgeArrayList, new BaseEdge.EdgeWeightComparator()); @Override
for ( final E edge : edgeArrayList ) { protected KBestHaplotype tail() {
final T target = path.getGraph().getEdgeTarget(edge); return result.tail();
// make sure the edge is not already in the path
final boolean alreadyVisited = allowCycles ? path.containsEdge(edge) : path.containsVertex(target);
if ( ! alreadyVisited ) {
final Path<T,E> newPath = new Path<T,E>(path, edge);
n.val++;
findBestPaths(newPath, sinks, bestPaths, n);
}
}
} }
} }
} }

View File

@ -695,4 +695,21 @@ public class BaseGraph<V extends BaseVertex, E extends BaseEdge> extends Default
public BaseGraph<V,E> subsetToRefSource() { public BaseGraph<V,E> subsetToRefSource() {
return subsetToNeighbors(getReferenceSourceVertex(), 10); return subsetToNeighbors(getReferenceSourceVertex(), 10);
} }
/**
* Checks whether the graph contains all the vertices in a collection.
*
* @param vertices the vertices to check.
*
* @throws IllegalArgumentException if {@code vertices} is {@code null}.
*
* @return {@code true} if all the vertices in the input collection are present in this graph.
* Also if the input collection is empty. Otherwise it returns {@code false}.
*/
public boolean containsAllVertices(final Collection<? extends V> vertices) {
if (vertices == null) throw new IllegalArgumentException("the input vertices collection cannot be null");
for (final V vertex : vertices)
if (!containsVertex(vertex)) return false;
return true;
}
} }

View File

@ -46,9 +46,9 @@
package org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs; package org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs;
/** /**
* Represent a trivial k-best sub haplotype finder with no solutions. * Represents a trivial k-best sub haplotype finder with no solutions.
* *
* <p>To be used at vertices that do not have any valid path to the requested sink node</p> * <p>To be used at vertices that do not have any valid path to the requested sink vertices</p>
* *
* @author Valentin Ruano-Rubio &lt;valentin@broadinstitute.org&gt; * @author Valentin Ruano-Rubio &lt;valentin@broadinstitute.org&gt;
*/ */

View File

@ -52,14 +52,19 @@ package org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs;
*/ */
class EmptyPathHaplotypeFinderNode implements KBestSubHaplotypeFinder { class EmptyPathHaplotypeFinderNode implements KBestSubHaplotypeFinder {
/** /**
* Caches the only solution returned by this finder. * Caches the only solution returned by this finder.
*/ */
private final KBestHaplotype singleHaplotypePath; private final KBestHaplotype singleHaplotypePath;
public EmptyPathHaplotypeFinderNode(final SeqGraph graph, final SeqVertex sink) { /**
singleHaplotypePath = new MyBestHaplotypePath(graph,sink); * Constructs a new empty k-best haplotype finder.
*
* @param graph the search graph.
* @param vertex the source and sink vertex of the only solution returned by this finder.
*/
public EmptyPathHaplotypeFinderNode(final SeqGraph graph, final SeqVertex vertex) {
singleHaplotypePath = new MyBestHaplotypePath(graph,vertex);
} }
@Override @Override
@ -81,12 +86,29 @@ class EmptyPathHaplotypeFinderNode implements KBestSubHaplotypeFinder {
*/ */
private class MyBestHaplotypePath extends KBestHaplotype { private class MyBestHaplotypePath extends KBestHaplotype {
/**
* The solution's only vertex.
*/
private final SeqVertex vertex; private final SeqVertex vertex;
/**
* The search graph.
*/
private final SeqGraph graph; private final SeqGraph graph;
/**
* Whether the vertex is a reference vertex.
*
* <p>Initialize lazily.</p>
*/
private Boolean isReference; private Boolean isReference;
/**
* Constructs a new empty k-best haplotype solution.
*
* @param graph the search graph.
* @param vertex the source and sink vertex of the only solution returned by the outer finder.
*/
public MyBestHaplotypePath(final SeqGraph graph, final SeqVertex vertex) { public MyBestHaplotypePath(final SeqGraph graph, final SeqVertex vertex) {
this.vertex = vertex; this.vertex = vertex;
this.graph = graph; this.graph = graph;

View File

@ -52,7 +52,7 @@ import org.broadinstitute.sting.utils.haplotype.Haplotype;
* *
* @author Valentin Ruano-Rubio &lt;valentin@broadinstitute.org&gt; * @author Valentin Ruano-Rubio &lt;valentin@broadinstitute.org&gt;
*/ */
public abstract class KBestHaplotype { public abstract class KBestHaplotype implements Comparable<KBestHaplotype> {
/** /**
* Returns the original graph searched. * Returns the original graph searched.
@ -143,6 +143,18 @@ public abstract class KBestHaplotype {
return path; return path;
} }
/**
* Compares k-best haplotypes based on the score where the one with larger score comes first (descending orther).
*
* @param other the other haplotype to compare to.
* @return {@code -1} if the current score is larger than {@code other}'s, {@code 0} if they are the same, {@code 1}
* if {@code other}'s score is larger.
*/
public int compareTo(final KBestHaplotype other) {
if (other == null) throw new IllegalArgumentException("the other object cannot be null");
return - 1 * (score() - other.score());
}
/** /**
* The first vertex on the haplotype path. * The first vertex on the haplotype path.
* *
@ -156,6 +168,4 @@ public abstract class KBestHaplotype {
* @return {@code null} if there are no more vertices in the solution path a part from the one returned by {@link #head}. * @return {@code null} if there are no more vertices in the solution path a part from the one returned by {@link #head}.
*/ */
protected abstract KBestHaplotype tail(); protected abstract KBestHaplotype tail();
} }

View File

@ -56,49 +56,131 @@ import java.util.*;
*/ */
public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implements Iterable<KBestHaplotype> { public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implements Iterable<KBestHaplotype> {
/**
* The search graph.
*/
private final SeqGraph graph; private final SeqGraph graph;
protected Map<SeqVertex,KBestSubHaplotypeFinder> nodeBySource;
protected SeqVertex sink; /**
protected SeqVertex source; * Map of sub-haplotype finder by their source vertex.
*/
protected Map<SeqVertex,KBestSubHaplotypeFinder> finderByVertex;
/**
* Possible haplotype sink vertices.
*/
protected Set<SeqVertex> sinks;
/**
* Possible haplotype source vertices.
*/
protected Set<SeqVertex> sources;
/**
* The top finder.
*
* <p>If there is only a single source vertex, its finder is the top finder. However whent there
* is more than one possible source, we create a composite finder that alternates between individual source vertices
* for their best haplotypes.</p>
*/
private final KBestSubHaplotypeFinder topFinder;
/** /**
* Constructs a new best haplotypes finder. * Constructs a new best haplotypes finder.
* *
* @param graph the seq-graph to search. * @param graph the seq-graph to search.
* @param source the source vertex for all haplotypes. * @param source the source vertex for all haplotypes.
* @param sink the sink vertex for all haplotypes. * @param sink sink vertices for all haplotypes.
* *
* @throws IllegalArgumentException if <ul> * @throws IllegalArgumentException if <ul>
* <li>any of {@code graph}, {@code source} or {@code sink} is {@code null} or</li> * <li>any of {@code graph}, {@code source} or {@code sink} is {@code null} or</li>
* <li>either {@code source} or {@code sink} is not a vertex of {@code graph}'s.</li> * <li>either {@code source} or {@code sink} is not a vertex in {@code graph}.</li>
* </ul> * </ul>
*/ */
public KBestHaplotypeFinder(final SeqGraph graph, final SeqVertex source, final SeqVertex sink) { public KBestHaplotypeFinder(final SeqGraph graph, final SeqVertex source, final SeqVertex sink) {
if (graph == null) throw new IllegalArgumentException("graph cannot be null"); this(graph,Collections.singleton(source),Collections.singleton(sink));
if (source == null) throw new IllegalArgumentException("source cannot be null");
if (sink == null) throw new IllegalArgumentException("sink cannot be null");
if (!graph.containsVertex(source)) throw new IllegalArgumentException("source does not belong to the graph");
if (!graph.containsVertex(sink)) throw new IllegalArgumentException("sink does not belong to the graph");
//TODO dealing with cycles here due to a bug in some of the graph transformations that produces cycles.
//TODO Once that is solve, the conditional above should be removed in order the save time:
//this.graph = graph;
if (new CycleDetector<>(graph).detectCycles())
this.graph = removeCycles(graph,source,sink);
else
this.graph = graph;
nodeBySource = new HashMap<>(graph.vertexSet().size());
this.sink = sink;
this.source = source;
} }
private static SeqGraph removeCycles(final SeqGraph original, final SeqVertex source, final SeqVertex sink) { /**
* Constructs a new best haplotypes finder.
*
* @param graph the seq-graph to search.
* @param sources source vertices for all haplotypes.
* @param sinks sink vertices for all haplotypes.
*
* @throws IllegalArgumentException if <ul>
* <li>any of {@code graph}, {@code sources} or {@code sinks} is {@code null} or</li>
* <li>any of {@code sources}' or any {@code sinks}' member is not a vertex in {@code graph}.</li>
* </ul>
*/
public KBestHaplotypeFinder(final SeqGraph graph, final Set<SeqVertex> sources, final Set<SeqVertex> sinks) {
if (graph == null) throw new IllegalArgumentException("graph cannot be null");
if (sources == null) throw new IllegalArgumentException("source cannot be null");
if (sinks == null) throw new IllegalArgumentException("sink cannot be null");
if (!graph.containsAllVertices(sources)) throw new IllegalArgumentException("source does not belong to the graph");
if (!graph.containsAllVertices(sinks)) throw new IllegalArgumentException("sink does not belong to the graph");
//TODO dealing with cycles here due to a bug in some of the graph transformations that produces cycles.
//TODO Once that is solve, the if-else below should be substituted by a throw if there is any cycles,
//TODO just the line commented out below if you want to trade early-bug-fail for speed.
//this.graph = graph;
this.graph = new CycleDetector<>(graph).detectCycles() ? removeCycles(graph,sources,sinks) : graph;
finderByVertex = new HashMap<>(this.graph.vertexSet().size());
this.sinks = sinks;
this.sources = sources;
if (sinks.size() == 0 || sources.size() == 0)
topFinder = DeadEndKBestSubHaplotypeFinder.INSTANCE;
else if (sources.size() == 1)
topFinder = createVertexFinder(sources.iterator().next());
else
topFinder = createAggregatedFinder();
}
/**
* Constructs a new best haplotype finder.
* <p>
* It will consider all source and sink vertex when looking for haplotypes.
* </p>
*
* @param graph the seq-graph to search for the best haplotypes.
*/
public KBestHaplotypeFinder(SeqGraph graph) {
this(graph,graph.getSources(),graph.getSinks());
}
/**
* Creates an aggregated recursive finder to try all possible source vertices.
*
* @return never {@code null}.
*/
private KBestSubHaplotypeFinder createAggregatedFinder() {
final List<KBestSubHaplotypeFinder> sourceFinders = new ArrayList<>(sources.size());
for (final SeqVertex source : sources)
sourceFinders.add(createVertexFinder(source));
return new AggregatedSubHaplotypeFinder(sourceFinders);
}
/**
* Removes edges that produces cycles and also dead vertices that do not lead to any sink vertex.
*
* @param original graph to modify.
* @param sources considered source vertices.
* @param sinks considered sink vertices.
* @return never {@code null}.
*/
private static SeqGraph removeCycles(final SeqGraph original, final Set<SeqVertex> sources, final Set<SeqVertex> sinks) {
final Set<BaseEdge> edgesToRemove = new HashSet<>(original.edgeSet().size()); final Set<BaseEdge> edgesToRemove = new HashSet<>(original.edgeSet().size());
final Set<SeqVertex> vertexToRemove = new HashSet<>(original.vertexSet().size()); final Set<SeqVertex> vertexToRemove = new HashSet<>(original.vertexSet().size());
if (!findGuiltyVerticesAndEdgesToRemoveCycles(original, source, sink, edgesToRemove, vertexToRemove, new HashSet<SeqVertex>(original.vertexSet().size()))) boolean foundSomePath = false;
throw new IllegalStateException("could not find any path from the source vertex to the sink vertex: " + source + " -> " + sink); for (final SeqVertex source : sources)
foundSomePath = findGuiltyVerticesAndEdgesToRemoveCycles(original, source, sinks, edgesToRemove,
vertexToRemove, new HashSet<SeqVertex>(original.vertexSet().size())) | foundSomePath;
if (!foundSomePath)
throw new IllegalStateException("could not find any path from the source vertex to the sink vertex after removing cycles: "
+ Arrays.toString(sources.toArray()) + " => " + Arrays.toString(sinks.toArray()));
if (edgesToRemove.isEmpty() && vertexToRemove.isEmpty()) if (edgesToRemove.isEmpty() && vertexToRemove.isEmpty())
throw new IllegalStateException("cannot find a way to remove the cycles"); throw new IllegalStateException("cannot find a way to remove the cycles");
@ -109,13 +191,30 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
return result; return result;
} }
private static boolean findGuiltyVerticesAndEdgesToRemoveCycles(final SeqGraph graph, final SeqVertex currentVertex, final SeqVertex sink, /**
final Set<BaseEdge> edgesToRemove, final Set<SeqVertex> verticesToRemove, * Recursive call that looks for edges and vertices that need to be removed to get rid of cycles.
*
* @param graph the original graph.
* @param currentVertex current search vertex.
* @param sinks considered sink vertices.
* @param edgesToRemove collection of edges that need to be removed in order to get rid of cycles.
* @param verticesToRemove collection of vertices that can be removed.
* @param parentVertices collection of vertices that preceded the {@code currentVertex}; i.e. the it can be
* reached from those vertices using edges existing in {@code graph}.
*
* @return {@code true} to indicate that the some sink vertex is reachable by {@code currentVertex},
* {@code false} otherwise.
*/
private static boolean findGuiltyVerticesAndEdgesToRemoveCycles(final SeqGraph graph,
final SeqVertex currentVertex,
final Set<SeqVertex> sinks,
final Set<BaseEdge> edgesToRemove,
final Set<SeqVertex> verticesToRemove,
final Set<SeqVertex> parentVertices) { final Set<SeqVertex> parentVertices) {
if (currentVertex.equals(sink)) return true; if (sinks.contains(currentVertex)) return true;
final Set<BaseEdge> outgoingEdges = graph.outgoingEdgesOf(currentVertex); final Set<BaseEdge> outgoingEdges = graph.outgoingEdgesOf(currentVertex);
boolean reachsSink = false; boolean reachesSink = false;
parentVertices.add(currentVertex); parentVertices.add(currentVertex);
for (final BaseEdge edge : outgoingEdges) { for (final BaseEdge edge : outgoingEdges) {
@ -123,31 +222,28 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
if (parentVertices.contains(child)) if (parentVertices.contains(child))
edgesToRemove.add(edge); edgesToRemove.add(edge);
else { else {
final boolean childReachSink = findGuiltyVerticesAndEdgesToRemoveCycles(graph, child, sink, edgesToRemove, verticesToRemove, parentVertices); final boolean childReachSink = findGuiltyVerticesAndEdgesToRemoveCycles(graph, child, sinks,
reachsSink = reachsSink || childReachSink; edgesToRemove, verticesToRemove, parentVertices);
reachesSink = reachesSink || childReachSink;
} }
} }
parentVertices.remove(currentVertex); parentVertices.remove(currentVertex);
if (!reachsSink) verticesToRemove.add(currentVertex); if (!reachesSink) verticesToRemove.add(currentVertex);
return reachsSink; return reachesSink;
} }
@Override @Override
public KBestHaplotype get(int index) { public KBestHaplotype get(int index) {
final KBestSubHaplotypeFinder sourceNode = createNode(source);
if (index < 0 || index >= size()) if (index < 0 || index >= size())
throw new IndexOutOfBoundsException(); throw new IndexOutOfBoundsException();
return sourceNode.getKBest(index); return topFinder.getKBest(index);
} }
@Override @Override
public Iterator<KBestHaplotype> iterator() { public Iterator<KBestHaplotype> iterator() {
final KBestSubHaplotypeFinder sourceFinder = createNode(source);
return new Iterator<KBestHaplotype>() { return new Iterator<KBestHaplotype>() {
private int nextK = 0; private int nextK = 0;
private final int maxK = sourceFinder.getCount(); private final int maxK = topFinder.getCount();
@Override @Override
@ -158,7 +254,7 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
@Override @Override
public KBestHaplotype next() { public KBestHaplotype next() {
if (nextK >= maxK) throw new NoSuchElementException(); if (nextK >= maxK) throw new NoSuchElementException();
return sourceFinder.getKBest(nextK++); return topFinder.getKBest(nextK++);
} }
@Override @Override
@ -170,7 +266,7 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
@Override @Override
public int size() { public int size() {
return createNode(source).getCount(); return topFinder.getCount();
} }
/** /**
@ -183,12 +279,10 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
* @return never {@code null}, but perhaps a iterator that return no haplotype. * @return never {@code null}, but perhaps a iterator that return no haplotype.
*/ */
public Iterator<KBestHaplotype> iterator(final int k) { public Iterator<KBestHaplotype> iterator(final int k) {
final KBestSubHaplotypeFinder sourceFinder = createNode(source);
return new Iterator<KBestHaplotype>() { return new Iterator<KBestHaplotype>() {
private int nextK = 0; private int nextK = 0;
private final int maxK = Math.min(sourceFinder.getCount(), k); private final int maxK = Math.min(size(), k);
@Override @Override
public boolean hasNext() { public boolean hasNext() {
@ -198,7 +292,7 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
@Override @Override
public KBestHaplotype next() { public KBestHaplotype next() {
if (nextK >= maxK) throw new NoSuchElementException(); if (nextK >= maxK) throw new NoSuchElementException();
return sourceFinder.getKBest(nextK++); return topFinder.getKBest(nextK++);
} }
@Override @Override
@ -208,38 +302,51 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
}; };
} }
protected KBestSubHaplotypeFinder createNode(final SeqVertex source) { /**
KBestSubHaplotypeFinder node = nodeBySource.get(source); * Creates a finder from a vertex.
*
* @param source the source vertex for the finder.
*
* @return never {@code null}, perhaps a finder that returns no haplotypes though.
*/
protected KBestSubHaplotypeFinder createVertexFinder(final SeqVertex source) {
KBestSubHaplotypeFinder node = finderByVertex.get(source);
if (node == null) { if (node == null) {
if (source.equals(sink)) if (sinks.contains(source))
node = new EmptyPathHaplotypeFinderNode(graph,sink); node = new EmptyPathHaplotypeFinderNode(graph,source);
else { else {
final Set<BaseEdge> outgoingEdges = graph.outgoingEdgesOf(source); final Set<BaseEdge> outgoingEdges = graph.outgoingEdgesOf(source);
if (outgoingEdges.isEmpty()) if (outgoingEdges.isEmpty())
node = DeadEndKBestSubHaplotypeFinder.INSTANCE; node = DeadEndKBestSubHaplotypeFinder.INSTANCE;
else { else {
final Map<BaseEdge,KBestSubHaplotypeFinder> undeadChildren = createChildrenNodes(outgoingEdges); final Map<BaseEdge,KBestSubHaplotypeFinder> undeadChildren = createChildrenFinders(outgoingEdges);
if (undeadChildren.isEmpty()) node = undeadChildren.isEmpty() ? DeadEndKBestSubHaplotypeFinder.INSTANCE :
node = DeadEndKBestSubHaplotypeFinder.INSTANCE; new RecursiveSubHaplotypeFinder(source,undeadChildren);
else
node = new RecursiveSubHaplotypeFinder(graph,source,undeadChildren);
} }
} }
nodeBySource.put(source,node); finderByVertex.put(source, node);
} }
return node; return node;
} }
private Map<BaseEdge, KBestSubHaplotypeFinder> createChildrenNodes(Set<BaseEdge> baseEdges) { /**
* Creates finder for target vertices of a collection of edges.
* <p>
* This peculiar signature is convenient for when we want to create finders for the children of a vertex.
* </p>
*
* @param baseEdges target collection of edges.
*
* @return never {@code null}, perhaps an empty map if there is no children with valid paths to any sink for this
* finder.
*/
private Map<BaseEdge, KBestSubHaplotypeFinder> createChildrenFinders(Set<BaseEdge> baseEdges) {
final Map<BaseEdge,KBestSubHaplotypeFinder> result = new LinkedHashMap<>(baseEdges.size()); final Map<BaseEdge,KBestSubHaplotypeFinder> result = new LinkedHashMap<>(baseEdges.size());
for (final BaseEdge edge : baseEdges) for (final BaseEdge edge : baseEdges) {
result.put(edge,createNode(graph.getEdgeTarget(edge))); final KBestSubHaplotypeFinder targetFinder = createVertexFinder(graph.getEdgeTarget(edge));
final Iterator<Map.Entry<BaseEdge,KBestSubHaplotypeFinder>> childrenIterator = result.entrySet().iterator(); if (targetFinder.getCount() == 0) continue;
while (childrenIterator.hasNext()) result.put(edge, targetFinder);
if (childrenIterator.next().getValue().getCount() == 0) }
childrenIterator.remove();
return result; return result;
} }
} }

View File

@ -68,5 +68,4 @@ interface KBestSubHaplotypeFinder {
* @return never {@code null}. * @return never {@code null}.
*/ */
public abstract KBestHaplotype getKBest(int k); public abstract KBestHaplotype getKBest(int k);
} }

View File

@ -46,18 +46,18 @@
package org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs; package org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs;
import java.util.ArrayList; import java.util.ArrayList;
import java.util.Collection;
import java.util.Map; import java.util.Map;
import java.util.PriorityQueue;
/** /**
* General recursive sub-haplotype finder. * General recursive sub-haplotype finder.
* <p> * <p>
* Provides the k-best sub-haplotypes looking into the outgoing set of vertices (that contain at least one solution). * Provides the k-best sub-haplotypes from a vertex provided map between outgoing edges and its target finders
* </p> * </p>
* <p> * <p>
* This is done efficiently by keeping an priority-queue on best subhaplotype solutions and pulling them on demand * This is done efficiently by keeping an priority-queue on best sub-haplotype solutions and pulling them on demand
* as needed. * as needed.
* </p> * </p>
* <p> * <p>
* Solutions are cached for repeated retrieval so that we save compute at vertices that share sub-haplotypes * Solutions are cached for repeated retrieval so that we save compute at vertices that share sub-haplotypes
* (share descendant vertices). This aspect is controlled by {@link KBestSubHaplotypeFinder} that instantiate * (share descendant vertices). This aspect is controlled by {@link KBestSubHaplotypeFinder} that instantiate
@ -67,34 +67,7 @@ import java.util.PriorityQueue;
* *
* @author Valentin Ruano-Rubio &lt;valentin@broadinstitute.org&gt; * @author Valentin Ruano-Rubio &lt;valentin@broadinstitute.org&gt;
*/ */
class RecursiveSubHaplotypeFinder implements KBestSubHaplotypeFinder { class RecursiveSubHaplotypeFinder extends AggregatedSubHaplotypeFinder {
private final SeqGraph graph;
private final SeqVertex vertex;
private final Map<BaseEdge,KBestSubHaplotypeFinder> children;
private boolean childrenWereProcessed = false;
/**
* Holds the number of possible paths from this source node vertex to the sink vertex.
*
* <p>Updated by {@link #processChildrenIfNeeded()}</p>
*/
private int possibleHaplotypeCount;
/**
* Holds the best {@code i} paths to the sink so far calculated where {@code i+1} is the length of rankedResults.
*
* <p>As more results are requested the array will grow. All positions and solutions are calculated up to {@code i}</p>.
*/
private ArrayList<KBestHaplotype> rankedResults;
/**
* Priority queue with best sub-haplotype solutions that haven't been calculated and cached on {@link #rankedResults} yet.
*/
private PriorityQueue<ChildKBestSubHaplotype> nextChildrenKBestHaplotypePath;
/** /**
* Creates a recursive sub-haplotype finder give the target graph, first vertex and all possible outgoing edges * Creates a recursive sub-haplotype finder give the target graph, first vertex and all possible outgoing edges
@ -104,113 +77,73 @@ class RecursiveSubHaplotypeFinder implements KBestSubHaplotypeFinder {
* are outgoing edges from {@code vertex} on {@code graph} and that the value sub-haplotype resolver have as * are outgoing edges from {@code vertex} on {@code graph} and that the value sub-haplotype resolver have as
* the first vertex the adjacent vertex through that key edge.</p> * the first vertex the adjacent vertex through that key edge.</p>
* *
* @param graph the search graph.
* @param vertex first vertex for all sub-haplotype solutions provided by this finder * @param vertex first vertex for all sub-haplotype solutions provided by this finder
* @param children map from outgoing edge to the corresponding sub-sub-haplotype finder. * @param children map from outgoing edge to the corresponding sub-sub-haplotype finder.
*/ */
public RecursiveSubHaplotypeFinder(final SeqGraph graph, final SeqVertex vertex, public RecursiveSubHaplotypeFinder(final SeqVertex vertex,
final Map<BaseEdge, KBestSubHaplotypeFinder> children) { final Map<BaseEdge, KBestSubHaplotypeFinder> children) {
if (vertex == null) throw new IllegalArgumentException("the vertex provided cannot be null"); super(createChildFinderCollection(vertex, children));
if (graph == null) throw new IllegalArgumentException("the graph provided cannot be null");
this.vertex = vertex;
this.children = children;
this.graph = graph;
} }
@Override private static Collection<EdgeSubHaplotypeFinder> createChildFinderCollection(final SeqVertex vertex, final Map<BaseEdge,KBestSubHaplotypeFinder> finders) {
public int getCount() { if (finders == null) throw new IllegalArgumentException("the edge to child map cannot be null");
processChildrenIfNeeded(); final Collection<EdgeSubHaplotypeFinder> result = new ArrayList<>(finders.size());
return possibleHaplotypeCount; for (final Map.Entry<BaseEdge,KBestSubHaplotypeFinder> e : finders.entrySet())
result.add(new EdgeSubHaplotypeFinder(vertex,e.getKey(), e.getValue()));
return result;
} }
/** private static class EdgeSubHaplotypeFinder implements KBestSubHaplotypeFinder {
* Process children and initialize structures if not done before.
*/
private void processChildrenIfNeeded() {
if (childrenWereProcessed) return;
long possibleHaplotypeCount = 0;
nextChildrenKBestHaplotypePath = new PriorityQueue<>(children.size()); private final KBestSubHaplotypeFinder childFinder;
for (final Map.Entry<BaseEdge,KBestSubHaplotypeFinder> entry : children.entrySet()) { private final SeqVertex vertex;
final KBestSubHaplotypeFinder child = entry.getValue();
final BaseEdge edge = entry.getKey(); private final BaseEdge edge;
final int childPossibleHaplotypePathCount = child.getCount();
if (childPossibleHaplotypePathCount != 0) // paranoia check, should not happen at this point. private EdgeSubHaplotypeFinder(final SeqVertex vertex, final BaseEdge edge, final KBestSubHaplotypeFinder childFinder) {
nextChildrenKBestHaplotypePath.add(new ChildKBestSubHaplotype(-1,edge,child,0)); this.childFinder = childFinder;
possibleHaplotypeCount += childPossibleHaplotypePathCount; this.edge = edge;
this.vertex = vertex;
} }
// Just make sure we won't incur in overflow here for very large graphs; who is ever going to ask for more than 2G paths!!!) @Override
this.possibleHaplotypeCount = (int) Math.min(Integer.MAX_VALUE,possibleHaplotypeCount); public int getCount() {
return childFinder.getCount();
// 10 is a bit arbitrary as it is difficult to anticipate what would be the number of requested }
// best sub-haplotypes for any node. It shouldn't be too large so that it does not waste space
// but not too small so that there is no need to resize when just a few best solutions are requested. @Override
rankedResults = new ArrayList<>(Math.min(this.possibleHaplotypeCount,10)); public KBestHaplotype getKBest(int k) {
return new ChildKBestSubHaplotype(vertex,edge,childFinder.getKBest(k));
childrenWereProcessed = true;
}
@Override
public KBestHaplotype getKBest(int k) {
if (k < 0)
throw new IllegalArgumentException("the rank requested cannot be negative");
processChildrenIfNeeded();
if (k >= possibleHaplotypeCount)
throw new IllegalArgumentException("the rank requested cannot be equal or greater to the number of possible haplotypes");
if (rankedResults.size() > k)
return rankedResults.get(k);
rankedResults.ensureCapacity(k+1);
for (int i = rankedResults.size(); i <= k; i++) {
// since k < possibleHaplotypeCount is guarantee no to be empty.
if (nextChildrenKBestHaplotypePath.isEmpty())
throw new IllegalStateException("what the heck " + k + " " + possibleHaplotypeCount);
final ChildKBestSubHaplotype nextResult = nextChildrenKBestHaplotypePath.remove();
nextResult.rank = i;
rankedResults.add(nextResult);
final int childRank = nextResult.subpath.rank();
final KBestSubHaplotypeFinder child = nextResult.child;
// if there is no further solution from the same child we cannot add another solution from that child.
if (childRank + 1 >= nextResult.child.getCount())
continue;
nextChildrenKBestHaplotypePath.add(new ChildKBestSubHaplotype(-1,nextResult.edge, child, childRank + 1));
} }
return rankedResults.get(k);
} }
/** /**
* Custom extension of the {@link KBestHaplotype} used for solutions generated by this class. * Custom extension of the {@link KBestHaplotype} used for solutions generated by this class.
*
* <p>
* These by delegating on the encapsulated solution from outgoing edge's finder by adding
* the edge score and prefixing this outer finder
* source vertex.
* </p>
*/ */
private class ChildKBestSubHaplotype extends KBestHaplotype implements Comparable<ChildKBestSubHaplotype>{ private static class ChildKBestSubHaplotype extends KBestHaplotype {
private final int score; private final int score;
private int rank; private final KBestHaplotype child;
private final KBestSubHaplotypeFinder child; private final SeqVertex vertex;
private final BaseEdge edge;
private final KBestHaplotype subpath;
private final boolean isReference; private final boolean isReference;
public ChildKBestSubHaplotype(final int rank, final BaseEdge edge, public ChildKBestSubHaplotype(final SeqVertex vertex, final BaseEdge edge, final KBestHaplotype child) {
final KBestSubHaplotypeFinder child, final int childRank) { this.score = edge.getMultiplicity() + child.score();
this.vertex = vertex;
this.child = child; this.child = child;
this.edge = edge; this.isReference = edge.isRef() && child.isReference();
this.rank = rank;
this.subpath = child.getKBest(childRank);
this.score = edge.getMultiplicity() + subpath.score();
this.isReference = edge.isRef() && subpath.isReference();
} }
@Override @Override
public SeqGraph graph() { public SeqGraph graph() {
return graph; return child.graph();
}
@Override
public int compareTo(final ChildKBestSubHaplotype other) {
if (other == null) throw new IllegalArgumentException("the other object cannot be null");
return - Integer.compare(this.score,other.score);
} }
@Override @Override
@ -220,10 +153,9 @@ class RecursiveSubHaplotypeFinder implements KBestSubHaplotypeFinder {
@Override @Override
public int rank() { public int rank() {
return rank; return child.rank();
} }
@Override @Override
protected SeqVertex head() { protected SeqVertex head() {
return vertex; return vertex;
@ -231,7 +163,7 @@ class RecursiveSubHaplotypeFinder implements KBestSubHaplotypeFinder {
@Override @Override
protected KBestHaplotype tail() { protected KBestHaplotype tail() {
return subpath; return child;
} }
@Override @Override

View File

@ -73,7 +73,6 @@ public class ReadThreadingAssembler extends LocalAssemblyEngine {
private final boolean dontIncreaseKmerSizesForCycles; private final boolean dontIncreaseKmerSizesForCycles;
private final int numPruningSamples; private final int numPruningSamples;
private boolean requireReasonableNumberOfPaths = false;
protected boolean removePathsNotConnectedToRef = true; protected boolean removePathsNotConnectedToRef = true;
private boolean justReturnRawGraph = false; private boolean justReturnRawGraph = false;
@ -207,24 +206,12 @@ public class ReadThreadingAssembler extends LocalAssemblyEngine {
initialSeqGraph.cleanNonRefPaths(); // TODO -- I don't this is possible by construction initialSeqGraph.cleanNonRefPaths(); // TODO -- I don't this is possible by construction
final AssemblyResult cleaned = cleanupSeqGraph(initialSeqGraph); final AssemblyResult cleaned = cleanupSeqGraph(initialSeqGraph);
final AssemblyResult.Status status = cleaned.getStatus() == AssemblyResult.Status.ASSEMBLED_SOME_VARIATION && requireReasonableNumberOfPaths && !reasonableNumberOfPaths(cleaned.getGraph()) ? AssemblyResult.Status.FAILED : cleaned.getStatus(); final AssemblyResult.Status status = cleaned.getStatus();
final AssemblyResult result = new AssemblyResult(status, cleaned.getGraph()); final AssemblyResult result = new AssemblyResult(status, cleaned.getGraph());
result.setThreadingGraph(rtgraph); result.setThreadingGraph(rtgraph);
return result; return result;
} }
/**
* Did we find a reasonable number of paths in this graph?
* @param graph
* @return
*/
private boolean reasonableNumberOfPaths(final SeqGraph graph) {
final KBestPaths<SeqVertex,BaseEdge> pathFinder = new KBestPaths<>(false);
final List<Path<SeqVertex,BaseEdge>> allPaths = pathFinder.getKBestPaths(graph, 100000);
logger.info("Found " + allPaths.size() + " paths through " + graph + " with maximum " + maxAllowedPathsForReadThreadingAssembler);
return allPaths.size() <= maxAllowedPathsForReadThreadingAssembler;
}
@Override @Override
public String toString() { public String toString() {
return "ReadThreadingAssembler{" + return "ReadThreadingAssembler{" +

View File

@ -137,19 +137,20 @@ public class CommonSuffixMergerUnitTest extends BaseTest {
public static void assertSameHaplotypes(final String name, final SeqGraph actual, final SeqGraph original) { public static void assertSameHaplotypes(final String name, final SeqGraph actual, final SeqGraph original) {
try { try {
final Set<String> haplotypes = new HashSet<String>(); final Set<String> haplotypes = new HashSet<String>();
final List<Path<SeqVertex,BaseEdge>> originalPaths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(original); final List<KBestHaplotype> originalKBestHaplotypes = new KBestHaplotypeFinder(original,original.getSources(),original.getSinks());
for ( final Path<SeqVertex,BaseEdge> path : originalPaths ) final List<KBestHaplotype> actualKBestHaplotypes = new KBestHaplotypeFinder(actual,actual.getSources(),actual.getSinks());
haplotypes.add(new String(path.getBases()));
final List<Path<SeqVertex,BaseEdge>> splitPaths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(actual); for (final KBestHaplotype kbh : originalKBestHaplotypes)
for ( final Path<SeqVertex,BaseEdge> path : splitPaths ) { haplotypes.add(new String(kbh.bases()));
final String h = new String(path.getBases());
for ( final KBestHaplotype kbh : actualKBestHaplotypes ) {
final String h = new String(kbh.bases());
Assert.assertTrue(haplotypes.contains(h), "Failed to find haplotype " + h); Assert.assertTrue(haplotypes.contains(h), "Failed to find haplotype " + h);
} }
if ( splitPaths.size() == originalPaths.size() ) { if ( actualKBestHaplotypes.size() == originalKBestHaplotypes.size() ) {
for ( int i = 0; i < originalPaths.size(); i++ ) { for ( int i = 0; i < originalKBestHaplotypes.size(); i++ ) {
Assert.assertTrue(splitPaths.get(i).equalSequence(originalPaths.get(i)), "Paths not equal " + splitPaths.get(i) + " vs. original " + originalPaths.get(i)); Assert.assertTrue(actualKBestHaplotypes.get(i).haplotype().getBaseString().equals(originalKBestHaplotypes.get(i).haplotype().getBaseString()), "Paths not equal " + actualKBestHaplotypes.get(i).haplotype() + " vs. original " + originalKBestHaplotypes.get(i).haplotype());
} }
} }
} catch ( AssertionError e ) { } catch ( AssertionError e ) {

View File

@ -66,31 +66,25 @@ import java.util.*;
* Date: 1/31/13 * Date: 1/31/13
*/ */
public class KBestPathsUnitTest extends BaseTest { public class KBestHaplotypeFinderUnitTest extends BaseTest {
private final static boolean DEBUG = false;
@DataProvider(name = "BasicPathFindingData") @DataProvider(name = "BasicPathFindingData")
public Object[][] makeBasicPathFindingData() { public Object[][] makeBasicPathFindingData() {
List<Object[]> tests = new ArrayList<Object[]>(); final List<Object[]> tests = new ArrayList<>();
for ( final boolean allowCycles : Arrays.asList(false, true)) { for ( final int nStartNodes : Arrays.asList(1, 2, 3) ) {
for ( final int nStartNodes : Arrays.asList(1, 2, 3) ) { for ( final int nBranchesPerBubble : Arrays.asList(2, 3) ) {
for ( final int nBranchesPerBubble : Arrays.asList(2, 3) ) { for ( final int nEndNodes : Arrays.asList(1, 2, 3) ) {
for ( final int nEndNodes : Arrays.asList(1, 2, 3) ) { tests.add(new Object[]{nStartNodes, nBranchesPerBubble, nEndNodes});
for ( final boolean addCycle : Arrays.asList(true, false) ) {
tests.add(new Object[]{nStartNodes, nBranchesPerBubble, nEndNodes, addCycle, allowCycles});
}
}
} }
} }
} }
return tests.toArray(new Object[][]{}); return tests.toArray(new Object[][]{});
} }
private static int weight = 1; private static int weight = 1;
final Set<SeqVertex> createVertices(final SeqGraph graph, final int n, final SeqVertex source, final SeqVertex target) { final Set<SeqVertex> createVertices(final SeqGraph graph, final int n, final SeqVertex source, final SeqVertex target) {
final List<String> seqs = Arrays.asList("A", "C", "G", "T"); final List<String> seqs = Arrays.asList("A", "C", "G", "T");
final Set<SeqVertex> vertices = new LinkedHashSet<SeqVertex>(); final Set<SeqVertex> vertices = new LinkedHashSet<>();
for ( int i = 0; i < n; i++ ) { for ( int i = 0; i < n; i++ ) {
final SeqVertex v = new SeqVertex(seqs.get(i)); final SeqVertex v = new SeqVertex(seqs.get(i));
graph.addVertex(v); graph.addVertex(v);
@ -101,77 +95,42 @@ public class KBestPathsUnitTest extends BaseTest {
return vertices; return vertices;
} }
@Test(dataProvider = "BasicPathFindingData", enabled = !DEBUG) @Test(dataProvider = "BasicPathFindingData")
public void testBasicPathFinding(final int nStartNodes, final int nBranchesPerBubble, final int nEndNodes, final boolean addCycle, final boolean allowCycles) { public void testBasicPathFinding(final int nStartNodes, final int nBranchesPerBubble, final int nEndNodes) {
SeqGraph graph = new SeqGraph(11); final SeqGraph graph = new SeqGraph(11);
final SeqVertex middleTop = new SeqVertex("GTAC"); final SeqVertex middleTop = new SeqVertex("GTAC");
final SeqVertex middleBottom = new SeqVertex("ACTG"); final SeqVertex middleBottom = new SeqVertex("ACTG");
graph.addVertices(middleTop, middleBottom); graph.addVertices(middleTop, middleBottom);
final Set<SeqVertex> starts = createVertices(graph, nStartNodes, null, middleTop); final Set<SeqVertex> starts = createVertices(graph, nStartNodes, null, middleTop);
@SuppressWarnings("unused")
final Set<SeqVertex> bubbles = createVertices(graph, nBranchesPerBubble, middleTop, middleBottom); final Set<SeqVertex> bubbles = createVertices(graph, nBranchesPerBubble, middleTop, middleBottom);
final Set<SeqVertex> ends = createVertices(graph, nEndNodes, middleBottom, null); final Set<SeqVertex> ends = createVertices(graph, nEndNodes, middleBottom, null);
if ( addCycle ) graph.addEdge(middleBottom, middleBottom);
// enumerate all possible paths // enumerate all possible paths
final List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>(allowCycles).getKBestPaths(graph, starts, ends); final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph, starts, ends);
final int expectedNumOfPaths = nStartNodes * nBranchesPerBubble * (addCycle && allowCycles ? 2 : 1) * nEndNodes; final int expectedNumOfPaths = nStartNodes * nBranchesPerBubble * nEndNodes;
Assert.assertEquals(paths.size(), expectedNumOfPaths, "Didn't find the expected number of paths"); Assert.assertEquals(paths.size(), expectedNumOfPaths, "Didn't find the expected number of paths");
int lastScore = Integer.MAX_VALUE; int lastScore = Integer.MAX_VALUE;
for ( final Path path : paths ) { for ( final KBestHaplotype kbh : paths ) {
final Path<SeqVertex,BaseEdge> path = kbh.path();
Assert.assertTrue(path.getScore() <= lastScore, "Paths out of order. Path " + path + " has score above previous " + lastScore); Assert.assertTrue(path.getScore() <= lastScore, "Paths out of order. Path " + path + " has score above previous " + lastScore);
lastScore = path.getScore(); lastScore = path.getScore();
} }
// get the best path, and make sure it's the same as our optimal path overall // get the best path, and make sure it's the same as our optimal path overall
final Path best = paths.get(0); final Path<SeqVertex,BaseEdge> best = paths.get(0).path();
final List<Path<SeqVertex,BaseEdge>> justOne = new KBestPaths<SeqVertex,BaseEdge>(allowCycles).getKBestPaths(graph, 1, starts, ends); final List<KBestHaplotype> justOne = new KBestHaplotypeFinder(graph,starts, ends).subList(0,1);
Assert.assertEquals(justOne.size(), 1); Assert.assertEquals(justOne.size(), 1);
Assert.assertTrue(justOne.get(0).pathsAreTheSame(best), "Best path from complete enumerate " + best + " not the same as from k = 1 search " + justOne.get(0));
}
@Test(enabled = false) // No longer supported, but no longer needed. Assert.assertTrue(justOne.get(0).path().pathsAreTheSame(best), "Best path from complete enumerate " + best + " not the same as from k = 1 search " + justOne.get(0));
public void testPathFindingComplexCycle() {
SeqGraph graph = new SeqGraph(11);
final SeqVertex v1 = new SeqVertex("A");
final SeqVertex v2 = new SeqVertex("C");
final SeqVertex v3 = new SeqVertex("G");
final SeqVertex v4 = new SeqVertex("T");
final SeqVertex v5 = new SeqVertex("AA");
graph.addVertices(v1, v2, v3, v4, v5);
graph.addEdges(v1, v2, v3, v4, v5);
graph.addEdges(v3, v3);
graph.addEdges(v4, v2);
// enumerate all possible paths
final List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>(false).getKBestPaths(graph, v1, v5);
Assert.assertEquals(paths.size(), 1, "Didn't find the expected number of paths");
}
@Test(enabled = false) // No longer supported, but no longer needed.
public void testPathFindingCycleLastNode() {
SeqGraph graph = new SeqGraph(11);
final SeqVertex v1 = new SeqVertex("A");
final SeqVertex v2 = new SeqVertex("C");
final SeqVertex v3 = new SeqVertex("G");
graph.addVertices(v1, v2, v3);
graph.addEdges(v1, v2, v3, v3);
// enumerate all possible paths
final List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>(false).getKBestPaths(graph, v1, v3);
Assert.assertEquals(paths.size(), 1, "Didn't find the expected number of paths");
} }
@DataProvider(name = "BasicBubbleDataProvider") @DataProvider(name = "BasicBubbleDataProvider")
public Object[][] makeBasicBubbleDataProvider() { public Object[][] makeBasicBubbleDataProvider() {
List<Object[]> tests = new ArrayList<Object[]>(); final List<Object[]> tests = new ArrayList<>();
for ( final int refBubbleLength : Arrays.asList(1, 5, 10) ) { for ( final int refBubbleLength : Arrays.asList(1, 5, 10) ) {
for ( final int altBubbleLength : Arrays.asList(1, 5, 10) ) { for ( final int altBubbleLength : Arrays.asList(1, 5, 10) ) {
tests.add(new Object[]{refBubbleLength, altBubbleLength}); tests.add(new Object[]{refBubbleLength, altBubbleLength});
@ -180,7 +139,7 @@ public class KBestPathsUnitTest extends BaseTest {
return tests.toArray(new Object[][]{}); return tests.toArray(new Object[][]{});
} }
@Test(dataProvider = "BasicBubbleDataProvider", enabled = !DEBUG) @Test(dataProvider = "BasicBubbleDataProvider")
public void testBasicBubbleData(final int refBubbleLength, final int altBubbleLength) { public void testBasicBubbleData(final int refBubbleLength, final int altBubbleLength) {
// Construct the assembly graph // Construct the assembly graph
SeqGraph graph = new SeqGraph(3); SeqGraph graph = new SeqGraph(3);
@ -202,9 +161,9 @@ public class KBestPathsUnitTest extends BaseTest {
graph.addEdge(v2Alt, v3, new BaseEdge(false, 5)); graph.addEdge(v2Alt, v3, new BaseEdge(false, 5));
// Construct the test path // Construct the test path
Path<SeqVertex,BaseEdge> path = new Path<SeqVertex,BaseEdge>(v, graph); Path<SeqVertex,BaseEdge> path = new Path<>(v, graph);
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v, v2Alt)); path = new Path<>(path, graph.getEdge(v, v2Alt));
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v2Alt, v3)); path = new Path<>(path, graph.getEdge(v2Alt, v3));
// Construct the actual cigar string implied by the test path // Construct the actual cigar string implied by the test path
Cigar expectedCigar = new Cigar(); Cigar expectedCigar = new Cigar();
@ -226,7 +185,7 @@ public class KBestPathsUnitTest extends BaseTest {
@DataProvider(name = "GetBasesData") @DataProvider(name = "GetBasesData")
public Object[][] makeGetBasesData() { public Object[][] makeGetBasesData() {
List<Object[]> tests = new ArrayList<Object[]>(); List<Object[]> tests = new ArrayList<>();
final List<String> frags = Arrays.asList("ACT", "GAC", "CAT"); final List<String> frags = Arrays.asList("ACT", "GAC", "CAT");
@ -238,14 +197,14 @@ public class KBestPathsUnitTest extends BaseTest {
return tests.toArray(new Object[][]{}); return tests.toArray(new Object[][]{});
} }
@Test(dataProvider = "GetBasesData", enabled = !DEBUG) @Test(dataProvider = "GetBasesData")
public void testGetBases(final List<String> frags) { public void testGetBases(final List<String> frags) {
// Construct the assembly graph // Construct the assembly graph
SeqGraph graph = new SeqGraph(3); SeqGraph graph = new SeqGraph(3);
SeqVertex prev = null; SeqVertex prev = null;
for ( int i = 0; i < frags.size(); i++ ) { for (final String s : frags) {
SeqVertex v = new SeqVertex(frags.get(i)); SeqVertex v = new SeqVertex(s);
graph.addVertex(v); graph.addVertex(v);
if ( prev != null ) if ( prev != null )
graph.addEdge(prev, v); graph.addEdge(prev, v);
@ -253,15 +212,15 @@ public class KBestPathsUnitTest extends BaseTest {
} }
// enumerate all possible paths // enumerate all possible paths
final List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(graph); final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph,graph.getSources(),graph.getSinks());
Assert.assertEquals(paths.size(), 1); Assert.assertEquals(paths.size(), 1);
final Path<SeqVertex,BaseEdge> path = paths.get(0); final Path<SeqVertex,BaseEdge> path = paths.get(0).path();
Assert.assertEquals(new String(path.getBases()), Utils.join("", frags), "Path doesn't have the expected sequence"); Assert.assertEquals(new String(path.getBases()), Utils.join("", frags), "Path doesn't have the expected sequence");
} }
@DataProvider(name = "TripleBubbleDataProvider") @DataProvider(name = "TripleBubbleDataProvider")
public Object[][] makeTripleBubbleDataProvider() { public Object[][] makeTripleBubbleDataProvider() {
List<Object[]> tests = new ArrayList<Object[]>(); final List<Object[]> tests = new ArrayList<>();
for ( final int refBubbleLength : Arrays.asList(1, 5, 10) ) { for ( final int refBubbleLength : Arrays.asList(1, 5, 10) ) {
for ( final int altBubbleLength : Arrays.asList(1, 5, 10) ) { for ( final int altBubbleLength : Arrays.asList(1, 5, 10) ) {
for ( final boolean offRefEnding : Arrays.asList(true, false) ) { for ( final boolean offRefEnding : Arrays.asList(true, false) ) {
@ -274,7 +233,7 @@ public class KBestPathsUnitTest extends BaseTest {
return tests.toArray(new Object[][]{}); return tests.toArray(new Object[][]{});
} }
@Test(dataProvider = "TripleBubbleDataProvider", enabled = !DEBUG) @Test(dataProvider = "TripleBubbleDataProvider")
public void testTripleBubbleData(final int refBubbleLength, final int altBubbleLength, final boolean offRefBeginning, final boolean offRefEnding) { public void testTripleBubbleData(final int refBubbleLength, final int altBubbleLength, final boolean offRefBeginning, final boolean offRefEnding) {
// Construct the assembly graph // Construct the assembly graph
SeqGraph graph = new SeqGraph(11); SeqGraph graph = new SeqGraph(11);
@ -328,19 +287,17 @@ public class KBestPathsUnitTest extends BaseTest {
graph.addEdge(v7, postV, new BaseEdge(false, 1)); graph.addEdge(v7, postV, new BaseEdge(false, 1));
// Construct the test path // Construct the test path
Path<SeqVertex,BaseEdge> path = new Path<SeqVertex,BaseEdge>( (offRefBeginning ? preV : v), graph); Path<SeqVertex,BaseEdge> path = new Path<>( (offRefBeginning ? preV : v), graph);
if( offRefBeginning ) { if( offRefBeginning )
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(preV, v)); path = new Path<>(path, graph.getEdge(preV, v));
} path = new Path<>(path, graph.getEdge(v, v2Alt));
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v, v2Alt)); path = new Path<>(path, graph.getEdge(v2Alt, v3));
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v2Alt, v3)); path = new Path<>(path, graph.getEdge(v3, v4Ref));
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v3, v4Ref)); path = new Path<>(path, graph.getEdge(v4Ref, v5));
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v4Ref, v5)); path = new Path<>(path, graph.getEdge(v5, v6Alt));
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v5, v6Alt)); path = new Path<>(path, graph.getEdge(v6Alt, v7));
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v6Alt, v7)); if( offRefEnding )
if( offRefEnding ) { path = new Path<>(path, graph.getEdge(v7,postV));
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v7,postV));
}
// Construct the actual cigar string implied by the test path // Construct the actual cigar string implied by the test path
Cigar expectedCigar = new Cigar(); Cigar expectedCigar = new Cigar();
@ -382,10 +339,10 @@ public class KBestPathsUnitTest extends BaseTest {
"Cigar string mismatch: ref = " + ref + " alt " + new String(path.getBases())); "Cigar string mismatch: ref = " + ref + " alt " + new String(path.getBases()));
} }
@Test(enabled = !DEBUG) @Test
public void testIntraNodeInsertionDeletion() { public void testIntraNodeInsertionDeletion() {
// Construct the assembly graph // Construct the assembly graph
SeqGraph graph = new SeqGraph(11); final SeqGraph graph = new SeqGraph(11);
final SeqVertex top = new SeqVertex("T"); final SeqVertex top = new SeqVertex("T");
final SeqVertex bot = new SeqVertex("T"); final SeqVertex bot = new SeqVertex("T");
final SeqVertex alt = new SeqVertex("AAACCCCC"); final SeqVertex alt = new SeqVertex("AAACCCCC");
@ -395,38 +352,38 @@ public class KBestPathsUnitTest extends BaseTest {
graph.addEdges(new BaseEdge(true, 1), top, ref, bot); graph.addEdges(new BaseEdge(true, 1), top, ref, bot);
graph.addEdges(new BaseEdge(false, 1), top, alt, bot); graph.addEdges(new BaseEdge(false, 1), top, alt, bot);
final KBestPaths<SeqVertex,BaseEdge> pathFinder = new KBestPaths<SeqVertex,BaseEdge>(); @SuppressWarnings("all")
final List<Path<SeqVertex,BaseEdge>> paths = pathFinder.getKBestPaths(graph, top, bot); final KBestHaplotypeFinder bestPathFinder = new KBestHaplotypeFinder(graph,top,bot);
Assert.assertEquals(paths.size(), 2); Assert.assertEquals(bestPathFinder.size(), 2);
final Path<SeqVertex,BaseEdge> refPath = paths.get(0); final Path<SeqVertex,BaseEdge> refPath = bestPathFinder.get(0).path();
final Path<SeqVertex,BaseEdge> altPath = paths.get(1); final Path<SeqVertex,BaseEdge> altPath = bestPathFinder.get(1).path();
final String refString = top.getSequenceString() + ref.getSequenceString() + bot.getSequenceString(); final String refString = top.getSequenceString() + ref.getSequenceString() + bot.getSequenceString();
Assert.assertEquals(refPath.calculateCigar(refString.getBytes()).toString(), "10M"); Assert.assertEquals(refPath.calculateCigar(refString.getBytes()).toString(), "10M");
Assert.assertEquals(altPath.calculateCigar(refString.getBytes()).toString(), "1M3I5M3D1M"); Assert.assertEquals(altPath.calculateCigar(refString.getBytes()).toString(), "1M3I5M3D1M");
} }
@Test(enabled = !DEBUG) @Test
public void testHardSWPath() { public void testHardSWPath() {
// Construct the assembly graph // Construct the assembly graph
SeqGraph graph = new SeqGraph(11); final SeqGraph graph = new SeqGraph(11);
final SeqVertex top = new SeqVertex( "NNN" ); final SeqVertex top = new SeqVertex( "NNN" );
final SeqVertex bot = new SeqVertex( "NNN" ); final SeqVertex bot = new SeqVertex( "NNN" );
final SeqVertex alt = new SeqVertex( "ACAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA" ); final SeqVertex alt = new SeqVertex( "ACAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA" );
final SeqVertex ref = new SeqVertex( "TGTGTGTGTGTGTGACAGAGAGAGAGAGAGAGAGAGAGAGAGAGA" ); final SeqVertex ref = new SeqVertex( "TGTGTGTGTGTGTGACAGAGAGAGAGAGAGAGAGAGAGAGAGAGA" );
graph.addVertices(top, bot, alt, ref); graph.addVertices(top, bot, alt, ref);
graph.addEdges(new BaseEdge(true, 1), top, ref, bot); graph.addEdges(new BaseEdge(true, 1), top, ref, bot);
graph.addEdges(new BaseEdge(false, 1), top, alt, bot); graph.addEdges(new BaseEdge(false, 1), top, alt, bot);
final KBestPaths<SeqVertex,BaseEdge> pathFinder = new KBestPaths<SeqVertex,BaseEdge>(); @SuppressWarnings("all")
final List<Path<SeqVertex,BaseEdge>> paths = pathFinder.getKBestPaths(graph, top, bot); final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph, top, bot);
Assert.assertEquals(paths.size(), 2); Assert.assertEquals(paths.size(), 2);
final Path<SeqVertex,BaseEdge> refPath = paths.get(0); final Path<SeqVertex,BaseEdge> refPath = paths.get(0).path();
final Path<SeqVertex,BaseEdge> altPath = paths.get(1); final Path<SeqVertex,BaseEdge> altPath = paths.get(1).path();
final String refString = top.getSequenceString() + ref.getSequenceString() + bot.getSequenceString(); final String refString = top.getSequenceString() + ref.getSequenceString() + bot.getSequenceString();
@ -446,7 +403,7 @@ public class KBestPathsUnitTest extends BaseTest {
@DataProvider(name = "SystematicRefAltSWTestData") @DataProvider(name = "SystematicRefAltSWTestData")
public Object[][] makeSystematicRefAltSWTestData() { public Object[][] makeSystematicRefAltSWTestData() {
List<Object[]> tests = new ArrayList<Object[]>(); final List<Object[]> tests = new ArrayList<>();
final List<List<String>> allDiffs = Arrays.asList( final List<List<String>> allDiffs = Arrays.asList(
Arrays.asList("G", "C", "1M"), Arrays.asList("G", "C", "1M"),
@ -470,7 +427,7 @@ public class KBestPathsUnitTest extends BaseTest {
return tests.toArray(new Object[][]{}); return tests.toArray(new Object[][]{});
} }
@Test(dataProvider = "SystematicRefAltSWTestData", enabled = !DEBUG) @Test(dataProvider = "SystematicRefAltSWTestData")
public void testRefAltSW(final String prefix, final String end, final String refMid, final String altMid, final String midCigar) { public void testRefAltSW(final String prefix, final String end, final String refMid, final String altMid, final String midCigar) {
// Construct the assembly graph // Construct the assembly graph
SeqGraph graph = new SeqGraph(11); SeqGraph graph = new SeqGraph(11);
@ -506,7 +463,7 @@ public class KBestPathsUnitTest extends BaseTest {
Assert.assertEquals(pathCigar, expected, "Cigar mismatch: ref = " + refString + " vs alt = " + new String(path.getBases())); Assert.assertEquals(pathCigar, expected, "Cigar mismatch: ref = " + refString + " vs alt = " + new String(path.getBases()));
} }
@Test(enabled = !DEBUG) @Test
public void testLeftAlignCigarSequentially() { public void testLeftAlignCigarSequentially() {
String preRefString = "GATCGATCGATC"; String preRefString = "GATCGATCGATC";
String postRefString = "TTT"; String postRefString = "TTT";

View File

@ -61,7 +61,7 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
@DataProvider(name = "PrefixSuffixData") @DataProvider(name = "PrefixSuffixData")
public Object[][] makePrefixSuffixData() { public Object[][] makePrefixSuffixData() {
List<Object[]> tests = new ArrayList<Object[]>(); final List<Object[]> tests = new ArrayList<>();
tests.add(new Object[]{Arrays.asList("A", "C"), 0, 0}); tests.add(new Object[]{Arrays.asList("A", "C"), 0, 0});
tests.add(new Object[]{Arrays.asList("C", "C"), 1, 0}); tests.add(new Object[]{Arrays.asList("C", "C"), 1, 0});
@ -91,7 +91,7 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
@Test(dataProvider = "PrefixSuffixData") @Test(dataProvider = "PrefixSuffixData")
public void testPrefixSuffix(final List<String> strings, int expectedPrefixLen, int expectedSuffixLen) { public void testPrefixSuffix(final List<String> strings, int expectedPrefixLen, int expectedSuffixLen) {
final List<byte[]> bytes = new ArrayList<byte[]>(); final List<byte[]> bytes = new ArrayList<>();
int min = Integer.MAX_VALUE; int min = Integer.MAX_VALUE;
for ( final String s : strings ) { for ( final String s : strings ) {
bytes.add(s.getBytes()); bytes.add(s.getBytes());
@ -107,7 +107,7 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
@Test(dataProvider = "PrefixSuffixData") @Test(dataProvider = "PrefixSuffixData")
public void testPrefixSuffixVertices(final List<String> strings, int expectedPrefixLen, int expectedSuffixLen) { public void testPrefixSuffixVertices(final List<String> strings, int expectedPrefixLen, int expectedSuffixLen) {
final List<SeqVertex> v = new ArrayList<SeqVertex>(); final List<SeqVertex> v = new ArrayList<>();
for ( final String s : strings ) { for ( final String s : strings ) {
v.add(new SeqVertex(s)); v.add(new SeqVertex(s));
} }
@ -127,19 +127,18 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
public void testSplitter(final List<String> strings, int expectedPrefixLen, int expectedSuffixLen) { public void testSplitter(final List<String> strings, int expectedPrefixLen, int expectedSuffixLen) {
final SeqGraph graph = new SeqGraph(11); final SeqGraph graph = new SeqGraph(11);
final List<SeqVertex> v = new ArrayList<SeqVertex>(); final List<SeqVertex> v = new ArrayList<>();
for ( final String s : strings ) { for ( final String s : strings ) {
v.add(new SeqVertex(s)); v.add(new SeqVertex(s));
} }
graph.addVertices(v.toArray(new SeqVertex[]{})); graph.addVertices(v.toArray(new SeqVertex[v.size()]));
final String expectedPrefix = strings.get(0).substring(0, expectedPrefixLen); final String expectedPrefix = strings.get(0).substring(0, expectedPrefixLen);
final String expectedSuffix = strings.get(0).substring(strings.get(0).length() - expectedSuffixLen); final String expectedSuffix = strings.get(0).substring(strings.get(0).length() - expectedSuffixLen);
final SharedVertexSequenceSplitter splitter = new SharedVertexSequenceSplitter(graph, v); final SharedVertexSequenceSplitter splitter = new SharedVertexSequenceSplitter(graph, v);
splitter.split(); splitter.split();
// splitter.splitGraph.printGraph(new File(Utils.join("_", strings) + ".dot"), 0);
Assert.assertEquals(splitter.prefixV.getSequenceString(), expectedPrefix); Assert.assertEquals(splitter.prefixV.getSequenceString(), expectedPrefix);
Assert.assertEquals(splitter.suffixV.getSequenceString(), expectedSuffix); Assert.assertEquals(splitter.suffixV.getSequenceString(), expectedSuffix);
@ -158,7 +157,7 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
@DataProvider(name = "CompleteCycleData") @DataProvider(name = "CompleteCycleData")
public Object[][] makeCompleteCycleData() { public Object[][] makeCompleteCycleData() {
List<Object[]> tests = new ArrayList<Object[]>(); List<Object[]> tests = new ArrayList<>();
for ( final boolean hasTop : Arrays.asList(true, false) ) { for ( final boolean hasTop : Arrays.asList(true, false) ) {
for ( final boolean hasBot : Arrays.asList(true, false) ) { for ( final boolean hasBot : Arrays.asList(true, false) ) {
@ -207,11 +206,11 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
int edgeWeight = 1; int edgeWeight = 1;
final SeqVertex top = hasTop ? new SeqVertex("AAAAAAAA") : null; final SeqVertex top = hasTop ? new SeqVertex("AAAAAAAA") : null;
final SeqVertex bot = hasBot ? new SeqVertex("GGGGGGGG") : null; final SeqVertex bot = hasBot ? new SeqVertex("GGGGGGGG") : null;
final List<SeqVertex> v = new ArrayList<SeqVertex>(); final List<SeqVertex> v = new ArrayList<>();
for ( final String s : strings ) { for ( final String s : strings ) {
v.add(new SeqVertex(s)); v.add(new SeqVertex(s));
} }
graph.addVertices(v.toArray(new SeqVertex[]{})); graph.addVertices(v.toArray(new SeqVertex[v.size()]));
final SeqVertex first = v.get(0); final SeqVertex first = v.get(0);
if ( hasTop ) { if ( hasTop ) {
@ -226,10 +225,10 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
graph.addEdge(vi, bot, new BaseEdge(vi == first, edgeWeight++)); graph.addEdge(vi, bot, new BaseEdge(vi == first, edgeWeight++));
} }
final Set<String> haplotypes = new HashSet<String>(); final Set<String> haplotypes = new HashSet<>();
final List<Path<SeqVertex,BaseEdge>> originalPaths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths((SeqGraph)graph.clone()); final List<KBestHaplotype> originalPaths = new KBestHaplotypeFinder((SeqGraph) graph.clone(),graph.getSources(),graph.getSinks());
for ( final Path<SeqVertex,BaseEdge> path : originalPaths ) for ( final KBestHaplotype path : originalPaths )
haplotypes.add(new String(path.getBases())); haplotypes.add(new String(path.bases()));
final SharedVertexSequenceSplitter splitter = new SharedVertexSequenceSplitter(graph, v); final SharedVertexSequenceSplitter splitter = new SharedVertexSequenceSplitter(graph, v);
splitter.split(); splitter.split();
@ -238,22 +237,22 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
splitter.updateGraph(top, bot); splitter.updateGraph(top, bot);
if ( PRINT_GRAPHS ) graph.printGraph(new File(Utils.join("_", strings) + ".updated.dot"), 0); if ( PRINT_GRAPHS ) graph.printGraph(new File(Utils.join("_", strings) + ".updated.dot"), 0);
final List<Path<SeqVertex,BaseEdge>> splitPaths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(graph); final List<KBestHaplotype> splitPaths = new KBestHaplotypeFinder(graph,graph.getSources(),graph.getSinks());
for ( final Path<SeqVertex,BaseEdge> path : splitPaths ) { for ( final KBestHaplotype path : splitPaths ) {
final String h = new String(path.getBases()); final String h = new String(path.bases());
Assert.assertTrue(haplotypes.contains(h), "Failed to find haplotype " + h); Assert.assertTrue(haplotypes.contains(h), "Failed to find haplotype " + h);
} }
if ( splitPaths.size() == originalPaths.size() ) { if ( splitPaths.size() == originalPaths.size() ) {
for ( int i = 0; i < originalPaths.size(); i++ ) { for ( int i = 0; i < originalPaths.size(); i++ ) {
Assert.assertTrue(splitPaths.get(i).equalScoreAndSequence(originalPaths.get(i)), "Paths not equal " + splitPaths.get(i) + " vs. original " + originalPaths.get(i)); Assert.assertTrue(splitPaths.get(i).path().equalScoreAndSequence(originalPaths.get(i).path()), "Paths not equal " + splitPaths.get(i) + " vs. original " + originalPaths.get(i));
} }
} }
} }
@DataProvider(name = "MeetsMinSequenceData") @DataProvider(name = "MeetsMinSequenceData")
public Object[][] makeMeetsMinSequenceData() { public Object[][] makeMeetsMinSequenceData() {
List<Object[]> tests = new ArrayList<Object[]>(); final List<Object[]> tests = new ArrayList<>();
final boolean prefixBiased = SharedVertexSequenceSplitter.prefersPrefixMerging(); final boolean prefixBiased = SharedVertexSequenceSplitter.prefersPrefixMerging();
tests.add(new Object[]{Arrays.asList("AC", "AC"), 0, true, true}); tests.add(new Object[]{Arrays.asList("AC", "AC"), 0, true, true});
@ -280,9 +279,9 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
final SeqVertex top = new SeqVertex("AAAAAAAA"); final SeqVertex top = new SeqVertex("AAAAAAAA");
final SeqVertex bot = new SeqVertex("GGGGGGGG"); final SeqVertex bot = new SeqVertex("GGGGGGGG");
final List<SeqVertex> v = new ArrayList<SeqVertex>(); final List<SeqVertex> v = new ArrayList<>();
for ( final String s : mids ) { v.add(new SeqVertex(s)); } for ( final String s : mids ) { v.add(new SeqVertex(s)); }
graph.addVertices(v.toArray(new SeqVertex[]{})); graph.addVertices(v.toArray(new SeqVertex[v.size()]));
graph.addVertices(top, bot); graph.addVertices(top, bot);
for ( final SeqVertex vi : v ) { graph.addEdge(top, vi); graph.addEdge(vi, bot); } for ( final SeqVertex vi : v ) { graph.addEdge(top, vi); graph.addEdge(vi, bot); }

View File

@ -46,10 +46,11 @@
package org.broadinstitute.sting.gatk.walkers.haplotypecaller.readthreading; package org.broadinstitute.sting.gatk.walkers.haplotypecaller.readthreading;
import net.sf.samtools.Cigar;
import net.sf.samtools.TextCigarCodec; import net.sf.samtools.TextCigarCodec;
import org.broadinstitute.sting.BaseTest; import org.broadinstitute.sting.BaseTest;
import org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs.*; import org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs.KBestHaplotype;
import org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs.KBestHaplotypeFinder;
import org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs.SeqGraph;
import org.broadinstitute.sting.utils.Utils; import org.broadinstitute.sting.utils.Utils;
import org.broadinstitute.sting.utils.sam.ArtificialSAMUtils; import org.broadinstitute.sting.utils.sam.ArtificialSAMUtils;
import org.broadinstitute.sting.utils.sam.GATKSAMRecord; import org.broadinstitute.sting.utils.sam.GATKSAMRecord;
@ -57,7 +58,8 @@ import org.testng.Assert;
import org.testng.annotations.DataProvider; import org.testng.annotations.DataProvider;
import org.testng.annotations.Test; import org.testng.annotations.Test;
import java.util.*; import java.util.ArrayList;
import java.util.List;
public class DanglingChainMergingGraphUnitTest extends BaseTest { public class DanglingChainMergingGraphUnitTest extends BaseTest {
@ -235,7 +237,7 @@ public class DanglingChainMergingGraphUnitTest extends BaseTest {
// confirm that we created the appropriate bubble in the graph only if expected // confirm that we created the appropriate bubble in the graph only if expected
rtgraph.cleanNonRefPaths(); rtgraph.cleanNonRefPaths();
final SeqGraph seqGraph = rtgraph.convertToSequenceGraph(); final SeqGraph seqGraph = rtgraph.convertToSequenceGraph();
List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(seqGraph, seqGraph.getReferenceSourceVertex(), seqGraph.getReferenceSinkVertex()); final List<KBestHaplotype> paths = new KBestHaplotypeFinder(seqGraph, seqGraph.getReferenceSourceVertex(), seqGraph.getReferenceSinkVertex());
Assert.assertEquals(paths.size(), shouldBeMerged ? 2 : 1); Assert.assertEquals(paths.size(), shouldBeMerged ? 2 : 1);
} }
} }

View File

@ -59,13 +59,14 @@ import java.io.File;
import java.util.*; import java.util.*;
public class ReadThreadingAssemblerUnitTest extends BaseTest { public class ReadThreadingAssemblerUnitTest extends BaseTest {
private final static boolean DEBUG = false; private final static boolean DEBUG = false;
private static class TestAssembler { private static class TestAssembler {
final ReadThreadingAssembler assembler; final ReadThreadingAssembler assembler;
Haplotype refHaplotype; Haplotype refHaplotype;
final List<GATKSAMRecord> reads = new LinkedList<GATKSAMRecord>(); final List<GATKSAMRecord> reads = new LinkedList<>();
private TestAssembler(final int kmerSize) { private TestAssembler(final int kmerSize) {
this.assembler = new ReadThreadingAssembler(100000, Arrays.asList(kmerSize)); this.assembler = new ReadThreadingAssembler(100000, Arrays.asList(kmerSize));
@ -102,11 +103,11 @@ public class ReadThreadingAssemblerUnitTest extends BaseTest {
private void assertSingleBubble(final TestAssembler assembler, final String one, final String two) { private void assertSingleBubble(final TestAssembler assembler, final String one, final String two) {
final SeqGraph graph = assembler.assemble(); final SeqGraph graph = assembler.assemble();
graph.simplifyGraph(); graph.simplifyGraph();
List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(graph); final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph);
Assert.assertEquals(paths.size(), 2); Assert.assertEquals(paths.size(), 2);
final Set<String> expected = new HashSet<String>(Arrays.asList(one, two)); final Set<String> expected = new HashSet<>(Arrays.asList(one, two));
for ( final Path<SeqVertex,BaseEdge> path : paths ) { for ( final KBestHaplotype path : paths ) {
final String seq = new String(path.getBases()); final String seq = new String(path.bases());
Assert.assertTrue(expected.contains(seq)); Assert.assertTrue(expected.contains(seq));
expected.remove(seq); expected.remove(seq);
} }
@ -169,7 +170,7 @@ public class ReadThreadingAssemblerUnitTest extends BaseTest {
Assert.assertNotNull(graph.getReferenceSourceVertex()); Assert.assertNotNull(graph.getReferenceSourceVertex());
Assert.assertNotNull(graph.getReferenceSinkVertex()); Assert.assertNotNull(graph.getReferenceSinkVertex());
final List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(graph); final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph);
Assert.assertEquals(paths.size(), 2); Assert.assertEquals(paths.size(), 2);
} }
@ -226,11 +227,10 @@ public class ReadThreadingAssemblerUnitTest extends BaseTest {
assembler.addSequence(ReadThreadingGraphUnitTest.getBytes(read2), false); assembler.addSequence(ReadThreadingGraphUnitTest.getBytes(read2), false);
final SeqGraph graph = assembler.assemble(); final SeqGraph graph = assembler.assemble();
final KBestPaths<SeqVertex,BaseEdge> pathFinder = new KBestPaths<SeqVertex,BaseEdge>(); final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph);
final List<Path<SeqVertex,BaseEdge>> paths = pathFinder.getKBestPaths(graph);
Assert.assertEquals(paths.size(), 2); Assert.assertEquals(paths.size(), 2);
final byte[] refPath = paths.get(0).getBases().length == ref.length() ? paths.get(0).getBases() : paths.get(1).getBases(); final byte[] refPath = paths.get(0).bases().length == ref.length() ? paths.get(0).bases() : paths.get(1).bases();
final byte[] altPath = paths.get(0).getBases().length == ref.length() ? paths.get(1).getBases() : paths.get(0).getBases(); final byte[] altPath = paths.get(0).bases().length == ref.length() ? paths.get(1).bases() : paths.get(0).bases();
Assert.assertEquals(refPath, ReadThreadingGraphUnitTest.getBytes(ref)); Assert.assertEquals(refPath, ReadThreadingGraphUnitTest.getBytes(ref));
Assert.assertEquals(altPath, ReadThreadingGraphUnitTest.getBytes(read1)); Assert.assertEquals(altPath, ReadThreadingGraphUnitTest.getBytes(read1));
} }

View File

@ -212,8 +212,7 @@ public class ReadThreadingGraphUnitTest extends BaseTest {
rtgraph.buildGraphIfNecessary(); rtgraph.buildGraphIfNecessary();
final SeqGraph graph = rtgraph.convertToSequenceGraph(); final SeqGraph graph = rtgraph.convertToSequenceGraph();
final KBestPaths<SeqVertex,BaseEdge> pathFinder = new KBestPaths<>(false); Assert.assertEquals(new KBestHaplotypeFinder(graph, graph.getReferenceSourceVertex(), graph.getReferenceSinkVertex()).size(), 1);
Assert.assertEquals(pathFinder.getKBestPaths(graph, length, graph.getReferenceSourceVertex(), graph.getReferenceSinkVertex()).size(), 1);
} }
// TODO -- update to use determineKmerSizeAndNonUniques directly // TODO -- update to use determineKmerSizeAndNonUniques directly

View File

@ -34,7 +34,6 @@ import org.broad.tribble.TribbleException;
import org.broad.tribble.index.Index; import org.broad.tribble.index.Index;
import org.broad.tribble.index.IndexFactory; import org.broad.tribble.index.IndexFactory;
import org.broad.tribble.util.LittleEndianOutputStream; import org.broad.tribble.util.LittleEndianOutputStream;
import org.broad.tribble.util.TabixUtils;
import org.broadinstitute.sting.commandline.Tags; import org.broadinstitute.sting.commandline.Tags;
import org.broadinstitute.sting.gatk.GenomeAnalysisEngine; import org.broadinstitute.sting.gatk.GenomeAnalysisEngine;
import org.broadinstitute.sting.gatk.arguments.ValidationExclusion; import org.broadinstitute.sting.gatk.arguments.ValidationExclusion;
@ -172,8 +171,8 @@ public class RMDTrackBuilder { // extends PluginManager<FeatureCodec> {
// we might not know the index type, try loading with the default reader constructor // we might not know the index type, try loading with the default reader constructor
logger.debug("Attempting to load " + inputFile + " as a tabix indexed file without validating it"); logger.debug("Attempting to load " + inputFile + " as a tabix indexed file without validating it");
try { try {
final File indexFile = new File(inputFile.getAbsoluteFile() + TabixUtils.STANDARD_INDEX_EXTENSION); final File indexFile = null;//new File(inputFile.getAbsoluteFile() + TabixUtils.STANDARD_INDEX_EXTENSION);
final SAMSequenceDictionary dict = TabixUtils.getSequenceDictionary(indexFile); final SAMSequenceDictionary dict = null; //TabixUtils.getSequenceDictionary(indexFile);
return new Pair<>(AbstractFeatureReader.getFeatureReader(inputFile.getAbsolutePath(), createCodec(descriptor, name)), dict); return new Pair<>(AbstractFeatureReader.getFeatureReader(inputFile.getAbsolutePath(), createCodec(descriptor, name)), dict);
} catch (TribbleException e) { } catch (TribbleException e) {
throw new UserException(e.getMessage(), e); throw new UserException(e.getMessage(), e);