Added a more efficient implementation of the KBest haplotype finder code (CONT.)
Changes: 1. Addressed review comments on new K-best haplotype assembly graph finder. 2. Generalize KBestHaplotypeFinder to deal with multiple source and sink vertices. 3. Updated test to use KBestHaplotypeFinder instead of KBestPaths 4. Retired KBestPaths to the archive. 5. Small improvements to the code and documentation.
This commit is contained in:
parent
7acf2eb0e7
commit
69bf2b3247
|
|
@ -1,185 +1,194 @@
|
|||
/*
|
||||
* By downloading the PROGRAM you agree to the following terms of use:
|
||||
*
|
||||
*
|
||||
* BROAD INSTITUTE - SOFTWARE LICENSE AGREEMENT - FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
|
||||
*
|
||||
*
|
||||
* This Agreement is made between the Broad Institute, Inc. with a principal address at 7 Cambridge Center, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
|
||||
*
|
||||
*
|
||||
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
|
||||
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
|
||||
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
|
||||
*
|
||||
*
|
||||
* 1. DEFINITIONS
|
||||
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK2 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute/GATK on the EFFECTIVE DATE.
|
||||
*
|
||||
*
|
||||
* 2. LICENSE
|
||||
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM.
|
||||
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM.
|
||||
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
|
||||
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
|
||||
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
|
||||
*
|
||||
* 3. OWNERSHIP OF INTELLECTUAL PROPERTY
|
||||
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
|
||||
*
|
||||
* 3. OWNERSHIP OF INTELLECTUAL PROPERTY
|
||||
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
|
||||
* Copyright 2012 Broad Institute, Inc.
|
||||
* Notice of attribution: The GATK2 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
|
||||
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
|
||||
*
|
||||
*
|
||||
* 4. INDEMNIFICATION
|
||||
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
|
||||
*
|
||||
*
|
||||
* 5. NO REPRESENTATIONS OR WARRANTIES
|
||||
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
|
||||
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
|
||||
*
|
||||
*
|
||||
* 6. ASSIGNMENT
|
||||
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
|
||||
*
|
||||
*
|
||||
* 7. MISCELLANEOUS
|
||||
* 7.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
|
||||
* 7.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
|
||||
* 7.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
|
||||
* 7.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
|
||||
* 7.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
|
||||
* 7.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
|
||||
* 7.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
|
||||
* 7.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
|
||||
* 7.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
|
||||
*/
|
||||
|
||||
package org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs;
|
||||
|
||||
import com.google.common.collect.MinMaxPriorityQueue;
|
||||
import com.google.java.contract.Ensures;
|
||||
|
||||
import java.io.Serializable;
|
||||
import java.util.*;
|
||||
import java.util.ArrayList;
|
||||
import java.util.Collection;
|
||||
import java.util.PriorityQueue;
|
||||
|
||||
/**
|
||||
* Class for finding the K best paths (as determined by the sum of multiplicities of the edges) in a graph.
|
||||
* This is different from most graph traversals because we want to test paths from any source node to any sink node.
|
||||
* K-best sub-haplotype finder that selects the best solutions out of a collection of sub-haplotype finders.
|
||||
*
|
||||
* User: ebanks, rpoplin, mdepristo
|
||||
* Date: Mar 23, 2011
|
||||
* @author Valentin Ruano-Rubio <valentin@broadinstitute.org>
|
||||
*/
|
||||
public class KBestPaths<T extends BaseVertex, E extends BaseEdge> {
|
||||
private final boolean allowCycles;
|
||||
class AggregatedSubHaplotypeFinder implements KBestSubHaplotypeFinder {
|
||||
|
||||
/**
|
||||
* Create a new KBestPaths finder that follows cycles in the graph
|
||||
* Collection of subFinders that provided the actual solutions.
|
||||
*/
|
||||
public KBestPaths() {
|
||||
this(true);
|
||||
}
|
||||
private final Collection<? extends KBestSubHaplotypeFinder> subFinders;
|
||||
|
||||
/**
|
||||
* Create a new KBestPaths finder
|
||||
* Flag indicating whether the sub-finders have been processed or not.
|
||||
*/
|
||||
private boolean processedSubFinders = false;
|
||||
|
||||
/**
|
||||
* Holds the number of k-best solution that this finder would ever return.
|
||||
*/
|
||||
private int count = 0;
|
||||
|
||||
/**
|
||||
* Holds the best {@code i} paths to the sink so far calculated where {@code i+1} is the length of this list.
|
||||
*
|
||||
* @param allowCycles should we allow paths that follow cycles in the graph?
|
||||
* <p>As more results are requested the array will grow. All positions and solutions are
|
||||
* calculated up to {@code i}</p>.
|
||||
*/
|
||||
public KBestPaths(final boolean allowCycles) {
|
||||
this.allowCycles = allowCycles;
|
||||
}
|
||||
|
||||
protected static class MyInt { public int val = 0; }
|
||||
private ArrayList<KBestHaplotype> rankedSubHaplotype;
|
||||
|
||||
/**
|
||||
* Compare paths such that paths with greater weight are earlier in a list
|
||||
* Priority queue with next best haplotype solution from each sub-finder; previous ones are
|
||||
* already part {@link #rankedSubHaplotype}.
|
||||
*/
|
||||
protected static class PathComparatorTotalScore implements Comparator<Path>, Serializable {
|
||||
private PriorityQueue<MyKBestHaplotypeResult> nextBestSubHaplotypes;
|
||||
|
||||
/**
|
||||
* Creates a new aggregated sub-haplotype finder given its sub-finders.
|
||||
* @param finders set of sub-finders.
|
||||
*/
|
||||
public AggregatedSubHaplotypeFinder(final Collection<? extends KBestSubHaplotypeFinder> finders) {
|
||||
if (finders == null) throw new IllegalArgumentException("finder collection cannot be null");
|
||||
this.subFinders = finders;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int getCount() {
|
||||
processSubFindersIfNeeded();
|
||||
return count;
|
||||
}
|
||||
|
||||
private void processSubFindersIfNeeded() {
|
||||
if (processedSubFinders) return;
|
||||
|
||||
long count = 0;
|
||||
nextBestSubHaplotypes = new PriorityQueue<>(subFinders.size());
|
||||
for (final KBestSubHaplotypeFinder finder : subFinders) {
|
||||
final int finderCount = finder.getCount();
|
||||
if (finderCount == 0) continue;
|
||||
count += finderCount;
|
||||
nextBestSubHaplotypes.add(new MyKBestHaplotypeResult(finder,0));
|
||||
}
|
||||
|
||||
this.count = (int) Math.min(Integer.MAX_VALUE,count);
|
||||
|
||||
rankedSubHaplotype = new ArrayList<>(10);
|
||||
processedSubFinders = true;
|
||||
}
|
||||
|
||||
@Override
|
||||
public KBestHaplotype getKBest(int k) {
|
||||
if (k < 0) throw new IllegalArgumentException("k cannot be negative");
|
||||
processSubFindersIfNeeded();
|
||||
if (k >= count) throw new IllegalArgumentException("k cannot be equal or greater than the count");
|
||||
if (k < rankedSubHaplotype.size())
|
||||
return rankedSubHaplotype.get(k);
|
||||
|
||||
rankedSubHaplotype.ensureCapacity(k+1);
|
||||
for (int i = rankedSubHaplotype.size(); i <= k; i++) {
|
||||
// since k < possibleHaplotypeCount is guarantee no to be empty.
|
||||
if (nextBestSubHaplotypes.isEmpty())
|
||||
throw new IllegalStateException("what the heck " + k + " " + count);
|
||||
final MyKBestHaplotypeResult nextResult = nextBestSubHaplotypes.remove();
|
||||
nextResult.rank = i;
|
||||
rankedSubHaplotype.add(nextResult);
|
||||
final int subRank = nextResult.result.rank();
|
||||
|
||||
// if there is no further solution from the same child we cannot add another solution from that child.
|
||||
if (subRank + 1 >= nextResult.subFinder.getCount())
|
||||
continue;
|
||||
nextBestSubHaplotypes.add(new MyKBestHaplotypeResult(nextResult.subFinder, subRank + 1));
|
||||
}
|
||||
return rankedSubHaplotype.get(k);
|
||||
}
|
||||
|
||||
/**
|
||||
* Custom implementation of {@link KBestHaplotype} to encapsulate sub-finder results.
|
||||
*/
|
||||
private class MyKBestHaplotypeResult extends KBestHaplotype {
|
||||
|
||||
private KBestSubHaplotypeFinder subFinder;
|
||||
|
||||
private final KBestHaplotype result;
|
||||
|
||||
private int rank;
|
||||
|
||||
private MyKBestHaplotypeResult(final KBestSubHaplotypeFinder finder, final int rank) {
|
||||
this.subFinder = finder;
|
||||
this.result = finder.getKBest(rank);
|
||||
this.rank = -1;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int compare(final Path path1, final Path path2) {
|
||||
return path2.getScore() - path1.getScore();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* @see #getKBestPaths(BaseGraph, int) retriving the best 1000 paths
|
||||
*/
|
||||
public List<Path<T,E>> getKBestPaths( final BaseGraph<T, E> graph ) {
|
||||
return getKBestPaths(graph, 1000);
|
||||
}
|
||||
|
||||
/**
|
||||
* @see #getKBestPaths(BaseGraph, int, java.util.Set, java.util.Set) retriving the first 1000 paths
|
||||
* starting from all source vertices and ending with all sink vertices
|
||||
*/
|
||||
public List<Path<T,E>> getKBestPaths( final BaseGraph<T,E> graph, final int k ) {
|
||||
return getKBestPaths(graph, k, graph.getSources(), graph.getSinks());
|
||||
}
|
||||
|
||||
/**
|
||||
* @see #getKBestPaths(BaseGraph, int, java.util.Set, java.util.Set) with k=1000
|
||||
*/
|
||||
public List<Path<T,E>> getKBestPaths( final BaseGraph<T,E> graph, final Set<T> sources, final Set<T> sinks ) {
|
||||
return getKBestPaths(graph, 1000, sources, sinks);
|
||||
}
|
||||
|
||||
/**
|
||||
* @see #getKBestPaths(BaseGraph, int, java.util.Set, java.util.Set) with k=1000
|
||||
*/
|
||||
public List<Path<T,E>> getKBestPaths( final BaseGraph<T,E> graph, final T source, final T sink ) {
|
||||
return getKBestPaths(graph, 1000, source, sink);
|
||||
}
|
||||
|
||||
/**
|
||||
* @see #getKBestPaths(BaseGraph, int, java.util.Set, java.util.Set) with singleton source and sink sets
|
||||
*/
|
||||
public List<Path<T,E>> getKBestPaths( final BaseGraph<T,E> graph, final int k, final T source, final T sink ) {
|
||||
return getKBestPaths(graph, k, Collections.singleton(source), Collections.singleton(sink));
|
||||
}
|
||||
|
||||
/**
|
||||
* Traverse the graph and pull out the best k paths.
|
||||
* Paths are scored via their comparator function. The default being PathComparatorTotalScore()
|
||||
* @param graph the graph from which to pull paths
|
||||
* @param k the number of paths to find
|
||||
* @param sources a set of vertices we want to start paths with
|
||||
* @param sinks a set of vertices we want to end paths with
|
||||
* @return a list with at most k top-scoring paths from the graph
|
||||
*/
|
||||
@Ensures({"result != null", "result.size() <= k"})
|
||||
public List<Path<T,E>> getKBestPaths( final BaseGraph<T,E> graph, final int k, final Set<T> sources, final Set<T> sinks ) {
|
||||
if( graph == null ) { throw new IllegalArgumentException("Attempting to traverse a null graph."); }
|
||||
|
||||
// a min max queue that will collect the best k paths
|
||||
final MinMaxPriorityQueue<Path<T,E>> bestPaths = MinMaxPriorityQueue.orderedBy(new PathComparatorTotalScore()).maximumSize(k).create();
|
||||
|
||||
// run a DFS for best paths
|
||||
for ( final T source : sources ) {
|
||||
final Path<T,E> startingPath = new Path<T,E>(source, graph);
|
||||
findBestPaths(startingPath, sinks, bestPaths, new MyInt());
|
||||
public SeqGraph graph() {
|
||||
return result.graph();
|
||||
}
|
||||
|
||||
// the MinMaxPriorityQueue iterator returns items in an arbitrary order, so we need to sort the final result
|
||||
final List<Path<T,E>> toReturn = new ArrayList<Path<T,E>>(bestPaths);
|
||||
Collections.sort(toReturn, new PathComparatorTotalScore());
|
||||
return toReturn;
|
||||
}
|
||||
@Override
|
||||
public int score() {
|
||||
return result.score();
|
||||
}
|
||||
|
||||
/**
|
||||
* Recursive algorithm to find the K best paths in the graph from the current path to any of the sinks
|
||||
* @param path the current path progress
|
||||
* @param sinks a set of nodes that are sinks. Will terminate and add a path if the last vertex of path is in this set
|
||||
* @param bestPaths a path to collect completed paths.
|
||||
* @param n used to limit the search by tracking the number of vertices visited across all paths
|
||||
*/
|
||||
private void findBestPaths( final Path<T,E> path, final Set<T> sinks, final Collection<Path<T,E>> bestPaths, final MyInt n ) {
|
||||
if ( sinks.contains(path.getLastVertex())) {
|
||||
bestPaths.add(path);
|
||||
} else if( n.val > 10000 ) {
|
||||
// do nothing, just return, as we've done too much work already
|
||||
} else {
|
||||
// recursively run DFS
|
||||
final ArrayList<E> edgeArrayList = new ArrayList<E>(path.getOutgoingEdgesOfLastVertex());
|
||||
Collections.sort(edgeArrayList, new BaseEdge.EdgeWeightComparator());
|
||||
for ( final E edge : edgeArrayList ) {
|
||||
final T target = path.getGraph().getEdgeTarget(edge);
|
||||
// make sure the edge is not already in the path
|
||||
final boolean alreadyVisited = allowCycles ? path.containsEdge(edge) : path.containsVertex(target);
|
||||
if ( ! alreadyVisited ) {
|
||||
final Path<T,E> newPath = new Path<T,E>(path, edge);
|
||||
n.val++;
|
||||
findBestPaths(newPath, sinks, bestPaths, n);
|
||||
}
|
||||
}
|
||||
@Override
|
||||
public boolean isReference() {
|
||||
return result.isReference();
|
||||
}
|
||||
|
||||
@Override
|
||||
public int rank() {
|
||||
return rank;
|
||||
}
|
||||
|
||||
@Override
|
||||
protected SeqVertex head() {
|
||||
return result.head();
|
||||
}
|
||||
|
||||
@Override
|
||||
protected KBestHaplotype tail() {
|
||||
return result.tail();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -695,4 +695,21 @@ public class BaseGraph<V extends BaseVertex, E extends BaseEdge> extends Default
|
|||
public BaseGraph<V,E> subsetToRefSource() {
|
||||
return subsetToNeighbors(getReferenceSourceVertex(), 10);
|
||||
}
|
||||
|
||||
/**
|
||||
* Checks whether the graph contains all the vertices in a collection.
|
||||
*
|
||||
* @param vertices the vertices to check.
|
||||
*
|
||||
* @throws IllegalArgumentException if {@code vertices} is {@code null}.
|
||||
*
|
||||
* @return {@code true} if all the vertices in the input collection are present in this graph.
|
||||
* Also if the input collection is empty. Otherwise it returns {@code false}.
|
||||
*/
|
||||
public boolean containsAllVertices(final Collection<? extends V> vertices) {
|
||||
if (vertices == null) throw new IllegalArgumentException("the input vertices collection cannot be null");
|
||||
for (final V vertex : vertices)
|
||||
if (!containsVertex(vertex)) return false;
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -46,9 +46,9 @@
|
|||
package org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs;
|
||||
|
||||
/**
|
||||
* Represent a trivial k-best sub haplotype finder with no solutions.
|
||||
* Represents a trivial k-best sub haplotype finder with no solutions.
|
||||
*
|
||||
* <p>To be used at vertices that do not have any valid path to the requested sink node</p>
|
||||
* <p>To be used at vertices that do not have any valid path to the requested sink vertices</p>
|
||||
*
|
||||
* @author Valentin Ruano-Rubio <valentin@broadinstitute.org>
|
||||
*/
|
||||
|
|
|
|||
|
|
@ -52,14 +52,19 @@ package org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs;
|
|||
*/
|
||||
class EmptyPathHaplotypeFinderNode implements KBestSubHaplotypeFinder {
|
||||
|
||||
|
||||
/**
|
||||
* Caches the only solution returned by this finder.
|
||||
*/
|
||||
private final KBestHaplotype singleHaplotypePath;
|
||||
|
||||
public EmptyPathHaplotypeFinderNode(final SeqGraph graph, final SeqVertex sink) {
|
||||
singleHaplotypePath = new MyBestHaplotypePath(graph,sink);
|
||||
/**
|
||||
* Constructs a new empty k-best haplotype finder.
|
||||
*
|
||||
* @param graph the search graph.
|
||||
* @param vertex the source and sink vertex of the only solution returned by this finder.
|
||||
*/
|
||||
public EmptyPathHaplotypeFinderNode(final SeqGraph graph, final SeqVertex vertex) {
|
||||
singleHaplotypePath = new MyBestHaplotypePath(graph,vertex);
|
||||
}
|
||||
|
||||
@Override
|
||||
|
|
@ -81,12 +86,29 @@ class EmptyPathHaplotypeFinderNode implements KBestSubHaplotypeFinder {
|
|||
*/
|
||||
private class MyBestHaplotypePath extends KBestHaplotype {
|
||||
|
||||
/**
|
||||
* The solution's only vertex.
|
||||
*/
|
||||
private final SeqVertex vertex;
|
||||
|
||||
/**
|
||||
* The search graph.
|
||||
*/
|
||||
private final SeqGraph graph;
|
||||
|
||||
/**
|
||||
* Whether the vertex is a reference vertex.
|
||||
*
|
||||
* <p>Initialize lazily.</p>
|
||||
*/
|
||||
private Boolean isReference;
|
||||
|
||||
/**
|
||||
* Constructs a new empty k-best haplotype solution.
|
||||
*
|
||||
* @param graph the search graph.
|
||||
* @param vertex the source and sink vertex of the only solution returned by the outer finder.
|
||||
*/
|
||||
public MyBestHaplotypePath(final SeqGraph graph, final SeqVertex vertex) {
|
||||
this.vertex = vertex;
|
||||
this.graph = graph;
|
||||
|
|
|
|||
|
|
@ -52,7 +52,7 @@ import org.broadinstitute.sting.utils.haplotype.Haplotype;
|
|||
*
|
||||
* @author Valentin Ruano-Rubio <valentin@broadinstitute.org>
|
||||
*/
|
||||
public abstract class KBestHaplotype {
|
||||
public abstract class KBestHaplotype implements Comparable<KBestHaplotype> {
|
||||
|
||||
/**
|
||||
* Returns the original graph searched.
|
||||
|
|
@ -143,6 +143,18 @@ public abstract class KBestHaplotype {
|
|||
return path;
|
||||
}
|
||||
|
||||
/**
|
||||
* Compares k-best haplotypes based on the score where the one with larger score comes first (descending orther).
|
||||
*
|
||||
* @param other the other haplotype to compare to.
|
||||
* @return {@code -1} if the current score is larger than {@code other}'s, {@code 0} if they are the same, {@code 1}
|
||||
* if {@code other}'s score is larger.
|
||||
*/
|
||||
public int compareTo(final KBestHaplotype other) {
|
||||
if (other == null) throw new IllegalArgumentException("the other object cannot be null");
|
||||
return - 1 * (score() - other.score());
|
||||
}
|
||||
|
||||
/**
|
||||
* The first vertex on the haplotype path.
|
||||
*
|
||||
|
|
@ -156,6 +168,4 @@ public abstract class KBestHaplotype {
|
|||
* @return {@code null} if there are no more vertices in the solution path a part from the one returned by {@link #head}.
|
||||
*/
|
||||
protected abstract KBestHaplotype tail();
|
||||
|
||||
|
||||
}
|
||||
|
|
|
|||
|
|
@ -56,49 +56,131 @@ import java.util.*;
|
|||
*/
|
||||
public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implements Iterable<KBestHaplotype> {
|
||||
|
||||
|
||||
/**
|
||||
* The search graph.
|
||||
*/
|
||||
private final SeqGraph graph;
|
||||
protected Map<SeqVertex,KBestSubHaplotypeFinder> nodeBySource;
|
||||
|
||||
protected SeqVertex sink;
|
||||
protected SeqVertex source;
|
||||
/**
|
||||
* Map of sub-haplotype finder by their source vertex.
|
||||
*/
|
||||
protected Map<SeqVertex,KBestSubHaplotypeFinder> finderByVertex;
|
||||
|
||||
/**
|
||||
* Possible haplotype sink vertices.
|
||||
*/
|
||||
protected Set<SeqVertex> sinks;
|
||||
|
||||
/**
|
||||
* Possible haplotype source vertices.
|
||||
*/
|
||||
protected Set<SeqVertex> sources;
|
||||
|
||||
/**
|
||||
* The top finder.
|
||||
*
|
||||
* <p>If there is only a single source vertex, its finder is the top finder. However whent there
|
||||
* is more than one possible source, we create a composite finder that alternates between individual source vertices
|
||||
* for their best haplotypes.</p>
|
||||
*/
|
||||
private final KBestSubHaplotypeFinder topFinder;
|
||||
|
||||
/**
|
||||
* Constructs a new best haplotypes finder.
|
||||
*
|
||||
* @param graph the seq-graph to search.
|
||||
* @param source the source vertex for all haplotypes.
|
||||
* @param sink the sink vertex for all haplotypes.
|
||||
* @param sink sink vertices for all haplotypes.
|
||||
*
|
||||
* @throws IllegalArgumentException if <ul>
|
||||
* <li>any of {@code graph}, {@code source} or {@code sink} is {@code null} or</li>
|
||||
* <li>either {@code source} or {@code sink} is not a vertex of {@code graph}'s.</li>
|
||||
* <li>either {@code source} or {@code sink} is not a vertex in {@code graph}.</li>
|
||||
* </ul>
|
||||
*/
|
||||
public KBestHaplotypeFinder(final SeqGraph graph, final SeqVertex source, final SeqVertex sink) {
|
||||
if (graph == null) throw new IllegalArgumentException("graph cannot be null");
|
||||
if (source == null) throw new IllegalArgumentException("source cannot be null");
|
||||
if (sink == null) throw new IllegalArgumentException("sink cannot be null");
|
||||
if (!graph.containsVertex(source)) throw new IllegalArgumentException("source does not belong to the graph");
|
||||
if (!graph.containsVertex(sink)) throw new IllegalArgumentException("sink does not belong to the graph");
|
||||
//TODO dealing with cycles here due to a bug in some of the graph transformations that produces cycles.
|
||||
//TODO Once that is solve, the conditional above should be removed in order the save time:
|
||||
//this.graph = graph;
|
||||
if (new CycleDetector<>(graph).detectCycles())
|
||||
this.graph = removeCycles(graph,source,sink);
|
||||
else
|
||||
this.graph = graph;
|
||||
nodeBySource = new HashMap<>(graph.vertexSet().size());
|
||||
this.sink = sink;
|
||||
this.source = source;
|
||||
this(graph,Collections.singleton(source),Collections.singleton(sink));
|
||||
}
|
||||
|
||||
private static SeqGraph removeCycles(final SeqGraph original, final SeqVertex source, final SeqVertex sink) {
|
||||
/**
|
||||
* Constructs a new best haplotypes finder.
|
||||
*
|
||||
* @param graph the seq-graph to search.
|
||||
* @param sources source vertices for all haplotypes.
|
||||
* @param sinks sink vertices for all haplotypes.
|
||||
*
|
||||
* @throws IllegalArgumentException if <ul>
|
||||
* <li>any of {@code graph}, {@code sources} or {@code sinks} is {@code null} or</li>
|
||||
* <li>any of {@code sources}' or any {@code sinks}' member is not a vertex in {@code graph}.</li>
|
||||
* </ul>
|
||||
*/
|
||||
public KBestHaplotypeFinder(final SeqGraph graph, final Set<SeqVertex> sources, final Set<SeqVertex> sinks) {
|
||||
if (graph == null) throw new IllegalArgumentException("graph cannot be null");
|
||||
if (sources == null) throw new IllegalArgumentException("source cannot be null");
|
||||
if (sinks == null) throw new IllegalArgumentException("sink cannot be null");
|
||||
if (!graph.containsAllVertices(sources)) throw new IllegalArgumentException("source does not belong to the graph");
|
||||
if (!graph.containsAllVertices(sinks)) throw new IllegalArgumentException("sink does not belong to the graph");
|
||||
|
||||
//TODO dealing with cycles here due to a bug in some of the graph transformations that produces cycles.
|
||||
//TODO Once that is solve, the if-else below should be substituted by a throw if there is any cycles,
|
||||
//TODO just the line commented out below if you want to trade early-bug-fail for speed.
|
||||
//this.graph = graph;
|
||||
this.graph = new CycleDetector<>(graph).detectCycles() ? removeCycles(graph,sources,sinks) : graph;
|
||||
|
||||
finderByVertex = new HashMap<>(this.graph.vertexSet().size());
|
||||
this.sinks = sinks;
|
||||
this.sources = sources;
|
||||
if (sinks.size() == 0 || sources.size() == 0)
|
||||
topFinder = DeadEndKBestSubHaplotypeFinder.INSTANCE;
|
||||
else if (sources.size() == 1)
|
||||
topFinder = createVertexFinder(sources.iterator().next());
|
||||
else
|
||||
topFinder = createAggregatedFinder();
|
||||
}
|
||||
|
||||
/**
|
||||
* Constructs a new best haplotype finder.
|
||||
* <p>
|
||||
* It will consider all source and sink vertex when looking for haplotypes.
|
||||
* </p>
|
||||
*
|
||||
* @param graph the seq-graph to search for the best haplotypes.
|
||||
*/
|
||||
public KBestHaplotypeFinder(SeqGraph graph) {
|
||||
this(graph,graph.getSources(),graph.getSinks());
|
||||
}
|
||||
|
||||
/**
|
||||
* Creates an aggregated recursive finder to try all possible source vertices.
|
||||
*
|
||||
* @return never {@code null}.
|
||||
*/
|
||||
private KBestSubHaplotypeFinder createAggregatedFinder() {
|
||||
final List<KBestSubHaplotypeFinder> sourceFinders = new ArrayList<>(sources.size());
|
||||
for (final SeqVertex source : sources)
|
||||
sourceFinders.add(createVertexFinder(source));
|
||||
return new AggregatedSubHaplotypeFinder(sourceFinders);
|
||||
}
|
||||
|
||||
/**
|
||||
* Removes edges that produces cycles and also dead vertices that do not lead to any sink vertex.
|
||||
*
|
||||
* @param original graph to modify.
|
||||
* @param sources considered source vertices.
|
||||
* @param sinks considered sink vertices.
|
||||
* @return never {@code null}.
|
||||
*/
|
||||
private static SeqGraph removeCycles(final SeqGraph original, final Set<SeqVertex> sources, final Set<SeqVertex> sinks) {
|
||||
final Set<BaseEdge> edgesToRemove = new HashSet<>(original.edgeSet().size());
|
||||
final Set<SeqVertex> vertexToRemove = new HashSet<>(original.vertexSet().size());
|
||||
|
||||
if (!findGuiltyVerticesAndEdgesToRemoveCycles(original, source, sink, edgesToRemove, vertexToRemove, new HashSet<SeqVertex>(original.vertexSet().size())))
|
||||
throw new IllegalStateException("could not find any path from the source vertex to the sink vertex: " + source + " -> " + sink);
|
||||
boolean foundSomePath = false;
|
||||
for (final SeqVertex source : sources)
|
||||
foundSomePath = findGuiltyVerticesAndEdgesToRemoveCycles(original, source, sinks, edgesToRemove,
|
||||
vertexToRemove, new HashSet<SeqVertex>(original.vertexSet().size())) | foundSomePath;
|
||||
|
||||
if (!foundSomePath)
|
||||
throw new IllegalStateException("could not find any path from the source vertex to the sink vertex after removing cycles: "
|
||||
+ Arrays.toString(sources.toArray()) + " => " + Arrays.toString(sinks.toArray()));
|
||||
|
||||
if (edgesToRemove.isEmpty() && vertexToRemove.isEmpty())
|
||||
throw new IllegalStateException("cannot find a way to remove the cycles");
|
||||
|
|
@ -109,13 +191,30 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
|
|||
return result;
|
||||
}
|
||||
|
||||
private static boolean findGuiltyVerticesAndEdgesToRemoveCycles(final SeqGraph graph, final SeqVertex currentVertex, final SeqVertex sink,
|
||||
final Set<BaseEdge> edgesToRemove, final Set<SeqVertex> verticesToRemove,
|
||||
/**
|
||||
* Recursive call that looks for edges and vertices that need to be removed to get rid of cycles.
|
||||
*
|
||||
* @param graph the original graph.
|
||||
* @param currentVertex current search vertex.
|
||||
* @param sinks considered sink vertices.
|
||||
* @param edgesToRemove collection of edges that need to be removed in order to get rid of cycles.
|
||||
* @param verticesToRemove collection of vertices that can be removed.
|
||||
* @param parentVertices collection of vertices that preceded the {@code currentVertex}; i.e. the it can be
|
||||
* reached from those vertices using edges existing in {@code graph}.
|
||||
*
|
||||
* @return {@code true} to indicate that the some sink vertex is reachable by {@code currentVertex},
|
||||
* {@code false} otherwise.
|
||||
*/
|
||||
private static boolean findGuiltyVerticesAndEdgesToRemoveCycles(final SeqGraph graph,
|
||||
final SeqVertex currentVertex,
|
||||
final Set<SeqVertex> sinks,
|
||||
final Set<BaseEdge> edgesToRemove,
|
||||
final Set<SeqVertex> verticesToRemove,
|
||||
final Set<SeqVertex> parentVertices) {
|
||||
if (currentVertex.equals(sink)) return true;
|
||||
if (sinks.contains(currentVertex)) return true;
|
||||
|
||||
final Set<BaseEdge> outgoingEdges = graph.outgoingEdgesOf(currentVertex);
|
||||
boolean reachsSink = false;
|
||||
boolean reachesSink = false;
|
||||
parentVertices.add(currentVertex);
|
||||
|
||||
for (final BaseEdge edge : outgoingEdges) {
|
||||
|
|
@ -123,31 +222,28 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
|
|||
if (parentVertices.contains(child))
|
||||
edgesToRemove.add(edge);
|
||||
else {
|
||||
final boolean childReachSink = findGuiltyVerticesAndEdgesToRemoveCycles(graph, child, sink, edgesToRemove, verticesToRemove, parentVertices);
|
||||
reachsSink = reachsSink || childReachSink;
|
||||
final boolean childReachSink = findGuiltyVerticesAndEdgesToRemoveCycles(graph, child, sinks,
|
||||
edgesToRemove, verticesToRemove, parentVertices);
|
||||
reachesSink = reachesSink || childReachSink;
|
||||
}
|
||||
}
|
||||
parentVertices.remove(currentVertex);
|
||||
if (!reachsSink) verticesToRemove.add(currentVertex);
|
||||
return reachsSink;
|
||||
if (!reachesSink) verticesToRemove.add(currentVertex);
|
||||
return reachesSink;
|
||||
}
|
||||
|
||||
|
||||
@Override
|
||||
public KBestHaplotype get(int index) {
|
||||
final KBestSubHaplotypeFinder sourceNode = createNode(source);
|
||||
if (index < 0 || index >= size())
|
||||
throw new IndexOutOfBoundsException();
|
||||
return sourceNode.getKBest(index);
|
||||
return topFinder.getKBest(index);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Iterator<KBestHaplotype> iterator() {
|
||||
final KBestSubHaplotypeFinder sourceFinder = createNode(source);
|
||||
|
||||
return new Iterator<KBestHaplotype>() {
|
||||
private int nextK = 0;
|
||||
private final int maxK = sourceFinder.getCount();
|
||||
private final int maxK = topFinder.getCount();
|
||||
|
||||
|
||||
@Override
|
||||
|
|
@ -158,7 +254,7 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
|
|||
@Override
|
||||
public KBestHaplotype next() {
|
||||
if (nextK >= maxK) throw new NoSuchElementException();
|
||||
return sourceFinder.getKBest(nextK++);
|
||||
return topFinder.getKBest(nextK++);
|
||||
}
|
||||
|
||||
@Override
|
||||
|
|
@ -170,7 +266,7 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
|
|||
|
||||
@Override
|
||||
public int size() {
|
||||
return createNode(source).getCount();
|
||||
return topFinder.getCount();
|
||||
}
|
||||
|
||||
/**
|
||||
|
|
@ -183,12 +279,10 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
|
|||
* @return never {@code null}, but perhaps a iterator that return no haplotype.
|
||||
*/
|
||||
public Iterator<KBestHaplotype> iterator(final int k) {
|
||||
final KBestSubHaplotypeFinder sourceFinder = createNode(source);
|
||||
|
||||
return new Iterator<KBestHaplotype>() {
|
||||
private int nextK = 0;
|
||||
private final int maxK = Math.min(sourceFinder.getCount(), k);
|
||||
|
||||
private final int maxK = Math.min(size(), k);
|
||||
|
||||
@Override
|
||||
public boolean hasNext() {
|
||||
|
|
@ -198,7 +292,7 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
|
|||
@Override
|
||||
public KBestHaplotype next() {
|
||||
if (nextK >= maxK) throw new NoSuchElementException();
|
||||
return sourceFinder.getKBest(nextK++);
|
||||
return topFinder.getKBest(nextK++);
|
||||
}
|
||||
|
||||
@Override
|
||||
|
|
@ -208,38 +302,51 @@ public class KBestHaplotypeFinder extends AbstractList<KBestHaplotype> implement
|
|||
};
|
||||
}
|
||||
|
||||
protected KBestSubHaplotypeFinder createNode(final SeqVertex source) {
|
||||
KBestSubHaplotypeFinder node = nodeBySource.get(source);
|
||||
/**
|
||||
* Creates a finder from a vertex.
|
||||
*
|
||||
* @param source the source vertex for the finder.
|
||||
*
|
||||
* @return never {@code null}, perhaps a finder that returns no haplotypes though.
|
||||
*/
|
||||
protected KBestSubHaplotypeFinder createVertexFinder(final SeqVertex source) {
|
||||
KBestSubHaplotypeFinder node = finderByVertex.get(source);
|
||||
if (node == null) {
|
||||
if (source.equals(sink))
|
||||
node = new EmptyPathHaplotypeFinderNode(graph,sink);
|
||||
if (sinks.contains(source))
|
||||
node = new EmptyPathHaplotypeFinderNode(graph,source);
|
||||
else {
|
||||
final Set<BaseEdge> outgoingEdges = graph.outgoingEdgesOf(source);
|
||||
if (outgoingEdges.isEmpty())
|
||||
node = DeadEndKBestSubHaplotypeFinder.INSTANCE;
|
||||
else {
|
||||
final Map<BaseEdge,KBestSubHaplotypeFinder> undeadChildren = createChildrenNodes(outgoingEdges);
|
||||
if (undeadChildren.isEmpty())
|
||||
node = DeadEndKBestSubHaplotypeFinder.INSTANCE;
|
||||
else
|
||||
node = new RecursiveSubHaplotypeFinder(graph,source,undeadChildren);
|
||||
|
||||
final Map<BaseEdge,KBestSubHaplotypeFinder> undeadChildren = createChildrenFinders(outgoingEdges);
|
||||
node = undeadChildren.isEmpty() ? DeadEndKBestSubHaplotypeFinder.INSTANCE :
|
||||
new RecursiveSubHaplotypeFinder(source,undeadChildren);
|
||||
}
|
||||
}
|
||||
nodeBySource.put(source,node);
|
||||
finderByVertex.put(source, node);
|
||||
}
|
||||
return node;
|
||||
}
|
||||
|
||||
private Map<BaseEdge, KBestSubHaplotypeFinder> createChildrenNodes(Set<BaseEdge> baseEdges) {
|
||||
/**
|
||||
* Creates finder for target vertices of a collection of edges.
|
||||
* <p>
|
||||
* This peculiar signature is convenient for when we want to create finders for the children of a vertex.
|
||||
* </p>
|
||||
*
|
||||
* @param baseEdges target collection of edges.
|
||||
*
|
||||
* @return never {@code null}, perhaps an empty map if there is no children with valid paths to any sink for this
|
||||
* finder.
|
||||
*/
|
||||
private Map<BaseEdge, KBestSubHaplotypeFinder> createChildrenFinders(Set<BaseEdge> baseEdges) {
|
||||
final Map<BaseEdge,KBestSubHaplotypeFinder> result = new LinkedHashMap<>(baseEdges.size());
|
||||
for (final BaseEdge edge : baseEdges)
|
||||
result.put(edge,createNode(graph.getEdgeTarget(edge)));
|
||||
final Iterator<Map.Entry<BaseEdge,KBestSubHaplotypeFinder>> childrenIterator = result.entrySet().iterator();
|
||||
while (childrenIterator.hasNext())
|
||||
if (childrenIterator.next().getValue().getCount() == 0)
|
||||
childrenIterator.remove();
|
||||
for (final BaseEdge edge : baseEdges) {
|
||||
final KBestSubHaplotypeFinder targetFinder = createVertexFinder(graph.getEdgeTarget(edge));
|
||||
if (targetFinder.getCount() == 0) continue;
|
||||
result.put(edge, targetFinder);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
}
|
||||
|
|
|
|||
|
|
@ -68,5 +68,4 @@ interface KBestSubHaplotypeFinder {
|
|||
* @return never {@code null}.
|
||||
*/
|
||||
public abstract KBestHaplotype getKBest(int k);
|
||||
|
||||
}
|
||||
|
|
|
|||
|
|
@ -46,18 +46,18 @@
|
|||
package org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs;
|
||||
|
||||
import java.util.ArrayList;
|
||||
import java.util.Collection;
|
||||
import java.util.Map;
|
||||
import java.util.PriorityQueue;
|
||||
|
||||
/**
|
||||
* General recursive sub-haplotype finder.
|
||||
* <p>
|
||||
* Provides the k-best sub-haplotypes looking into the outgoing set of vertices (that contain at least one solution).
|
||||
* Provides the k-best sub-haplotypes from a vertex provided map between outgoing edges and its target finders
|
||||
* </p>
|
||||
* <p>
|
||||
* This is done efficiently by keeping an priority-queue on best subhaplotype solutions and pulling them on demand
|
||||
* This is done efficiently by keeping an priority-queue on best sub-haplotype solutions and pulling them on demand
|
||||
* as needed.
|
||||
* </p>
|
||||
* </p>
|
||||
* <p>
|
||||
* Solutions are cached for repeated retrieval so that we save compute at vertices that share sub-haplotypes
|
||||
* (share descendant vertices). This aspect is controlled by {@link KBestSubHaplotypeFinder} that instantiate
|
||||
|
|
@ -67,34 +67,7 @@ import java.util.PriorityQueue;
|
|||
*
|
||||
* @author Valentin Ruano-Rubio <valentin@broadinstitute.org>
|
||||
*/
|
||||
class RecursiveSubHaplotypeFinder implements KBestSubHaplotypeFinder {
|
||||
|
||||
private final SeqGraph graph;
|
||||
|
||||
private final SeqVertex vertex;
|
||||
|
||||
private final Map<BaseEdge,KBestSubHaplotypeFinder> children;
|
||||
|
||||
private boolean childrenWereProcessed = false;
|
||||
|
||||
/**
|
||||
* Holds the number of possible paths from this source node vertex to the sink vertex.
|
||||
*
|
||||
* <p>Updated by {@link #processChildrenIfNeeded()}</p>
|
||||
*/
|
||||
private int possibleHaplotypeCount;
|
||||
|
||||
/**
|
||||
* Holds the best {@code i} paths to the sink so far calculated where {@code i+1} is the length of rankedResults.
|
||||
*
|
||||
* <p>As more results are requested the array will grow. All positions and solutions are calculated up to {@code i}</p>.
|
||||
*/
|
||||
private ArrayList<KBestHaplotype> rankedResults;
|
||||
|
||||
/**
|
||||
* Priority queue with best sub-haplotype solutions that haven't been calculated and cached on {@link #rankedResults} yet.
|
||||
*/
|
||||
private PriorityQueue<ChildKBestSubHaplotype> nextChildrenKBestHaplotypePath;
|
||||
class RecursiveSubHaplotypeFinder extends AggregatedSubHaplotypeFinder {
|
||||
|
||||
/**
|
||||
* Creates a recursive sub-haplotype finder give the target graph, first vertex and all possible outgoing edges
|
||||
|
|
@ -104,113 +77,73 @@ class RecursiveSubHaplotypeFinder implements KBestSubHaplotypeFinder {
|
|||
* are outgoing edges from {@code vertex} on {@code graph} and that the value sub-haplotype resolver have as
|
||||
* the first vertex the adjacent vertex through that key edge.</p>
|
||||
*
|
||||
* @param graph the search graph.
|
||||
* @param vertex first vertex for all sub-haplotype solutions provided by this finder
|
||||
* @param children map from outgoing edge to the corresponding sub-sub-haplotype finder.
|
||||
*/
|
||||
public RecursiveSubHaplotypeFinder(final SeqGraph graph, final SeqVertex vertex,
|
||||
public RecursiveSubHaplotypeFinder(final SeqVertex vertex,
|
||||
final Map<BaseEdge, KBestSubHaplotypeFinder> children) {
|
||||
if (vertex == null) throw new IllegalArgumentException("the vertex provided cannot be null");
|
||||
if (graph == null) throw new IllegalArgumentException("the graph provided cannot be null");
|
||||
this.vertex = vertex;
|
||||
this.children = children;
|
||||
this.graph = graph;
|
||||
super(createChildFinderCollection(vertex, children));
|
||||
}
|
||||
|
||||
@Override
|
||||
public int getCount() {
|
||||
processChildrenIfNeeded();
|
||||
return possibleHaplotypeCount;
|
||||
private static Collection<EdgeSubHaplotypeFinder> createChildFinderCollection(final SeqVertex vertex, final Map<BaseEdge,KBestSubHaplotypeFinder> finders) {
|
||||
if (finders == null) throw new IllegalArgumentException("the edge to child map cannot be null");
|
||||
final Collection<EdgeSubHaplotypeFinder> result = new ArrayList<>(finders.size());
|
||||
for (final Map.Entry<BaseEdge,KBestSubHaplotypeFinder> e : finders.entrySet())
|
||||
result.add(new EdgeSubHaplotypeFinder(vertex,e.getKey(), e.getValue()));
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Process children and initialize structures if not done before.
|
||||
*/
|
||||
private void processChildrenIfNeeded() {
|
||||
if (childrenWereProcessed) return;
|
||||
long possibleHaplotypeCount = 0;
|
||||
private static class EdgeSubHaplotypeFinder implements KBestSubHaplotypeFinder {
|
||||
|
||||
nextChildrenKBestHaplotypePath = new PriorityQueue<>(children.size());
|
||||
private final KBestSubHaplotypeFinder childFinder;
|
||||
|
||||
for (final Map.Entry<BaseEdge,KBestSubHaplotypeFinder> entry : children.entrySet()) {
|
||||
final KBestSubHaplotypeFinder child = entry.getValue();
|
||||
final BaseEdge edge = entry.getKey();
|
||||
final int childPossibleHaplotypePathCount = child.getCount();
|
||||
if (childPossibleHaplotypePathCount != 0) // paranoia check, should not happen at this point.
|
||||
nextChildrenKBestHaplotypePath.add(new ChildKBestSubHaplotype(-1,edge,child,0));
|
||||
possibleHaplotypeCount += childPossibleHaplotypePathCount;
|
||||
private final SeqVertex vertex;
|
||||
|
||||
private final BaseEdge edge;
|
||||
|
||||
private EdgeSubHaplotypeFinder(final SeqVertex vertex, final BaseEdge edge, final KBestSubHaplotypeFinder childFinder) {
|
||||
this.childFinder = childFinder;
|
||||
this.edge = edge;
|
||||
this.vertex = vertex;
|
||||
}
|
||||
|
||||
// Just make sure we won't incur in overflow here for very large graphs; who is ever going to ask for more than 2G paths!!!)
|
||||
this.possibleHaplotypeCount = (int) Math.min(Integer.MAX_VALUE,possibleHaplotypeCount);
|
||||
|
||||
// 10 is a bit arbitrary as it is difficult to anticipate what would be the number of requested
|
||||
// best sub-haplotypes for any node. It shouldn't be too large so that it does not waste space
|
||||
// but not too small so that there is no need to resize when just a few best solutions are requested.
|
||||
rankedResults = new ArrayList<>(Math.min(this.possibleHaplotypeCount,10));
|
||||
|
||||
childrenWereProcessed = true;
|
||||
}
|
||||
|
||||
@Override
|
||||
public KBestHaplotype getKBest(int k) {
|
||||
if (k < 0)
|
||||
throw new IllegalArgumentException("the rank requested cannot be negative");
|
||||
processChildrenIfNeeded();
|
||||
if (k >= possibleHaplotypeCount)
|
||||
throw new IllegalArgumentException("the rank requested cannot be equal or greater to the number of possible haplotypes");
|
||||
if (rankedResults.size() > k)
|
||||
return rankedResults.get(k);
|
||||
|
||||
rankedResults.ensureCapacity(k+1);
|
||||
for (int i = rankedResults.size(); i <= k; i++) {
|
||||
// since k < possibleHaplotypeCount is guarantee no to be empty.
|
||||
if (nextChildrenKBestHaplotypePath.isEmpty())
|
||||
throw new IllegalStateException("what the heck " + k + " " + possibleHaplotypeCount);
|
||||
final ChildKBestSubHaplotype nextResult = nextChildrenKBestHaplotypePath.remove();
|
||||
nextResult.rank = i;
|
||||
rankedResults.add(nextResult);
|
||||
final int childRank = nextResult.subpath.rank();
|
||||
final KBestSubHaplotypeFinder child = nextResult.child;
|
||||
|
||||
// if there is no further solution from the same child we cannot add another solution from that child.
|
||||
if (childRank + 1 >= nextResult.child.getCount())
|
||||
continue;
|
||||
nextChildrenKBestHaplotypePath.add(new ChildKBestSubHaplotype(-1,nextResult.edge, child, childRank + 1));
|
||||
@Override
|
||||
public int getCount() {
|
||||
return childFinder.getCount();
|
||||
}
|
||||
|
||||
@Override
|
||||
public KBestHaplotype getKBest(int k) {
|
||||
return new ChildKBestSubHaplotype(vertex,edge,childFinder.getKBest(k));
|
||||
}
|
||||
return rankedResults.get(k);
|
||||
}
|
||||
|
||||
/**
|
||||
* Custom extension of the {@link KBestHaplotype} used for solutions generated by this class.
|
||||
*
|
||||
* <p>
|
||||
* These by delegating on the encapsulated solution from outgoing edge's finder by adding
|
||||
* the edge score and prefixing this outer finder
|
||||
* source vertex.
|
||||
* </p>
|
||||
*/
|
||||
private class ChildKBestSubHaplotype extends KBestHaplotype implements Comparable<ChildKBestSubHaplotype>{
|
||||
private static class ChildKBestSubHaplotype extends KBestHaplotype {
|
||||
|
||||
private final int score;
|
||||
private int rank;
|
||||
private final KBestSubHaplotypeFinder child;
|
||||
private final BaseEdge edge;
|
||||
private final KBestHaplotype subpath;
|
||||
private final KBestHaplotype child;
|
||||
private final SeqVertex vertex;
|
||||
private final boolean isReference;
|
||||
|
||||
public ChildKBestSubHaplotype(final int rank, final BaseEdge edge,
|
||||
final KBestSubHaplotypeFinder child, final int childRank) {
|
||||
public ChildKBestSubHaplotype(final SeqVertex vertex, final BaseEdge edge, final KBestHaplotype child) {
|
||||
this.score = edge.getMultiplicity() + child.score();
|
||||
this.vertex = vertex;
|
||||
this.child = child;
|
||||
this.edge = edge;
|
||||
this.rank = rank;
|
||||
this.subpath = child.getKBest(childRank);
|
||||
this.score = edge.getMultiplicity() + subpath.score();
|
||||
this.isReference = edge.isRef() && subpath.isReference();
|
||||
this.isReference = edge.isRef() && child.isReference();
|
||||
}
|
||||
|
||||
@Override
|
||||
public SeqGraph graph() {
|
||||
return graph;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int compareTo(final ChildKBestSubHaplotype other) {
|
||||
if (other == null) throw new IllegalArgumentException("the other object cannot be null");
|
||||
return - Integer.compare(this.score,other.score);
|
||||
return child.graph();
|
||||
}
|
||||
|
||||
@Override
|
||||
|
|
@ -220,10 +153,9 @@ class RecursiveSubHaplotypeFinder implements KBestSubHaplotypeFinder {
|
|||
|
||||
@Override
|
||||
public int rank() {
|
||||
return rank;
|
||||
return child.rank();
|
||||
}
|
||||
|
||||
|
||||
@Override
|
||||
protected SeqVertex head() {
|
||||
return vertex;
|
||||
|
|
@ -231,7 +163,7 @@ class RecursiveSubHaplotypeFinder implements KBestSubHaplotypeFinder {
|
|||
|
||||
@Override
|
||||
protected KBestHaplotype tail() {
|
||||
return subpath;
|
||||
return child;
|
||||
}
|
||||
|
||||
@Override
|
||||
|
|
|
|||
|
|
@ -73,7 +73,6 @@ public class ReadThreadingAssembler extends LocalAssemblyEngine {
|
|||
|
||||
private final boolean dontIncreaseKmerSizesForCycles;
|
||||
private final int numPruningSamples;
|
||||
private boolean requireReasonableNumberOfPaths = false;
|
||||
protected boolean removePathsNotConnectedToRef = true;
|
||||
private boolean justReturnRawGraph = false;
|
||||
|
||||
|
|
@ -207,24 +206,12 @@ public class ReadThreadingAssembler extends LocalAssemblyEngine {
|
|||
initialSeqGraph.cleanNonRefPaths(); // TODO -- I don't this is possible by construction
|
||||
|
||||
final AssemblyResult cleaned = cleanupSeqGraph(initialSeqGraph);
|
||||
final AssemblyResult.Status status = cleaned.getStatus() == AssemblyResult.Status.ASSEMBLED_SOME_VARIATION && requireReasonableNumberOfPaths && !reasonableNumberOfPaths(cleaned.getGraph()) ? AssemblyResult.Status.FAILED : cleaned.getStatus();
|
||||
final AssemblyResult.Status status = cleaned.getStatus();
|
||||
final AssemblyResult result = new AssemblyResult(status, cleaned.getGraph());
|
||||
result.setThreadingGraph(rtgraph);
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Did we find a reasonable number of paths in this graph?
|
||||
* @param graph
|
||||
* @return
|
||||
*/
|
||||
private boolean reasonableNumberOfPaths(final SeqGraph graph) {
|
||||
final KBestPaths<SeqVertex,BaseEdge> pathFinder = new KBestPaths<>(false);
|
||||
final List<Path<SeqVertex,BaseEdge>> allPaths = pathFinder.getKBestPaths(graph, 100000);
|
||||
logger.info("Found " + allPaths.size() + " paths through " + graph + " with maximum " + maxAllowedPathsForReadThreadingAssembler);
|
||||
return allPaths.size() <= maxAllowedPathsForReadThreadingAssembler;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String toString() {
|
||||
return "ReadThreadingAssembler{" +
|
||||
|
|
|
|||
|
|
@ -137,19 +137,20 @@ public class CommonSuffixMergerUnitTest extends BaseTest {
|
|||
public static void assertSameHaplotypes(final String name, final SeqGraph actual, final SeqGraph original) {
|
||||
try {
|
||||
final Set<String> haplotypes = new HashSet<String>();
|
||||
final List<Path<SeqVertex,BaseEdge>> originalPaths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(original);
|
||||
for ( final Path<SeqVertex,BaseEdge> path : originalPaths )
|
||||
haplotypes.add(new String(path.getBases()));
|
||||
final List<KBestHaplotype> originalKBestHaplotypes = new KBestHaplotypeFinder(original,original.getSources(),original.getSinks());
|
||||
final List<KBestHaplotype> actualKBestHaplotypes = new KBestHaplotypeFinder(actual,actual.getSources(),actual.getSinks());
|
||||
|
||||
final List<Path<SeqVertex,BaseEdge>> splitPaths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(actual);
|
||||
for ( final Path<SeqVertex,BaseEdge> path : splitPaths ) {
|
||||
final String h = new String(path.getBases());
|
||||
for (final KBestHaplotype kbh : originalKBestHaplotypes)
|
||||
haplotypes.add(new String(kbh.bases()));
|
||||
|
||||
for ( final KBestHaplotype kbh : actualKBestHaplotypes ) {
|
||||
final String h = new String(kbh.bases());
|
||||
Assert.assertTrue(haplotypes.contains(h), "Failed to find haplotype " + h);
|
||||
}
|
||||
|
||||
if ( splitPaths.size() == originalPaths.size() ) {
|
||||
for ( int i = 0; i < originalPaths.size(); i++ ) {
|
||||
Assert.assertTrue(splitPaths.get(i).equalSequence(originalPaths.get(i)), "Paths not equal " + splitPaths.get(i) + " vs. original " + originalPaths.get(i));
|
||||
if ( actualKBestHaplotypes.size() == originalKBestHaplotypes.size() ) {
|
||||
for ( int i = 0; i < originalKBestHaplotypes.size(); i++ ) {
|
||||
Assert.assertTrue(actualKBestHaplotypes.get(i).haplotype().getBaseString().equals(originalKBestHaplotypes.get(i).haplotype().getBaseString()), "Paths not equal " + actualKBestHaplotypes.get(i).haplotype() + " vs. original " + originalKBestHaplotypes.get(i).haplotype());
|
||||
}
|
||||
}
|
||||
} catch ( AssertionError e ) {
|
||||
|
|
|
|||
|
|
@ -66,31 +66,25 @@ import java.util.*;
|
|||
* Date: 1/31/13
|
||||
*/
|
||||
|
||||
public class KBestPathsUnitTest extends BaseTest {
|
||||
private final static boolean DEBUG = false;
|
||||
public class KBestHaplotypeFinderUnitTest extends BaseTest {
|
||||
|
||||
@DataProvider(name = "BasicPathFindingData")
|
||||
public Object[][] makeBasicPathFindingData() {
|
||||
List<Object[]> tests = new ArrayList<Object[]>();
|
||||
for ( final boolean allowCycles : Arrays.asList(false, true)) {
|
||||
for ( final int nStartNodes : Arrays.asList(1, 2, 3) ) {
|
||||
for ( final int nBranchesPerBubble : Arrays.asList(2, 3) ) {
|
||||
for ( final int nEndNodes : Arrays.asList(1, 2, 3) ) {
|
||||
for ( final boolean addCycle : Arrays.asList(true, false) ) {
|
||||
tests.add(new Object[]{nStartNodes, nBranchesPerBubble, nEndNodes, addCycle, allowCycles});
|
||||
}
|
||||
}
|
||||
final List<Object[]> tests = new ArrayList<>();
|
||||
for ( final int nStartNodes : Arrays.asList(1, 2, 3) ) {
|
||||
for ( final int nBranchesPerBubble : Arrays.asList(2, 3) ) {
|
||||
for ( final int nEndNodes : Arrays.asList(1, 2, 3) ) {
|
||||
tests.add(new Object[]{nStartNodes, nBranchesPerBubble, nEndNodes});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return tests.toArray(new Object[][]{});
|
||||
}
|
||||
|
||||
private static int weight = 1;
|
||||
final Set<SeqVertex> createVertices(final SeqGraph graph, final int n, final SeqVertex source, final SeqVertex target) {
|
||||
final List<String> seqs = Arrays.asList("A", "C", "G", "T");
|
||||
final Set<SeqVertex> vertices = new LinkedHashSet<SeqVertex>();
|
||||
final Set<SeqVertex> vertices = new LinkedHashSet<>();
|
||||
for ( int i = 0; i < n; i++ ) {
|
||||
final SeqVertex v = new SeqVertex(seqs.get(i));
|
||||
graph.addVertex(v);
|
||||
|
|
@ -101,77 +95,42 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
return vertices;
|
||||
}
|
||||
|
||||
@Test(dataProvider = "BasicPathFindingData", enabled = !DEBUG)
|
||||
public void testBasicPathFinding(final int nStartNodes, final int nBranchesPerBubble, final int nEndNodes, final boolean addCycle, final boolean allowCycles) {
|
||||
SeqGraph graph = new SeqGraph(11);
|
||||
@Test(dataProvider = "BasicPathFindingData")
|
||||
public void testBasicPathFinding(final int nStartNodes, final int nBranchesPerBubble, final int nEndNodes) {
|
||||
final SeqGraph graph = new SeqGraph(11);
|
||||
|
||||
final SeqVertex middleTop = new SeqVertex("GTAC");
|
||||
final SeqVertex middleBottom = new SeqVertex("ACTG");
|
||||
graph.addVertices(middleTop, middleBottom);
|
||||
final Set<SeqVertex> starts = createVertices(graph, nStartNodes, null, middleTop);
|
||||
@SuppressWarnings("unused")
|
||||
final Set<SeqVertex> bubbles = createVertices(graph, nBranchesPerBubble, middleTop, middleBottom);
|
||||
final Set<SeqVertex> ends = createVertices(graph, nEndNodes, middleBottom, null);
|
||||
|
||||
if ( addCycle ) graph.addEdge(middleBottom, middleBottom);
|
||||
|
||||
// enumerate all possible paths
|
||||
final List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>(allowCycles).getKBestPaths(graph, starts, ends);
|
||||
final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph, starts, ends);
|
||||
|
||||
final int expectedNumOfPaths = nStartNodes * nBranchesPerBubble * (addCycle && allowCycles ? 2 : 1) * nEndNodes;
|
||||
final int expectedNumOfPaths = nStartNodes * nBranchesPerBubble * nEndNodes;
|
||||
Assert.assertEquals(paths.size(), expectedNumOfPaths, "Didn't find the expected number of paths");
|
||||
|
||||
int lastScore = Integer.MAX_VALUE;
|
||||
for ( final Path path : paths ) {
|
||||
for ( final KBestHaplotype kbh : paths ) {
|
||||
final Path<SeqVertex,BaseEdge> path = kbh.path();
|
||||
Assert.assertTrue(path.getScore() <= lastScore, "Paths out of order. Path " + path + " has score above previous " + lastScore);
|
||||
lastScore = path.getScore();
|
||||
}
|
||||
|
||||
// get the best path, and make sure it's the same as our optimal path overall
|
||||
final Path best = paths.get(0);
|
||||
final List<Path<SeqVertex,BaseEdge>> justOne = new KBestPaths<SeqVertex,BaseEdge>(allowCycles).getKBestPaths(graph, 1, starts, ends);
|
||||
final Path<SeqVertex,BaseEdge> best = paths.get(0).path();
|
||||
final List<KBestHaplotype> justOne = new KBestHaplotypeFinder(graph,starts, ends).subList(0,1);
|
||||
Assert.assertEquals(justOne.size(), 1);
|
||||
Assert.assertTrue(justOne.get(0).pathsAreTheSame(best), "Best path from complete enumerate " + best + " not the same as from k = 1 search " + justOne.get(0));
|
||||
}
|
||||
|
||||
@Test(enabled = false) // No longer supported, but no longer needed.
|
||||
public void testPathFindingComplexCycle() {
|
||||
SeqGraph graph = new SeqGraph(11);
|
||||
|
||||
final SeqVertex v1 = new SeqVertex("A");
|
||||
final SeqVertex v2 = new SeqVertex("C");
|
||||
final SeqVertex v3 = new SeqVertex("G");
|
||||
final SeqVertex v4 = new SeqVertex("T");
|
||||
final SeqVertex v5 = new SeqVertex("AA");
|
||||
graph.addVertices(v1, v2, v3, v4, v5);
|
||||
graph.addEdges(v1, v2, v3, v4, v5);
|
||||
graph.addEdges(v3, v3);
|
||||
graph.addEdges(v4, v2);
|
||||
|
||||
// enumerate all possible paths
|
||||
final List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>(false).getKBestPaths(graph, v1, v5);
|
||||
|
||||
Assert.assertEquals(paths.size(), 1, "Didn't find the expected number of paths");
|
||||
}
|
||||
|
||||
@Test(enabled = false) // No longer supported, but no longer needed.
|
||||
public void testPathFindingCycleLastNode() {
|
||||
SeqGraph graph = new SeqGraph(11);
|
||||
|
||||
final SeqVertex v1 = new SeqVertex("A");
|
||||
final SeqVertex v2 = new SeqVertex("C");
|
||||
final SeqVertex v3 = new SeqVertex("G");
|
||||
graph.addVertices(v1, v2, v3);
|
||||
graph.addEdges(v1, v2, v3, v3);
|
||||
|
||||
// enumerate all possible paths
|
||||
final List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>(false).getKBestPaths(graph, v1, v3);
|
||||
|
||||
Assert.assertEquals(paths.size(), 1, "Didn't find the expected number of paths");
|
||||
Assert.assertTrue(justOne.get(0).path().pathsAreTheSame(best), "Best path from complete enumerate " + best + " not the same as from k = 1 search " + justOne.get(0));
|
||||
}
|
||||
|
||||
@DataProvider(name = "BasicBubbleDataProvider")
|
||||
public Object[][] makeBasicBubbleDataProvider() {
|
||||
List<Object[]> tests = new ArrayList<Object[]>();
|
||||
final List<Object[]> tests = new ArrayList<>();
|
||||
for ( final int refBubbleLength : Arrays.asList(1, 5, 10) ) {
|
||||
for ( final int altBubbleLength : Arrays.asList(1, 5, 10) ) {
|
||||
tests.add(new Object[]{refBubbleLength, altBubbleLength});
|
||||
|
|
@ -180,7 +139,7 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
return tests.toArray(new Object[][]{});
|
||||
}
|
||||
|
||||
@Test(dataProvider = "BasicBubbleDataProvider", enabled = !DEBUG)
|
||||
@Test(dataProvider = "BasicBubbleDataProvider")
|
||||
public void testBasicBubbleData(final int refBubbleLength, final int altBubbleLength) {
|
||||
// Construct the assembly graph
|
||||
SeqGraph graph = new SeqGraph(3);
|
||||
|
|
@ -202,9 +161,9 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
graph.addEdge(v2Alt, v3, new BaseEdge(false, 5));
|
||||
|
||||
// Construct the test path
|
||||
Path<SeqVertex,BaseEdge> path = new Path<SeqVertex,BaseEdge>(v, graph);
|
||||
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v, v2Alt));
|
||||
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v2Alt, v3));
|
||||
Path<SeqVertex,BaseEdge> path = new Path<>(v, graph);
|
||||
path = new Path<>(path, graph.getEdge(v, v2Alt));
|
||||
path = new Path<>(path, graph.getEdge(v2Alt, v3));
|
||||
|
||||
// Construct the actual cigar string implied by the test path
|
||||
Cigar expectedCigar = new Cigar();
|
||||
|
|
@ -226,7 +185,7 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
|
||||
@DataProvider(name = "GetBasesData")
|
||||
public Object[][] makeGetBasesData() {
|
||||
List<Object[]> tests = new ArrayList<Object[]>();
|
||||
List<Object[]> tests = new ArrayList<>();
|
||||
|
||||
final List<String> frags = Arrays.asList("ACT", "GAC", "CAT");
|
||||
|
||||
|
|
@ -238,14 +197,14 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
return tests.toArray(new Object[][]{});
|
||||
}
|
||||
|
||||
@Test(dataProvider = "GetBasesData", enabled = !DEBUG)
|
||||
@Test(dataProvider = "GetBasesData")
|
||||
public void testGetBases(final List<String> frags) {
|
||||
// Construct the assembly graph
|
||||
SeqGraph graph = new SeqGraph(3);
|
||||
|
||||
SeqVertex prev = null;
|
||||
for ( int i = 0; i < frags.size(); i++ ) {
|
||||
SeqVertex v = new SeqVertex(frags.get(i));
|
||||
for (final String s : frags) {
|
||||
SeqVertex v = new SeqVertex(s);
|
||||
graph.addVertex(v);
|
||||
if ( prev != null )
|
||||
graph.addEdge(prev, v);
|
||||
|
|
@ -253,15 +212,15 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
}
|
||||
|
||||
// enumerate all possible paths
|
||||
final List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(graph);
|
||||
final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph,graph.getSources(),graph.getSinks());
|
||||
Assert.assertEquals(paths.size(), 1);
|
||||
final Path<SeqVertex,BaseEdge> path = paths.get(0);
|
||||
final Path<SeqVertex,BaseEdge> path = paths.get(0).path();
|
||||
Assert.assertEquals(new String(path.getBases()), Utils.join("", frags), "Path doesn't have the expected sequence");
|
||||
}
|
||||
|
||||
@DataProvider(name = "TripleBubbleDataProvider")
|
||||
public Object[][] makeTripleBubbleDataProvider() {
|
||||
List<Object[]> tests = new ArrayList<Object[]>();
|
||||
final List<Object[]> tests = new ArrayList<>();
|
||||
for ( final int refBubbleLength : Arrays.asList(1, 5, 10) ) {
|
||||
for ( final int altBubbleLength : Arrays.asList(1, 5, 10) ) {
|
||||
for ( final boolean offRefEnding : Arrays.asList(true, false) ) {
|
||||
|
|
@ -274,7 +233,7 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
return tests.toArray(new Object[][]{});
|
||||
}
|
||||
|
||||
@Test(dataProvider = "TripleBubbleDataProvider", enabled = !DEBUG)
|
||||
@Test(dataProvider = "TripleBubbleDataProvider")
|
||||
public void testTripleBubbleData(final int refBubbleLength, final int altBubbleLength, final boolean offRefBeginning, final boolean offRefEnding) {
|
||||
// Construct the assembly graph
|
||||
SeqGraph graph = new SeqGraph(11);
|
||||
|
|
@ -328,19 +287,17 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
graph.addEdge(v7, postV, new BaseEdge(false, 1));
|
||||
|
||||
// Construct the test path
|
||||
Path<SeqVertex,BaseEdge> path = new Path<SeqVertex,BaseEdge>( (offRefBeginning ? preV : v), graph);
|
||||
if( offRefBeginning ) {
|
||||
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(preV, v));
|
||||
}
|
||||
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v, v2Alt));
|
||||
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v2Alt, v3));
|
||||
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v3, v4Ref));
|
||||
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v4Ref, v5));
|
||||
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v5, v6Alt));
|
||||
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v6Alt, v7));
|
||||
if( offRefEnding ) {
|
||||
path = new Path<SeqVertex,BaseEdge>(path, graph.getEdge(v7,postV));
|
||||
}
|
||||
Path<SeqVertex,BaseEdge> path = new Path<>( (offRefBeginning ? preV : v), graph);
|
||||
if( offRefBeginning )
|
||||
path = new Path<>(path, graph.getEdge(preV, v));
|
||||
path = new Path<>(path, graph.getEdge(v, v2Alt));
|
||||
path = new Path<>(path, graph.getEdge(v2Alt, v3));
|
||||
path = new Path<>(path, graph.getEdge(v3, v4Ref));
|
||||
path = new Path<>(path, graph.getEdge(v4Ref, v5));
|
||||
path = new Path<>(path, graph.getEdge(v5, v6Alt));
|
||||
path = new Path<>(path, graph.getEdge(v6Alt, v7));
|
||||
if( offRefEnding )
|
||||
path = new Path<>(path, graph.getEdge(v7,postV));
|
||||
|
||||
// Construct the actual cigar string implied by the test path
|
||||
Cigar expectedCigar = new Cigar();
|
||||
|
|
@ -382,10 +339,10 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
"Cigar string mismatch: ref = " + ref + " alt " + new String(path.getBases()));
|
||||
}
|
||||
|
||||
@Test(enabled = !DEBUG)
|
||||
@Test
|
||||
public void testIntraNodeInsertionDeletion() {
|
||||
// Construct the assembly graph
|
||||
SeqGraph graph = new SeqGraph(11);
|
||||
final SeqGraph graph = new SeqGraph(11);
|
||||
final SeqVertex top = new SeqVertex("T");
|
||||
final SeqVertex bot = new SeqVertex("T");
|
||||
final SeqVertex alt = new SeqVertex("AAACCCCC");
|
||||
|
|
@ -395,38 +352,38 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
graph.addEdges(new BaseEdge(true, 1), top, ref, bot);
|
||||
graph.addEdges(new BaseEdge(false, 1), top, alt, bot);
|
||||
|
||||
final KBestPaths<SeqVertex,BaseEdge> pathFinder = new KBestPaths<SeqVertex,BaseEdge>();
|
||||
final List<Path<SeqVertex,BaseEdge>> paths = pathFinder.getKBestPaths(graph, top, bot);
|
||||
@SuppressWarnings("all")
|
||||
final KBestHaplotypeFinder bestPathFinder = new KBestHaplotypeFinder(graph,top,bot);
|
||||
|
||||
Assert.assertEquals(paths.size(), 2);
|
||||
Assert.assertEquals(bestPathFinder.size(), 2);
|
||||
|
||||
final Path<SeqVertex,BaseEdge> refPath = paths.get(0);
|
||||
final Path<SeqVertex,BaseEdge> altPath = paths.get(1);
|
||||
final Path<SeqVertex,BaseEdge> refPath = bestPathFinder.get(0).path();
|
||||
final Path<SeqVertex,BaseEdge> altPath = bestPathFinder.get(1).path();
|
||||
|
||||
final String refString = top.getSequenceString() + ref.getSequenceString() + bot.getSequenceString();
|
||||
Assert.assertEquals(refPath.calculateCigar(refString.getBytes()).toString(), "10M");
|
||||
Assert.assertEquals(altPath.calculateCigar(refString.getBytes()).toString(), "1M3I5M3D1M");
|
||||
}
|
||||
|
||||
@Test(enabled = !DEBUG)
|
||||
@Test
|
||||
public void testHardSWPath() {
|
||||
// Construct the assembly graph
|
||||
SeqGraph graph = new SeqGraph(11);
|
||||
final SeqGraph graph = new SeqGraph(11);
|
||||
final SeqVertex top = new SeqVertex( "NNN" );
|
||||
final SeqVertex bot = new SeqVertex( "NNN" );
|
||||
final SeqVertex alt = new SeqVertex( "ACAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA" );
|
||||
final SeqVertex alt = new SeqVertex( "ACAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA" );
|
||||
final SeqVertex ref = new SeqVertex( "TGTGTGTGTGTGTGACAGAGAGAGAGAGAGAGAGAGAGAGAGAGA" );
|
||||
graph.addVertices(top, bot, alt, ref);
|
||||
graph.addEdges(new BaseEdge(true, 1), top, ref, bot);
|
||||
graph.addEdges(new BaseEdge(false, 1), top, alt, bot);
|
||||
|
||||
final KBestPaths<SeqVertex,BaseEdge> pathFinder = new KBestPaths<SeqVertex,BaseEdge>();
|
||||
final List<Path<SeqVertex,BaseEdge>> paths = pathFinder.getKBestPaths(graph, top, bot);
|
||||
@SuppressWarnings("all")
|
||||
final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph, top, bot);
|
||||
|
||||
Assert.assertEquals(paths.size(), 2);
|
||||
|
||||
final Path<SeqVertex,BaseEdge> refPath = paths.get(0);
|
||||
final Path<SeqVertex,BaseEdge> altPath = paths.get(1);
|
||||
final Path<SeqVertex,BaseEdge> refPath = paths.get(0).path();
|
||||
final Path<SeqVertex,BaseEdge> altPath = paths.get(1).path();
|
||||
|
||||
final String refString = top.getSequenceString() + ref.getSequenceString() + bot.getSequenceString();
|
||||
|
||||
|
|
@ -446,7 +403,7 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
|
||||
@DataProvider(name = "SystematicRefAltSWTestData")
|
||||
public Object[][] makeSystematicRefAltSWTestData() {
|
||||
List<Object[]> tests = new ArrayList<Object[]>();
|
||||
final List<Object[]> tests = new ArrayList<>();
|
||||
|
||||
final List<List<String>> allDiffs = Arrays.asList(
|
||||
Arrays.asList("G", "C", "1M"),
|
||||
|
|
@ -470,7 +427,7 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
return tests.toArray(new Object[][]{});
|
||||
}
|
||||
|
||||
@Test(dataProvider = "SystematicRefAltSWTestData", enabled = !DEBUG)
|
||||
@Test(dataProvider = "SystematicRefAltSWTestData")
|
||||
public void testRefAltSW(final String prefix, final String end, final String refMid, final String altMid, final String midCigar) {
|
||||
// Construct the assembly graph
|
||||
SeqGraph graph = new SeqGraph(11);
|
||||
|
|
@ -506,7 +463,7 @@ public class KBestPathsUnitTest extends BaseTest {
|
|||
Assert.assertEquals(pathCigar, expected, "Cigar mismatch: ref = " + refString + " vs alt = " + new String(path.getBases()));
|
||||
}
|
||||
|
||||
@Test(enabled = !DEBUG)
|
||||
@Test
|
||||
public void testLeftAlignCigarSequentially() {
|
||||
String preRefString = "GATCGATCGATC";
|
||||
String postRefString = "TTT";
|
||||
|
|
@ -61,7 +61,7 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
|
|||
|
||||
@DataProvider(name = "PrefixSuffixData")
|
||||
public Object[][] makePrefixSuffixData() {
|
||||
List<Object[]> tests = new ArrayList<Object[]>();
|
||||
final List<Object[]> tests = new ArrayList<>();
|
||||
|
||||
tests.add(new Object[]{Arrays.asList("A", "C"), 0, 0});
|
||||
tests.add(new Object[]{Arrays.asList("C", "C"), 1, 0});
|
||||
|
|
@ -91,7 +91,7 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
|
|||
|
||||
@Test(dataProvider = "PrefixSuffixData")
|
||||
public void testPrefixSuffix(final List<String> strings, int expectedPrefixLen, int expectedSuffixLen) {
|
||||
final List<byte[]> bytes = new ArrayList<byte[]>();
|
||||
final List<byte[]> bytes = new ArrayList<>();
|
||||
int min = Integer.MAX_VALUE;
|
||||
for ( final String s : strings ) {
|
||||
bytes.add(s.getBytes());
|
||||
|
|
@ -107,7 +107,7 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
|
|||
|
||||
@Test(dataProvider = "PrefixSuffixData")
|
||||
public void testPrefixSuffixVertices(final List<String> strings, int expectedPrefixLen, int expectedSuffixLen) {
|
||||
final List<SeqVertex> v = new ArrayList<SeqVertex>();
|
||||
final List<SeqVertex> v = new ArrayList<>();
|
||||
for ( final String s : strings ) {
|
||||
v.add(new SeqVertex(s));
|
||||
}
|
||||
|
|
@ -127,19 +127,18 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
|
|||
public void testSplitter(final List<String> strings, int expectedPrefixLen, int expectedSuffixLen) {
|
||||
final SeqGraph graph = new SeqGraph(11);
|
||||
|
||||
final List<SeqVertex> v = new ArrayList<SeqVertex>();
|
||||
final List<SeqVertex> v = new ArrayList<>();
|
||||
for ( final String s : strings ) {
|
||||
v.add(new SeqVertex(s));
|
||||
}
|
||||
|
||||
graph.addVertices(v.toArray(new SeqVertex[]{}));
|
||||
graph.addVertices(v.toArray(new SeqVertex[v.size()]));
|
||||
|
||||
final String expectedPrefix = strings.get(0).substring(0, expectedPrefixLen);
|
||||
final String expectedSuffix = strings.get(0).substring(strings.get(0).length() - expectedSuffixLen);
|
||||
|
||||
final SharedVertexSequenceSplitter splitter = new SharedVertexSequenceSplitter(graph, v);
|
||||
splitter.split();
|
||||
// splitter.splitGraph.printGraph(new File(Utils.join("_", strings) + ".dot"), 0);
|
||||
|
||||
Assert.assertEquals(splitter.prefixV.getSequenceString(), expectedPrefix);
|
||||
Assert.assertEquals(splitter.suffixV.getSequenceString(), expectedSuffix);
|
||||
|
|
@ -158,7 +157,7 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
|
|||
|
||||
@DataProvider(name = "CompleteCycleData")
|
||||
public Object[][] makeCompleteCycleData() {
|
||||
List<Object[]> tests = new ArrayList<Object[]>();
|
||||
List<Object[]> tests = new ArrayList<>();
|
||||
|
||||
for ( final boolean hasTop : Arrays.asList(true, false) ) {
|
||||
for ( final boolean hasBot : Arrays.asList(true, false) ) {
|
||||
|
|
@ -207,11 +206,11 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
|
|||
int edgeWeight = 1;
|
||||
final SeqVertex top = hasTop ? new SeqVertex("AAAAAAAA") : null;
|
||||
final SeqVertex bot = hasBot ? new SeqVertex("GGGGGGGG") : null;
|
||||
final List<SeqVertex> v = new ArrayList<SeqVertex>();
|
||||
final List<SeqVertex> v = new ArrayList<>();
|
||||
for ( final String s : strings ) {
|
||||
v.add(new SeqVertex(s));
|
||||
}
|
||||
graph.addVertices(v.toArray(new SeqVertex[]{}));
|
||||
graph.addVertices(v.toArray(new SeqVertex[v.size()]));
|
||||
final SeqVertex first = v.get(0);
|
||||
|
||||
if ( hasTop ) {
|
||||
|
|
@ -226,10 +225,10 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
|
|||
graph.addEdge(vi, bot, new BaseEdge(vi == first, edgeWeight++));
|
||||
}
|
||||
|
||||
final Set<String> haplotypes = new HashSet<String>();
|
||||
final List<Path<SeqVertex,BaseEdge>> originalPaths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths((SeqGraph)graph.clone());
|
||||
for ( final Path<SeqVertex,BaseEdge> path : originalPaths )
|
||||
haplotypes.add(new String(path.getBases()));
|
||||
final Set<String> haplotypes = new HashSet<>();
|
||||
final List<KBestHaplotype> originalPaths = new KBestHaplotypeFinder((SeqGraph) graph.clone(),graph.getSources(),graph.getSinks());
|
||||
for ( final KBestHaplotype path : originalPaths )
|
||||
haplotypes.add(new String(path.bases()));
|
||||
|
||||
final SharedVertexSequenceSplitter splitter = new SharedVertexSequenceSplitter(graph, v);
|
||||
splitter.split();
|
||||
|
|
@ -238,22 +237,22 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
|
|||
splitter.updateGraph(top, bot);
|
||||
if ( PRINT_GRAPHS ) graph.printGraph(new File(Utils.join("_", strings) + ".updated.dot"), 0);
|
||||
|
||||
final List<Path<SeqVertex,BaseEdge>> splitPaths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(graph);
|
||||
for ( final Path<SeqVertex,BaseEdge> path : splitPaths ) {
|
||||
final String h = new String(path.getBases());
|
||||
final List<KBestHaplotype> splitPaths = new KBestHaplotypeFinder(graph,graph.getSources(),graph.getSinks());
|
||||
for ( final KBestHaplotype path : splitPaths ) {
|
||||
final String h = new String(path.bases());
|
||||
Assert.assertTrue(haplotypes.contains(h), "Failed to find haplotype " + h);
|
||||
}
|
||||
|
||||
if ( splitPaths.size() == originalPaths.size() ) {
|
||||
for ( int i = 0; i < originalPaths.size(); i++ ) {
|
||||
Assert.assertTrue(splitPaths.get(i).equalScoreAndSequence(originalPaths.get(i)), "Paths not equal " + splitPaths.get(i) + " vs. original " + originalPaths.get(i));
|
||||
Assert.assertTrue(splitPaths.get(i).path().equalScoreAndSequence(originalPaths.get(i).path()), "Paths not equal " + splitPaths.get(i) + " vs. original " + originalPaths.get(i));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@DataProvider(name = "MeetsMinSequenceData")
|
||||
public Object[][] makeMeetsMinSequenceData() {
|
||||
List<Object[]> tests = new ArrayList<Object[]>();
|
||||
final List<Object[]> tests = new ArrayList<>();
|
||||
|
||||
final boolean prefixBiased = SharedVertexSequenceSplitter.prefersPrefixMerging();
|
||||
tests.add(new Object[]{Arrays.asList("AC", "AC"), 0, true, true});
|
||||
|
|
@ -280,9 +279,9 @@ public class SharedVertexSequenceSplitterUnitTest extends BaseTest {
|
|||
|
||||
final SeqVertex top = new SeqVertex("AAAAAAAA");
|
||||
final SeqVertex bot = new SeqVertex("GGGGGGGG");
|
||||
final List<SeqVertex> v = new ArrayList<SeqVertex>();
|
||||
final List<SeqVertex> v = new ArrayList<>();
|
||||
for ( final String s : mids ) { v.add(new SeqVertex(s)); }
|
||||
graph.addVertices(v.toArray(new SeqVertex[]{}));
|
||||
graph.addVertices(v.toArray(new SeqVertex[v.size()]));
|
||||
graph.addVertices(top, bot);
|
||||
for ( final SeqVertex vi : v ) { graph.addEdge(top, vi); graph.addEdge(vi, bot); }
|
||||
|
||||
|
|
|
|||
|
|
@ -46,10 +46,11 @@
|
|||
|
||||
package org.broadinstitute.sting.gatk.walkers.haplotypecaller.readthreading;
|
||||
|
||||
import net.sf.samtools.Cigar;
|
||||
import net.sf.samtools.TextCigarCodec;
|
||||
import org.broadinstitute.sting.BaseTest;
|
||||
import org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs.*;
|
||||
import org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs.KBestHaplotype;
|
||||
import org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs.KBestHaplotypeFinder;
|
||||
import org.broadinstitute.sting.gatk.walkers.haplotypecaller.graphs.SeqGraph;
|
||||
import org.broadinstitute.sting.utils.Utils;
|
||||
import org.broadinstitute.sting.utils.sam.ArtificialSAMUtils;
|
||||
import org.broadinstitute.sting.utils.sam.GATKSAMRecord;
|
||||
|
|
@ -57,7 +58,8 @@ import org.testng.Assert;
|
|||
import org.testng.annotations.DataProvider;
|
||||
import org.testng.annotations.Test;
|
||||
|
||||
import java.util.*;
|
||||
import java.util.ArrayList;
|
||||
import java.util.List;
|
||||
|
||||
public class DanglingChainMergingGraphUnitTest extends BaseTest {
|
||||
|
||||
|
|
@ -235,7 +237,7 @@ public class DanglingChainMergingGraphUnitTest extends BaseTest {
|
|||
// confirm that we created the appropriate bubble in the graph only if expected
|
||||
rtgraph.cleanNonRefPaths();
|
||||
final SeqGraph seqGraph = rtgraph.convertToSequenceGraph();
|
||||
List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(seqGraph, seqGraph.getReferenceSourceVertex(), seqGraph.getReferenceSinkVertex());
|
||||
final List<KBestHaplotype> paths = new KBestHaplotypeFinder(seqGraph, seqGraph.getReferenceSourceVertex(), seqGraph.getReferenceSinkVertex());
|
||||
Assert.assertEquals(paths.size(), shouldBeMerged ? 2 : 1);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -59,13 +59,14 @@ import java.io.File;
|
|||
import java.util.*;
|
||||
|
||||
public class ReadThreadingAssemblerUnitTest extends BaseTest {
|
||||
|
||||
private final static boolean DEBUG = false;
|
||||
|
||||
private static class TestAssembler {
|
||||
final ReadThreadingAssembler assembler;
|
||||
|
||||
Haplotype refHaplotype;
|
||||
final List<GATKSAMRecord> reads = new LinkedList<GATKSAMRecord>();
|
||||
final List<GATKSAMRecord> reads = new LinkedList<>();
|
||||
|
||||
private TestAssembler(final int kmerSize) {
|
||||
this.assembler = new ReadThreadingAssembler(100000, Arrays.asList(kmerSize));
|
||||
|
|
@ -102,11 +103,11 @@ public class ReadThreadingAssemblerUnitTest extends BaseTest {
|
|||
private void assertSingleBubble(final TestAssembler assembler, final String one, final String two) {
|
||||
final SeqGraph graph = assembler.assemble();
|
||||
graph.simplifyGraph();
|
||||
List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(graph);
|
||||
final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph);
|
||||
Assert.assertEquals(paths.size(), 2);
|
||||
final Set<String> expected = new HashSet<String>(Arrays.asList(one, two));
|
||||
for ( final Path<SeqVertex,BaseEdge> path : paths ) {
|
||||
final String seq = new String(path.getBases());
|
||||
final Set<String> expected = new HashSet<>(Arrays.asList(one, two));
|
||||
for ( final KBestHaplotype path : paths ) {
|
||||
final String seq = new String(path.bases());
|
||||
Assert.assertTrue(expected.contains(seq));
|
||||
expected.remove(seq);
|
||||
}
|
||||
|
|
@ -169,7 +170,7 @@ public class ReadThreadingAssemblerUnitTest extends BaseTest {
|
|||
Assert.assertNotNull(graph.getReferenceSourceVertex());
|
||||
Assert.assertNotNull(graph.getReferenceSinkVertex());
|
||||
|
||||
final List<Path<SeqVertex,BaseEdge>> paths = new KBestPaths<SeqVertex,BaseEdge>().getKBestPaths(graph);
|
||||
final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph);
|
||||
Assert.assertEquals(paths.size(), 2);
|
||||
}
|
||||
|
||||
|
|
@ -226,11 +227,10 @@ public class ReadThreadingAssemblerUnitTest extends BaseTest {
|
|||
assembler.addSequence(ReadThreadingGraphUnitTest.getBytes(read2), false);
|
||||
|
||||
final SeqGraph graph = assembler.assemble();
|
||||
final KBestPaths<SeqVertex,BaseEdge> pathFinder = new KBestPaths<SeqVertex,BaseEdge>();
|
||||
final List<Path<SeqVertex,BaseEdge>> paths = pathFinder.getKBestPaths(graph);
|
||||
final List<KBestHaplotype> paths = new KBestHaplotypeFinder(graph);
|
||||
Assert.assertEquals(paths.size(), 2);
|
||||
final byte[] refPath = paths.get(0).getBases().length == ref.length() ? paths.get(0).getBases() : paths.get(1).getBases();
|
||||
final byte[] altPath = paths.get(0).getBases().length == ref.length() ? paths.get(1).getBases() : paths.get(0).getBases();
|
||||
final byte[] refPath = paths.get(0).bases().length == ref.length() ? paths.get(0).bases() : paths.get(1).bases();
|
||||
final byte[] altPath = paths.get(0).bases().length == ref.length() ? paths.get(1).bases() : paths.get(0).bases();
|
||||
Assert.assertEquals(refPath, ReadThreadingGraphUnitTest.getBytes(ref));
|
||||
Assert.assertEquals(altPath, ReadThreadingGraphUnitTest.getBytes(read1));
|
||||
}
|
||||
|
|
|
|||
|
|
@ -212,8 +212,7 @@ public class ReadThreadingGraphUnitTest extends BaseTest {
|
|||
rtgraph.buildGraphIfNecessary();
|
||||
|
||||
final SeqGraph graph = rtgraph.convertToSequenceGraph();
|
||||
final KBestPaths<SeqVertex,BaseEdge> pathFinder = new KBestPaths<>(false);
|
||||
Assert.assertEquals(pathFinder.getKBestPaths(graph, length, graph.getReferenceSourceVertex(), graph.getReferenceSinkVertex()).size(), 1);
|
||||
Assert.assertEquals(new KBestHaplotypeFinder(graph, graph.getReferenceSourceVertex(), graph.getReferenceSinkVertex()).size(), 1);
|
||||
}
|
||||
|
||||
// TODO -- update to use determineKmerSizeAndNonUniques directly
|
||||
|
|
|
|||
|
|
@ -34,7 +34,6 @@ import org.broad.tribble.TribbleException;
|
|||
import org.broad.tribble.index.Index;
|
||||
import org.broad.tribble.index.IndexFactory;
|
||||
import org.broad.tribble.util.LittleEndianOutputStream;
|
||||
import org.broad.tribble.util.TabixUtils;
|
||||
import org.broadinstitute.sting.commandline.Tags;
|
||||
import org.broadinstitute.sting.gatk.GenomeAnalysisEngine;
|
||||
import org.broadinstitute.sting.gatk.arguments.ValidationExclusion;
|
||||
|
|
@ -172,8 +171,8 @@ public class RMDTrackBuilder { // extends PluginManager<FeatureCodec> {
|
|||
// we might not know the index type, try loading with the default reader constructor
|
||||
logger.debug("Attempting to load " + inputFile + " as a tabix indexed file without validating it");
|
||||
try {
|
||||
final File indexFile = new File(inputFile.getAbsoluteFile() + TabixUtils.STANDARD_INDEX_EXTENSION);
|
||||
final SAMSequenceDictionary dict = TabixUtils.getSequenceDictionary(indexFile);
|
||||
final File indexFile = null;//new File(inputFile.getAbsoluteFile() + TabixUtils.STANDARD_INDEX_EXTENSION);
|
||||
final SAMSequenceDictionary dict = null; //TabixUtils.getSequenceDictionary(indexFile);
|
||||
return new Pair<>(AbstractFeatureReader.getFeatureReader(inputFile.getAbsolutePath(), createCodec(descriptor, name)), dict);
|
||||
} catch (TribbleException e) {
|
||||
throw new UserException(e.getMessage(), e);
|
||||
|
|
|
|||
Loading…
Reference in New Issue