Skip to content

Minimum Flow Decomposition

1. Definition

The Minimum Flow Decomposition problem on a directed acyclic graph (DAG) is defined as follows:

  • INPUT: A directed graph \(G = (V,E)\), and a flow on \(G\), namely weights \(f(u,v)\) for every edge \((u,v)\) of \(G\), such that for every node \(v\) that is not a source or sink of \(G\), it holds that the sum of the flow values entering \(v\) equals the sum of the flow values exiting \(v\). This property is called flow conservation.

  • OUTPUT: A minimum number \(k\) of source-to-sink paths, \(P_1,\dots,P_k\), with a weight \(w_i\) associated to each \(P_i\), such that for every edge it holds that its flow value equals the sum of the weights of the paths going through the edge. Formally, $$ f(u,v) = \sum_{i \in \{1,\dots,k\} : (u,v) \in P_i }w_i, ~~\forall (u,v) \in E. $$

Note

The graph may have more than one source or sink nodes. The solution paths are required to start in some source node, and end in some sink node.

For example, the directed graph below satisfies the flow conservation property:

flowchart LR
    s((s))
    a((a))
    b((b))
    c((c))
    d((d))
    t((t))
    s -->|6| a
    a -->|2| b
    s -->|7| b
    a -->|4| c
    b -->|9| c
    c -->|6| d
    d -->|6| t
    c -->|7| t

A decomposition into 3 paths, in red, orange and blue, of weights 4, 2 and 7, respectively is shown below. There is no decomposition into a smaller number of paths, and thus this decomposition is also a minimum flow decomposition.

flowchart LR
    s((s))
    a((a))
    b((b))
    c((c))
    d((d))
    t((t))
    s -->|4| a
    a -->|4| c
    c -->|4| d
    d -->|4| t
    linkStyle 0,1,2,3 stroke:red,stroke-width:3;
    s -->|2| a
    a -->|2| b
    b -->|2| c
    c -->|2| d
    d -->|2| t
    linkStyle 4,5,6,7,8 stroke:orange,stroke-width:3;
    s -->|7| b
    b -->|7| c
    c -->|7| t
    linkStyle 9,10,11 stroke:blue,stroke-width:3;

2. Solving the problem

We create the graph as a networkx DiGraph. In real project, you will likely have a method that transforms your graph to a DiGraph. We also give an attribute flow for every edge storing its flow value.

import flowpaths as fp
import networkx as nx

graph = nx.DiGraph()
graph.add_edge("s", "a", flow=6)
graph.add_edge("s", "b", flow=7)
graph.add_edge("a", "b", flow=2)
graph.add_edge("a", "c", flow=4)
graph.add_edge("b", "c", flow=9)
graph.add_edge("c", "d", flow=6)
graph.add_edge("c", "t", flow=7)
graph.add_edge("d", "t", flow=6)
We now create a Minimum Flow Decomposition solver with default settings, by specifying that the flow value of each edge is in the attribute flow of the edges. Note that MinFlowDecomp just creates the model. You need to call solve() to solve it.

mfd_model = fp.MinFlowDecomp(graph, flow_attr="flow")
mfd_model.solve()

The model might not be solved because the MILP solver couldn’t do it in the time it had allocated, or other problems. Thus, you need to check if it was solved, and then get its solution. The solutoion of MinFlowDecomp is a tuple, where the first component is the list of paths (as lists of nodes), and the second component is a list of their corresponding weights.

if mfd_model.is_solved():
    solution = mfd_model.get_solution()
    print("Paths, weights", solution["paths"], solution["weights"])
    # [['s', 'b', 'c', 't'], ['s', 'a', 'c', 'd', 't'], ['s', 'a', 'b', 'c', 'd', 't']] [7, 4, 2]

3. References

There are several works on this problem, for example.

  1. Vatinlen, Benedicte, et al. Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths. European Journal of Operational Research 185.3 (2008): 1390-1401.

  2. Shao, Mingfu, and Carl Kingsford. Theory and a heuristic for the minimum path flow decomposition problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16.2 (2017): 658-670.

  3. Kloster, Kyle, et al. A practical fpt algorithm for flow decomposition and transcript assembly 2018 Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments (ALENEX). Society for Industrial and Applied Mathematics, 2018.

  4. See also flowpaths References, and the other papers cited by these works.

MinFlowDecomp

MinFlowDecomp(G: DiGraph, flow_attr: str, weight_type: type = float, subpath_constraints: list = [], subpath_constraints_coverage: float = 1.0, subpath_constraints_coverage_length: float = None, edge_length_attr: str = None, optimization_options: dict = None, solver_options: dict = None)

Bases: AbstractPathModelDAG

Class to decompose a flow into a minimum number of weighted paths.

Initialize the Minimum Flow Decomposition model, minimizing the number of paths.

Parameters

  • G : nx.DiGraph

    The input directed acyclic graph, as networkx DiGraph.

  • flow_attr : str

    The attribute name from where to get the flow values on the edges.

  • weight_type : type, optional

    The type of weights (int or float). Default is float.

  • subpath_constraints : list, optional

    List of subpath constraints. Default is an empty list. Each subpath constraint is a list of edges that must be covered by some solution path, according to the subpath_constraints_coverage or subpath_constraints_coverage_length parameters (see below).

  • subpath_constraints_coverage : float, optional

    Coverage fraction of the subpath constraints that must be covered by some solution paths.

    Defaults to 1.0 (meaning that 100% of the edges of the constraint need to be covered by some solution path). See subpath constraints documentation

  • subpath_constraints_coverage_length : float, optional

    Coverage length of the subpath constraints. Default is None. If set, this overrides subpath_constraints_coverage, and the coverage constraint is expressed in terms of the subpath constraint length. subpath_constraints_coverage_length is then the fraction of the total length of the constraint (specified via edge_length_attr) needs to appear in some solution path. See subpath constraints documentation

  • edge_length_attr : str, optional

    Attribute name for edge lengths. Default is None.

  • optimization_options : dict, optional

    Dictionary with the optimization options. Default is None. See optimization options documentation. This class also supports the optimization "optimize_with_greedy": True (this is the default value). This will use a greedy algorithm to solve the problem, and if the number of paths returned by it equals a lowerbound on the solution size, then we know the greedy solution is optimum, and it will use that. The lowerbound used currently is the edge-width of the graph, meaning the minimum number of paths needed to cover all edges. This is a correct lowerbound because any flow decomposition must cover all edges, as they have non-zero flow.

  • solver_options : dict, optional

    Dictionary with the solver options. Default is None. See solver options documentation.

Raises

ValueError

  • If weight_type is not int or float.
  • If some edge does not have the flow attribute specified as flow_attr.
  • If the graph does not satisfy flow conservation on nodes different from source or sink.
  • If the graph contains edges with negative (<0) flow values.
  • If the graph is not acyclic.
Source code in flowpaths/minflowdecomp.py
def __init__(
    self,
    G: nx.DiGraph,
    flow_attr: str,
    weight_type: type = float,
    subpath_constraints: list = [],
    subpath_constraints_coverage: float = 1.0,
    subpath_constraints_coverage_length: float = None,
    edge_length_attr: str = None,
    optimization_options: dict = None,
    solver_options: dict = None,
):
    """
    Initialize the Minimum Flow Decomposition model, minimizing the number of paths.

    Parameters
    ----------
    - `G : nx.DiGraph`

        The input directed acyclic graph, as networkx DiGraph.

    - `flow_attr : str`

        The attribute name from where to get the flow values on the edges.

    - `weight_type : type`, optional

        The type of weights (`int` or `float`). Default is `float`.

    - `subpath_constraints : list`, optional

        List of subpath constraints. Default is an empty list. 
        Each subpath constraint is a list of edges that must be covered by some solution path, according 
        to the `subpath_constraints_coverage` or `subpath_constraints_coverage_length` parameters (see below).

    - `subpath_constraints_coverage : float`, optional

        Coverage fraction of the subpath constraints that must be covered by some solution paths. 

        Defaults to `1.0` (meaning that 100% of the edges of the constraint need to be covered by some solution path). See [subpath constraints documentation](subpath-constraints.md#3-relaxing-the-constraint-coverage)

    - `subpath_constraints_coverage_length : float`, optional

        Coverage length of the subpath constraints. Default is `None`. If set, this overrides `subpath_constraints_coverage`, 
        and the coverage constraint is expressed in terms of the subpath constraint length. 
        `subpath_constraints_coverage_length` is then the fraction of the total length of the constraint (specified via `edge_length_attr`) needs to appear in some solution path.
        See [subpath constraints documentation](subpath-constraints.md#3-relaxing-the-constraint-coverage)

    - `edge_length_attr : str`, optional

        Attribute name for edge lengths. Default is `None`.

    - `optimization_options : dict`, optional

        Dictionary with the optimization options. Default is `None`. See [optimization options documentation](solver-options-optimizations.md).
        This class also supports the optimization `"optimize_with_greedy": True` (this is the default value). This
        will use a greedy algorithm to solve the problem, and if the number of paths returned by it equals a lowerbound on the solution size,
        then we know the greedy solution is optimum, and it will use that. The lowerbound used currently is the edge-width of the graph,
        meaning the minimum number of paths needed to cover all edges. This is a correct lowerbound because any flow decomposition must cover all edges, 
        as they have non-zero flow.

    - `solver_options : dict`, optional

        Dictionary with the solver options. Default is `None`. See [solver options documentation](solver-options-optimizations.md).

    Raises
    ------
    `ValueError`

    - If `weight_type` is not `int` or `float`.
    - If some edge does not have the flow attribute specified as `flow_attr`.
    - If the graph does not satisfy flow conservation on nodes different from source or sink.
    - If the graph contains edges with negative (<0) flow values.
    - If the graph is not acyclic.
    """

    self.G = G
    self.flow_attr = flow_attr
    self.weight_type = weight_type
    self.subpath_constraints = subpath_constraints
    self.subpath_constraints_coverage = subpath_constraints_coverage
    self.subpath_constraints_coverage_length = subpath_constraints_coverage_length
    self.edge_length_attr = edge_length_attr
    self.optimization_options = optimization_options
    self.solver_options = solver_options

    self.solve_statistics = {}
    self.__solution = None
    self.__lowerbound = None

get_solution

get_solution()

Retrieves the solution for the flow decomposition problem.

Returns
  • solution: dict

    A dictionary containing the solution paths (key "paths") and their corresponding weights (key "weights").

Raises
  • exception If model is not solved.
Source code in flowpaths/minflowdecomp.py
def get_solution(self):
    """
    Retrieves the solution for the flow decomposition problem.

    Returns
    -------
    - `solution: dict`

        A dictionary containing the solution paths (key `"paths"`) and their corresponding weights (key `"weights"`).

    Raises
    -------
    - `exception` If model is not solved.
    """
    self.check_is_solved()
    return self.__solution

solve

solve() -> bool

Attempts to solve the flow distribution problem using a model with varying number of paths.

This method iterates over a range of possible path counts, creating and solving a flow decompostion model for each count. If a solution is found, it stores the solution and relevant statistics, and returns True. If no solution is found after iterating through all possible path counts, it returns False.

Returns:

Name Type Description
bool bool

True if a solution is found, False otherwise.

Note

This overloads the solve() method from AbstractPathModelDAG class.

Source code in flowpaths/minflowdecomp.py
def solve(self) -> bool:
    """
    Attempts to solve the flow distribution problem using a model with varying number of paths.

    This method iterates over a range of possible path counts, creating and solving a flow decompostion model for each count.
    If a solution is found, it stores the solution and relevant statistics, and returns True. If no solution is found after
    iterating through all possible path counts, it returns False.

    Returns:
        bool: True if a solution is found, False otherwise.

    Note:
        This overloads the `solve()` method from `AbstractPathModelDAG` class.
    """
    start_time = time.time()
    for i in range(self.get_lowerbound_k(), self.G.number_of_edges()):
        fd_model = kflowdecomp.kFlowDecomp(
            G=self.G,
            flow_attr=self.flow_attr,
            k=i,
            weight_type=self.weight_type,
            subpath_constraints=self.subpath_constraints,
            subpath_constraints_coverage=self.subpath_constraints_coverage,
            subpath_constraints_coverage_length=self.subpath_constraints_coverage_length,
            edge_length_attr=self.edge_length_attr,
            optimization_options=self.optimization_options,
            solver_options=self.solver_options,
        )

        fd_model.solve()

        if fd_model.is_solved():
            self.__solution = fd_model.get_solution()
            self.set_solved()
            self.solve_statistics = fd_model.solve_statistics
            self.solve_statistics["mfd_solve_time"] = time.time() - start_time

            # storing the fd_model object for further analysis
            self.fd_model = fd_model
            return True
    return False