Minimum Flow Decomposition
1. Definition
The Minimum Flow Decomposition problem on a directed acyclic graph (DAG) is defined as follows:
- 
INPUT: A directed graph \(G = (V,E)\), and a flow on \(G\), namely weights \(f(u,v)\) for every edge \((u,v)\) of \(G\), such that for every node \(v\) that is not a source or sink of \(G\), it holds that the sum of the flow values entering \(v\) equals the sum of the flow values exiting \(v\). This property is called flow conservation.  
- 
OUTPUT: A minimum number \(k\) of source-to-sink paths, \(P_1,\dots,P_k\), with a weight \(w_i\) associated to each \(P_i\), such that for every edge it holds that its flow value equals the sum of the weights of the paths going through the edge. Formally, 
$$
f(u,v) = \sum_{i \in \{1,\dots,k\} : (u,v) \in P_i }w_i, ~~\forall (u,v) \in E.
$$ 
Note
- This class support also graphs with flow values on nodes. Set the parameter flow_attr_origin = "node". For details on how these are handled internally, see Handling graphs with flows / weights on nodes.
- The graph may have more than one source or sink nodes, in which case the solution paths are just required to start in any source node, and end in any sink node.
 
For example, the directed graph below satisfies the flow conservation property:
flowchart LR
    s((s))
    a((a))
    b((b))
    c((c))
    d((d))
    t((t))
    s -->|6| a
    a -->|2| b
    s -->|7| b
    a -->|4| c
    b -->|9| c
    c -->|6| d
    d -->|6| t
    c -->|7| t
A decomposition into 3 paths, in red, orange and blue, of weights 4, 2 and 7, respectively is shown below. There is no decomposition into a smaller number of paths, and thus this decomposition is also a minimum flow decomposition.
flowchart LR
    s((s))
    a((a))
    b((b))
    c((c))
    d((d))
    t((t))
    s -->|4| a
    a -->|4| c
    c -->|4| d
    d -->|4| t
    linkStyle 0,1,2,3 stroke:red,stroke-width:3;
    s -->|2| a
    a -->|2| b
    b -->|2| c
    c -->|2| d
    d -->|2| t
    linkStyle 4,5,6,7,8 stroke:orange,stroke-width:3;
    s -->|7| b
    b -->|7| c
    c -->|7| t
    linkStyle 9,10,11 stroke:blue,stroke-width:3;
2. Solving the problem
We create the graph as a networkx DiGraph. In real project, you will likely have a method that transforms your graph to a DiGraph. We also give an attribute flow for every edge storing its flow value.
import flowpaths as fp
import networkx as nx
graph = nx.DiGraph()
graph.add_edge("s", "a", flow=6)
graph.add_edge("s", "b", flow=7)
graph.add_edge("a", "b", flow=2)
graph.add_edge("a", "c", flow=4)
graph.add_edge("b", "c", flow=9)
graph.add_edge("c", "d", flow=6)
graph.add_edge("c", "t", flow=7)
graph.add_edge("d", "t", flow=6)
flow of the edges. Note that MinFlowDecomp just creates the model. You need to call solve() to solve it.
mfd_model = fp.MinFlowDecomp(graph, flow_attr="flow")
mfd_model.solve()
The model might not be solved because the MILP solver couldn’t do it in the time it had allocated, or other problems. Thus, you need to check if it was solved, and then get its solution. The solution of MinFlowDecomp is a dictionary, with an key 'paths', and a key 'weights':
if mfd_model.is_solved():
    solution = mfd_model.get_solution()
    print(solution)
    # {'paths': [
    #   ['s', 'b', 'c', 't'], 
    #   ['s', 'a', 'c', 'd', 't'], 
    #   ['s', 'a', 'b', 'c', 'd', 't']], 
    # 'weights': [7, 4, 2]} 
3. References
There are several works on this problem, for example.
- 
Vatinlen, Benedicte, et al. Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths. European Journal of Operational Research 185.3 (2008): 1390-1401. 
- 
Shao, Mingfu, and Carl Kingsford. Theory and a heuristic for the minimum path flow decomposition problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16.2 (2017): 658-670. 
- 
Kloster, Kyle, et al. A practical fpt algorithm for flow decomposition and transcript assembly 2018 Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments (ALENEX). Society for Industrial and Applied Mathematics, 2018. 
- 
See also flowpaths References, and the other papers cited by these works. 
    
            
              Bases: AbstractPathModelDAG
        A class to decompose a network flow if a directed acyclic graph into a minimum number of weighted paths.
        Initialize the Minimum Flow Decomposition model, minimizing the number of paths.
Parameters
- 
G : nx.DiGraph
 The input directed acyclic graph, as networkx DiGraph. 
- 
flow_attr : str
 The attribute name from where to get the flow values on the edges. 
- 
flow_attr_origin : str, optional
 The origin of the flow attribute. Default is "edge". Options:
 
- "edge": the flow attribute is assumed to be on the edges of the graph.
- "node": the flow attribute is assumed to be on the nodes of the graph. See the documentation on how node-weighted graphs are handled.
 
- 
weight_type : type, optional
 The type of weights (intorfloat). Default isfloat.
 
- 
subpath_constraints : list, optional
 List of subpath constraints. Default is an empty list. 
Each subpath constraint is a list of edges that must be covered by some solution path, according 
to the subpath_constraints_coverageorsubpath_constraints_coverage_lengthparameters (see below).
 
- 
subpath_constraints_coverage: float, optional
 Coverage fraction of the subpath constraints that must be covered by some solution paths.  Defaults to 1.0, meaning that 100% of the edges (or nodes, ifflow_attr_originis"node") of 
the constraint need to be covered by some solution path). 
See subpath constraints documentation
 
- 
subpath_constraints_coverage_length : float, optional
 Coverage length of the subpath constraints. Default is None. If set, this overridessubpath_constraints_coverage, 
and the coverage constraint is expressed in terms of the subpath constraint length.subpath_constraints_coverage_lengthis then the fraction of the total length of the constraint (specified vialength_attr) needs to appear in some solution path.
See subpath constraints documentation
 
- 
length_attr: str, optional
 The attribute name from where to get the edge lengths (or node length, if flow_attr_originis"node"). Defaults toNone.
 
- If set, then the subpath lengths (above) are in terms of the edge/node lengths specified in the length_attrfield of each edge/node.
- If set, and an edge/node has a missing edge length, then it gets length 1.
 
- 
elements_to_ignore : list, optional
 List of edges (or nodes, if flow_attr_originis"node") to ignore when adding constrains on flow explanation by the weighted paths. 
Default is an empty list. See ignoring edges documentation
 
- 
additional_starts: list, optional
 List of additional start nodes of the paths. Default is an empty list. See additional start/end nodes documentation. You can set this only if flow_attr_originis"node".
 
- 
additional_ends: list, optional
 List of additional end nodes of the paths. Default is an empty list. See additional start/end nodes documentation. You can set this only if flow_attr_originis"node".
 
- 
optimization_options : dict, optional
 Dictionary with the optimization options. Default is an empty dict. See optimization options documentation.
This class also supports the optimization "optimize_with_greedy": True(this is the default value). This
will use a greedy algorithm to solve the problem, and if the number of paths returned by it equals a lowerbound on the solution size,
then we know the greedy solution is optimum, and it will use that. The lowerbound used currently is the edge-width of the graph,
meaning the minimum number of paths needed to cover all edges. This is a correct lowerbound because any flow decomposition must cover all edges, 
as they have non-zero flow.
 
- 
solver_options : dict, optional
 Dictionary with the solver options. Default is {}. See solver options documentation.
 
Raises
ValueError
- If weight_typeis notintorfloat.
- If some edge does not have the flow attribute specified as flow_attr.
- If the graph does not satisfy flow conservation on nodes different from source or sink.
- If the graph contains edges with negative (<0) flow values.
- If the graph is not acyclic.
- If flow_attr_originis not “node” or “edge”.
                    Source code in flowpaths/minflowdecomp.py
                    |  | def __init__(
    self,
    G: nx.DiGraph,
    flow_attr: str,
    flow_attr_origin: str = "edge",
    weight_type: type = float,
    subpath_constraints: list = [],
    subpath_constraints_coverage: float = 1.0,
    subpath_constraints_coverage_length: float = None,
    length_attr: str = None,
    elements_to_ignore: list = [],
    additional_starts: list = [],
    additional_ends: list = [],
    optimization_options: dict = {},
    solver_options: dict = {},
):
    """
    Initialize the Minimum Flow Decomposition model, minimizing the number of paths.
    Parameters
    ----------
    - `G : nx.DiGraph`
        The input directed acyclic graph, as [networkx DiGraph](https://networkx.org/documentation/stable/reference/classes/digraph.html).
    - `flow_attr : str`
        The attribute name from where to get the flow values on the edges.
    - `flow_attr_origin : str`, optional
        The origin of the flow attribute. Default is `"edge"`. Options:
        - `"edge"`: the flow attribute is assumed to be on the edges of the graph.
        - `"node"`: the flow attribute is assumed to be on the nodes of the graph. See [the documentation](node-expanded-digraph.md) on how node-weighted graphs are handled.
    - `weight_type : type`, optional
        The type of weights (`int` or `float`). Default is `float`.
    - `subpath_constraints : list`, optional
        List of subpath constraints. Default is an empty list. 
        Each subpath constraint is a list of edges that must be covered by some solution path, according 
        to the `subpath_constraints_coverage` or `subpath_constraints_coverage_length` parameters (see below).
    - `subpath_constraints_coverage: float`, optional
        Coverage fraction of the subpath constraints that must be covered by some solution paths. 
        Defaults to `1.0`, meaning that 100% of the edges (or nodes, if `flow_attr_origin` is `"node"`) of 
        the constraint need to be covered by some solution path). 
        See [subpath constraints documentation](subpath-constraints.md#3-relaxing-the-constraint-coverage)
    - `subpath_constraints_coverage_length : float`, optional
        Coverage length of the subpath constraints. Default is `None`. If set, this overrides `subpath_constraints_coverage`, 
        and the coverage constraint is expressed in terms of the subpath constraint length. 
        `subpath_constraints_coverage_length` is then the fraction of the total length of the constraint (specified via `length_attr`) needs to appear in some solution path.
        See [subpath constraints documentation](subpath-constraints.md#3-relaxing-the-constraint-coverage)
    - `length_attr: str`, optional
        The attribute name from where to get the edge lengths (or node length, if `flow_attr_origin` is `"node"`). Defaults to `None`.
        - If set, then the subpath lengths (above) are in terms of the edge/node lengths specified in the `length_attr` field of each edge/node.
        - If set, and an edge/node has a missing edge length, then it gets length 1.
    - `elements_to_ignore : list`, optional
        List of edges (or nodes, if `flow_attr_origin` is `"node"`) to ignore when adding constrains on flow explanation by the weighted paths. 
        Default is an empty list. See [ignoring edges documentation](ignoring-edges.md)
    - `additional_starts: list`, optional
        List of additional start nodes of the paths. Default is an empty list. See [additional start/end nodes documentation](additional-start-end-nodes.md). **You can set this only if `flow_attr_origin` is `"node"`**.
    - `additional_ends: list`, optional
        List of additional end nodes of the paths. Default is an empty list. See [additional start/end nodes documentation](additional-start-end-nodes.md). **You can set this only if `flow_attr_origin` is `"node"`**.
    - `optimization_options : dict`, optional
        Dictionary with the optimization options. Default is an empty dict. See [optimization options documentation](solver-options-optimizations.md).
        This class also supports the optimization `"optimize_with_greedy": True` (this is the default value). This
        will use a greedy algorithm to solve the problem, and if the number of paths returned by it equals a lowerbound on the solution size,
        then we know the greedy solution is optimum, and it will use that. The lowerbound used currently is the edge-width of the graph,
        meaning the minimum number of paths needed to cover all edges. This is a correct lowerbound because any flow decomposition must cover all edges, 
        as they have non-zero flow.
    - `solver_options : dict`, optional
        Dictionary with the solver options. Default is `{}`. See [solver options documentation](solver-options-optimizations.md).
    Raises
    ------
    `ValueError`
    - If `weight_type` is not `int` or `float`.
    - If some edge does not have the flow attribute specified as `flow_attr`.
    - If the graph does not satisfy flow conservation on nodes different from source or sink.
    - If the graph contains edges with negative (<0) flow values.
    - If the graph is not acyclic.
    - If `flow_attr_origin` is not "node" or "edge".
    """
    # Handling node-weighted graphs
    self.flow_attr_origin = flow_attr_origin
    if self.flow_attr_origin == "node":
        if G.number_of_nodes() == 0:
            utils.logger.error(f"{__name__}: The input graph G has no nodes. Please provide a graph with at least one node.")
            raise ValueError(f"The input graph G has no nodes. Please provide a graph with at least one node.")
        if len(additional_starts) + len(additional_ends) == 0:
            self.G_internal = nedg.NodeExpandedDiGraph(
                G=G, 
                node_flow_attr=flow_attr, 
                node_length_attr=length_attr)
        else:
            self.G_internal = nedg.NodeExpandedDiGraph(
                G=G, 
                node_flow_attr=flow_attr, 
                node_length_attr=length_attr,
                try_filling_in_missing_flow_attr=True,
                additional_starts=additional_starts,
                additional_ends=additional_ends,
            )
        subpath_constraints_internal = self.G_internal.get_expanded_subpath_constraints(subpath_constraints)
        edges_to_ignore_internal = self.G_internal.edges_to_ignore
        if not all(isinstance(element_to_ignore, str) for element_to_ignore in elements_to_ignore):
            utils.logger.error(f"elements_to_ignore must be a list of nodes (i.e strings), not {elements_to_ignore}")
            raise ValueError(f"elements_to_ignore must be a list of nodes (i.e strings), not {elements_to_ignore}")
        edges_to_ignore_internal += [self.G_internal.get_expanded_edge(node) for node in elements_to_ignore]
        edges_to_ignore_internal = list(set(edges_to_ignore_internal))
    elif self.flow_attr_origin == "edge":
        if G.number_of_edges() == 0:
            utils.logger.error(f"{__name__}: The input graph G has no edges. Please provide a graph with at least one edge.")
            raise ValueError(f"The input graph G has no edges. Please provide a graph with at least one edge.")
        if len(additional_starts) + len(additional_ends) > 0:
            utils.logger.error(f"additional_starts and additional_ends are not supported when flow_attr_origin is 'edge'.")
            raise ValueError(f"additional_starts and additional_ends are not supported when flow_attr_origin is 'edge'.")
        self.G_internal = G
        subpath_constraints_internal = subpath_constraints
        if not all(isinstance(edge, tuple) and len(edge) == 2 for edge in elements_to_ignore):
            utils.logger.error(f"elements_to_ignore must be a list of edges (i.e. tuples of nodes), not {elements_to_ignore}")
            raise ValueError(f"elements_to_ignore must be a list of edges (i.e. tuples of nodes), not {elements_to_ignore}")
        edges_to_ignore_internal = elements_to_ignore
    else:
        utils.logger.error(f"flow_attr_origin must be either 'node' or 'edge', not {self.flow_attr_origin}")
        raise ValueError(f"flow_attr_origin must be either 'node' or 'edge', not {self.flow_attr_origin}")
    self.G = self.G_internal
    self.subpath_constraints = subpath_constraints_internal
    self.edges_to_ignore = edges_to_ignore_internal
    self.flow_attr = flow_attr
    self.weight_type = weight_type
    self.subpath_constraints_coverage = subpath_constraints_coverage
    self.subpath_constraints_coverage_length = subpath_constraints_coverage_length
    self.length_attr = length_attr
    self.optimization_options = optimization_options
    self.solver_options = solver_options
    self.time_limit = self.solver_options.get("time_limit", sw.SolverWrapper.time_limit)
    self.solve_time_start = None
    self.solve_statistics = {}
    self._solution = None
    self._lowerbound_k = None
    self._is_solved = None
    # Internal variables
    self._generating_set = None
    self._all_subgraph_weights = None
    self._given_weights_model = None
    self._source_flow = None
    utils.logger.info(f"{__name__}: initialized with graph id = {utils.fpid(G)}")
 | 
 
  
            get_solution
    
        Retrieves the solution for the flow decomposition problem.
Returns
Raises
- exceptionIf model is not solved.
              Source code in flowpaths/minflowdecomp.py
              |  | def get_solution(self):
    """
    Retrieves the solution for the flow decomposition problem.
    Returns
    -------
    - `solution: dict`
        A dictionary containing the solution paths (key `"paths"`) and their corresponding weights (key `"weights"`).
    Raises
    -------
    - `exception` If model is not solved.
    """
    self.check_is_solved()
    return self._solution
 | 
 
     
 
            solve
    
        Attempts to solve the flow decomposition problem using a model with varying number of paths.
This method iterates over a range of possible path numbers, creating and solving a flow decomposition model for each count.
If a solution is found, it stores the solution and relevant statistics, and returns True. If no solution is found after
iterating through all possible path counts, it returns False.
    Returns:
    
      
        
| Name | Type | Description | 
      
      
          
| bool | bool | 
                True if a solution is found, False otherwise. | 
      
    
  Note
  This overloads the solve() method from AbstractPathModelDAG class.
 
            
              Source code in flowpaths/minflowdecomp.py
              |  | def solve(self) -> bool:
    """
    Attempts to solve the flow decomposition problem using a model with varying number of paths.
    This method iterates over a range of possible path numbers, creating and solving a flow decomposition model for each count.
    If a solution is found, it stores the solution and relevant statistics, and returns True. If no solution is found after
    iterating through all possible path counts, it returns False.
    Returns:
        bool: True if a solution is found, False otherwise.
    Note:
        This overloads the `solve()` method from `AbstractPathModelDAG` class.
    """
    self.solve_time_start = time.perf_counter()
    if self.optimization_options.get("optimize_with_given_weights", MinFlowDecomp.optimize_with_given_weights):            
        self._solve_with_given_weights()
    for i in range(self.get_lowerbound_k(), self.G.number_of_edges()):
        utils.logger.info(f"{__name__}: iteration with k = {i}")
        fd_model = None
        # Checking if we have already found a solution with the same number of paths
        # via the min gen set and given weights approach
        if self._given_weights_model is not None and self._given_weights_model.is_solved():
            if len(self._given_weights_model.get_solution(remove_empty_paths=True)["paths"]) == i:
                fd_model = self._given_weights_model
        fd_solver_options = copy.deepcopy(self.solver_options)
        if "time_limit" in fd_solver_options:
            fd_solver_options["time_limit"] = self.time_limit - self.solve_time_elapsed
        if fd_model is None:
            fd_model = kflowdecomp.kFlowDecomp(
                G=self.G,
                flow_attr=self.flow_attr,
                k=i,
                weight_type=self.weight_type,
                subpath_constraints=self.subpath_constraints,
                subpath_constraints_coverage=self.subpath_constraints_coverage,
                subpath_constraints_coverage_length=self.subpath_constraints_coverage_length,
                length_attr=self.length_attr,
                elements_to_ignore=self.edges_to_ignore,
                optimization_options=self.optimization_options,
                solver_options=fd_solver_options,
            )
            fd_model.solve()
        if fd_model.is_solved():
            self._solution = fd_model.get_solution(remove_empty_paths=True)
            if self.flow_attr_origin == "node":
                # If the flow_attr_origin is "node", we need to convert the solution paths from the expanded graph to paths in the original graph.
                self._solution["_paths_internal"] = self._solution["paths"]
                self._solution["paths"] = self.G_internal.get_condensed_paths(self._solution["paths"])
            self.set_solved()
            self.solve_statistics = fd_model.solve_statistics
            self.solve_statistics["mfd_solve_time"] = time.perf_counter() - self.solve_time_start
            self.fd_model = fd_model
            return True
        elif fd_model.solver.get_model_status() != sw.SolverWrapper.infeasible_status:
            # If the model is not solved and the status is not infeasible,
            # it means that the solver stopped because of an unexpected termination,
            # thus we cannot conclude that the model is infeasible.
            # In this case, we stop the search.
            self.set_not_solved()
            return False
    self.set_not_solved()
    return False
 |