Minimum Flow Decomposition
1. Definition
The Minimum Flow Decomposition problem on a directed acyclic graph (DAG) is defined as follows:
-
INPUT: A directed graph \(G = (V,E)\), and a flow on \(G\), namely weights \(f(u,v)\) for every edge \((u,v)\) of \(G\), such that for every node \(v\) that is not a source or sink of \(G\), it holds that the sum of the flow values entering \(v\) equals the sum of the flow values exiting \(v\). This property is called flow conservation.
-
OUTPUT: A minimum number \(k\) of source-to-sink paths, \(P_1,\dots,P_k\), with a weight \(w_i\) associated to each \(P_i\), such that for every edge it holds that its flow value equals the sum of the weights of the paths going through the edge. Formally,
$$
f(u,v) = \sum_{i \in \{1,\dots,k\} : (u,v) \in P_i }w_i, ~~\forall (u,v) \in E.
$$
Note
- This class support also graphs with flow values on nodes. Set the parameter
flow_attr_origin = "node"
. For details on how these are handled internally, see Handling graphs with flows / weights on nodes.
- The graph may have more than one source or sink nodes, in which case the solution paths are just required to start in any source node, and end in any sink node.
For example, the directed graph below satisfies the flow conservation property:
flowchart LR
s((s))
a((a))
b((b))
c((c))
d((d))
t((t))
s -->|6| a
a -->|2| b
s -->|7| b
a -->|4| c
b -->|9| c
c -->|6| d
d -->|6| t
c -->|7| t
A decomposition into 3 paths, in red, orange and blue, of weights 4, 2 and 7, respectively is shown below. There is no decomposition into a smaller number of paths, and thus this decomposition is also a minimum flow decomposition.
flowchart LR
s((s))
a((a))
b((b))
c((c))
d((d))
t((t))
s -->|4| a
a -->|4| c
c -->|4| d
d -->|4| t
linkStyle 0,1,2,3 stroke:red,stroke-width:3;
s -->|2| a
a -->|2| b
b -->|2| c
c -->|2| d
d -->|2| t
linkStyle 4,5,6,7,8 stroke:orange,stroke-width:3;
s -->|7| b
b -->|7| c
c -->|7| t
linkStyle 9,10,11 stroke:blue,stroke-width:3;
2. Solving the problem
We create the graph as a networkx DiGraph. In real project, you will likely have a method that transforms your graph to a DiGraph. We also give an attribute flow
for every edge storing its flow value.
import flowpaths as fp
import networkx as nx
graph = nx.DiGraph()
graph.add_edge("s", "a", flow=6)
graph.add_edge("s", "b", flow=7)
graph.add_edge("a", "b", flow=2)
graph.add_edge("a", "c", flow=4)
graph.add_edge("b", "c", flow=9)
graph.add_edge("c", "d", flow=6)
graph.add_edge("c", "t", flow=7)
graph.add_edge("d", "t", flow=6)
We now create a Minimum Flow Decomposition solver with default settings, by specifying that the flow value of each edge is in the attribute flow
of the edges. Note that MinFlowDecomp
just creates the model. You need to call solve()
to solve it.
mfd_model = fp.MinFlowDecomp(graph, flow_attr="flow")
mfd_model.solve()
The model might not be solved because the MILP solver couldn’t do it in the time it had allocated, or other problems. Thus, you need to check if it was solved, and then get its solution. The solution of MinFlowDecomp
is a dictionary, with an key 'paths'
, and a key 'weights'
:
if mfd_model.is_solved():
solution = mfd_model.get_solution()
print(solution)
# {'paths': [
# ['s', 'b', 'c', 't'],
# ['s', 'a', 'c', 'd', 't'],
# ['s', 'a', 'b', 'c', 'd', 't']],
# 'weights': [7, 4, 2]}
3. References
There are several works on this problem, for example.
-
Vatinlen, Benedicte, et al. Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths. European Journal of Operational Research 185.3 (2008): 1390-1401.
-
Shao, Mingfu, and Carl Kingsford. Theory and a heuristic for the minimum path flow decomposition problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16.2 (2017): 658-670.
-
Kloster, Kyle, et al. A practical fpt algorithm for flow decomposition and transcript assembly 2018 Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments (ALENEX). Society for Industrial and Applied Mathematics, 2018.
-
See also flowpaths References, and the other papers cited by these works.
Bases: AbstractPathModelDAG
A class to decompose a network flow if a directed acyclic graph into a minimum number of weighted paths.
Initialize the Minimum Flow Decomposition model, minimizing the number of paths.
Parameters
-
G : nx.DiGraph
The input directed acyclic graph, as networkx DiGraph.
-
flow_attr : str
The attribute name from where to get the flow values on the edges.
-
flow_attr_origin : str
, optional
The origin of the flow attribute. Default is "edge"
. Options:
"edge"
: the flow attribute is assumed to be on the edges of the graph.
"node"
: the flow attribute is assumed to be on the nodes of the graph. See the documentation on how node-weighted graphs are handled.
-
weight_type : type
, optional
The type of weights (int
or float
). Default is float
.
-
subpath_constraints : list
, optional
List of subpath constraints. Default is an empty list.
Each subpath constraint is a list of edges that must be covered by some solution path, according
to the subpath_constraints_coverage
or subpath_constraints_coverage_length
parameters (see below).
-
subpath_constraints_coverage: float
, optional
Coverage fraction of the subpath constraints that must be covered by some solution paths.
Defaults to 1.0
, meaning that 100% of the edges (or nodes, if flow_attr_origin
is "node"
) of
the constraint need to be covered by some solution path).
See subpath constraints documentation
-
subpath_constraints_coverage_length : float
, optional
Coverage length of the subpath constraints. Default is None
. If set, this overrides subpath_constraints_coverage
,
and the coverage constraint is expressed in terms of the subpath constraint length.
subpath_constraints_coverage_length
is then the fraction of the total length of the constraint (specified via length_attr
) needs to appear in some solution path.
See subpath constraints documentation
-
length_attr: str
, optional
The attribute name from where to get the edge lengths (or node length, if flow_attr_origin
is "node"
). Defaults to None
.
- If set, then the subpath lengths (above) are in terms of the edge/node lengths specified in the
length_attr
field of each edge/node.
- If set, and an edge/node has a missing edge length, then it gets length 1.
-
elements_to_ignore : list
, optional
List of edges (or nodes, if flow_attr_origin
is "node"
) to ignore when adding constrains on flow explanation by the weighted paths.
Default is an empty list. See ignoring edges documentation
-
additional_starts: list
, optional
List of additional start nodes of the paths. Default is an empty list. See additional start/end nodes documentation. You can set this only if flow_attr_origin
is "node"
.
-
additional_ends: list
, optional
List of additional end nodes of the paths. Default is an empty list. See additional start/end nodes documentation. You can set this only if flow_attr_origin
is "node"
.
-
optimization_options : dict
, optional
Dictionary with the optimization options. Default is an empty dict. See optimization options documentation.
This class also supports the optimization "optimize_with_greedy": True
(this is the default value). This
will use a greedy algorithm to solve the problem, and if the number of paths returned by it equals a lowerbound on the solution size,
then we know the greedy solution is optimum, and it will use that. The lowerbound used currently is the edge-width of the graph,
meaning the minimum number of paths needed to cover all edges. This is a correct lowerbound because any flow decomposition must cover all edges,
as they have non-zero flow.
-
solver_options : dict
, optional
Dictionary with the solver options. Default is {}
. See solver options documentation.
Raises
ValueError
- If
weight_type
is not int
or float
.
- If some edge does not have the flow attribute specified as
flow_attr
.
- If the graph does not satisfy flow conservation on nodes different from source or sink.
- If the graph contains edges with negative (<0) flow values.
- If the graph is not acyclic.
- If
flow_attr_origin
is not “node” or “edge”.
Source code in flowpaths/minflowdecomp.py
| def __init__(
self,
G: nx.DiGraph,
flow_attr: str,
flow_attr_origin: str = "edge",
weight_type: type = float,
subpath_constraints: list = [],
subpath_constraints_coverage: float = 1.0,
subpath_constraints_coverage_length: float = None,
length_attr: str = None,
elements_to_ignore: list = [],
additional_starts: list = [],
additional_ends: list = [],
optimization_options: dict = {},
solver_options: dict = {},
):
"""
Initialize the Minimum Flow Decomposition model, minimizing the number of paths.
Parameters
----------
- `G : nx.DiGraph`
The input directed acyclic graph, as [networkx DiGraph](https://networkx.org/documentation/stable/reference/classes/digraph.html).
- `flow_attr : str`
The attribute name from where to get the flow values on the edges.
- `flow_attr_origin : str`, optional
The origin of the flow attribute. Default is `"edge"`. Options:
- `"edge"`: the flow attribute is assumed to be on the edges of the graph.
- `"node"`: the flow attribute is assumed to be on the nodes of the graph. See [the documentation](node-expanded-digraph.md) on how node-weighted graphs are handled.
- `weight_type : type`, optional
The type of weights (`int` or `float`). Default is `float`.
- `subpath_constraints : list`, optional
List of subpath constraints. Default is an empty list.
Each subpath constraint is a list of edges that must be covered by some solution path, according
to the `subpath_constraints_coverage` or `subpath_constraints_coverage_length` parameters (see below).
- `subpath_constraints_coverage: float`, optional
Coverage fraction of the subpath constraints that must be covered by some solution paths.
Defaults to `1.0`, meaning that 100% of the edges (or nodes, if `flow_attr_origin` is `"node"`) of
the constraint need to be covered by some solution path).
See [subpath constraints documentation](subpath-constraints.md#3-relaxing-the-constraint-coverage)
- `subpath_constraints_coverage_length : float`, optional
Coverage length of the subpath constraints. Default is `None`. If set, this overrides `subpath_constraints_coverage`,
and the coverage constraint is expressed in terms of the subpath constraint length.
`subpath_constraints_coverage_length` is then the fraction of the total length of the constraint (specified via `length_attr`) needs to appear in some solution path.
See [subpath constraints documentation](subpath-constraints.md#3-relaxing-the-constraint-coverage)
- `length_attr: str`, optional
The attribute name from where to get the edge lengths (or node length, if `flow_attr_origin` is `"node"`). Defaults to `None`.
- If set, then the subpath lengths (above) are in terms of the edge/node lengths specified in the `length_attr` field of each edge/node.
- If set, and an edge/node has a missing edge length, then it gets length 1.
- `elements_to_ignore : list`, optional
List of edges (or nodes, if `flow_attr_origin` is `"node"`) to ignore when adding constrains on flow explanation by the weighted paths.
Default is an empty list. See [ignoring edges documentation](ignoring-edges.md)
- `additional_starts: list`, optional
List of additional start nodes of the paths. Default is an empty list. See [additional start/end nodes documentation](additional-start-end-nodes.md). **You can set this only if `flow_attr_origin` is `"node"`**.
- `additional_ends: list`, optional
List of additional end nodes of the paths. Default is an empty list. See [additional start/end nodes documentation](additional-start-end-nodes.md). **You can set this only if `flow_attr_origin` is `"node"`**.
- `optimization_options : dict`, optional
Dictionary with the optimization options. Default is an empty dict. See [optimization options documentation](solver-options-optimizations.md).
This class also supports the optimization `"optimize_with_greedy": True` (this is the default value). This
will use a greedy algorithm to solve the problem, and if the number of paths returned by it equals a lowerbound on the solution size,
then we know the greedy solution is optimum, and it will use that. The lowerbound used currently is the edge-width of the graph,
meaning the minimum number of paths needed to cover all edges. This is a correct lowerbound because any flow decomposition must cover all edges,
as they have non-zero flow.
- `solver_options : dict`, optional
Dictionary with the solver options. Default is `{}`. See [solver options documentation](solver-options-optimizations.md).
Raises
------
`ValueError`
- If `weight_type` is not `int` or `float`.
- If some edge does not have the flow attribute specified as `flow_attr`.
- If the graph does not satisfy flow conservation on nodes different from source or sink.
- If the graph contains edges with negative (<0) flow values.
- If the graph is not acyclic.
- If `flow_attr_origin` is not "node" or "edge".
"""
# Handling node-weighted graphs
self.flow_attr_origin = flow_attr_origin
if self.flow_attr_origin == "node":
if G.number_of_nodes() == 0:
utils.logger.error(f"{__name__}: The input graph G has no nodes. Please provide a graph with at least one node.")
raise ValueError(f"The input graph G has no nodes. Please provide a graph with at least one node.")
if len(additional_starts) + len(additional_ends) == 0:
self.G_internal = nedg.NodeExpandedDiGraph(
G=G,
node_flow_attr=flow_attr,
node_length_attr=length_attr)
else:
self.G_internal = nedg.NodeExpandedDiGraph(
G=G,
node_flow_attr=flow_attr,
node_length_attr=length_attr,
try_filling_in_missing_flow_attr=True,
additional_starts=additional_starts,
additional_ends=additional_ends,
)
subpath_constraints_internal = self.G_internal.get_expanded_subpath_constraints(subpath_constraints)
edges_to_ignore_internal = self.G_internal.edges_to_ignore
if not all(isinstance(element_to_ignore, str) for element_to_ignore in elements_to_ignore):
utils.logger.error(f"elements_to_ignore must be a list of nodes (i.e strings), not {elements_to_ignore}")
raise ValueError(f"elements_to_ignore must be a list of nodes (i.e strings), not {elements_to_ignore}")
edges_to_ignore_internal += [self.G_internal.get_expanded_edge(node) for node in elements_to_ignore]
edges_to_ignore_internal = list(set(edges_to_ignore_internal))
elif self.flow_attr_origin == "edge":
if G.number_of_edges() == 0:
utils.logger.error(f"{__name__}: The input graph G has no edges. Please provide a graph with at least one edge.")
raise ValueError(f"The input graph G has no edges. Please provide a graph with at least one edge.")
if len(additional_starts) + len(additional_ends) > 0:
utils.logger.error(f"additional_starts and additional_ends are not supported when flow_attr_origin is 'edge'.")
raise ValueError(f"additional_starts and additional_ends are not supported when flow_attr_origin is 'edge'.")
self.G_internal = G
subpath_constraints_internal = subpath_constraints
if not all(isinstance(edge, tuple) and len(edge) == 2 for edge in elements_to_ignore):
utils.logger.error(f"elements_to_ignore must be a list of edges (i.e. tuples of nodes), not {elements_to_ignore}")
raise ValueError(f"elements_to_ignore must be a list of edges (i.e. tuples of nodes), not {elements_to_ignore}")
edges_to_ignore_internal = elements_to_ignore
else:
utils.logger.error(f"flow_attr_origin must be either 'node' or 'edge', not {self.flow_attr_origin}")
raise ValueError(f"flow_attr_origin must be either 'node' or 'edge', not {self.flow_attr_origin}")
self.G = self.G_internal
self.subpath_constraints = subpath_constraints_internal
self.edges_to_ignore = edges_to_ignore_internal
self.flow_attr = flow_attr
self.weight_type = weight_type
self.subpath_constraints_coverage = subpath_constraints_coverage
self.subpath_constraints_coverage_length = subpath_constraints_coverage_length
self.length_attr = length_attr
self.optimization_options = optimization_options
self.solver_options = solver_options
self.time_limit = self.solver_options.get("time_limit", sw.SolverWrapper.time_limit)
self.solve_time_start = None
self.solve_statistics = {}
self._solution = None
self._lowerbound_k = None
self._is_solved = None
# Internal variables
self._generating_set = None
self._all_subgraph_weights = None
self._given_weights_model = None
self._source_flow = None
utils.logger.info(f"{__name__}: initialized with graph id = {utils.fpid(G)}")
|
get_solution
Retrieves the solution for the flow decomposition problem.
Returns
Raises
exception
If model is not solved.
Source code in flowpaths/minflowdecomp.py
| def get_solution(self):
"""
Retrieves the solution for the flow decomposition problem.
Returns
-------
- `solution: dict`
A dictionary containing the solution paths (key `"paths"`) and their corresponding weights (key `"weights"`).
Raises
-------
- `exception` If model is not solved.
"""
self.check_is_solved()
return self._solution
|
solve
Attempts to solve the flow decomposition problem using a model with varying number of paths.
This method iterates over a range of possible path numbers, creating and solving a flow decomposition model for each count.
If a solution is found, it stores the solution and relevant statistics, and returns True. If no solution is found after
iterating through all possible path counts, it returns False.
Returns:
Name | Type |
Description |
bool |
bool
|
True if a solution is found, False otherwise.
|
Note
This overloads the solve()
method from AbstractPathModelDAG
class.
Source code in flowpaths/minflowdecomp.py
| def solve(self) -> bool:
"""
Attempts to solve the flow decomposition problem using a model with varying number of paths.
This method iterates over a range of possible path numbers, creating and solving a flow decomposition model for each count.
If a solution is found, it stores the solution and relevant statistics, and returns True. If no solution is found after
iterating through all possible path counts, it returns False.
Returns:
bool: True if a solution is found, False otherwise.
Note:
This overloads the `solve()` method from `AbstractPathModelDAG` class.
"""
self.solve_time_start = time.perf_counter()
if self.optimization_options.get("optimize_with_given_weights", MinFlowDecomp.optimize_with_given_weights):
self._solve_with_given_weights()
for i in range(self.get_lowerbound_k(), self.G.number_of_edges()):
utils.logger.info(f"{__name__}: iteration with k = {i}")
fd_model = None
# Checking if we have already found a solution with the same number of paths
# via the min gen set and given weights approach
if self._given_weights_model is not None and self._given_weights_model.is_solved():
if len(self._given_weights_model.get_solution(remove_empty_paths=True)["paths"]) == i:
fd_model = self._given_weights_model
fd_solver_options = copy.deepcopy(self.solver_options)
if "time_limit" in fd_solver_options:
fd_solver_options["time_limit"] = self.time_limit - self.solve_time_elapsed
if fd_model is None:
fd_model = kflowdecomp.kFlowDecomp(
G=self.G,
flow_attr=self.flow_attr,
k=i,
weight_type=self.weight_type,
subpath_constraints=self.subpath_constraints,
subpath_constraints_coverage=self.subpath_constraints_coverage,
subpath_constraints_coverage_length=self.subpath_constraints_coverage_length,
length_attr=self.length_attr,
elements_to_ignore=self.edges_to_ignore,
optimization_options=self.optimization_options,
solver_options=fd_solver_options,
)
fd_model.solve()
if fd_model.is_solved():
self._solution = fd_model.get_solution(remove_empty_paths=True)
if self.flow_attr_origin == "node":
# If the flow_attr_origin is "node", we need to convert the solution paths from the expanded graph to paths in the original graph.
self._solution["_paths_internal"] = self._solution["paths"]
self._solution["paths"] = self.G_internal.get_condensed_paths(self._solution["paths"])
self.set_solved()
self.solve_statistics = fd_model.solve_statistics
self.solve_statistics["mfd_solve_time"] = time.perf_counter() - self.solve_time_start
self.fd_model = fd_model
return True
elif fd_model.solver.get_model_status() != sw.SolverWrapper.infeasible_status:
# If the model is not solved and the status is not infeasible,
# it means that the solver stopped because of an unexpected termination,
# thus we cannot conclude that the model is infeasible.
# In this case, we stop the search.
self.set_not_solved()
return False
self.set_not_solved()
return False
|