Skip to content

Various utils

Flowpaths implements various helper functions on graphs. They can be access with the prefix flowpaths.utils.

Graph visualization and drawing

You can create drawing as this one

An example of the graph drawing

using the following code:

import flowpaths as fp
import networkx as nx

# Create a simple graph
graph = nx.DiGraph()
graph.graph["id"] = "simple_graph"
graph.add_edge("s", "a", flow=6)
graph.add_edge("s", "b", flow=7)
graph.add_edge("a", "b", flow=2)
graph.add_edge("a", "c", flow=5)
graph.add_edge("b", "c", flow=9)
graph.add_edge("c", "d", flow=6)
graph.add_edge("c", "t", flow=7)
graph.add_edge("d", "t", flow=6)

# Solve the minimum path error model
mpe_model = fp.kMinPathError(graph, flow_attr="flow", k=3, weight_type=float)
mpe_model.solve()

# Draw the solution
if mpe_model.is_solved():
    solution = mpe_model.get_solution()
    fp.utils.draw(
        G=graph,
        filename="simple_graph.pdf",
        flow_attr="flow",
        paths=solution["paths"],
        weights=solution["weights"],
        draw_options={
        "show_graph_edges": True,
        "show_edge_weights": False,
        "show_path_weights": False,
        "show_path_weight_on_first_edge": True,
        "pathwidth": 2,
    })

This produces a file with extension .pdf storing the PDF image of the graph.

Sankey Diagram Visualization

For acyclic graphs (DAGs), you can create interactive Sankey diagrams using plotly. Sankey diagrams are particularly effective for visualizing flow decompositions, as they show:

  • Each node in the graph as a labeled box
  • Each path as a colored flow whose width represents the path weight

An example of a Sankey diagram

To create a Sankey diagram, set "style": "sankey" in the draw_options:

import flowpaths as fp
import networkx as nx

# Create a sample DAG
G = nx.DiGraph()
G.add_edge('s', 'a', flow=10)
G.add_edge('s', 'b', flow=5)
G.add_edge('a', 'c', flow=6)
G.add_edge('a', 'd', flow=4)
G.add_edge('b', 'c', flow=3)
G.add_edge('b', 'd', flow=2)
G.add_edge('c', 't', flow=9)
G.add_edge('d', 't', flow=6)

# Compute minimum flow decomposition
solver = fp.MinFlowDecomp(G, flow_attr='flow')
solver.solve()
solution = solver.get_solution()

# Draw as interactive Sankey diagram
fp.utils.draw(
    G=G,
    filename="flow_sankey.html",  # saves as HTML (interactive)
    flow_attr='flow',
    paths=solution['paths'],
    weights=solution['weights'],
    draw_options={
        "style": "sankey"
    }
)

Features:

  • Interactive: Hover over nodes and links to see details, zoom and pan the diagram
  • Jupyter support: Automatically displays inline when run in Jupyter notebooks
  • Dual output: Automatically saves both HTML (interactive) and a static image (PDF by default)
  • Automatic coloring: Each path gets a distinct color; shared edges show blended colors
  • Graph identification: Uses the graph’s ID as the diagram title if available

Requirements:

  • plotly: Installed automatically with flowpaths
  • kaleido: Installed automatically with flowpaths for static image export

File formats:

The function automatically saves both formats: - HTML file (interactive): [basename].html - Static image: [basename].pdf (or .png, .svg if specified)

# Saves both output.html and output.pdf
fp.utils.draw(G, "output", paths=paths, weights=weights, 
              draw_options={"style": "sankey"})

# Saves both flow.html and flow.png
fp.utils.draw(G, "flow.png", paths=paths, weights=weights,
              draw_options={"style": "sankey"})

# Saves both diagram.html and diagram.svg
fp.utils.draw(G, "diagram.svg", paths=paths, weights=weights,
              draw_options={"style": "sankey"})

Note: Sankey diagrams require the graph to be acyclic (DAG). If the graph contains cycles, use the traditional graphviz rendering ("style": "default" or "style": "points").

See examples/sankey_demo.py and examples/sankey_demo.ipynb for complete examples.

Logging

flowpaths exposes a simple logging helper via fp.utils.configure_logging. Use it to control verbosity, enable console/file logging, and set file mode.

Basic usage (console logging at INFO level):

import flowpaths as fp

fp.utils.configure_logging(
    level=fp.utils.logging.INFO,
    log_to_console=True,
)

Also log to a file (append mode):

fp.utils.configure_logging(
    level=fp.utils.logging.DEBUG,      # default is DEBUG
    log_to_console=True,               # show logs in terminal
    log_file="flowpaths.log",         # write logs to this file
    file_mode="a",                    # "a" append (or "w" overwrite)
)

Notes: - Levels available: fp.utils.logging.DEBUG, INFO, WARNING, ERROR, CRITICAL. - Default level is DEBUG. If you prefer quieter output, use INFO or WARNING. - Internally, the package logs through its own logger; configure_logging sets handlers/formatters accordingly.

API reference:

Configures logging for the flowpaths package.

Parameters:

  • level: int, optional

    Logging level (e.g., fp.utils.logging.DEBUG, fp.utils.logging.INFO). Default is fp.utils.logging.DEBUG.

  • log_to_console: bool, optional

    Whether to log to the console. Default is True.

  • log_file: str, optional

    File path to log to. If None, logging to a file is disabled. Default is None. If a file path is provided, the log will be written to that file. If the file already exists, it will be overwritten unless file_mode is set to “a”.

  • file_mode: str, optional

    Mode for the log file. “a” (append) or “w” (overwrite). Default is “w”.

Source code in flowpaths/utils/logging.py
def configure_logging(
        level=logging.DEBUG, 
        log_to_console=True, 
        log_file=None, 
        file_mode="w"  # "a" for append, "w" for overwrite
    ):
    """
    Configures logging for the flowpaths package.

    Parameters:
    -----------

    - `level: int`, optional

        Logging level (e.g., fp.utils.logging.DEBUG, fp.utils.logging.INFO). 
        Default is fp.utils.logging.DEBUG.

    - `log_to_console: bool`, optional

        Whether to log to the console. Default is True.

    - `log_file: str`, optional

        File path to log to. If None, logging to a file is disabled. Default is None.
        If a file path is provided, the log will be written to that file.
        If the file already exists, it will be overwritten unless `file_mode` is set to "a".

    - `file_mode: str`, optional

        Mode for the log file. "a" (append) or "w" (overwrite). Default is "w".

    """
    # Remove existing handlers to avoid duplicate logs
    for handler in logger.handlers[:]:
        logger.removeHandler(handler)

    # Set the logger level
    logger.setLevel(level)

    # Define a formatter
    formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")

    # Add console handler if enabled
    if log_to_console:
        console_handler = logging.StreamHandler()
        console_handler.setLevel(level)
        console_handler.setFormatter(formatter)
        logger.addHandler(console_handler)

    if file_mode not in ["a", "w"]:
        raise ValueError("file_mode must be either 'a' (append) or 'w' (overwrite)")

    # Add file handler if a file path is provided
    if log_file:
        file_handler = logging.FileHandler(log_file, mode=file_mode)  # Use file_mode
        file_handler.setLevel(level)
        file_handler.setFormatter(formatter)
        logger.addHandler(file_handler)

    logger.info("Logging initialized: level=%s, console=%s, file=%s, mode=%s", 
                level, log_to_console, log_file, file_mode)

check_flow_conservation

check_flow_conservation(
    G: DiGraph, flow_attr
) -> bool

Check if the flow conservation property holds for the given graph.

Parameters

  • G: nx.DiGraph

    The input directed acyclic graph, as networkx DiGraph.

  • flow_attr: str

    The attribute name from where to get the flow values on the edges.

Returns

  • bool:

    True if the flow conservation property holds, False otherwise.

Source code in flowpaths/utils/graphutils.py
def check_flow_conservation(G: nx.DiGraph, flow_attr) -> bool:
    """
    Check if the flow conservation property holds for the given graph.

    Parameters
    ----------
    - `G`: nx.DiGraph

        The input directed acyclic graph, as [networkx DiGraph](https://networkx.org/documentation/stable/reference/classes/digraph.html).

    - `flow_attr`: str

        The attribute name from where to get the flow values on the edges.

    Returns
    -------

    - bool: 

        True if the flow conservation property holds, False otherwise.
    """

    for v in G.nodes():
        if G.out_degree(v) == 0 or G.in_degree(v) == 0:
            continue

        out_flow = 0
        for x, y, data in G.out_edges(v, data=True):
            if data.get(flow_attr) is None:
                return False
            out_flow += data[flow_attr]

        in_flow = 0
        for x, y, data in G.in_edges(v, data=True):
            if data.get(flow_attr) is None:
                return False
            in_flow += data[flow_attr]

        if out_flow != in_flow:
            return False

    return True

draw

draw(
    G: DiGraph,
    filename: str,
    flow_attr: str = None,
    paths: list = [],
    weights: list = [],
    additional_starts: list = [],
    additional_ends: list = [],
    additional_edges: list = [],
    subpath_constraints: list = [],
    draw_options: dict = {
        "show_graph_edges": True,
        "show_edge_weights": False,
        "show_node_weights": False,
        "show_graph_title": False,
        "show_path_weights": False,
        "show_path_weight_on_first_edge": True,
        "pathwidth": 3.0,
        "style": "default",
        "color_nodes": False,
        "sankey_arrowlen": 0,
        "sankey_color_toggle": False,
        "sankey_arrow_toggle": False,
    },
)

Draw the graph with the paths and their weights highlighted.

Parameters

  • G: nx.DiGraph

    The input directed acyclic graph, as networkx DiGraph.

  • filename: str

    The name of the file to save the drawing. The file type is inferred from the extension. Supported extensions are ‘.bmp’, ‘.canon’, ‘.cgimage’, ‘.cmap’, ‘.cmapx’, ‘.cmapx_np’, ‘.dot’, ‘.dot_json’, ‘.eps’, ‘.exr’, ‘.fig’, ‘.gd’, ‘.gd2’, ‘.gif’, ‘.gtk’, ‘.gv’, ‘.ico’, ‘.imap’, ‘.imap_np’, ‘.ismap’, ‘.jp2’, ‘.jpe’, ‘.jpeg’, ‘.jpg’, ‘.json’, ‘.json0’, ‘.pct’, ‘.pdf’, ‘.pic’, ‘.pict’, ‘.plain’, ‘.plain-ext’, ‘.png’, ‘.pov’, ‘.ps’, ‘.ps2’, ‘.psd’, ‘.sgi’, ‘.svg’, ‘.svgz’, ‘.tga’, ‘.tif’, ‘.tiff’, ‘.tk’, ‘.vml’, ‘.vmlz’, ‘.vrml’, ‘.wbmp’, ‘.webp’, ‘.x11’, ‘.xdot’, ‘.xdot1.2’, ‘.xdot1.4’, ‘.xdot_json’, ‘.xlib’

  • flow_attr: str

    The attribute name from where to get the flow values on the edges. Default is an empty string, in which case no edge weights are shown.

  • paths: list

    The list of paths to highlight, as lists of nodes. Default is an empty list, in which case no path is drawn. Default is an empty list.

  • weights: list

    The list of weights corresponding to the paths, of various colors. Default is an empty list, in which case no path is drawn.

  • additional_starts: list

    A list of additional nodes to highlight in green as starting nodes. Default is an empty list.
    
  • additional_ends: list

    A list of additional nodes to highlight in red as ending nodes. Default is an empty list.
    
  • additional_edges: list

    A list of additional edges to draw as dashed black lines if `show_graph_edges` is True. 
    Each edge should be a tuple `(u, v)`. Default is an empty list.
    
  • subpath_constraints: list

    A list of subpaths to highlight in the graph, of various colors. Each subpath can be:

    • A list of nodes: ['n1', 'n2', 'n3', ...] — the nodes are highlighted with the constraint color, and edges between consecutive nodes are drawn as dashed lines.
    • A list of edges: [('n1', 'n2'), ('n2', 'n3'), ...] — edges are drawn as dashed lines (existing behavior).

    Default is an empty list. There is no association between the subpath colors and the path colors.

  • draw_options: dict

    A dictionary with the following keys:

    • show_graph_edges: bool

      Whether to show the edges of the graph. Default is True.

    • show_edge_weights: bool

      Whether to show the edge weights in the graph from the flow_attr. Default is False.

    • show_node_weights: bool

      Whether to show the node weights in the graph from the flow_attr. Default is False.

    • show_graph_title: bool

      Whether to show the graph title (from graph id) in the figure. Default is False.

    • show_path_weights: bool

      Whether to show the path weights in the graph on every edge. Default is False.

    • show_path_weight_on_first_edge: bool

      Whether to show the path weight on the first edge of the path. Default is True.

    • pathwidth: float

      The width of the path to be drawn. Default is 3.0.

    • style: str

      The style of the drawing. Available options: default, points, sankey.

      • default: Standard graphviz rendering with nodes as rounded rectangles
      • points: Graphviz rendering with nodes as points
      • sankey: Interactive Sankey diagram using plotly (requires acyclic graph). Saves as HTML by default (interactive) or static image formats (png, pdf, svg) if kaleido is installed. Automatically displays in Jupyter notebooks.
    • color_nodes: bool

      Whether to use the existing node coloring behavior.
      If `False` (default), all nodes use a neutral color.
      If `True`, nodes are colored as before (including `additional_starts`
      in green and `additional_ends` in red for graphviz styles).
      
    • sankey_arrowlen: float

      Length of arrowheads for Sankey links (Plotly arrowlen). Default is 0 (no arrowheads).

    • sankey_color_toggle: bool

      Whether to add an interactive toggle (buttons) to switch Sankey links between colored and monochrome gray. Default is False.

    • sankey_arrow_toggle: bool

      Whether to add an interactive toggle (buttons) to switch Sankey link arrowheads on/off. Default is False.

Source code in flowpaths/utils/graphutils.py
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
def draw(
        G: nx.DiGraph, 
        filename: str,
        flow_attr: str = None,
        paths: list = [], 
        weights: list = [], 
        additional_starts: list = [],
        additional_ends: list = [],
        additional_edges: list = [],
        subpath_constraints: list = [],
        draw_options: dict = {
            "show_graph_edges": True,
            "show_edge_weights": False,
            "show_node_weights": False,
            "show_graph_title": False,
            "show_path_weights": False,
            "show_path_weight_on_first_edge": True,
            "pathwidth": 3.0,
            "style": "default",
            "color_nodes": False,
            "sankey_arrowlen": 0,
            "sankey_color_toggle": False,
            "sankey_arrow_toggle": False,
        },
        ):
        """
        Draw the graph with the paths and their weights highlighted.

        Parameters
        ----------

        - `G`: nx.DiGraph 

            The input directed acyclic graph, as [networkx DiGraph](https://networkx.org/documentation/stable/reference/classes/digraph.html). 

        - `filename`: str

            The name of the file to save the drawing. The file type is inferred from the extension. Supported extensions are '.bmp', '.canon', '.cgimage', '.cmap', '.cmapx', '.cmapx_np', '.dot', '.dot_json', '.eps', '.exr', '.fig', '.gd', '.gd2', '.gif', '.gtk', '.gv', '.ico', '.imap', '.imap_np', '.ismap', '.jp2', '.jpe', '.jpeg', '.jpg', '.json', '.json0', '.pct', '.pdf', '.pic', '.pict', '.plain', '.plain-ext', '.png', '.pov', '.ps', '.ps2', '.psd', '.sgi', '.svg', '.svgz', '.tga', '.tif', '.tiff', '.tk', '.vml', '.vmlz', '.vrml', '.wbmp', '.webp', '.x11', '.xdot', '.xdot1.2', '.xdot1.4', '.xdot_json', '.xlib'

        - `flow_attr`: str

            The attribute name from where to get the flow values on the edges. Default is an empty string, in which case no edge weights are shown.

        - `paths`: list

            The list of paths to highlight, as lists of nodes. Default is an empty list, in which case no path is drawn. Default is an empty list.

        - `weights`: list

            The list of weights corresponding to the paths, of various colors. Default is an empty list, in which case no path is drawn.

        - `additional_starts`: list

                A list of additional nodes to highlight in green as starting nodes. Default is an empty list.

        - `additional_ends`: list

                A list of additional nodes to highlight in red as ending nodes. Default is an empty list.

        - `additional_edges`: list

                A list of additional edges to draw as dashed black lines if `show_graph_edges` is True. 
                Each edge should be a tuple `(u, v)`. Default is an empty list.

        - `subpath_constraints`: list

            A list of subpaths to highlight in the graph, of various colors. Each subpath can be:

            - A list of nodes: `['n1', 'n2', 'n3', ...]` — the nodes are highlighted with the constraint color, and edges between consecutive nodes are drawn as dashed lines.
            - A list of edges: `[('n1', 'n2'), ('n2', 'n3'), ...]` — edges are drawn as dashed lines (existing behavior).

            Default is an empty list. There is no association between the subpath colors and the path colors.

        - `draw_options`: dict

            A dictionary with the following keys:

            - `show_graph_edges`: bool

                Whether to show the edges of the graph. Default is `True`.

            - `show_edge_weights`: bool

                Whether to show the edge weights in the graph from the `flow_attr`. Default is `False`.

            - `show_node_weights`: bool

                Whether to show the node weights in the graph from the `flow_attr`. Default is `False`.

            - `show_graph_title`: bool

                Whether to show the graph title (from graph id) in the figure.
                Default is `False`.

            - `show_path_weights`: bool

                Whether to show the path weights in the graph on every edge. Default is `False`.

            - `show_path_weight_on_first_edge`: bool

                Whether to show the path weight on the first edge of the path. Default is `True`.

            - `pathwidth`: float

                The width of the path to be drawn. Default is `3.0`.

            - `style`: str

                The style of the drawing. Available options: `default`, `points`, `sankey`.

                - `default`: Standard graphviz rendering with nodes as rounded rectangles
                - `points`: Graphviz rendering with nodes as points
                - `sankey`: Interactive Sankey diagram using plotly (requires acyclic graph). 
                  Saves as HTML by default (interactive) or static image formats (png, pdf, svg) if kaleido is installed.
                  Automatically displays in Jupyter notebooks.

            - `color_nodes`: bool

                    Whether to use the existing node coloring behavior.
                    If `False` (default), all nodes use a neutral color.
                    If `True`, nodes are colored as before (including `additional_starts`
                    in green and `additional_ends` in red for graphviz styles).

            - `sankey_arrowlen`: float

                Length of arrowheads for Sankey links (Plotly `arrowlen`).
                Default is `0` (no arrowheads).

            - `sankey_color_toggle`: bool

                Whether to add an interactive toggle (buttons) to switch Sankey
                links between colored and monochrome gray.
                Default is `False`.

            - `sankey_arrow_toggle`: bool

                Whether to add an interactive toggle (buttons) to switch Sankey
                link arrowheads on/off.
                Default is `False`.

        """

        if len(paths) != len(weights) and len(weights) > 0:
            raise ValueError(f"{__name__}: Paths and weights must have the same length, if provided.")

        style = draw_options.get("style", "default")

        # Handle Sankey diagram separately
        if style == "sankey":
            # Check if graph is acyclic
            if not nx.is_directed_acyclic_graph(G):
                utils.logger.error(f"{__name__}: Sankey diagram requires an acyclic graph.")
                raise ValueError("Sankey diagram requires an acyclic graph.")

            try:
                sankey_arrowlen = float(draw_options.get("sankey_arrowlen", 0))
            except (TypeError, ValueError):
                utils.logger.error(f"{__name__}: draw_options['sankey_arrowlen'] must be numeric.")
                raise ValueError("draw_options['sankey_arrowlen'] must be numeric.")

            if sankey_arrowlen < 0:
                utils.logger.error(f"{__name__}: draw_options['sankey_arrowlen'] must be >= 0.")
                raise ValueError("draw_options['sankey_arrowlen'] must be >= 0.")

            sankey_color_toggle = bool(draw_options.get("sankey_color_toggle", False))
            sankey_arrow_toggle = bool(draw_options.get("sankey_arrow_toggle", False))
            color_nodes = bool(draw_options.get("color_nodes", False))
            show_graph_title = bool(draw_options.get("show_graph_title", False))
            default_arrowlen_for_toggle = sankey_arrowlen if sankey_arrowlen > 0 else 15.0

            try:
                import plotly.graph_objects as go
            except ImportError:
                utils.logger.error(f"{__name__}: plotly module not found. It should be installed with flowpaths. Try reinstalling: pip install --force-reinstall flowpaths")
                raise ImportError("plotly module not found. It should be installed with flowpaths. Try reinstalling: pip install --force-reinstall flowpaths")

            # Create node list in topological order, with sources and sinks at the end
            # This ordering can help preserve link ordering in the Sankey layout
            topo_order = list(nx.topological_sort(G))
            longest_path_len = nx.algorithms.dag.dag_longest_path_length(G)
            sankey_width = max(900, 500 + 50 * max(1, longest_path_len))

            # Identify sources (in-degree 0) and sinks (out-degree 0)
            sources = [node for node in topo_order if G.in_degree(node) == 0]
            sinks = [node for node in topo_order if G.out_degree(node) == 0]

            # Middle nodes (neither pure source nor pure sink)
            middle_nodes = [node for node in topo_order if node not in sources and node not in sinks]

            # Build node list: middle nodes in topo order, then sources, then sinks
            node_list = middle_nodes + sources + sinks
            node_dict = {node: idx for idx, node in enumerate(node_list)}

            # Define colors for paths (with transparency for blending)
            colors = [
                "rgba(255, 0, 0, 0.4)",      # red
                "rgba(0, 0, 255, 0.4)",      # blue
                "rgba(0, 128, 0, 0.4)",      # green
                "rgba(128, 0, 128, 0.4)",    # purple
                "rgba(165, 42, 42, 0.4)",    # brown
                "rgba(0, 255, 255, 0.4)",    # cyan
                "rgba(255, 255, 0, 0.4)",    # yellow
                "rgba(255, 192, 203, 0.4)",  # pink
                "rgba(128, 128, 128, 0.4)",  # grey
                "rgba(210, 105, 30, 0.4)",   # chocolate
                "rgba(0, 0, 139, 0.4)",      # darkblue
                "rgba(85, 107, 47, 0.4)",    # darkolivegreen
                "rgba(47, 79, 79, 0.4)",     # darkslategray
                "rgba(0, 191, 255, 0.4)",    # deepskyblue
                "rgba(95, 158, 160, 0.4)",   # cadetblue
                "rgba(139, 0, 139, 0.4)",    # darkmagenta
                "rgba(255, 193, 37, 0.4)",   # goldenrod 
            ]

            # Build links with path information to maintain consistent ordering at nodes
            # Structure: list of (source, target, weight, color, path_idx)
            links_with_metadata = []

            for path_idx, path in enumerate(paths):
                path_weight = weights[path_idx] if path_idx < len(weights) else 1
                path_color = colors[path_idx % len(colors)]

                # Add each edge in the path
                for i in range(len(path) - 1):
                    source = node_dict[path[i]]
                    target = node_dict[path[i + 1]]
                    links_with_metadata.append((source, target, path_weight, path_color, path_idx))

            # Sort links by path index to maintain consistent ordering throughout the diagram
            # This ensures edges from the same path appear in the same relative order at all nodes
            links_with_metadata.sort(key=lambda x: x[4])

            # Extract sorted components
            link_sources = [link[0] for link in links_with_metadata]
            link_targets = [link[1] for link in links_with_metadata]
            link_values = [link[2] for link in links_with_metadata]
            link_colors = [link[3] for link in links_with_metadata]

            # Create Sankey diagram
            link_dict = dict(
                source=link_sources,
                target=link_targets,
                value=link_values,
                color=link_colors,
            )
            if sankey_arrowlen > 0:
                link_dict["arrowlen"] = sankey_arrowlen

            node_color = "rgba(99, 110, 120, 0.85)" if not color_nodes else "rgba(31, 119, 180, 0.8)"

            base_fig = go.Figure(data=[go.Sankey(
                node=dict(
                    pad=15,
                    thickness=20,
                    line=dict(color="black", width=0.5),
                    label=[str(node) for node in node_list],
                    color=[node_color] * len(node_list),
                ),
                link=link_dict
            )])

            # Use graph ID as title if available
            graph_id = G.graph.get("id", fpid(G))
            title_text = f"{graph_id}" if show_graph_title and graph_id and graph_id != str(id(G)) else ""

            base_fig.update_layout(
                title_text=title_text,
                font_size=10,
                width=sankey_width,
                height=600,
            )

            fig = go.Figure(base_fig)

            updatemenus = []

            if sankey_color_toggle and len(link_colors) > 0:
                monochrome_colors = ["rgba(150, 150, 150, 0.6)"] * len(link_colors)
                updatemenus.append(
                    dict(
                        type="buttons",
                        direction="left",
                        x=0.0,
                        y=1.12,
                        showactive=True,
                        buttons=[
                            dict(
                                label="Colored links",
                                method="restyle",
                                args=[{"link.color": [link_colors]}],
                            ),
                            dict(
                                label="Monochrome links",
                                method="restyle",
                                args=[{"link.color": [monochrome_colors]}],
                            ),
                        ],
                    )
                )

            if sankey_arrow_toggle:
                updatemenus.append(
                    dict(
                        type="buttons",
                        direction="left",
                        x=0.0,
                        y=1.05,
                        showactive=True,
                        buttons=[
                            dict(
                                label="Arrowheads on",
                                method="restyle",
                                args=[{"link.arrowlen": [default_arrowlen_for_toggle]}],
                            ),
                            dict(
                                label="Arrowheads off",
                                method="restyle",
                                args=[{"link.arrowlen": [0]}],
                            ),
                        ],
                    )
                )

            if len(updatemenus) > 0:
                fig.update_layout(updatemenus=updatemenus)

            # Determine base filename and extension
            file_ext = filename.split('.')[-1].lower() if '.' in filename else ''
            base_filename = filename.rsplit('.', 1)[0] if '.' in filename else filename

            # Always save HTML (interactive version)
            html_filename = base_filename + '.html'
            fig.write_html(html_filename)
            utils.logger.info(f"{__name__}: Sankey diagram (HTML) saved as {html_filename}")

            # Also save static image (PDF by default, or specified format)
            static_format = file_ext if file_ext in ['png', 'pdf', 'svg', 'jpg', 'jpeg'] else 'pdf'
            static_filename = base_filename + '.' + static_format

            try:
                static_fig = go.Figure(base_fig)
                if static_format == 'pdf':
                    static_fig.update_layout(
                        width=sankey_width,
                        height=900,
                    )

                static_fig.write_image(static_filename, format=static_format)
                utils.logger.info(f"{__name__}: Sankey diagram (static) saved as {static_filename}")
            except Exception as e:
                utils.logger.warning(f"{__name__}: Could not save static image. Error: {e}")
                utils.logger.warning(f"{__name__}: Static image export may require additional system dependencies.")

            # Check if we're in a Jupyter notebook and show the figure
            if "get_ipython" in globals():
                try:
                    if globals()["get_ipython"]() is not None:
                        fig.show()
                except Exception:
                    pass  # Not in a notebook, just save

            return

        try:
            import graphviz as gv

            color_nodes = bool(draw_options.get("color_nodes", False))

            dot = gv.Digraph(format="pdf")
            dot.graph_attr["rankdir"] = "LR"  # Display the graph in landscape mode

            # style already extracted above
            if style == "default":
                dot.node_attr["shape"] = "rectangle"  # Rectangle nodes
                dot.node_attr["style"] = "rounded"  # Rounded rectangle nodes
            elif style == "points":
                dot.node_attr["shape"] = "point"  # Point nodes
                dot.node_attr["style"] = "filled"  # Filled point nodes
                # dot.node_attr['label'] = '' 
                dot.node_attr['width'] = '0.1' 

            colors = [
                "red",
                "blue",
                "green",
                "purple",
                "brown",
                "cyan",
                "yellow",
                "pink",
                "grey",
                "chocolate",
                "darkblue",
                "darkolivegreen",
                "darkslategray",
                "deepskyblue2",
                "cadetblue3",
                "darkmagenta",
                "goldenrod1"
            ]

            dot.attr('node', fontname='Arial')

            if draw_options.get("show_graph_edges", True):
                # drawing nodes
                for node in G.nodes():
                    neutral_node_color = "gray40"
                    color = neutral_node_color
                    penwidth = "1.0"
                    if color_nodes:
                        color = "black"
                        if node in additional_starts:
                            color = "green"
                            penwidth = "2.0"
                        elif node in additional_ends:
                            color = "red"
                            penwidth = "2.0"

                    if draw_options.get("show_node_weights", False) and flow_attr is not None and flow_attr in G.nodes[node]:
                        label = f"{G.nodes[node][flow_attr]}\\n{node}" if style != "points" else ""
                        dot.node(
                            name=str(node),
                            label=label,
                            shape="record",
                            color=color, 
                            penwidth=penwidth)
                    else:
                        label = str(node) if style != "points" else ""
                        dot.node(
                            name=str(node), 
                            label=str(node), 
                            color=color, 
                            penwidth=penwidth)

                # drawing edges
                for u, v, data in G.edges(data=True):
                    if draw_options.get("show_edge_weights", False):
                        dot.edge(
                            tail_name=str(u), 
                            head_name=str(v), 
                            label=str(data.get(flow_attr,"")),
                            fontname="Arial",)
                    else:
                        dot.edge(
                            tail_name=str(u), 
                            head_name=str(v))

                # drawing additional edges as dashed black lines
                for u, v in additional_edges:
                    dot.edge(
                        tail_name=str(u),
                        head_name=str(v),
                        color="black",
                        style="dashed",
                        penwidth="2.0"
                    )

            for index, path in enumerate(paths):
                pathColor = colors[index % len(colors)]
                for i in range(len(path) - 1):
                    if i == 0 and draw_options.get("show_path_weight_on_first_edge", True) or \
                        draw_options.get("show_path_weights", True):
                        dot.edge(
                            str(path[i]),
                            str(path[i + 1]),
                            fontcolor=pathColor,
                            color=pathColor,
                            penwidth=str(draw_options.get("pathwidth", 3.0)),
                            label=str(weights[index]) if len(weights) > 0 else "",
                            fontname="Arial",
                        )
                    else:
                        dot.edge(
                            str(path[i]),
                            str(path[i + 1]),
                            color=pathColor,
                            penwidth=str(draw_options.get("pathwidth", 3.0)),
                            )
                if len(path) == 1:
                    dot.node(str(path[0]), color=pathColor, penwidth=str(draw_options.get("pathwidth", 3.0)))        

            # Process subpath constraints: auto-detect node-based vs edge-based
            # Build mapping of nodes to constraint colors for node-based constraints
            node_constraint_colors = {}  # node -> list of (index, color) tuples

            for index, constraint in enumerate(subpath_constraints):
                if not constraint:
                    continue

                constraint_color = colors[index % len(colors)]

                # Detect if this constraint is node-based or edge-based
                is_edge_based = isinstance(constraint[0], (tuple, list)) and len(constraint[0]) == 2

                if is_edge_based:
                    # Edge-based constraint: draw dashed edges
                    for i in range(len(constraint)):
                        if len(constraint[i]) != 2:
                            utils.logger.error(f"{__name__}: Subpath edges must be 2-tuples.")
                            raise ValueError("Subpath edges must be 2-tuples.")
                        dot.edge(
                            str(constraint[i][0]),
                            str(constraint[i][1]),
                            color=constraint_color,
                            style="dashed",
                            penwidth="2.0"
                        )
                else:
                    # Node-based constraint: nodes are a sequence
                    # Highlight nodes with constraint color (no dashed edges)
                    for node in constraint:
                        if node not in node_constraint_colors:
                            node_constraint_colors[node] = []
                        node_constraint_colors[node].append((index, constraint_color))

            # Re-draw nodes with constraint colors if any node-based constraints exist
            if node_constraint_colors:
                for node in node_constraint_colors:
                    constraint_list = node_constraint_colors[node]
                    # Use the color of the first constraint this node is in
                    # (or could use a blended approach if desired)
                    first_color = constraint_list[0][1]

                    # Re-draw the node with the constraint color as fillcolor
                    # Preserve node label (including weights) and styling
                    label = str(node) if style != "points" else ""
                    if draw_options.get("show_node_weights", False) and flow_attr is not None and flow_attr in G.nodes[node]:
                        label = f"{G.nodes[node][flow_attr]}\\n{node}" if style != "points" else ""

                    # Determine the style based on the drawing style
                    if style == "default":
                        node_style = "rounded,filled"
                    elif style == "points":
                        node_style = "filled"
                    else:
                        node_style = "filled"

                    dot.node(
                        name=str(node),
                        label=label,
                        color="black",
                        fillcolor=first_color,
                        style=node_style,
                        penwidth="1.5"
                    )

            dot.render(outfile=filename, view=False, cleanup=True)

        except ImportError:
            utils.logger.error(f"{__name__}: graphviz module not found. Please install it via pip (pip install graphviz).")
            raise ImportError("graphviz module not found. Please install it via pip (pip install graphviz).")

fpid

fpid(G) -> str

Returns a unique identifier for the given graph.

Source code in flowpaths/utils/graphutils.py
def fpid(G) -> str:
    """
    Returns a unique identifier for the given graph.
    """
    if isinstance(G, nx.DiGraph):
        if "id" in G.graph:
            return G.graph["id"]

    return str(id(G))

get_subgraph_between_topological_nodes

get_subgraph_between_topological_nodes(
    graph: DiGraph,
    topo_order: list,
    left: int,
    right: int,
) -> nx.DiGraph

Create a subgraph with the nodes between left and right in the topological order, including the edges between them, but also the edges from these nodes that are incident to nodes outside this range.

Source code in flowpaths/utils/graphutils.py
def get_subgraph_between_topological_nodes(graph: nx.DiGraph, topo_order: list, left: int, right: int) -> nx.DiGraph:
    """
    Create a subgraph with the nodes between left and right in the topological order, 
    including the edges between them, but also the edges from these nodes that are incident to nodes outside this range.
    """

    if left < 0 or right >= len(topo_order):
        utils.logger.error(f"{__name__}: Invalid range for topological order: {left}, {right}.")
        raise ValueError("Invalid range for topological order")
    if left > right:
        utils.logger.error(f"{__name__}: Invalid range for topological order: {left}, {right}.")
        raise ValueError("Invalid range for topological order")

    # Create a subgraph with the nodes between left and right in the topological order
    subgraph = nx.DiGraph()
    if "id" in graph.graph:
        subgraph.graph["id"] = graph.graph["id"]
    for i in range(left, right):
        subgraph.add_node(topo_order[i], **graph.nodes[topo_order[i]])

    fixed_nodes = set(subgraph.nodes())

    # Add the edges between the nodes in the subgraph
    for u, v in graph.edges():
        if u in fixed_nodes or v in fixed_nodes:
            subgraph.add_edge(u, v, **graph[u][v])
            if u not in fixed_nodes:
                subgraph.add_node(u, **graph.nodes[u])
            if v not in fixed_nodes:
                subgraph.add_node(v, **graph.nodes[v])

    return subgraph

max_bottleneck_path

max_bottleneck_path(
    G: DiGraph, flow_attr
) -> tuple

Computes the maximum bottleneck path in a directed graph.

Parameters

  • G: nx.DiGraph

    A directed graph where each edge has a flow attribute.

  • flow_attr: str

    The flow attribute from where to get the flow values.

Returns

  • tuple: A tuple containing:

    • The value of the maximum bottleneck.
    • The path corresponding to the maximum bottleneck (list of nodes). If no s-t flow exists in the network, returns (None, None).
Source code in flowpaths/utils/graphutils.py
def max_bottleneck_path(G: nx.DiGraph, flow_attr) -> tuple:
    """
    Computes the maximum bottleneck path in a directed graph.

    Parameters
    ----------
    - `G`: nx.DiGraph

        A directed graph where each edge has a flow attribute.

    - `flow_attr`: str

        The flow attribute from where to get the flow values.

    Returns
    --------

    - tuple: A tuple containing:

        - The value of the maximum bottleneck.
        - The path corresponding to the maximum bottleneck (list of nodes).
            If no s-t flow exists in the network, returns (None, None).
    """
    B = dict()
    maxInNeighbor = dict()
    maxBottleneckSink = None

    # Computing the B values with DP
    for v in nx.topological_sort(G):
        if G.in_degree(v) == 0:
            B[v] = float("inf")
        else:
            B[v] = float("-inf")
            for u in G.predecessors(v):
                uBottleneck = min(B[u], G.edges[u, v][flow_attr])
                if uBottleneck > B[v]:
                    B[v] = uBottleneck
                    maxInNeighbor[v] = u
            if G.out_degree(v) == 0:
                if maxBottleneckSink is None or B[v] > B[maxBottleneckSink]:
                    maxBottleneckSink = v

    # If no s-t flow exists in the network
    if B[maxBottleneckSink] == 0:
        return None, None

    # Recovering the path of maximum bottleneck
    reverse_path = [maxBottleneckSink]
    while G.in_degree(reverse_path[-1]) > 0:
        reverse_path.append(maxInNeighbor[reverse_path[-1]])

    return B[maxBottleneckSink], list(reversed(reverse_path))

max_occurrence

max_occurrence(
    seq,
    paths_in_DAG,
    edge_lengths: dict = {},
) -> int

Check what is the maximum number of edges of seq that appear in some path in the list paths_in_DAG.

This assumes paths_in_DAG are paths in a directed acyclic graph.

Parameters

  • seq (list): The sequence of edges to check.
  • paths (list): The list of paths to check against, as lists of nodes.

Returns

  • int: the largest number of seq edges that appear in some path in paths_in_DAG
Source code in flowpaths/utils/graphutils.py
def max_occurrence(seq, paths_in_DAG, edge_lengths: dict = {}) -> int:
    """
    Check what is the maximum number of edges of seq that appear in some path in the list paths_in_DAG. 

    This assumes paths_in_DAG are paths in a directed acyclic graph. 

    Parameters
    ----------
    - seq (list): The sequence of edges to check.
    - paths (list): The list of paths to check against, as lists of nodes.

    Returns
    -------
    - int: the largest number of seq edges that appear in some path in paths_in_DAG
    """
    max_occurence = 0
    for path in paths_in_DAG:
        path_edges = set([(path[i], path[i + 1]) for i in range(len(path) - 1)])
        # Check how many seq edges are in path_edges
        occurence = 0
        for edge in seq:
            if edge in path_edges:
                occurence += edge_lengths.get(edge, 1)
        if occurence > max_occurence:
            max_occurence = occurence

    return max_occurence

read_graph

read_graph(
    graph_raw,
) -> nx.DiGraph

Parse a single graph block from a list of lines.

Accepts one or more header lines at the beginning (each prefixed by ‘#’), followed by a line containing the number of vertices (n), then any number of edge lines of the form: “u v w” (whitespace-separated).

Subpath constraint lines

Lines starting with “#S” define a (directed) subpath constraint as a sequence of nodes: “#S n1 n2 n3 …”. For each such line we build the list of consecutive edge tuples [(n1,n2), (n2,n3), …] and append this edge-list (the subpath) to G.graph[“constraints”]. Duplicate filtering is applied on the whole node sequence: if an identical sequence of nodes has already appeared in a previous “#S” line, the entire subpath line is ignored (its edges are not added again). Different subpaths may

share edges; they are kept as separate entries. After all graph edges are parsed, every constraint edge is validated to ensure it exists in the graph; a missing edge raises ValueError.

Example block

graph number = 1 name = foo

any other header line

S a b c d (adds subpath [(a,b),(b,c),(c,d)])

S b c e (adds subpath [(b,c),(c,e)])

S a b c d (ignored: exact node sequence already seen)

5 a b 1.0 b c 2.5 c d 3.0 c e 4.0

Source code in flowpaths/utils/graphutils.py
def read_graph(graph_raw) -> nx.DiGraph:
    """
    Parse a single graph block from a list of lines.

    Accepts one or more header lines at the beginning (each prefixed by '#'),
    followed by a line containing the number of vertices (n), then any number
    of edge lines of the form: "u v w" (whitespace-separated).

    Subpath constraint lines:
        Lines starting with "#S" define a (directed) subpath constraint as a
        sequence of nodes: "#S n1 n2 n3 ...". For each such line we build the
        list of consecutive edge tuples [(n1,n2), (n2,n3), ...] and append this
        edge-list (the subpath) to G.graph["constraints"]. Duplicate filtering
        is applied on the whole node sequence: if an identical sequence of
        nodes has already appeared in a previous "#S" line, the entire subpath
        line is ignored (its edges are not added again). Different subpaths may
    share edges; they are kept as separate entries. After all graph edges
    are parsed, every constraint edge is validated to ensure it exists in
    the graph; a missing edge raises ValueError.

    Example block:
        # graph number = 1 name = foo
        # any other header line
        #S a b c d          (adds subpath [(a,b),(b,c),(c,d)])
        #S b c e            (adds subpath [(b,c),(c,e)])
        #S a b c d          (ignored: exact node sequence already seen)
        5
        a b 1.0
        b c 2.5
        c d 3.0
        c e 4.0
    """

    # Collect leading header lines (prefixed by '#') and parse constraint lines prefixed by '#S'
    idx = 0
    header_lines = []
    constraint_subpaths = []       # list of subpaths, each a list of (u,v) edge tuples
    subpaths_seen = set()          # set of full node sequences (tuples) to filter duplicate subpaths
    while idx < len(graph_raw) and graph_raw[idx].lstrip().startswith("#"):
        stripped = graph_raw[idx].lstrip()
        # Subpath constraint line: starts with '#S'
        if stripped.startswith("#S"):
            # Remove leading '#S' and split remaining node sequence
            nodes_part = stripped[2:].strip()  # drop '#S'
            if nodes_part:
                nodes_seq = nodes_part.split()
                seq_key = tuple(nodes_seq)
                # Skip if this exact subpath sequence already processed
                if seq_key not in subpaths_seen:
                    subpaths_seen.add(seq_key)
                    edges_list = [(u, v) for u, v in zip(nodes_seq, nodes_seq[1:])]
                    # Only append if there is at least one edge (>=2 nodes)
                    if edges_list:
                        constraint_subpaths.append(edges_list)
        else:
            # Regular header line (remove leading '#') for metadata / id extraction
            header_lines.append(stripped.lstrip("#").strip())
        idx += 1

    # Determine graph id from the first (non-#S) header line if present
    graph_id = header_lines[0] if header_lines else str(id(graph_raw))

    # Skip blank lines before the vertex-count line
    while idx < len(graph_raw) and graph_raw[idx].strip() == "":
        idx += 1

    if idx >= len(graph_raw):
        error_msg = "Graph block missing vertex-count line."
        utils.logger.error(f"{__name__}: {error_msg}")
        raise ValueError(error_msg)
    # Parse number of vertices (kept for information; not used to count edges here)
    try:
        n = int(graph_raw[idx].strip())
    except ValueError:
        utils.logger.error(f"{__name__}: Invalid vertex-count line: {graph_raw[idx].rstrip()}.")
        raise

    idx += 1

    G = nx.DiGraph()
    G.graph["id"] = graph_id
    # Store (possibly empty) list of subpaths (each a list of edge tuples)
    G.graph["constraints"] = constraint_subpaths

    if n == 0:
        utils.logger.info(f"Graph {graph_id} has 0 vertices.")
        return G

    # Parse edges: skip blanks and comment/header lines defensively
    for line in graph_raw[idx:]:
        if not line.strip() or line.lstrip().startswith('#'):
            continue
        elements = line.split()
        if len(elements) != 3:
            utils.logger.error(f"{__name__}: Invalid edge format: {line.rstrip()}")
            raise ValueError(f"Invalid edge format: {line.rstrip()}")
        u, v, w_str = elements
        try:
            w = float(w_str)
        except ValueError:
            utils.logger.error(f"{__name__}: Invalid weight value in edge: {line.rstrip()}")
            raise
        G.add_edge(u.strip(), v.strip(), flow=w)

    # Validate that every constraint edge exists in the graph
    for subpath in constraint_subpaths:
        for (u, v) in subpath:
            if not G.has_edge(u, v):
                utils.logger.error(f"{__name__}: Constraint edge ({u}, {v}) not found in graph {graph_id} edges.")
                raise ValueError(f"Constraint edge ({u}, {v}) not found in graph edges.")

    G.graph["n"] = G.number_of_nodes()
    G.graph["m"] = G.number_of_edges()
    # Lazy import here to avoid circular import at module load time
    from flowpaths import stdigraph as _stdigraph  # type: ignore
    G.graph["w"] = _stdigraph.stDiGraph(G).get_width()

    return G

read_graphs

read_graphs(filename)

Read one or more graphs from a file.

Supports graphs whose header consists of one or multiple consecutive lines prefixed by ‘#’. Each graph block is: - one or more header lines starting with ‘#’ - one line with the number of vertices (n) - zero or more edge lines “u v w”

Graphs are delimited by the start of the next header (a line starting with ‘#’) or the end of file.

Source code in flowpaths/utils/graphutils.py
def read_graphs(filename):
    """
    Read one or more graphs from a file.

    Supports graphs whose header consists of one or multiple consecutive lines
    prefixed by '#'. Each graph block is:
        - one or more header lines starting with '#'
        - one line with the number of vertices (n)
        - zero or more edge lines "u v w"

    Graphs are delimited by the start of the next header (a line starting with '#')
    or the end of file.
    """
    with open(filename, "r") as f:
        lines = f.readlines()

    graphs = []
    n_lines = len(lines)
    i = 0

    # Iterate through the file, capturing blocks that start with one or more '#' lines
    while i < n_lines:
        # Move to the start of the next graph header
        while i < n_lines and not lines[i].lstrip().startswith('#'):
            i += 1
        if i >= n_lines:
            break

        start = i

        # Consume all consecutive header lines for this graph
        while i < n_lines and lines[i].lstrip().startswith('#'):
            i += 1

        # Advance until the next header line (start of next graph) or EOF
        j = i
        while j < n_lines and not lines[j].lstrip().startswith('#'):
            j += 1

        graphs.append(read_graph(lines[start:j]))
        i = j

    return graphs

read_intron_graph

read_intron_graph(
    graph_dir,
) -> nx.DiGraph

Read one node-weighted graph from a folder produced in the intron-graph TSV format.

Expected files inside graph_dir: - vertices.tsv: node list. Each row becomes one graph node, with: - node id from vertex_id - node weight stored in G.nodes[node]["flow"] from the weight column - extra metadata copied from the row (type, chr, start, end) - edges.tsv: directed graph edges. The u and v columns define edges. If a third weight column is present, it is stored on the edge as flow. - read_subpaths.tsv (optional): subpath constraints. Each non-empty path_simple value is parsed as a comma-separated node list and appended to G.graph["constraints"]. - paths.tsv (optional): ground-truth transcript paths. For each row, path_simple is parsed as a node list and stored in: - G.graph["groundtruth_paths_nodes"] (list of node lists) - G.graph["groundtruth_paths_edges"] (list of edge lists between consecutive nodes) and count_scaled is stored in G.graph["groundtruth_weights"]. - ref_edges.tsv (optional): reference edge metadata. Rows with concrete u_id, v_id values are collected in two groups: - status == "in_graph" -> G.graph["reference_edges"] - status == "missing_edge" -> G.graph["additional_edges"]

The returned graph uses node weights (flow) in the same spirit as read_ngraph. It also stores: - G.graph["id"]: folder name - G.graph["source_folder"]: absolute folder path - G.graph["constraints"]: list of node lists from read_subpaths.tsv:path_simple - G.graph["groundtruth_paths_nodes"]: list of node lists from paths.tsv:path_simple - G.graph["groundtruth_paths_edges"]: list of edge lists induced by groundtruth_paths_nodes - G.graph["groundtruth_weights"]: list of count_scaled values from paths.tsv - G.graph["reference_edges"]: list of in-graph (u_id, v_id) pairs from ref_edges.tsv - G.graph["additional_edges"]: list of missing-edge (u_id, v_id) pairs from ref_edges.tsv - G.graph["n"], G.graph["m"], G.graph["w"]: node count, edge count, width

Source code in flowpaths/utils/graphutils.py
def read_intron_graph(graph_dir) -> nx.DiGraph:
    """
    Read one node-weighted graph from a folder produced in the intron-graph TSV format.

    Expected files inside `graph_dir`:
    - `vertices.tsv`: node list. Each row becomes one graph node, with:
      - node id from `vertex_id`
      - node weight stored in `G.nodes[node]["flow"]` from the `weight` column
      - extra metadata copied from the row (`type`, `chr`, `start`, `end`)
    - `edges.tsv`: directed graph edges. The `u` and `v` columns define edges.
      If a third `weight` column is present, it is stored on the edge as `flow`.
        - `read_subpaths.tsv` (optional): subpath constraints. Each non-empty `path_simple`
      value is parsed as a comma-separated node list and appended to
      `G.graph["constraints"]`.
        - `paths.tsv` (optional): ground-truth transcript paths. For each row,
            `path_simple` is parsed as a node list and stored in:
            - `G.graph["groundtruth_paths_nodes"]` (list of node lists)
            - `G.graph["groundtruth_paths_edges"]` (list of edge lists between consecutive nodes)
            and `count_scaled` is stored in `G.graph["groundtruth_weights"]`.
        - `ref_edges.tsv` (optional): reference edge metadata. Rows with concrete
            `u_id`, `v_id` values are collected in two groups:
            - `status == "in_graph"` -> `G.graph["reference_edges"]`
            - `status == "missing_edge"` -> `G.graph["additional_edges"]`

    The returned graph uses node weights (`flow`) in the same spirit as `read_ngraph`.
    It also stores:
    - `G.graph["id"]`: folder name
    - `G.graph["source_folder"]`: absolute folder path
    - `G.graph["constraints"]`: list of node lists from `read_subpaths.tsv:path_simple`
    - `G.graph["groundtruth_paths_nodes"]`: list of node lists from `paths.tsv:path_simple`
    - `G.graph["groundtruth_paths_edges"]`: list of edge lists induced by `groundtruth_paths_nodes`
    - `G.graph["groundtruth_weights"]`: list of `count_scaled` values from `paths.tsv`
    - `G.graph["reference_edges"]`: list of in-graph `(u_id, v_id)` pairs from `ref_edges.tsv`
    - `G.graph["additional_edges"]`: list of missing-edge `(u_id, v_id)` pairs from `ref_edges.tsv`
    - `G.graph["n"]`, `G.graph["m"]`, `G.graph["w"]`: node count, edge count, width
    """

    graph_path = Path(graph_dir)
    if not graph_path.is_dir():
        utils.logger.error(f"{__name__}: Graph directory not found: {graph_path}")
        raise ValueError(f"Graph directory not found: {graph_path}")

    vertices_path = graph_path / "vertices.tsv"
    edges_path = graph_path / "edges.tsv"
    read_subpaths_path = graph_path / "read_subpaths.tsv"
    paths_path = graph_path / "paths.tsv"
    ref_edges_path = graph_path / "ref_edges.tsv"

    if not vertices_path.is_file():
        utils.logger.error(f"{__name__}: Missing vertices.tsv in {graph_path}")
        raise ValueError(f"Missing vertices.tsv in {graph_path}")
    if not edges_path.is_file():
        utils.logger.error(f"{__name__}: Missing edges.tsv in {graph_path}")
        raise ValueError(f"Missing edges.tsv in {graph_path}")

    G = nx.DiGraph()
    G.graph["id"] = graph_path.name
    G.graph["source_folder"] = str(graph_path.resolve())
    G.graph["constraints"] = []
    G.graph["groundtruth_paths_nodes"] = []
    G.graph["groundtruth_paths_edges"] = []
    G.graph["groundtruth_weights"] = []
    G.graph["reference_edges"] = []
    G.graph["additional_edges"] = []

    for row in _read_tsv_rows(vertices_path):
        node_id = row["vertex_id"].strip()
        try:
            weight = float(row["weight"])
        except (TypeError, ValueError):
            utils.logger.error(f"{__name__}: Invalid node weight in {vertices_path}: {row}")
            raise

        G.add_node(
            node_id,
            flow=weight,
            type=row.get("type"),
            chr=row.get("chr"),
            start=row.get("start"),
            end=row.get("end"),
        )

    for row in _read_tsv_rows(edges_path):
        u = row["u"].strip()
        v = row["v"].strip()
        if u not in G.nodes or v not in G.nodes:
            utils.logger.error(
                f"{__name__}: Edge ({u}, {v}) references unknown node in {graph_path}"
            )
            raise ValueError(f"Edge ({u}, {v}) references unknown node in {graph_path}")

        edge_attrs = {}
        if row.get("weight") not in [None, ""]:
            try:
                edge_attrs["flow"] = float(row["weight"])
            except ValueError:
                utils.logger.error(f"{__name__}: Invalid edge weight in {edges_path}: {row}")
                raise
        G.add_edge(u, v, **edge_attrs)

    constraints = []
    constraints_seen = set()
    if read_subpaths_path.is_file():
        for row in _read_tsv_rows(read_subpaths_path):
            nodes = _parse_path_simple_nodes(row.get("path_simple", ""))
            if len(nodes) == 0:
                continue
            if not all(node in G.nodes for node in nodes):
                missing_nodes = [node for node in nodes if node not in G.nodes]
                utils.logger.error(
                    f"{__name__}: Constraint references unknown nodes {missing_nodes} in {graph_path}"
                )
                raise ValueError(f"Constraint references unknown nodes {missing_nodes} in {graph_path}")

            key = tuple(nodes)
            if key in constraints_seen:
                continue
            constraints_seen.add(key)
            constraints.append(nodes)
    G.graph["constraints"] = constraints

    groundtruth_paths_nodes = []
    groundtruth_paths_edges = []
    groundtruth_weights = []
    if paths_path.is_file():
        for row in _read_tsv_rows(paths_path):
            nodes = _parse_path_simple_nodes(row.get("path_simple", ""))
            if len(nodes) > 0 and not all(node in G.nodes for node in nodes):
                missing_nodes = [node for node in nodes if node not in G.nodes]
                utils.logger.error(
                    f"{__name__}: Groundtruth path references unknown nodes {missing_nodes} in {graph_path}"
                )
                raise ValueError(
                    f"Groundtruth path references unknown nodes {missing_nodes} in {graph_path}"
                )

            try:
                weight = int(row.get("count_scaled", 0))
            except (TypeError, ValueError):
                utils.logger.error(f"{__name__}: Invalid count_scaled value in {paths_path}: {row}")
                raise

            groundtruth_paths_nodes.append(nodes)
            groundtruth_paths_edges.append(_nodes_to_edges(nodes))
            groundtruth_weights.append(weight)

    G.graph["groundtruth_paths_nodes"] = groundtruth_paths_nodes
    G.graph["groundtruth_paths_edges"] = groundtruth_paths_edges
    G.graph["groundtruth_weights"] = groundtruth_weights

    reference_edges = []
    reference_edges_seen = set()
    additional_edges = []
    additional_edges_seen = set()
    if ref_edges_path.is_file():
        for row in _read_tsv_rows(ref_edges_path):
            u = (row.get("u_id") or "").strip()
            v = (row.get("v_id") or "").strip()
            if u in ["", "*"] or v in ["", "*"]:
                continue
            if u not in G.nodes or v not in G.nodes:
                utils.logger.error(
                    f"{__name__}: Reference edge ({u}, {v}) references unknown node in {graph_path}"
                )
                raise ValueError(f"Reference edge ({u}, {v}) references unknown node in {graph_path}")

            edge = (u, v)
            status = row.get("status")

            if status == "in_graph":
                if edge in reference_edges_seen:
                    continue
                reference_edges_seen.add(edge)
                reference_edges.append(edge)
            elif status == "missing_edge":
                if edge in additional_edges_seen:
                    continue
                additional_edges_seen.add(edge)
                additional_edges.append(edge)

    G.graph["reference_edges"] = reference_edges
    G.graph["additional_edges"] = additional_edges
    G.graph["n"] = G.number_of_nodes()
    G.graph["m"] = G.number_of_edges()
    from flowpaths import stdigraph as _stdigraph  # type: ignore
    G.graph["w"] = _stdigraph.stDiGraph(G).get_width()

    return G

read_intron_graphs

read_intron_graphs(
    foldername,
) -> list

Read all intron-format graph folders inside foldername.

Behavior: - If foldername itself contains vertices.tsv, it is parsed as one graph folder. - Otherwise, every immediate child directory containing vertices.tsv is parsed.

Returns a list of graphs in lexicographic folder-name order.

Source code in flowpaths/utils/graphutils.py
def read_intron_graphs(foldername) -> list:
    """
    Read all intron-format graph folders inside `foldername`.

    Behavior:
    - If `foldername` itself contains `vertices.tsv`, it is parsed as one graph folder.
    - Otherwise, every immediate child directory containing `vertices.tsv` is parsed.

    Returns a list of graphs in lexicographic folder-name order.
    """

    folder_path = Path(foldername)
    if not folder_path.is_dir():
        utils.logger.error(f"{__name__}: Folder not found: {folder_path}")
        raise ValueError(f"Folder not found: {folder_path}")

    if (folder_path / "vertices.tsv").is_file():
        return [read_intron_graph(folder_path)]

    graph_dirs = sorted(
        child for child in folder_path.iterdir() if child.is_dir() and (child / "vertices.tsv").is_file()
    )
    return [read_intron_graph(graph_dir) for graph_dir in graph_dirs]

read_ngraph

read_ngraph(
    graph_raw,
) -> nx.DiGraph

Parse a single node-weighted ngraph block from a list of lines.

Expected block structure
  • one or more leading header lines starting with ‘#’ (optional #S constraints can appear here)
  • one line with the number of nodes n
  • a marker line starting with ‘#NODES’
  • exactly n node lines: “node_id node_weight”
  • a marker line starting with ‘#EDGES’
  • zero or more edge lines: “u v edge_weight”
Constraint lines
  • ‘#S n1 n2 n3 …’ lines define subpath constraints.
  • Duplicates are filtered by exact node sequence.
  • Constraints are stored in G.graph[‘constraints’] as node lists.
Source code in flowpaths/utils/graphutils.py
def read_ngraph(graph_raw) -> nx.DiGraph:
    """
    Parse a single node-weighted ngraph block from a list of lines.

    Expected block structure:
        - one or more leading header lines starting with '#'
          (optional #S constraints can appear here)
        - one line with the number of nodes n
        - a marker line starting with '#NODES'
        - exactly n node lines: "node_id node_weight"
        - a marker line starting with '#EDGES'
        - zero or more edge lines: "u v edge_weight"

    Constraint lines:
        - '#S n1 n2 n3 ...' lines define subpath constraints.
        - Duplicates are filtered by exact node sequence.
        - Constraints are stored in G.graph['constraints'] as node lists.
    """

    idx = 0
    header_lines = []
    constraint_subpaths = []
    subpaths_seen = set()

    # Parse leading header lines and #S constraints.
    while idx < len(graph_raw) and graph_raw[idx].lstrip().startswith("#"):
        stripped = graph_raw[idx].lstrip()
        if stripped.startswith("#S"):
            nodes_part = stripped[2:].strip()
            if nodes_part:
                nodes_seq = nodes_part.split()
                seq_key = tuple(nodes_seq)
                if seq_key not in subpaths_seen:
                    subpaths_seen.add(seq_key)
                    if len(nodes_seq) >= 2:
                        constraint_subpaths.append(nodes_seq)
        else:
            header_lines.append(stripped.lstrip("#").strip())
        idx += 1

    graph_id = header_lines[0] if header_lines else str(id(graph_raw))

    while idx < len(graph_raw) and graph_raw[idx].strip() == "":
        idx += 1

    if idx >= len(graph_raw):
        error_msg = "ngraph block missing node-count line."
        utils.logger.error(f"{__name__}: {error_msg}")
        raise ValueError(error_msg)

    try:
        n = int(graph_raw[idx].strip())
    except ValueError:
        utils.logger.error(f"{__name__}: Invalid ngraph node-count line: {graph_raw[idx].rstrip()}.")
        raise

    idx += 1
    while idx < len(graph_raw) and graph_raw[idx].strip() == "":
        idx += 1

    if idx >= len(graph_raw) or not graph_raw[idx].lstrip().startswith("#NODES"):
        error_msg = "ngraph block missing #NODES section marker."
        utils.logger.error(f"{__name__}: {error_msg}")
        raise ValueError(error_msg)
    idx += 1

    G = nx.DiGraph()
    G.graph["id"] = graph_id
    G.graph["constraints"] = constraint_subpaths

    # Read exactly n node lines.
    nodes_read = 0
    while idx < len(graph_raw) and nodes_read < n:
        line = graph_raw[idx].strip()
        idx += 1
        if line == "":
            continue
        if line.lstrip().startswith("#"):
            utils.logger.error(f"{__name__}: Unexpected comment in #NODES section: {line}")
            raise ValueError(f"Unexpected comment in #NODES section: {line}")
        elements = line.split()
        if len(elements) != 2:
            utils.logger.error(f"{__name__}: Invalid node format in ngraph: {line}")
            raise ValueError(f"Invalid node format in ngraph: {line}")
        node_id, weight_str = elements
        try:
            weight = float(weight_str)
        except ValueError:
            utils.logger.error(f"{__name__}: Invalid node weight in ngraph: {line}")
            raise
        G.add_node(node_id.strip(), flow=weight)
        nodes_read += 1

    if nodes_read != n:
        error_msg = f"ngraph node section ended early: expected {n}, read {nodes_read}."
        utils.logger.error(f"{__name__}: {error_msg}")
        raise ValueError(error_msg)

    while idx < len(graph_raw) and graph_raw[idx].strip() == "":
        idx += 1

    if idx >= len(graph_raw) or not graph_raw[idx].lstrip().startswith("#EDGES"):
        error_msg = "ngraph block missing #EDGES section marker."
        utils.logger.error(f"{__name__}: {error_msg}")
        raise ValueError(error_msg)
    idx += 1

    # Parse edges until the end of the block.
    for line in graph_raw[idx:]:
        stripped = line.strip()
        if not stripped:
            continue

        if line.lstrip().startswith("#"):
            comment = line.lstrip()
            # Allow additional #S lines after #EDGES for flexibility.
            if comment.startswith("#S"):
                nodes_part = comment[2:].strip()
                if nodes_part:
                    nodes_seq = nodes_part.split()
                    seq_key = tuple(nodes_seq)
                    if seq_key not in subpaths_seen:
                        subpaths_seen.add(seq_key)
                        if len(nodes_seq) >= 2:
                            constraint_subpaths.append(nodes_seq)
            continue

        elements = stripped.split()
        if len(elements) != 3:
            utils.logger.error(f"{__name__}: Invalid edge format in ngraph: {line.rstrip()}")
            raise ValueError(f"Invalid edge format in ngraph: {line.rstrip()}")

        u, v, w_str = elements
        try:
            w = float(w_str)
        except ValueError:
            utils.logger.error(f"{__name__}: Invalid edge weight in ngraph: {line.rstrip()}")
            raise

        if u not in G.nodes or v not in G.nodes:
            utils.logger.error(
                f"{__name__}: Edge ({u}, {v}) references unknown node in graph {graph_id}."
            )
            raise ValueError(f"Edge ({u}, {v}) references unknown node in ngraph.")

        G.add_edge(u.strip(), v.strip(), flow=w)

    # For ngraph, constraints can encode node-pair evidence (MultiTrans R),
    # which is not necessarily an existing edge. Validate only node existence.
    for subpath in constraint_subpaths:
        for node_id in subpath:
            if node_id not in G.nodes:
                utils.logger.error(
                    f"{__name__}: Constraint references unknown node {node_id} in ngraph {graph_id}."
                )
                raise ValueError(f"Constraint references unknown node {node_id}.")

    G.graph["n"] = G.number_of_nodes()
    G.graph["m"] = G.number_of_edges()
    from flowpaths import stdigraph as _stdigraph  # type: ignore
    G.graph["w"] = _stdigraph.stDiGraph(G).get_width()

    return G

read_ngraphs

read_ngraphs(filename)

Read one or more ngraph blocks from a file.

Graph blocks are delimited by lines starting with ‘# graph’ (case-insensitive). If no such delimiter exists, the whole file is parsed as one ngraph block.

Source code in flowpaths/utils/graphutils.py
def read_ngraphs(filename):
    """
    Read one or more ngraph blocks from a file.

    Graph blocks are delimited by lines starting with '# graph' (case-insensitive).
    If no such delimiter exists, the whole file is parsed as one ngraph block.
    """

    with open(filename, "r") as f:
        lines = f.readlines()

    starts = []
    for i, line in enumerate(lines):
        stripped = line.lstrip().lower()
        if stripped.startswith("# graph") or stripped.startswith("#graph"):
            starts.append(i)

    if len(starts) == 0:
        return [read_ngraph(lines)]

    graphs = []
    for idx, start in enumerate(starts):
        end = starts[idx + 1] if idx + 1 < len(starts) else len(lines)
        graphs.append(read_ngraph(lines[start:end]))

    return graphs