An Introduction to Flows!

(Yu Peng here)

I'll be going through a pretty cool concept in graph theory: flows! I'll start off by explaining what a flow network is, then go on to demonstrate how we can make use of flows to easily prove some theorems in graph theory, for example, Hall's Marriage Theorem. Finally, we'll look at an olympiad problem.

Preferably, you should have some background knowledge in graph theory. In particular, you should know what are undirected and directed graphs, bipartite graphs, as well as some idea of connectivity in graphs. If you don't, just google these terms, the definitions shouldn't take too long to read :)

Definition

A flow network is a directed graph with two special vertices: a source $s$ and a sink $t$. All edges incident to $s$ must be out-edges, and all edges incident to $t$ must be in-edges.

Each edge $u \rightarrow v$ has a capacity $c(u, v) \in \mathbb{R}_{\geq 0}$, and has a flow $f(u, v) \in \mathbb{R}_{\geq 0}$ associated with it, such that the following two conditions are satisfied:

  • $f(u, v) \leq c(u, v)$ for every edge $u \rightarrow v$ in the graph.
  • (Flow conservation) For every vertex $u \neq s, t$, the total flow into the vertex must be the same as the total flow out of the vertex. Formally, we have $$ \sum_{u\rightarrow v} f(u, v) = \sum_{w\rightarrow u} f(w, u)$$

The size of a flow (we shall call it $|f|$) is $$|f| = \sum_{s\rightarrow v} f(s, v) = \sum_{v\rightarrow t} f(v, t)$$ The second equality should be intuitively true (since flow can't just disappear or appear in the middle), and can be proven by the flow conservation condition.

Non-mathematical interpretation

Picture the graph as a network of water pipes, where each directed edge is a pipe allowing water to flow in the direction of the edge. Do you see how this can be related to a flow network?

The capacity of the edge is then the maximum volume of water that can flow through per unit time, and the flow in the edge is then the volume of water flowing through per unit time. The size of the flow is then just the volume of water flowing into the source and out of the sink per unit time.

Applications

Flow networks can be used to model various problems, such as bipartite matching and assignment problems with constraints.

Bipartite Graphs

Take a bipartite graph $(A, B, E)$ where $A, B$ are the two groups of vertices and $E$ is the set of edges between vertices in $A$ and vertices in $B$. A matching $M \subseteq E$ is a set of edges such no two edges in $M$ share a vertex.

We can model a bipartite graph as a flow network with source $s$ and sink $t$:

  • For all $a \in A$, $b \in B$ connected by an edge, have an edge $a \rightarrow b$ with capacity 1.
  • For all $a \in A,$ have an edge $s\rightarrow a$ with capacity 1.
  • For all $b \in B,$ have an edge $b\rightarrow t$ with capacity 1.

Now given a matching $M$, we can choose the flow values:

  • For all $a \in A$, $b \in B$ connected by an edge, $f(a, b) = 1$ if the edge $a - b \in M$, 0 otherwise.
  • For all $a \in A,$ $f(s, a) = 1$ if $a$ is incident to some edge in the matching, 0 otherwise.
  • For all $b \in B,$ $f(b, t) = 1$ if $b$ is incident to some edge in the matching, 0 otherwise.

Here's a picture to demonstrate this (directions omitted):

The coloured edges show the edges with flow 1, and we see that the edges across the middle form a matching (although not maximum).

Thus, we can get a flow of size $|M|$ from a matching $M$.

If we restrict the flow in each edge to only have integral values i.e. 0 or 1 in this case, then a flow of size $m$ will correspond to a matching of size $m$.

Thus, we can use a flow network to model a bipartite matching problem! In particular, finding a matching with maximum size is equivalent to finding a flow with maximum size in the graph we created.

However, after we get a flow network, how do we use it to prove anything?

Max-Flow Min-Cut Theorem

We first define what a cut is. A cut $(S, T)$ is a partition of the vertices such that $s \in S$ and $t \in T$. The capacity of a cut is $$ c(S, T) = \sum_{u\in S, v\in T, u \rightarrow v} c(u, v) $$

Now from the name of the theorem, you might guess that there is some relation between the maximum flow and the minimum cut. So how is the quantity above related to the size of the flow?

Since removing all edges from $S$ to $T$ prevents the flow from travelling from $S$ to $T$, the capacity of a cut should be a bottleneck for the size of the flow. Let's formalise this.

Intuitively, the net flow from $S$ to $T$ should be equal to the size of the flow. Formally, this means that the size of the flow is $$ |f| = \sum_{u\in S, v\in T, u \rightarrow v} f(u, v) - \sum_{u\in S, v\in T, v \rightarrow u} f(v, u) $$ i.e. the sum of flows in the edges from $S$ to $T$ minus the sum of flows in the edges from $T$ to $S$. This can be properly proven with the flow conservation condition.

Since only the first part seems relevant to the capacity of the cut, we can observe the following inequality:

$$ |f| \leq \sum_{u\in S, v\in T, u \rightarrow v} f(u, v) \leq \sum_{u\in S, v\in T, u \rightarrow v} c(u, v) = c(S, T) $$

Consequently, this implies that the minimum capacity of a cut is at least the maximum size of a flow.

The max-flow min-cut theorem goes further to say:

The maximum size of a flow is equal to the minimum capacity of a cut in a flow network.

This statement is quite powerful, as it gives us a link between the maximum flow and the minimum cut. As such, when we want to find one of them, we might opt to find the other simply because it's easier. When we apply the theorem to flow networks constructed specifically for specific graphs, we can also get some nice links between properties of graphs.

Proof of Theorem

Before we get into proving the theorem, let's think about how we could construct a max-flow. In other words, let's find an algorithm that could compute the max-flow.

Sometimes, if you figure out how to construct / compute something through an algorithm when looking at a problem, you can find out interesting properties and structure in whatever you're constructing / computing by analysing the individual steps, or by analysing the properties and invariances that exist while performing the algorithm.

An Attempt at an Algorithm

Well let's just take a greedy approach first: given a flow network, we try to alter the flows in the edges to give us a larger flow.

Naively, we can just start from the source $s$, then look for edges forming a directed path to $t$ which aren't at maximum capacity, then increase the flow on all such edges:

In the graph we let $a / b$ mean the edge has flow $a$ and capacity $b$.

We can increase the flows on all the red edges to get a larger flow. So whenever we can find a directed path from $s$ to $t$ of edges not at maximum capacity, we can increase the flow. We shall experiment to see if this algorithm gives us a max flow

Sadly, there are situations where there isn't a directed path of edges not at maximum capacity from source to sink, but there exists a larger flow:

Improving the Algorithm?

Let's not limit ourselves to increasing the flow on edges!

By increasing the flow on some edges and decreasing the flow on some other edges, we might be able to increase the size of the flow. We can also observe that the edges we changed kind of form a path from source to sink, if you just ignore their directions. (Notice how this is similar to augmenting paths in bipartite matching, for those of you who have some idea of that)

So basically we want to keep track of which edges we can increase the flow for, and which edges we can decrease the flow for, and somehow do an alteration to increase the flow. Let's try to formalise this:

From a flow network on a graph $G$, construct a directed graph $G'$ as follows:

  • The vertex set of $G'$ is the same as the vertex set of $G$.
  • For each edge $u \rightarrow v$ in $G$:
    • if $c(u, v) > f(u, v)$ then add an edge $u \rightarrow v$ in with weight $c(u, v) - f(u, v)$ in $G'$.
    • if $f(u, v) > 0$ then add an edge $v \rightarrow u$ with weight $f(u, v)$ in $G'$.

Basically, the weight of the edges show how much we can increase / decrease the flow in each edge.

Now, if there exists a directed path from $s$ to $t$ in this graph $G'$, then let $m$ be the minimum weight of edges in this path. For each edge on the path, we can increase its flow by $m$ if it has the same direction in both $G$ and $G'$, and decrease its flow by $m$ if it has different directions in $G$ and $G'$. We can check that this still gives us a valid flow, but with size increased by $m$.

Algorithm Analysis

Ok, so now we have an algorithm to increase the flow, but does it give us the maximum flow? We'll check the terminating state of the algorithm to find out. It terminates since we increase the size of the flow at each step, and the maximum size is finite.

Ok not really, since the capacities might not be rationals. But if we have a max flow, then it's surely in the terminating state, so we'll just assume we have a flow in the terminating state.

When we can no longer increase the flow, it means that in the graph $G'$ we constructed, there is no directed path from $s$ to $t$. In other words, $s$ and $t$ are now disconnected. Doesn't this sound like a "cut" in the graph?

Let $S$ be the set of vertices reachable from $s$ in $G'$, and $T$ be all the other vertices, so that $t \in T$. We want to analyse the value of this cut in $G$.

Note that there cannot be edges from any vertex in $S$ to any vertex in $T$ in $G'$, otherwise some vertex in $T$ is reachable from $s$.

Thus, $f(u, v) = c(u, v)$ for all edges $u \rightarrow v$, where $u \in S, v \in T$, and $f(v, u) = 0$ for all edges $v \rightarrow u$, where $v \in T, u \in S$.

In other words,

$$ \begin{align*} |f| & = \sum_{u\in S, v\in T, u \rightarrow v} f(u, v) - \sum_{u\in S, v\in T, v \rightarrow u} f(v, u) \\ & = \sum_{u\in S, v\in T, u \rightarrow v} c(u, v)\\ & = c(S, T) \end{align*} $$

This shows that $f$ is a maximum flow, otherwise we can find a larger flow $f'$ and $|f'| > c(S, T)$, a contradiction. Similarly, $(S, T)$ must be a minimum cut. Thus we have proven the max-flow min-cut theorem.

Remarks

This algorithm is actually known as the Ford-Fulkerson Algorithm, and is one of the ways to find a max flow in a flow network when the capacities are rationals. If you're interested, you can look up other (faster) algorithms to find a max flow :)

From this, we've seen how algorithms can help us deduce some properties in a given setup, so stay open to thinking about algorithms to get a better understanding of problems, even if it doesn't seem like it would be immediately helpful to solving the problem. I might talk more about algorithms in a future post, but not here since this is supposed to be about flows.

Another consequence of this algorithm is the following theorem:

Integral Flow Theorem

If the capacities of all edges in a flow network are integers, then there exists a maximum flow where all edges have integral flow values.

This is because starting from 0 flow, all alterations made by the algorithm are integral and the algorithm terminates, hence the final max flow we get is integral too.

Applications of the Theorems

So I've mentioned that finding a maximum matching in a bipartite graph is equivalent to finding a maximum flow in a flow network constructed from that graph, and now we know that max-flow = min-cut in any flow network. Thus, let's attempt to prove a theorem related to maximum matchings: the Hall's Marriage Theorem.

Hall's Marriage Theorem

Let's first define $N(S)$ for a general graph as the set of vertices which share an edge with at least one vertex in $S$. Then the theorem states:

Given a bipartite graph partitioned into vertex sets $(A, B)$, there exists a matching containing every vertex in $A$ if and only if for all $S \subseteq A, |S| \leq |N(S)|$.

Proof of Theorem

The forward implication is obvious. For the reverse implication, we construct a network flow as we did before.

Let $|A| = n$, then our goal is to show that an integral flow with size $n$ exists. Our strategy is to simply analyse all cuts $(S, T)$ of the graph, and show that its capacity is always at least $n$, so that by the max-flow min-cut theorem we can find a max flow of size at least $n$, which must be exactly $n$ since the sum of capacities of edges going out from $s$ is $n$.

Take a cut $(S,T)$ that partitions $A = A_1 \cup A_2, B = B_1\cup B_2,$ and $A_1, B_1 \subseteq S, A_2, B_2 \subseteq T$. Then $c(S,T) = |A_2| + |B_1| +$ #edges between $A_1, B_2$. We need to show that $c(S,T) \geq n$.

Since $|N(A_1)| \geq |A_1|$, there are at least $|A_1| - |B_1|$ edges between $A_1$ and $B_2$. Thus $c(S,T) = |A_2| + |B_1| +$ #edges between $A_1, B_2 \geq |A_2| + |B_1| + |A_1| - |B_1| = |A_1| + |A_2| = n$, and we're done.


Now let's look at another theorem regarding bipartite graphs which can also be proven using flows, but with a slightly different construction of the flow network.

Kőnig's Theorem

In a graph, a vertex cover is a set of vertices $S$ such that every edge in the graph is incident to at least one vertex in $S$. Kőnig's Theorem states:

In a bipartite graph, the size of a maximum matching is equal to the size of a minimum vertex cover.

Proof of Theorem

Let the graph be partitioned as $(A, B)$.

Let's set up the flow network:

  • For all $a \in A,$ have an edge $s\rightarrow a$ with capacity 1.
  • For all $b \in B,$ have an edge $b\rightarrow t$ with capacity 1.
  • For all $a \in A$, $b \in B$ connected by an edge, have an edge $a \rightarrow b$ with capacity $+\infty$.

Why do we use $\infty$? Well, it simplifies the calculations when computing the capacity of the cut later on, and this simplification turns out to be necessary in this case actually. The size of flow is still finite since it's bounded by $\min(|A|, |B|)$, and we can still check that a flow of size $m$ corresponds to a matching of size $m$ in this graph.

Consider a cut $(S, T)$ which partitions $A = A_1 \cup A_2$, $B = B_1 \cup B_2$, such that $A_1, B_1 \subseteq S$ and $A_2, B_2 \subseteq T$.

If there is any edge from $A_1$ to $B_2$, then the capacity of the cut is $\infty$. Hence, a min-cut should have no edges between $A_1$ and $B_2$:

Thus, the capacity of the cut is just $|A_2| + |B_1|$. But note that $B_1 \cup A_2$ is a vertex cover in the graph $\iff$ there are no edges between $A_1, B_2$. Hence:

$$ \begin{align*} \text{Size of max matching} &= \text{Size of max flow} \\ &= \text{Capacity of min cut}\\ &= \text{Size of min vertex cover} \end{align*} $$

Remarks

Try to prove the result when the capacities of the edges between $A, B$ aren't set to infinity. What problems do you face?

We see that how we construct the flow network is important, and how we choose the capacities can also be important.

There's actually another Kőnig's Theorem relating independent sets and edge covers. An independent set in a graph is a set of vertices where no two are connected by an edge. An edge cover in a graph is a set of edges such that every vertex in the graph is incident to at least one edge in the set.

The other Kőnig's Theorem states:

In a bipartite graph with no isolated vertices (vertices with no edges), the size of a maximum indepedent set is equal to the size of a minimum edge cover.

This is quite a similar statement, because a matching is really just an independent set of edges (no two edges are "adjacent" as they don't share vertices). Proof is left as an exercise :)


Now we'll look at one last theorem, which isn't about bipartite graphs :O

Menger's Theorem

This theorem has two versions: one deals with edge connectivity, and the other with vertex connectivity. The two versions are stated as follows:

Let $G$ be a finite undirected graph and $x, y$ be two distinct vertices in $G$. Then the minimum number of edge removals needed to disconnect $x, y$ is equal to the maximum number of edge-disjoint paths from $x$ to $y$.

Let $G$ be a finite undirected graph and $x, y$ be two distinct vertices in $G$. Then the minimum number of vertex removals (except $x, y$) needed to disconnect $x, y$ is equal to the maximum number of internally vertex-disjoint paths from $x$ to $y$ (i.e. the paths share no vertices aside from $x, y$).

Let's deal with the first version on edge connectivity.

Proof of edge version

Flows are defined on directed graphs, so we first have to transform the graph to a directed one. Since we can't control which direction each edge is traversed in for each path, replace each edge $u-v$ in the graph by edges $u \rightarrow v$ and $v \rightarrow u$ both with capacity 1.

For sanity's sake, have a source $s$ and sink $t$, and add edges $s \rightarrow x, y \rightarrow t$ with capacities $+\infty$, since by definition we should only have out-edges from the sources and in-edges to the sink.

We first show that the maximum number of edge-disjoint paths from $x$ to $y$ is equal to the size of the max flow. It's clear that edge-disjoint paths translate to a flow, but what about the other direction?

Take an integral max flow, then each edge has a flow of 0 or 1. We may partition the edges with flow 1 into some (say $m$) edge disjoint paths from $x$ to $y$. However, the same edge may be traversed in both directions, and such a flow would not translate to edge-disjoint paths in $G$. Could we fix this?

Turns out it isn't hard to fix. If an edge is traversed in both directions by one path, then we can just remove a cycle from that path:

If an edge is traversed in both directions by different paths, then we just alter the two paths:

In both cases, the total length of paths decreases, so we can make a finite number of such alterations, leaving us with $m$ edge disjoint paths that don't traverse any edge in both directions.

Hence, size of max flow = maximum number of edge-disjoint paths from $x$ to $y$.

Now consider a cut $(S, T)$. We'll just assume that $x \in S$ and $y \in T$ otherwise the cut has capacity $\infty$. The capacity $c(S, T)$ is equal to the number of edges between $S$ and $T$, and the removal of all such edges certainly disconnects $x, y$.

On the other hand, it's not hard to show that for any set of edges $E$ whose removal disconnects $x, y$, there exists a subset $E' \subseteq E$ and a cut $(S, T)$ such that $E'$ is the set of edges between $S$ and $T$ (let $S$ be the vertices reachable from $x$ after removing the edges, for example).

Hence, capacity of min-cut = minimum number of edge removals needed to disconnect $x, y$. Then by max-flow min-cut theorem we obtain the edge connectivity version of Menger's Theorem.


Now let's look at the vertex version.

Proof (Sketch) of vertex version

Ok let's try to construct the flow network... but we need our vertices to have capacity now??? Recall that in a flow network, we only define capacities for edges, not vertices. So can we even construct a flow network to prove this version?

Well... if we need edges, then let's make edges! For each vertex $u$, create a copy $u'$ and add an edge $u \rightarrow u'$ with capacity 1. Now our vertex is effectively an edge! I'll leave the rest of the flow network construction and the proof to you, it's basically the same.

Remarks

The two versions are also true on directed graphs, the proof is basically identical.

On another note, I'll briefly introduce some notions of connectivity in graph theory.

We define the vertex-connectivity of a graph to be the minimum number of vertex removals needed to disconnect or empty the graph. We say a graph is $k$-connected if its vertex connectivity is at least $k$.

Similarly, define the edge-connectivity of a graph to be the minimum number of edge removals needed to disconnect the graph. We say a graph is $k$-edge-connected if its edge connectivity is at least $k$.

Menger's theorem gives us some results about paths in $k$-connected and $k$-edge connected graphs. In particular, for $k = 2$, $2$-connected graphs are also known as biconnected graphs (which have no articulation points), and $2$-edge-connected graphs have no bridges. Those of you doing IO might be familiar with these.


Olympiad Context

We've seen some ways to construct flow networks to obtain various results, so let's look at an olympiad problem that can be solved by flows now.

ISL 1998

A rectangular array of numbers is given. In each row and each column, the sum of all numbers is an integer. Prove that each nonintegral number $x$ in the array can be changed into either $\lceil x\rceil $ or $\lfloor x\rfloor $ so that the row-sums and column-sums remain unchanged. (Note that $\lceil x\rceil $ is the least integer greater than or equal to $x$, while $\lfloor x\rfloor $ is the greatest integer less than or equal to $x$.)

Spoilers immediately ahead. This problem is pretty nice, so you might want to try it a little before reading on to get some understanding of the setup in the problem.

Given that I said this can be solved with flow, you might wonder: where is the graph? More experienced people might know that grids are really just bipartite graphs, but even if you don't, this problem can be modelled as a flow problem quite naturally. So let's start off without the preconceived notion that we should be using graph theory.

Firstly, let there be $n$ rows and $m$ columns. Now, after defining the dimensions of the grid, we are unfortunately faced with with the dilemma of choosing floor or ceiling for each number. Every number also changes by a different amount when we change it to floor or ceiling, which is annoying.

To ease this burden of choice and non-uniformity, change everything to its ceiling, then choose which ones we want to change to floor. This sounds comparatively easier, since we're just choosing some numbers to $-1$ from. Our goal is to get back to the original row and column sums, so how much should we decrease the row sums and column sums by?

Let the original number in the $i^\text{th}$ row and the $j^\text{th}$ column be $a_{i, j}.$ Then define $$ r_i = \sum_{j = 1}^m (\lceil a_{i, j} \rceil - a_{i, j}), c_j = \sum_{i = 1}^n (\lceil a_{i, j} \rceil - a_{i, j})$$

In other words, $r_i$ is the amount we have to decrease the row sum of row $i$ by, and $c_j$ is the amount we have to decrease the column sum of $c_j$ by. We are allowed to $-1$ from some numbers in the grid, but only at most once for each number, and only for those that weren't originally integers.

So in a way, each number has capacity 1 or 0... doesn't this sound like a constraint problem, which we could possibly model as a flow problem?

Ok, so when we decrease $a_{i, j}$ by 1, it affects row $i$ and column $j$. This gives us an idea how to construct the flow network: we want to create some dependency between $a_{i, j}$ and row $i$ and column $j$. The capacity constraint is on $a_{i, j}$, so we should probably make it an edge between row $i$ and column $j$. We also want to decrease the sum in row $i$ by $r_i$, and the sum in column $j$ by $c_j$.

Create a graph with vertices $R_1, \dots, R_n, C_1 \dots, C_m$, source $s$ and sink $t$. Then:

  • For each $a_{i, j}$ that isn't an integer, add an edge $R_i \rightarrow C_j$ with capacity 1.
  • For each $1 \leq i \leq n$, add an edge $s \rightarrow R_i$ with capacity $r_i$.
  • For each $1 \leq j \leq m$, add an edge $C_j \rightarrow t$ with capacity $c_j$.

Our goal is to find an integral max flow with size $\sum_{i = 1}^n r_i = \sum_{j = 1}^m c_j$, as the edges $R_i \rightarrow C_j$ with flow 1 correspond to the numbers we want to $-1$ from.

So as usual, let's analyse the capacity of a cut $(S, T)$...

Or not. We actually already have a max flow with the correct size, just that it isn't integral. We let:

  • $f(R_i, C_j) = \lceil a_{i, j} \rceil - a_{i, j}$
  • $f(s, R_i) = r_i$
  • $f(C_j, t) = c_j$

I'll leave you to check that this is a valid max flow with size $\sum_{i = 1}^n r_i = \sum_{j = 1}^m c_j$.

So by the integral flow theorem, there exists an integral max flow with the same size, which is what we wanted to show.

Remarks

If you didn't observe that we already have a max flow, you can still do cut capacity analysis to solve the problem, it's just not as clean. So when trying to find the max flow, you might want to choose between directly constructing one and inducing the integral flow theorem, or doing cut analysis and inducing the max-flow min-cut theorem, or possibly some other way.


Conclusion

This was a pretty long post (I mean it's basically an introduction to flows), so I hope you've gained something if you took the time to read the whole thing :)

Modelling some problems as a flow problem could be a very useful technique in some cases, as the max-flow min-cut theorem is really powerful. However, you have to be quite careful in how you construct your flow network, and how you assign capacities to edges.

Hope you've gained some appreciation for algorithms in math, and math in algorithms as well!

Comments

Popular posts from this blog

SMO Open 2024 ??% Speedrun

Musings about the IMO