Coupling Arguments

(David here.) I'm going to explore various versions of an idea, starting from where I first saw it in Olympiads, and going beyond.

Changing the rules

I first encountered the idea from this ISL problem:

(ISL 2011 C5, simplified) There are some ants on a unit square. Each is moving with speed $1$ parallel to a side of the square. When two ants moving in opposite directions meet, they both turn $90^{\circ}$ clockwise and continue moving with speed $1$, and otherwise they pass through each other. Find the longest possible time until all the ants fall off.

A priori, it's not even clear that all ants must eventually fall off - maybe an ant could coil around indefinitely? Some vague sense tells us that this is not possible, since we would have to coordinate the other ants really well, but it's hard to clarify this intuition.

Here's the magic idea: when a left-ant meets a right-ant, they turn down and up respectively. But when a down-ant meet an up-ant, instead of turning right and left respectively, we switch their labels and pretend that they turned left and right instead. This is valid, because we don't really care about the ordering of the ants on the board, just the set of their positions. Thus, we can instead pretend we have "down-left" ants and "up-right" ants, and now it is clear that they must fall off by time $2$. With some more careful analysis, we can pinpoint the longest possible time (which is $3/2$).

In Pascal96's Combinatorics Book, we can find this problem in a section (aptly) titled "Problem Alteration: Don’t play by the rules". When it works out, it often feels like cheating.

Here's another problem where changing the rules helps:

(Chip firing) Put a number of chips on the vertices of a finite connected graph. If a vertex has at least as many chips as neighbors we can "fire" it by sending each neighbor a chip. Show that if there are less chips than edges, then each vertex fires finitely many times.

How do we change the rules here? We imagine that on every edge there is long ditch that can hold at most one chip. Furthermore, the chip can be on either end of the ditch. Now, when we fire a chip from a vertex to another, it lands in the ditch if the ditch is empty, and moves to the opposite end. If the ditch is not empty and there is a chip on the side of the original vertex, then we move the chip across the ditch. Otherwise, we just send a chip over to the recepient vertex as usual.

If we pretend that the chips in the ditch are part of the pile on the closest vertex, we recover the original game. However, we see that chips can never leave the ditches, so if a ditch never fills up, then the adjacent vertices must have never fired, and their adjacent vertices must have fired finitely many times and so on. Since the graph is connected, we conclude that we can indeed only fire finitely many times at each vertex.

One can describe this idea also as coupling - we are tying together two different processes, each running with their own rules, but somehow they have the same desired outcomes.

Sometimes for induction, the cases $n$ and $n-1$ can be coupled as well.

(Canada 2010/4) There are $n$ lamps in a room, with certain lamps connected by wires. Initially all lamps are off. You can press the on/off button on any lamp $A$, but this also switches the state of all the lamps connected to lamp $A$ from on to off and vice versa. Prove that by pressing enough buttons you can make all the lamps on. (Connections are two-way.)

If you've solved the problem for $n-1$, then with $n$ lamps you can ignore a lamp and run your algorithm. Then, if we're not done, the remaining lamp must be off. But since we can do this for any lamp, we can chain this algorithm for two lamps together: this allows us to turn on a single pair of lamps. Now the problem becomes easy.

Random variables with equal distributions

We can apply a similar spirit to random processes - you could try to invent a random process which results in an object of the same distribution. Then, any questions about the probability of events or expected values can be answered using this invented process.

(R. Stanley) Suppose I have an ordinary deck of 26 black cards and 26 red cards. I shuffle the deck thoroughly and start turning over cards one at a time. At any point in the process, before I turn over the next card, you can say 'stop'! You then win if the next card is red. Clearly saying ‘stop’ before the first card is turned gives you a 50% chance of success. Is there a strategy that can do better?

The answer is no. Imagine that whenever you stop, a magical spell brings the last card in the deck to the top of the deck and you open that instead. The distribution is still the same, so this is valid, but there's no way to guess the last card to better than 50% chance of success.

(easier version of 1992 Putnam A6) Three points are chosen at random on the circle. What is the probability that it is acute?

Instead of drawing three points, draw a point and two diameters, then for each diameter pick one of the endpoints. Then, exactly one of the 4 possible triangles will be acute, so the answer is $1/4$.

(From Persi Diaconis) Show that for sufficiently large $n$, the number of cycles in a random permutation of $\{1,2,\cdots, n\}$ is between $0.9\log n$ and $1.1\log n$ with probability at least 99.9%. (In other words, $99.9\%$ of permutations lie very near $\log n$ for large $n$.)

We consider the so-called Chinese restaurant process: customers 1 through $n$ enter a restaurant, and each customer chooses to sit between some two customers or at a new table uniformly at random. The result is exactly a random permutation, since the probability of getting a fixed permutation we want is $1/n!$.

However, the number of cycles is exactly then number of customers who decide to sit at a new table, so this is simply the sum of $n$ independent $0-1$ variables $X_1+...+X_n$, where the probability of $X_k$ being 1 is $1/k$. Hence, the variance is the sum of the individual variances, which is $\log n + O(1)$, so we're done by Chebyshev's inequality.

(From a quant interview.) (Magnetic Coins) You have 100 magnetic coins, and you start to toss them into two jars, each initially contaiing a single coin. Because the coins are magnetic, each coin you toss will land into either jar with probability proportional to the number of coins already in the jar.

Suppose that you knew that the 99th coin lands in the first jar. What is the probability that the 100th coin also lands in the first jar?

Imagine that every coin chooses one coin uniformly at random among all the coins in both jars and sticks to it.

Now we write this as a sequence. Label the first two coins $0,1$, and the rest of the coins $2,...,101$. When we attach $b$ to $a$, we write $b$ right after $a$ in the sequence. After tossing $n$ coins, we get precisely $0$ followed by a random permutation of $[0,...,101]$.

A coin lands in the first jar if it comes before $1$ in the permutation. So the probability of $n$ appearing before coin $1$ is 1/2, and the probability of both $100$ and $101$ appearing before coin $1$ is 1/3, so the conditional probability is 2/3.

Probabilistic coupling

In probability, the term coupling refers to a way of placing two random objects (or distributions) together on the same probability space while retaining their original distributions. For instance, valid couplings of two fair dice include

having two independent dice rolls
having their values always be the same
tossing one dice, then uniformly generating the other dice from one of three values with the same parity.

For the last possibility, you would have to check that the second dice was indeed fair.

A classic example of coupling is if you had two biased coins with probability of heads being $p < q$ respectively. Then, we can generate them together while enforcing that if the first coin shows heads, then the second coin must show heads as well. One could do this by generating a uniformly random number $u$ from [0,1], and then having the first coin be heads only if $u < p$ (and similarly for the second coin).

Here's a tricky problem you can demolish with this idea:

Given any graph, generate a subgraph by independently selecting each edge with probability $p$. Show that the probability that the graph is connected is increasing in $p$.

A first instinct might be to check this for a fixed connected subgraph, but the probability of having it is $p^{(\# \text{edges})}(1-p)^{(\# \text{non-edges})}$, which isn't always increasing in $p$. What's happening here is sort of subtle: near $p\to 1$, we only have 1 increasing function from the original graph, but this is (almost) perfectly balanced on the first-order with the graphs with exactly one missing edge.

Maybe one could get to a solution eventually this way, but otherwise it is immediate using coupling - if you had two different probabilities $p<q$, you could couple the generated graphs so that the $p$-graph is always a subgraph of the $q$-graph, so the latter must be connected if the former is connected.

Somewhat shockingly, there are Olympiad problems that could use this idea:

(USA TST 2015/3) A physicist encounters $2015$ atoms called usamons. Each usamon either has one electron or zero electrons, and the physicist can't tell the difference. The physicist's only tool is a diode. The physicist may connect the diode from any usamon $A$ to any other usamon $B$. (This connection is directed.) When she does so, if usamon $A$ has an electron and usamon $B$ does not, then the electron jumps from $A$ to $B$. In any other case, nothing happens. In addition, the physicist cannot tell whether an electron jumps during any given step. The physicist's goal is to isolate two usamons that she is sure are currently in the same state. Is there any series of diode usage that makes this possible?

The solution is to couple instances of the problem by assigning a real number (between 0 and 1) of electrons to each usamons. Then, the diode simply makes sure that $A$ has less electrons than $B$. We couple this to the original problem by choosing any cutoff we want, and saying that usamons above the cutoff have an electron and those below do not. Since we can at best make sure that the original real numbers were ordered, we cannot pick two usamons in the same state (since the cutoff could always be between them).

Markov Chain Mixing Times

We now come to a highlight application of probabilistic coupling. Suppose you did a random walk on a (connected) graph: start from a vertex, and with equal probability walk to a neighbor or stay put. If you walk a large number of steps $T$, what probability should you expect to be at some given vertex $v$?

Let's try a heuristic: suppose that for large $T$, we approach a steady state probability distribution $\pi(v)$. Taking one step, $\pi(v) / \deg(v)$ of the probability weight goes out to every neighboring vertex $u$, so if we balance the transfer of probability weight we get $$\frac{\pi(v)}{\deg(v)} = \frac{\pi(u)}{\deg(u)}$$ so perhaps $\pi(v)$ is proportional to the degree. However, this is not logically an assumption; one only requires that $$\pi(v) = \sum_{u\in N(v)} \frac{\pi(u)}{\deg(u)+1}, \quad \text{where } N(v) \text{ includes }v$$ So, was our guess right - that this is the limiting distribution? Could there be a different steady state distribution? Does which vertex we start on matter? And if this is right, how fast do we converge in terms of $T$?

For a general distribution $\mu$, we define a "step" as the distribution taking one step of the random walk after initializing according to $\mu$: $$(\mu P){(t+1)}(v) := \sum_{u\in N(v)} \frac{\mu(u)}{\deg(u)+1}$$

First, to define a "limiting" distribution, one needs a metric on distributions. Define the total variation distance between distributions $\mu, \nu$ as $$d_{TV}(\mu, \nu) = \max_{S\subseteq V} \{P_{v\sim \mu}(v\in S) - P_{v\sim \nu}(v\in S)\} = \frac 1 2 \sum_{v\in V} |\mu(v) - \nu(v)|$$ In words, this is the largest measurable deviation in probability between the two distributions. In our case, we would like to show that there must be a unique $\pi$ such that regardless of $\mu$, $d(\mu P^T, \pi) \to 0$.

To do this, it would suffice (why?) to simply show that $d(\mu P^T, \nu P^T)\to 0$.

Here is where coupling comes into play. Define two random walks $(X_0, X_1,...)$ and $(Y_0,Y_1,...)$ starting from distributions $\mu$ and $\nu$ respecitvely. Then if we could couple them together, we would have that $$d_{TV}(\mu P^t, \nu P^t) = \max_{S\subseteq V} \{P(X_t\in S) - P(Y_t\in S)\} \le P(X_t\neq Y_t)$$ since $X\in S$ and $Y\notin S$ implies $X\neq Y$. So if we could show that $P(X_t \neq Y_t)$ went to zero, then we would have the result.

So how do we make sure that $X_t$ and $Y_t$ match up? Here's the coupling rule:

when $X_t\neq Y_t$, they wander around independently.
when $X_t = Y_t$, they move around in sync with each other (but still a random walk)

Individually, $X_t$ and $Y_t$ still have the distribution of the original random walks! We're just left with showing that they have a high chance of eventually meeting up.

Let $D$ be the diameter of the graph (so there exists a path of length $D$ between any pair of vertices), and let $\Delta$ be the maximum degree. Thus, with probability at least $1/(\Delta+1)^{D+1}$, for $\lceil D/2\rceil$ time steps both $X_t, Y_t$ simply go towards each other and meet up. Thus, we must have $$P(X_{t+\lceil D/2\rceil}\neq Y_{t+\lceil D/2\rceil})\le \left(1-\frac 1 {(\Delta+1)^{D+1}}\right) P(X_t\neq Y_t).$$ Thus, in the long run, we can expect $$d_{TV}(\mu P^T, \nu P^T) \le P(X_T \neq Y_T) \le \left(1 - \frac{1}{D(\Delta + 1)^{D+1}}\right)^{T} P(X_0 \neq Y_0)$$ so in fact, the total variation distance must converge exponentially.

In general, the quantity $$\tau = \inf_{t>0} \{\sup_{\mu, \nu} d_{TV}(\mu P^\tau, \nu P^\tau) < 1/(2e)\}$$ is also called the mixing time of the Markov chain, and it governs the rate of the exponential convergence.

(Interestingly, a quick way to show that a limiting distribution exists is using Brouwer's fixed point theorem, since the map $\mu\to \mu P$ is continuous.)

Search This Blog

SIMO Retiree Blog