Inclusion-Exclusion

(David here.) In The Rising Sea, Ravi Vakil writes:

When introduced to a new idea, always ask why you should care. Do not expect an answer right away, but demand an answer eventually. Try at least to apply any new abstraction to some concrete example you can understand well.

We have something analogous for understanding proofs - with time, new perspectives show up and reveal new insight, and allows one to tackle problems that seemed impossible before.

(I have to stress that this happens eventually - the line of thought behind this article spanned years!)


PIE. Can you remember the proof of PIE off the top of your head?

Here's a small case: $$|A\cup B \cup C| = |A| + |B| + |C| - |A\cap B| - |B\cap C| - |C\cap A| + |A\cap B \cap C|$$

Perhaps someone told you to draw a Venn diagram, and to look at how many times each region by $|A|+|B|+|C|$, and then to correct for it etc.

For the longest time, in my head the $n=3$ case justified the truth of the general PIE: I asssumed there was some messy induction you could do to generalize the logic of the small cases, and didn't look much further into it. But it certainly didn't feel that my understanding was complete.


Significance. What did we even need this for? Maybe you've had to count how many numbers up to a 1000 that were not multiples of 2, 3 or 5. In general, it's good for counting things that are simultaneously "not X, not Y, ..." when it was easy to count things that were simultaneously both X and Y and so on.

A special case of this is counting "primitive" elements - how many numbers from 1 to $n$ are coprime to $n$? Well, if $n$ had prime factors $p_1,...,p_k$, we would need to count all numbers that are not divisible by $p_i$. Thus, $$\varphi(n) = n - \left(\frac{n}{p_1} + \cdots + \frac{n}{p_k}\right) + \left(\frac{n}{p_1p_2} + \frac{n}{p_1p_3} \cdots + \frac{n}{p_{k-1}p_k}\right) + \cdots.$$

If you know about the Chinese remainder theorem, you might know that taking a (uniformly) random number from 1 to $n$ should give you independently random residues mod $p_i$, so we can just write $$\varphi(n) = n \left(1-\frac 1 {p_1}\right) \cdots \left(1-\frac 1 {p_k}\right)$$ and it's not too hard to realize that these are equivalent expressions. Hmm.


The proof. It was many years later that I learnt of a satisfactory proof. This involves switching gears and talking about the probability version of PIE. (Perhaps you guessed this was coming from the last paragraph? It took me a year or so.)

In probability, one talks about events (e.g. "sum of two dice rolls = 7"), but often one can equivalently talk about sets of outcomes (e.g. "the number of pairs of dice rolls that sum to 7"). If these outcomes are equally likely, then the probability of an event is proportional to the number of outcomes corresponding to the event.

So with events $A,B,C$ instead of sets, we must also have that $$P(\text{at least one of }A,B,C) = P(A) + P(B) + P(C) - P(A,B) - P(B,C) - P(C,A) + P(A,B,C)$$ where $P(A,B)$ denotes the probability of $A,B$ simultaneously happening. $$\newcommand \I {\mathbb{I}} \newcommand \E {\mathbb E}$$ For each event $E$, we can make a ranodm variable $\I_E$ which is 1 if event $E$ happens and 0 otherwise. The mean of $\I_E$ is exactly $$\E[\I_E] = P(E) \cdot 1 + P(E^c) \cdot 0 = P(E)$$ so probability of events correspond to the expectation of their indicator functions. So the probabilistic statement corresponds to $$\E[\I_{A\cup B\cup C}] = \E\left[\I_A + \I_B + \I_C - \I_{A\cap B} - \I_{B\cap C} - \I_{C\cap A} + \I_{A\cap B \cap C}\right]$$

Now for the voila moment - $\I_{A\cap B} = \I_A \I_B$ (that's what it means for both events to happen), whereas if you're not in $A\cup B\cup C$, then you're not in each of them: $$1-\I_{A\cup B \cup C} = (1-\I_A)(1-\I_B)(1-\I_C)$$ and immediately we recover the original PIE by summing both sides over all elements.

For the first time, I felt like I could see that PIE was true in full clarity and immediacy.

Remark. To some extent, PIE is not just a statement about the size of the sets, but more generally the composition of the sets: in the Venn diagram argument, to get the union, we start with A, B, C, then we remove the duplicates, and then we recompensate for duplicate removals. (So the statement wouuld also be true if we measured the size of sets by $\sum_{a\in A} f(a)$ - this means that the indicator version has to be true!)


Alternating subset sums. Let's say you saw the following problem:

(CTST 2020, additional test Q3) Given a finite set $S$ of positive integers, define $$f_S(x) :=\sum_{\varnothing \neq A \subset S} \frac{(-1)^{|A|} x^{d(A)}}{\mathrm{lcm}(A)}.$$ where $\text{lcm}(A)$ denotes the least common multiple of elements in $A$, and $d(A)$ denote the number of prime factors of $\text{lcm}(A)$ (counting multiplicity).

Prove that if $0 \le x \le 2$, then $-1 \le f_S(x) \le 0$.

This looks like it has nothing to do with PIE, unless you've written down PIE this way:

$$|\cup_I A| = \sum_{\varnothing \neq J\subseteq I} (-1)^{|J|-1}|\cap_J A|.$$ where $\cup_I A$ is shorthand for $\cup_{i\in I} A_i$.

Then, we see so many echoes from the previous proof. There is:

  • an alternating sum over subsets
  • the desired conclusion is to show that the quantity is between -1 and 0 (really, it is between 0 and 1 if we add in the case $A=\varnothing$) - perhaps this is a probability of sorts.

So, inspired by the "indicators" proof of the PIE, we want to cook up events $E_s$ such that

$$\sum_{A \subseteq S} \frac{(-1)^{|A|} x^{d(A)}}{\mathrm{lcm}(A)} = \mathbb E \left[ \prod_{s\in S} (1-\mathbb I_{E_s}) \right].$$ Notice that $x^{d(A)} / \mathrm{lcm} (A)$ is really a function of $\mathrm {lcm}(A)$, so this gives us no choice but to think prime-by-prime. In particular, each prime power $p^k$ dividing $s$ should contribute a factor of $x/p$ to the probability of $E_s$. Thus, we set up the events $\{E_s\}$ as follows:

  • For each prime power $p^k$, toss a coin $C(p^k)$ with probability $x/p$ of being heads.
  • Event $E_s$ is the event that the coins for prime powers dividing $s$ flip heads.
  • Immediately $E_s\cap E_t = E_{\mathrm {lcm} (s,t)}$, so each term is exactly what we expect.

Sometimes, it does pay off to be unsatisfied with proofs, even if it takes years to find a resolution.


Empty indicator.

What does the "expectation of indicators" argument translate back into in set language? You'll find this proof on Wikipedia - take the perspective of a single element and count how many terms contain it. To me that feels slightly less immediate than the probabilistic interpretation, but we'll see that there's something to be gleaned here.

Let $\mathcal E$ be the "set of everything". Then, we can write PIE in the slightly more natural

$$|\mathcal E \setminus (A_1\cup A_2 \cup \cdots \cup A_n)| = \sum_{J \subseteq I} (-1)^{|J|} |\cap_J A|$$

Now, we consider a double sum by whether each element $x$ is in these sets on the RHS:

$$ \begin{align*} \sum_{J \subseteq I} (-1)^{|J|} |\cap_J A| &= \sum_{J \subseteq I} (-1)^{|J|} \sum_{x \in U} (x \in \cap_J A)\\ &= \sum_{x\in U} \sum_{J \subseteq I} (-1)^{|J|} (x \in \cap_J A) \end{align*}$$

At this stage, $J$ is only relevant if it contains only sets that contain $x$ (let's call this $C_x$). Thus, $$ \sum_{x\in U} \sum_{J \subseteq I} (-1)^{|J|} (x \in \cap_J A) = \sum_{x\in U} \sum_{J \subseteq C_x} (-1)^{|J|}$$ Is this sum the number of elements that are not in any $A_i$? That would also mean that we want tot detect whether $C_x = \varnothing$. Indeed, one can check that $$\sum_{J\subseteq I} (-1)^{|J|}= \begin{cases} 0 & \text{if }|I|>0 \\ 1 & \text{if }|I| = 0\end{cases}$$ so we're done. (Or, if you'd prefer, you can match up the odd-sized subsets and even-sized subsets, as we would do for D.I.E. counting.)

This is an indicator function for the empty set, so to some extent this makes it the "Boolean" version of Fourier analysis. A variation of this that if we knew that $I'\subseteq I$, then we can detect if they are equal by taking $$\sum_{I'\subseteq J \subseteq I} (-1)^{|J|} = \begin{cases} (-1)^{|I|} & \text{if }I' = I \\ 0 & \text{otherwise}\end{cases}$$

Here's a problem that uses the mod 2 version of this:

Given a (simple) graph, a subset $D$ of its vertices is dominating if all vertices are either in $D$ or adjacent to a vertex in $D$. Show that there must be an odd number of dominating subsets.

Here's the trick: a set $D$ is dominating if $N(D)$ is exactly all the vertices. So, the complement of $N(D)$ has an odd number of subsets exactly when $D$ is dominating. However, this is the same as counting pairs of subsets which are not adjacent, so this is symmetric except for the case where both subsets are equal (and thus empty), so the number of such pairs are odd.


Quadratic PIE. The insight above gives us a plausible line of attack for this problem.

(239 Olympiad 2016, Q6) Suppose that I have a union-closed finite family $\mathcal F$ of finite sets, each of which has size not divisible by 3. Then there must be two elements, such that each set in the family contains one of them!

Let's bring back PIE as we've stated earlier:

$$|\cup_I A| = \sum_{\varnothing \neq J\subseteq I} (-1)^{|J|-1}|\cap_J A|.$$

By relating the size of unions and the size of intersections, we can almost perfectly capture the conditions in the statement of the problmem.

  • Assume that the elements $ A_i $ belong to the family $ \mathcal{F}$. Then, the union $\cup_I A $ also belongs to the family $\mathcal F$, and thus its size is 2 mod 3.
  • On the other hand, if $I $ and $ J $ together cover the entire family $ \mathcal{F}$, then we can suppose (for contradiction) either the size of $ |\cap_I A| $ or $ |\cap_J A| $ must be zero - if neither is zero, simply selecting one element from each intersection would suffice. So we can take some kind of sum across $ |\cap_I|\cdot |\cap_J| $, then each term will be zero.

We get closer to it by squaring the statement of PIE: $$|\cup_K|^2 = \sum_{\varnothing \neq I \subseteq K} \sum_{\varnothing \neq J \subseteq K} (-1)^{|I|+|J|} |\cap_I|\cdot |\cap_J|$$ and then we have the problem of determining whether $I\cup J$ does indeed cover everything. But here's where our "Fourier" trick comes in: if we do an alternating sum over all $K$ between $I\cup J$ and everything ($\mathcal E$ for everything), then we should kill off everything with a gap, i.e. $$\sum_{I\cup J \subseteq K \subseteq \mathcal E} (-1)^{|Z|} = \begin{cases}(-1)^{|\mathcal E|} & \text{if }I\cup J = \mathcal E \\ 0 & \text{otherwise}\end{cases}$$

So putting it all together, we have $$\sum_{K\subseteq \mathcal E} |\cup_K|^2 (-1)^{|K|} = 0$$

We recall that $|\cup K|$ is always 2 mod 3, except when $K$ is empty. So the LHS must be 1 mod 3, a contradiction (because $\sum_{K\subseteq \mathcal E} (-1)^{|Z|} = 0$)!

In retrospect, this might have been one of the hardest things I've had to invent to solve a contest problem - if you just read the proof, might you have just accepted that it was impossible to come up with? But one of the things I love about math is that sometimes, we're able to gradually refine and evolve our understanding, and with the right angle of attack, for a brief glimpse even the impossible becomes possible.

Comments

Popular posts from this blog

SMO Open 2024 ??% Speedrun

Musings about the IMO