20 Ideas: an Ode to Intuition

(Dylan here.)

0. Introduction

This post will be a bit adventurous, but hopefully the risk will pay off. As I am excited to jump into the main content, my introduction/message will be brief: I wish to convince you that everything can be put into words and pictures, even fuzzy intuition. I will attempt to do so by rambling abstractly about ideas that I believe I have understood to some extent, and I think are beautiful. Some meta-comments:

While I said 'pictures' above, I am too lazy to draw diagrams; I will thus describe them, and strongly encourage the use of imagination.
The amount I write about each idea varies; this does not correlate with how interesting/useful/powerful the idea is. If anything, it correlates with how much I thought about the idea.
Against my hopes, my perspectives on these ideas probably aren't the 'best', 'most complete', or perhaps even 'morally right' perspectives. You may understand things in a different way.
The list is non-linear; feel free to skip around.
The list is also nowhere close to being exhaustive. These are the first 20 ideas that came to mind.

Let's go!

1. Double counting and Abel summation

Here's an algebraic formulation of double counting:

$$\sum_{i=1}^n \sum_{j=1}^m a_{i,j} = \sum_{j=1}^m \sum_{i=1}^n a_{i,j}$$

The intuition comes from drawing a $2$-dimensional grid and populating it with 'weights' $a_{i,j}$, and computing the total weight. Of course, this generalises to higher-dimensional counting (think: sum of divisors of $2^8\cdot 3^7\cdot 5^{10}$). Double counting is also the underlying reason behind the simple fact that $m\cdot n = n\cdot m$ for integers $m,n$.

A twist on this idea is Abel summation:

$$\sum_{i=1}^n a_ib_j = \sum_{i=1}^n (a_1+\dots + a_i)(b_i-b_{i+1}) + (a_1+\dots+a_n)b_{n+1}$$

The intuition comes from computing the area of a staircase-shaped object in two ways. While the combinatorial picture is similar to double counting, the algebraic picture is completely different: it somehow relates 'dot product'-type sums where the $a_i$ are replaced by partial sums while $b_i$ are replaced by successive differences. (This is an example of an 'obvious' fact being reformulated into a 'non-obvious' one; we draw parallels with problem solving: we typically wish to reformulate 'non-obvious' problems into 'obvious' ones.)

Double counting also appears in other contexts such as graph theory: the interaction of vertices with edges enable several objects to be counted in more than one way. Some examples are the handshaking lemma, the proof of Mantel's theorem for triangle-free graphs, and bounds on $H$-free graphs for general graphs $H$.

In higher maths, this fits in a more general picture: treating the summations as operators, double counting is the fact that the operations commute (Fubini).

2. Pigeonhole principle and Ramsey

The pigeonhole principle is about excess: for instance, if you try to pour 10 litres of fruit punch into any number of cups with a total volume of 9 litres, you will necessarily create a mess by overflow.

It is often used in the following formulation: there are a lot of objects, each of which belongs to one of a (fixed) finite number of types; then at least one type must also have a lot of objects. However, finding the 'objects' and 'types' for a specific problem may require some creativity.

Its power becomes apparent when applied to Ramsey-type problems, with the theme of 'finding rigid substructure in abundance'. Some examples: (Ramsey) for a fixed number of colours and target graph $H$, there is always a monochromatic $H$-subgraph in a large coloured complete graph; or the happy ending problem concerning finding convex polygons (which requires a clever perspective); or Szemeredi's theorem on finding arithmetic progressions (which needs a lot more technical firepower to capture the 'abundance').

3. Probabilistic method

Slightly adjacent to Ramsey-type ideas is the idea that 'everything happens in chaos', i.e. if you are looking for a construction with certain structure, it is helpful to check if a random construction will be likely to succeed. This is often entwined with expectation (i.e. mean) calculations, with clever applications of linearity of expectation (which could be thought of as a form of double counting) These may be further supplemented with advanced techniques such as alterations (being close to the desire structure is typically good enough, if the amount of modification is insignificant on average) and second moment methods (the random outcomes should have significant spread).

Here's a (admittedly simplified) example: suppose you are playing a deterministic solitaire game involving a sequence of binary choices (say $A,B$), and you know that complete randomised choices should, on average (heuristically or otherwise), result in victory. One way to turn this into a deterministic algorithm is to compare the probability of winning by (i) picking $A$, then randomising all subsequent choices, versus (ii) picking $B$, then randomising all subsequent choices.

Some applications: crossing number bounds; existence of high-girth, high-chromatic number graphs; and games of choice.

4. Algorithms and monovariants

Regardless of whether a process is mentioned in the problem or not, thinking algorithmically is always a bonus. Typically, you define your algorithm such that when it terminates, the problem is solved. For instance, to optimise a quantity, one may consider tweaking the system to repeatedly increase/decrease said quantity (this is the spirit of gradient descent).

Typically, an algorithm is guaranteed to terminate by finiteness of total possibilities, together with a monovariant (monotone-varying quantity). This is typically the part that needs creativity: for instance, the algorithm may involve a local perturbation, and (e.g. by convexity) affect a quantity monotonically.

Algorithms can range from local (e.g. delete a vertex), to intermediate (e.g. invert a neighbour set), ton global (e.g. apply a permutation to all indices). Some common monovariants are size norms (e.g. maximum absolute value of an entry, minimum subclique size, number of edge crossings), energy (sum of squares; special case of norms), and entropy (more relevant to problems in information theory). Of course, it is highly dependent on the specifics of the algorithm; even something as esoteric as chromatic number could be one.

Some famous examples of algorithms are greedy, sorting/searching algorithms (e.g. breadth-first/depth-first), max-flow min-cut, Euclidean algorithm, sieves, Gram-Schmidt/Gaussian elimination, and dynamic programming-esque procedures. Those of you with CS background can appreciate these in other ways as well (e.g. in time complexity, or among randomised algorithms).

5. Invariants and colourings

Some algorithms preserve quantities; for instance, the algorithm that "evolves a closed physical system forward in time by 1 second" conserves total energy and momentum. Invariants typically hint at hidden structure, or a potential reduction in degrees of freedom.

In the discrete world, invariants may also be used to restrict possibilities. A common example is parity, or general colouring invariants (counts, possibly taken modulo $n$). Checkerboard colourings are among a plethora of creative possibilities. This also arises algebraically, for instance in roots-of-unity filters (see Fourier below).

6. Symmetries, transformations and duals

The unassuming fact that the entries of an $n\times n$ grid may be reflected across its diagonal, or rotated $90^\circ$, or that a certain polynomial is symmetric in its variables, or homogeneous in its degree, or that vertices $B,C$ of a geometry diagram play interchangeable roles, could very well be the key idea to a solution.

Sometimes, one may wish to symmetrise a non-symmetric object. In other times, one may wish to anti-symmetrise. In inequalities, this may be seen as homogenisation (e.g. adding extra factors of $x_{n+1}$ to match degrees) or specialisation (e.g. setting $x_1=1$).

The dual is a powerful concept; for instance, dual bases of a vector space, illustrating the complementary relationship between vectors and functionals. Or, dual problems in linear optimisation.

Transformations along symmetries, such as translation, scaling (homothety), rotation (more generally, spiral similarity), reflection, inversion, pole-polar, and perspective-shifting are exceptionally useful (and often the 'morally right' proofs) in classical geometry, but transformations arise and are just as powerful in any general context (e.g. rewriting polynomial $P(X) = Q(Y)$ by identifying $Y=X+1$, with the coefficients getting mixed by binomial coefficients).

Symmetries may also be used to group objects: for instance, each up-right path from $(0,0)$ to $(m,n)$ may be placed in the same class as all other paths obtained by cyclic-shifting the $m+n$ moves. This leads to an expression for the Catalan numbers. In general, the right language to express this is group theory: in particular, a group action and its orbits and stabilisers.

7. Inclusion-exclusion and Mobius inversion

Counting by splitting into cases is typically only effective if the cases are disjoint (or $m$-to-$1$ for fixed $m$, e.g. double counting). Nevertheless, if one can easily understand all errors (i.e. the count for any intersection of cases), then one may account for them in the final count. Inclusion-exclusion is exactly that, generalising

$$|{A\cup B}| = |A| + |B| - |A\cap B|.$$

A good way to formulate inclusion-exclusion algebraically is via indicator functions, or equivalently, boolean rings (i.e. $0-1$ vectors with pointwise addition and multiplication modulo $2$).

Mobius inversion is a special manifestation of inclusion-exclusion:

$$a_n = \sum_{d\mid n} b_d \implies b_n = \sum_{d\mid n} \mu(d) a_d,$$

where $\mu(d)\in \{0,\pm 1\}$ depends on number of distinct prime factors/being square-free. This arises more generally in the theory of arithmetic sequences and convolutions, with applications in analytic number theory.

8. Discretisation and continualisation

Given a grid populated with real numbers, one may wish to round them to the nearest integer. In the other direction, integer-valued grids may wish to be considered among real-valued (or rational-valued) ones.

Discretising allows for a notion of finiteness (e.g. allowing one to run algorithms that terminate). This may allow one to prove an inherently analytic fact via successive discrete approximations (e.g. Cauchy's functional equation, or even Riemann integration).

Conversely, continualisation allows us to apply analytic tools to inherently discrete objects, e.g. convexity bounds on binomial coefficients; or taking the adjacency matrix of a graph (a $0-1$ matrix) and bounding its complex eigenvalues to understand certain properties. The challenging step is to find the appropriate intermediate structure that bridges the discrete to the continuum.

9. Local-global

The Euler characteristic of a planar graph is an example of how local structure can be 'glued' together to obtain global structure. Chip firing problems is another manifestation of this. This is typically a powerful perspective that requires a deep understanding of the rigidity of local structure. A neat physical analog that illustrates the subtlety is how some certain materials are magnetic where others aren't: imagine if a material is formed from bajillions of tiny arrows, each trying to influence its neighbours to point in the same direction (whilst getting influenced itself), and how this results in global alignment in magnets (rigid structure!) but (chaotic) disordered arrangements in other materials (non-rigid).

In number theory, this takes a slightly different flavour: solving a problem mod $p$ for all $p$ could mean solving the problem over the integers (with a million subtleties).

In higher mathematics, this is a strong idea in complex analysis (holomorphic functions) and geometry (Gauss-Bonnet).

10. Just do it

With an infinite number of degrees of freedom, it is typically true (unless an 'obvious' obstruction is present) that one may satisfy finitely many constraints. One may push this a step further to satisfy infinitely many constraints, by addressing them one at a time (i.e. in a recursive, or iterative, way).

Here's a nice exercise very much in the spirit of 'just do it': on an infinite grid, show that one may make a sequence of horizontal/vertical jumps (with the $n$-th jump having distance $n$), such that every cell is visited exactly once.

The extreme version of this is transfinite induction.

11. (Strong) induction

Often, one needs to prove an infinite number of statements (e.g. "for all $n$/for all graphs with property $X$, something is true"). If it happens to be the case that the statements are interrelated (e.g. the property of $n$ is almost the property of $n+1$), it is convenient to order the statements, and prove them one at a time (getting to assume the result is true for all 'smaller' cases).

Induction seems to be the gradual buildup of larger and larger truth statements, but in practice it is often done by iterative reduction to smaller statements. This is exactly captured by the idea of arguing by contradiction and picking a 'smallest/minimal counterexample', then showing it could actually be reduced further. In a sense, it is a contrapositive of the extremal argument (see below). An example that illustrates the power of induction is Hall's theorem concerning matchings of a bipartite graph.

Warning: just because a statement can be proved by induction, doesn't mean that is the 'morally correct' proof! There are a considerable number of cases where proof by induction is 'cheating', in the sense that it does not shed light on how one would motivate the problem statement in the first place.

Take, for instance, the identity

$$1^3+2^3+\dots+n^3 = (1+2+\dots+n)^2.$$

The plain induction proof does not shed light on why we expect such identities to exist in the first place; for instance, it does not capture the fact that both sides should be polynomial in $n$, thus it suffices to check equality at finitely many $n$ (even allowing 'substituting negative values of $n$').

Tldr, induction is a powerful and sometimes essential tool; however, it could obscure motivating intuition if used in a reverse-engineering manner.

12. Generalisation

The true statement

$$1^3+2^3+\dots+100^3 = (1+2+\dots+100)^2$$

is conveniently proved by realising that the statement holds true replacing $100$ by a general $n$, and then running induction. Placing a statement among similar statements, or a structure among a wider class of structure, or a family as part of a larger family, is the spirit of generalisation.

Some examples: a statement about $n\times n$ square grids may be generalised to $m\times n$ rectangles. The subtle identification $\{n\} \leftrightarrow\{(n,n)\} \subset \{(m,n)\}$ is also hidden in Ramsey numbers: avoiding a monochromatic $n$-clique may be generalised by picking different $n$ for different colours. At the structural level, a graph-theoretic result may be translated to properties of its adjacency matrix, which is then generalised to a statement on $0-1$ matrices. Or, a set-theoretic result may be abstracted to an incidence graph, and then working in the language of graph theory: in particular, certain induction steps (e.g. removing an edge) may not translate well (or even possibly translate) to a corresponding operation on the original sets. In number theory, integer solutions may be considered as a subset of rational solutions (or even $p$-adic solutions).

At a higher level, this motivates abstraction (see below): e.g. metric spaces $\rightarrow$ topological spaces, surfaces $\rightarrow$ manifolds, polynomial diophantine equations $\rightarrow$ varieties.

In the opposite direction, to prove a very general statement, one may gain intuition by first proving it on a special class of well-behaved substructures (e.g. to prove a statement for all graphs, one may first prove it on all trees/bipartite graphs). The ideas that arise when trying to generalise the proof to wider and wider classes may motivate the exact ideas/stepping stone lemmas necessary for induction.

13. Extremal

As mentioned above, extremal ideas may be viewed as a contrapositive to induction, or more generally, algorithms with a monovariant. It underlies pigeonhole and probabilistic ideas. The base idea is that one may speak about lower/upper bounds of any set, function, or quantity (in analysis jargon, the infimum/supremum respectively), and extremal/near-extremal points probably have rigid structure (think: the minima of a generic polynomial looks almost quadratic.)

But more generally, extremal choices may be made anywhere within the proof (e.g. characterising the local structure of a vertex with maximal degree; or reducing modulo the smallest prime divisor). These may be motivated by experimentation: e.g. if the problem is to show $m,n$ are coprime, but several near-counterexamples have $m,n$ sharing the smallest prime divisor, it hints toward an investigation in that direction.

14. Convexity and smoothing

Convex objects (think: rocks) have nice intuitive geometric properties which may be formalised and proved: to describe one, the supporting hyperplane theorem states that any point on the boundary has a plane for which the convex objects lies entirely on one side (think: tangent to a U-shaped curve).

This is useful in inequalities: convex functions have 'post-linear interpolation always greater than pre-linear interpolation'. Almost all classical inequalities follow from this (except for order-dependent results such as the rearrangement inequality; and Schur's, which somehow removes the order dependence, but I can't quite put my finger on the underlying mechanism).

The 'tangent line method' is an example of using convexity arguments on a function that is not globally convex, but convex around the points that matter (i.e. the points that are close to being minimisers). Smoothing ideas are of the same spirit.

It is also worth noting the geometric relevance of convexity: for instance, a result on points on the plane may be viewed in a different perspective by focusing on the convex hull.

15. Prime valuations

The idea is that

$$v_p(x) \leq v_p(y) \quad \forall p \implies x\mid y$$

In particular, if you have solved the problem modulo $p^N$ for all primes $p$ and large $N$, then you have probably solved the problem.

One may go beyond this to talk about the $p$-adics (viewing the problem modulo $p^N$ for all $N$ at once).

16. Analysis

The idea is that

$$x\leq y+\varepsilon\quad\forall \varepsilon>0 \implies x\leq y;$$

In particular, if you have solved the problem with some leeway, for any arbitrarily small leeway, then you have probably solved the problem.

(Of course, this is not the only cool analysis idea; it's just one that I think is most useful and relevant at the high school olympiad level. Convexity and smoothing is an inherently analytic idea as well.)

17. Polynomials

The fact that long division and the Euclidean algorithm may be applied to polynomials (with the notion of degree) makes polynomials an interesting algebraic structure on its own ($\mathbb{Z}[X]$ is a Noetherian ring, and unique factorisation domain).

But it doesn't end there: one may move between a polynomial's formal sum representation (i.e. coefficients), its (geometric) graph (function interpretation: e.g. every continuous function may be approximated by polynomials), and its (analytic or number-theoretic) roots (i.e. in $\mathbb Q$, or modulo $p$, or in $\mathbb R$ or $\mathbb C$).

This makes it a wonderful mixing bowl for different paradigms to interact, in particular allowing the possibility of integrating discrete ($\mathbb Z$) and continuous ($\mathbb C$) techniques. In particular, it is amazing how an integer-coefficient polynomial can be evaluated at a third root-of-unity to get meaningful information.

And on top of all this, there is still additional structure from composing polynomials (perhaps linking to arithmetic dynamics).

Furthermore, polynomial expansions may involve binomial coefficients, which have combinatorial interpretations. It is no wonder that polynomials show up in combinatorics: e.g. chromatic polynomial of a graph, the Jones polynomial of knots, and generating functions in enumerative/additive combinatorics (power series also come into play, which link to analysis).

The Chebyshev polynomials are a peculiar collection that arise from trigonometric laws. Other 'common' classes of polynomials naturally arise from the eigenfunctions of certain differential equations.

Beyond olympiad mathematics, polynomials underlie algebraic varieties. In particular, the world of polynomials in more than one variable is vast and deeply structured.

Here is a measly attempt to explain why polynomials are interesting: just as the first non-trivial double counting arose in commutativity of integer multiplication, there is also a manifestly non-trivial way in which addition and multiplication interacts (called the distributive law), and polynomials precisely capture this structure.

18. Fourier and convolutions

The core idea is that the time-data of a signal is equivalent to its frequency-data.

Concretely, the roots-of-unity filter to obtain, say, only the terms of a polynomial with exponent a multiple of $3$, is a manifestation of the discrete Fourier transform. Or in number theory, the proof of Fermat's little theorem, or to show there are infinitely many $3$ mod $4$ primes. The latter is generalised to Dirichlet characters and (with supplementary analysis firepower) Dirichlet's theorem.

One perspective arises in linear algebra: the Fourier transform is an orthogonal/unitary (i.e. norm-preserving) linear transformation that picks out multiplicative self-similarity (in the sense of geometric progressions). This perspective generalises to representation theory (an area of higher mathematics where a lot of interesting things happen).

Convolutions are a way to translate between time-data operations and frequency-data operations. One may draw a parallel with convolutions of arithmetic sequences (related to Mobius inversion above).

In analytic number theory (advanced), Fourier ideas may play the role of picking out sub-arithmetic progressions (via, for instance, Fourier transform of an indicator function).

19. Projective geometry

Yes.

(That was initially all I had to say, but I feel compelled to explain: when I was young, I saw only the power, but not the underlying 'why it works', of projective techniques such as harmonic pencils. Now I still don't really know, except in the context of cross-ratios on the Riemann sphere ($\mathbb C$-geometry) being preserved by Möbius maps.)

(Also, even though I avoided learning and applying coordinate/complex/barycentric tools, I now wish I had at least tried to engage and find intuition regarding them: not for solving Euclidean geometry problems in particular, but more for drawing links to more advanced ideas in (general) geometry.)

20. Abstraction

In primary school, we had to get comfortable with algebra: using symbols to represent arbitrary numbers. We do so because it helps us work in generality: a useful perspective is that every expression is a function, and substituting in a value entails evaluating the function at that value.

In a similar veil, we start to expand our mathematics vocabulary so that we can (a) describe interesting structures, (b) make general arguments. Here are some common ones you may have already come across in olympiad mathematics:

Sets, relations, and functions as an abstraction of things, relationships, and input-output machines.
Posets as an abstraction of weak order (e.g. $a$ divides $b$, $b$ divides $c$ implies $a$ divides $c$).
Graphs as an abstraction of networks and connectivity. Directed graphs may capture the notion of flow, or tournaments.
Groups as an abstraction of symmetries or function compositions. Rings as a generalisation of the integers (with operations $+$ and $\times$). Fields as a generalisation of the rationals.
Quotients as an abstraction of 'equal up to some equivalence'. Products as an abstraction of pairs.
Polynomials, power series.
Primes as an abstraction of atoms (notion of irreducibility).
Vector spaces as an abstraction of space.
Functionals as an abstraction of coordinates.

The list goes on as you learn higher mathematics; it will become less and less clear what are the 'right definitions/properties/structures', but you will also then be able to ask and answer more questions, eventually forming a big picture.

Bonus. Scaffolding

Typically, textbooks are associated with theory built from definitions, examples, conjectures, lemmas, propositions, theorems, and corollaries. This poses a barrier of entry to new pursuers of mathematics, typically because the question "why is this relevant?" is neither asked nor answered until one pushes through and internalises a significant amount of buildup content.

This is noticeably similar to how people usually explain solutions or proofs: in bottom-up logical order (and not in the order of one's personal path to the solution, including wrong turns and soft ideas). While the former gets the content across, I see value in the latter: when done right, the intuition and roadmap is often more important than the actual details.

The reason I am talking about this is that a good portion of hard olympiad problems (by hard, I mean IMO Q3/6 level) require mini-theory building: one needs to do some experimentation, observe patterns and form conjectures, find appropriate substructures, and lay down stepping-stone lemmas on their own. (The other half of hard problems require creativity blended with intuition and experience!)

As a small self-contained example, Dirac's theorem concerning sufficiency condition for a Hamiltonian cycle has the stepping stone of paths: to extend a $k$-cycle to a $(k+1)$-cycle, one goes via the intermediate step of extending to a $k$-path. Several medium-difficulty inequalities also involve two steps, where the intermediate expression is entirely up to the problem solver to craft.

A more generic (but vague) example: if one wishes to show a function $f:\mathbb N\to \mathbb N$ is linear from a functional equation, one may consider the following subgoals:

Show $f$ is non-decreasing.
Show $f$ is bounded above by a linear function.
Show that successive differences $f(n+1)-f(n)$ are bounded.
Show that successive differences $f(n+1)-f(n)$ are (eventually) periodic.
Show that successive differences $f(n+1)-f(n)$ are (eventually) constant, i.e. $f$ is eventually linear.
Show that $f$ is linear.

A trailing remark: how to find these stepping stones is highly problem-dependent; however, one enlightening perspective may be to step into the shoes of the problem creator, in the sense that you are not trying to prove a given statement, but just trying to find any structure whatsoever (to verbalise this way of thinking: not "how do I get way over there?", but "what more can I get on top of what I have now?").

Honourable mentions.

The complex numbers and the fundamental theorem of algebra.

The Farey sequence.

Generating functions and exponential generating functions.

Bernoulli numbers.

Bertrand's postulate. (on this note, I should mention that although an elementary proof exists, as do elementary proofs of other non-trivial results, it need not always be the 'morally right' approach: more often than not, the path that avoids theory and makes clever but specific choices to bypass obstacles is also the contrived, hindsight 20/20, unintuitive approach.)

Quadratic reciprocity.

Zsigmondy.

Pascal/Brianchon.

Ptolemy.

Power of point/pencil of circles.

21. Conclusion

Thanks for making it this far! I hope that after reading this post, you

believe that it is in fact possible to verbalise intuition;
realise that you and I understand some things differently (which is good! it means we can both learn a new perspective);
are not offended by my mathematical taste (which blends with physics intuition);
are inspired (more than terrified) by the amount of cool math that exists;
try verbalising some of your own favourite ideas!

Search This Blog

SIMO Retiree Blog