Pell's Equation and Beyond

(Jit here). Today I am going to write about the Pell's equation $x^2 - dy^2 = 1$, which hopefully should be familiar to readers, and about generalizations of it to higher degrees. Let's first review some facts about the integer solutions of a Pell's equation.

By negating $x$ and $y$, we can focus on the solutions with $(x,y)$ being positive integers. If $d$ is a perfect square, then we can factorize the equation into $(x- \sqrt{d}y)(x + \sqrt{d}y) = 1$ and it is easy to solve for all solutions. Else if $d$ is not a perfect square, then there exists a solution $(x_0,y_0)$ that is minimal among all positive integers, say arranged by either size of $x$ or $y$, and more importantly one can generate all other solutions $(x_n,y_n)$ by the formula 

$$x_n + y_n \sqrt{d} = (x_0 + y_0 \sqrt{d})^n.$$

Most of you who are in national team (or even senior team) should at least know these facts but have no idea how to prove them. Let's try to prove them now. The first step is to view solving the equation $x^2 - dy^2 = 1$ from a different point of view. Consider the set 

$$\mathbb{Z}[\sqrt{d}] = \{a + b \sqrt{d} \mid a,b \in \mathbb{Z}\}.$$ 

This is closed under addition and multiplication and forms what we call a ring, but that is not really important. The important thing is that there is a notion of a norm, which we denote by $N( \cdot )$, which define as

$$N(a + b \sqrt{d}) = (a+b \sqrt{d})(a - b \sqrt{d}) = 1.$$

Thus solving the Pell's equation is equivalent to solving $N(a + b \sqrt{d}) = 1$. It might not seem we did anything useful, but the interesting thing about the norm is that it is multiplicative; i.e. 

$$N( \alpha \cdot \beta) = N(\alpha) N(\beta)$$

for $\alpha,\beta \in \mathbb{Z}[\sqrt{d}]$. It is not hard to simply expand and verify that this is true, but a more conceptual reason is that the norm multiplies $a + b \sqrt{d}$ with its conjugate $a - b \sqrt{d}$. From an algebraic perspective in $\mathbb{Q}$, it is impossible to differentiate an element with its conjugate (it is similar to how people say $i$ could be swapped with $-i$ and mathematics would still be the same). In particular, the conjugate of a product should be the product of conjugates (otherwise we could differentiate them) and so the norm of a product is the product of norms.

Accepting that the norm is multiplicative, it becomes clear that if $N(\alpha) = 1$, then so is $N(\alpha^n) = 1$. In particular given a solution $a+b \sqrt{d}$, we can generate more elements with norm $1$, and this is exactly how the solutions of the Pell's equation are generated from the fundamental solution. Let's assume that at least one solution among positive integers exist for $x^2 - dy^2 = 1$. Then clearly we can choose one such that it is minimal among all positive integers, say $\alpha$. We now have to show that all other solutions must be powers of $\alpha$.

If there exists $\beta = x + y \sqrt{d}$ with $x,y$ being positive integers, and $\beta$ not a power of $\alpha$, we will try to construct a smaller solution. Among all elements with $N(x) = 1$, we can put an ordering among them by size since certainly if $x_1 > x_0$, then $y_1$ is also $> y_0$ if $N(x_0 + y_0 \sqrt{d}) = N(x_1 + y_1 \sqrt{d}) = 1$. We check that $x > y$ is true if and only if $xz > yz$ is true. In particular, if we let $\alpha^r$ be the largest power such that $\alpha^r < \beta$, we have

$$\alpha^r < \beta < \alpha^{r+1} \implies 1 < \beta \cdot \bar{\alpha}^r < \alpha$$

which contradicts the minimality of $\alpha$. Thus if we had a minimal solution, then all other solutions are generated by taking powers of it.

It suffices to show that there exists at least one solution to the Pell's equation, which is the hardest part. Let's say we had $\alpha,\beta$ such that $N(\alpha) = N(\beta) = m$. Then $N(\alpha \beta^{-1}) = 1$ which gives us a solution, except that $\alpha \beta^{-1}$ might have rational coefficients and not integer coefficients. We now check what $\alpha \beta^{-1}$ looks like. If we let $\alpha = a_0 +b_0 \sqrt{d}$ and $\beta = a_1+ b_1 \sqrt{d}$, we get

$$\alpha \beta^{-1} = \frac{a_0 + b_0 \sqrt{d}}{a_1 + b_1 \sqrt{d}} = \frac{(a_0+b_0 \sqrt{d})(a_1 - b_1 \sqrt{d})}{a_1^2 - db_1^2} = \frac{ (a_0 a_1 - d b_0 b_1) + (b_0 a_1 - a_0 b_1) \sqrt{d}}{m}.$$

Note that if $a_0 \equiv a_1 \pmod m$ and $b_0 \equiv b_1 \pmod m$, the numerator reduces to $a_0^2 - db_0^2$ in mod $m$ and this is $0 \pmod m$. Thus it suffices to find a pair of solutions such that they are congruent mod $m$, and in particular if we had infinitely many $\alpha$'s with $N(\alpha) = m$, we can definitely find two which are congruent modulo $m$. Hence we reduced our problem of $N(\alpha) = 1$ to $N(\alpha)$ being small.

To find $\alpha$ with $N(\alpha)$ small, we see that $x^2 - dy^2$ being small is equivalent to $\frac{x}{y} - \sqrt{d}$ being small. Now we know how to find approximations of the latter, By Dirichlet approximation, we can find infinitely many pairs $(x,y)$ such that 

$$\left| \frac{x}{y} - \sqrt{d} \right| \leq \frac{1}{y^2} \implies |x- y \sqrt{d}| \leq \frac{1}{y} \implies x^2 - dy^2 \leq \frac{x + y \sqrt{d}}{y} \leq 2 \sqrt{d} + 1.$$

Since $2 \sqrt{d}$ is bounded, we are now done - there exists some $m$ with $m \leq 2 \sqrt{d} + 1$ such that $x^2 - dy^2 = m$ has infinitely many solutions, and then taking two solutions which are congruent to each other modulo $m$, their quotient gives us a solution to $x^2 - dy^2 = 1$ as desired. 

Let's move onto higher degrees. Now a naive generalization would be to simply replaces squares with cubes and get $x^3 - dy^3 = 1$. However this in fact only have finitely many integer solutions and in fact behaves very differently from Pell's equation. The correct generalization is to generalize the norm function $N( \cdot )$ from $\mathbb{Z}[\sqrt{d}]$ to other rings. For example, if we take $\mathbb{Z}[\sqrt[3]{2}]$, then every element can be written in the form $a + b \sqrt[3]{2} + c \sqrt[3]{4}$ and our norm function is given by

$$N(a + b \sqrt[3]{2} + c \sqrt[3]{4}) = (a + b \sqrt[3]{2} + c \sqrt[3]{4})( a + b \omega \sqrt[3]{2} + c \omega^2 \sqrt[3]{4}) (a + b \omega^2 \sqrt[3]{2} + c \omega \sqrt[3]{4}) = a^3 + 2b^3 + 4c^3 - 6abc.$$

To come up with the norm function, the conjugates of $\sqrt[3]{2}$ are $\omega \sqrt[3]{2}$ and $\omega^2 \sqrt[3]{2}$. We then replace $\sqrt[3]{2}$ with those, replace $\sqrt[3]{4}$ with the square of the replacement, and take their products. One can verify that the norm is multiplicative and in particular $N(\alpha) = 1 \implies N(\alpha^n) = 1$. Now we have

$$N(1 + \sqrt[3]{2} + \sqrt[3]{4}) = 1 \implies N( (1 + \sqrt[3]{2} + \sqrt[3]{4})^n) = 1 \text{ for all } n \geq 1.$$

This gives us infinitely many triplets $(a,b,c)$ such that $a^3 + 2b^3 + 4c^3 - 6abc = 1$, which is exactly a USA TST problem. Of course there is still more to be desired: we haven't showed that all solutions can be generated from a smallest solution. Since there are three variables in play this time, it is no longer clear if we can arrange them by size and perform the same argument. Also, in the case of $\sqrt[3]{2}$, we were lucky that $(1,1,1)$ gave a solution to our cubic Pell's equation. In general for $\sqrt[3]{n}$, we have to solve

$$a^3 + nb^3 + n^2 c^3 - 3nabc = 1$$

and it is no longer clear whether there is a solution in positive integers if $n$ is not a perfect cube. Let's first try to prove that all solutions are generated by a minimal one. We first try to embed $\mathbb{Z}[\sqrt[3]{n}]$ into a higher dimensional space so that it becomes a lattice. Currently it looks like a three dimensional lattice squeezed into the one dimensional line $\mathbb{R}$. 

We let $V = \mathbb{R} \times \mathbb{C}$ and we embed $\mathbb{Z}[\sqrt[3]{n}]$ into it by sending 

$$a + b \sqrt[3]{n} + c \sqrt[3]{n^2} \mapsto (a + b \sqrt[3]{n} + c \sqrt[3]{n^2}, a + b \omega \sqrt[3]{n} + c \omega^2 \sqrt[3]{n^2}).$$

We don't bother using the last factor since it is the complex conjugate of the second factor and so does not carry any extra information. Then clearly the map is injective as any two points with the same image must have all conjugates equal, and thus be equal. The more interesting claim is that this embeds $\mathbb{Z}[\sqrt[3]{n}]$ as a discrete lattice inside $V$. To prove this, it suffices to prove that the vectors $(1,1), (\sqrt[3]{n}, \omega \sqrt[3]{n}), (\sqrt[3]{n^2}, \omega^2 \sqrt[3]{n^2})$ are linearly independent over $\mathbb{R}$. But this is easy to check. Hence we have embedded our ring $\mathbb{Z}[\sqrt[3]{n}]$ as a discrete lattice inside a three-dimensional space $V$. 

Now where do our units lie? If we define a norm function on V by 

$$N(v) = x \cdot |y|^2 \text{ for } v = (x,y),$$

where $|y|$ is the norm of the complex number $y$, then it lies on the subset $N(v) = 1$. Let's try to understand what the subset $N(v) = 1$ looks like. It is closed under multiplication, so if we took log's we would get a vector subspace. So looking at $(\log |x|, 2 \log |y|) \subseteq \mathbb{R}^2$, we are looking at the subspace satisfying $x_1 + x_2 = 0$. Hence it is a one-dimensional line and since we are looking at lattice points inside this one-dimensional line, we should expect to get a one-dimensional lattice in it. 

Ok now there are two difficulties that I sidestepped. One is that taking absolute values destroys lots of information on the complex side. Hence one might ask whether looking at $L(v) = (\log |x|, 2 \log |y|)$ preserves information about our lattice. The answer is yes - for any point $(c,d)$ that lies in the image of our lattice, the other lattice points that map to the same image differ by $(x_1,x_2)$ where $|x_1| = |x_2| = 1$. Hence we are trying to classify elements of $\mathbb{Z}[\sqrt[3]{n}]$ with all conjugates having norm $1$. It is a theorem of Kronecker, that works in general, that says the only such elements are exactly the roots of unity. It uses a clever idea, but is not hard once you know it. If we let $\alpha$ be such a number, and let $\alpha_1,\alpha_2,\alpha_3$ be the three conjugates (including itself), then we know that $\alpha_i^r \in \mathbb{Z}[\sqrt[3]{n}]$ for any $r \geq 0$, and in particular the polynomial

$$(x - \alpha_1^r)(x-\alpha_2^r)(x-\alpha_3^r)$$

is a monic polynomial with integer coefficients. But $|\alpha_i^r| = 1$ and so expanding out, using the triangle inequality it is clear that there are only finitely many options for the coefficients. In particular there must exist $r_1,r_2$ such that the two polynomials obtained this way are the exact same polynomial, and hence share roots. In particular we get $\alpha_i^{r_1} = \alpha_j^{r_2}$ for some $i,j$. Repeating this for other $r_i$'s, it is easy to show that $\alpha$ must then be a root of unity.

So we don't lose much information by taking absolute values of our elements, at least about our lattice. The second difficulty sidestepped is that nobody said the image is going to remain discrete. It is possible that the image is something like how $\mathbb{Z}[\sqrt[3]{n}]$ was inside $\mathbb{R}$ and so it is not necessary it is a one dimensional lattice. To show our lattice is discrete, it suffices to check that there are only finitely many elements inside some open ball around the origin. This corresponds to the region

$$ -\frac{1}{C} \leq |x| , |y| \leq C$$

for some $C > 0$. But by the same argument of looking at the polynomial with roots being the three conjugates, it is clear that the possibilities of the coefficients of the polynomial formed is finite, and so again there are only finitely many elements of $\mathbb{Z}[\sqrt[3]{n}]$ that lie in this region and so the image of $\mathbb{Z}[\sqrt[3]{n}]$ under 

$$a + b \sqrt[3]{n} + c \sqrt[3]{n^2} \mapsto (\log |a + b \sqrt[3]{n} + c \sqrt[3]{n^2}|, 2 \log | a +  b  \omega\sqrt[3]{n} + c \omega^2 \sqrt[3]{n^2}|)$$

inside the line $x_1 + x_2 = 0$ is discrete, and since it is a lattice, it must be a one dimensional lattice and so generated by some minimal element as desired.

We now come to the question of whether a single solution exists. Much like how for Pell's equation we showed there are solutions with $x^2 - dy^2$ being small, before proving the existence for $x^2 - dy^2 = 1$, we will do the same for the cubic version. If we move back to $V = \mathbb{R} \times \mathbb{C}$, we have this lower dimensional set $\{N(v) = 1\}$ and we wish to exhibit lattice points in it. But since our set is lower dimensional, it is possible that our lattice simply misses it. Instead, we are going to fatten our set and then show that a lattice point in this enlarged version. This will correspond to solutions with $N(x)$ being small. 

To produce lattice points in a region, we are going to use Minkowski's theorem. I won't bother proving it here, although the proof is a simple pigeonhole, but you can google and read about it. Given a discrete $n$-dimensional lattice $L$ inside $\mathbb{R}^n$, there exists a fundamental domain for it, which is given by looking at 

$$D = \{a_1x_1 + a_2x_2 + \cdots + a_n x_n \mid 0 \leq a_i \leq 1\}$$

for a $\mathbb{Z}$-basis $x_1,\ldots,x_n$. Minkowski's theorem states that if we have a convex subset that is symmetric about the origin, meaning if $x \in S$ then so is $-x$, then if $\operatorname{vol}(S) > 2^n \operatorname{vol}(D)$, then $S$ must contain a non-zero lattice point of $D$. Now we are going to take the lattice $\mathbb{Z}[\sqrt[3]{n}]$ inside $V$ as our lattice. We then choose a convex symmetric subset $O$ with volume larger than $2^n \operatorname{vol}(D)$. Now given a $v$ such that $N(v) = 1$, we look at $v^{-1}O$, which is the subset $O$ multiplied by $v$. The claim is that multiplication by $v$ scales the volume by exactly $N(v)$, and so the volume of $v^{-1}O$ is the same as the volume of $O$. Also, it remains convex and symmetric about the origin, and so Minkowski's theorem applies and we obtain a non-zero lattice point inside $v^{-1}O$, say $a$. Then $va$ lies inside $O$ and so $N(va)$ is bounded, implying that $N(a)$ is bounded. We now repeat this argument with a sequence of $v_n = (x_n,y_n)$'s, say with $x_n$ much larger than $x_{n-1}$. This will ensure that the first coordinate of $a_n$ is much smaller than that of $a_{n-1}$ and so we obtain an infinite sequence of $a_n$'s with $N(a_n)$ all uniformly bounded. 

Now finally, it is a fact that for $x \in \mathbb{Z}[\sqrt[3]{n}]$, the norm $N(x)$ is exactly the size of distinct remainders when taken modulo $x$, or the size of the quotient $\mathbb{Z}[\sqrt[3]{n}]/(x)$, where $(x)$ is the set of multiples of $x$. Since $\mathbb{Z}[\sqrt[3]{n}]$ is a three dimensional lattice, we can prove that there are only finitely many sublattices whose quotient has a given size. Thus if we have infinitely many $x$ with $N(x)$ all equal, there must be two of them such that they generate the same set of multiples, and in particular $x$ is a multiple of $x'$ and so their quotient is an element of $\mathbb{Z}[\sqrt[3]{n}]$ with norm $1$, solving the equation

$$a^3 + nb^3 + n^2 c^3 - 3n abc = 1.$$

This solves the "cubic" Pell's equation that we have. A natural question to ask is what about higher degrees, such as $\mathbb{Z}[\sqrt[m]{n}]$, or even $\mathbb{Z}[\alpha]$ where $\alpha$ satisfies some monic polynomial with integer coefficients. It turns out that the argument for the cubic Pell's generalizes easily, and I leave it as an exercise to find out what the set of solutions look like. For example, with $\mathbb{Z}[\sqrt[5]{2}]$, it turns out that there are two fundamental solutions that are independent of each other, and together they generate all other solutions. So we get a two-dimensional lattice, and not a one-dimensional lattice like in the Pell's equation and the cubic case as we just saw. 


 

Comments

Popular posts from this blog

SMO Open 2024 ??% Speedrun

Musings about the IMO