Pell's Equation and Beyond

(Jit here). Today I am going to write about the Pell's equation x2dy2=1x^2 - dy^2 = 1, which hopefully should be familiar to readers, and about generalizations of it to higher degrees. Let's first review some facts about the integer solutions of a Pell's equation.

By negating xx and yy, we can focus on the solutions with (x,y)(x,y) being positive integers. If dd is a perfect square, then we can factorize the equation into (xdy)(x+dy)=1(x- \sqrt{d}y)(x + \sqrt{d}y) = 1 and it is easy to solve for all solutions. Else if dd is not a perfect square, then there exists a solution (x0,y0)(x_0,y_0) that is minimal among all positive integers, say arranged by either size of xx or yy, and more importantly one can generate all other solutions (xn,yn)(x_n,y_n) by the formula 

xn+ynd=(x0+y0d)n.x_n + y_n \sqrt{d} = (x_0 + y_0 \sqrt{d})^n.

Most of you who are in national team (or even senior team) should at least know these facts but have no idea how to prove them. Let's try to prove them now. The first step is to view solving the equation x2dy2=1x^2 - dy^2 = 1 from a different point of view. Consider the set 

Z[d]={a+bda,bZ}.\mathbb{Z}[\sqrt{d}] = \{a + b \sqrt{d} \mid a,b \in \mathbb{Z}\}. 

This is closed under addition and multiplication and forms what we call a ring, but that is not really important. The important thing is that there is a notion of a norm, which we denote by N()N( \cdot ), which define as

N(a+bd)=(a+bd)(abd)=1.N(a + b \sqrt{d}) = (a+b \sqrt{d})(a - b \sqrt{d}) = 1.

Thus solving the Pell's equation is equivalent to solving N(a+bd)=1N(a + b \sqrt{d}) = 1. It might not seem we did anything useful, but the interesting thing about the norm is that it is multiplicative; i.e. 

N(αβ)=N(α)N(β)N( \alpha \cdot \beta) = N(\alpha) N(\beta)

for α,βZ[d]\alpha,\beta \in \mathbb{Z}[\sqrt{d}]. It is not hard to simply expand and verify that this is true, but a more conceptual reason is that the norm multiplies a+bda + b \sqrt{d} with its conjugate abda - b \sqrt{d}. From an algebraic perspective in Q\mathbb{Q}, it is impossible to differentiate an element with its conjugate (it is similar to how people say ii could be swapped with i-i and mathematics would still be the same). In particular, the conjugate of a product should be the product of conjugates (otherwise we could differentiate them) and so the norm of a product is the product of norms.

Accepting that the norm is multiplicative, it becomes clear that if N(α)=1N(\alpha) = 1, then so is N(αn)=1N(\alpha^n) = 1. In particular given a solution a+bda+b \sqrt{d}, we can generate more elements with norm 11, and this is exactly how the solutions of the Pell's equation are generated from the fundamental solution. Let's assume that at least one solution among positive integers exist for x2dy2=1x^2 - dy^2 = 1. Then clearly we can choose one such that it is minimal among all positive integers, say α\alpha. We now have to show that all other solutions must be powers of α\alpha.

If there exists β=x+yd\beta = x + y \sqrt{d} with x,yx,y being positive integers, and β\beta not a power of α\alpha, we will try to construct a smaller solution. Among all elements with N(x)=1N(x) = 1, we can put an ordering among them by size since certainly if x1>x0x_1 > x_0, then y1y_1 is also >y0> y_0 if N(x0+y0d)=N(x1+y1d)=1N(x_0 + y_0 \sqrt{d}) = N(x_1 + y_1 \sqrt{d}) = 1. We check that x>yx > y is true if and only if xz>yzxz > yz is true. In particular, if we let αr\alpha^r be the largest power such that αr<β\alpha^r < \beta, we have

αr<β<αr+1    1<βαˉr<α\alpha^r < \beta < \alpha^{r+1} \implies 1 < \beta \cdot \bar{\alpha}^r < \alpha

which contradicts the minimality of α\alpha. Thus if we had a minimal solution, then all other solutions are generated by taking powers of it.

It suffices to show that there exists at least one solution to the Pell's equation, which is the hardest part. Let's say we had α,β\alpha,\beta such that N(α)=N(β)=mN(\alpha) = N(\beta) = m. Then N(αβ1)=1N(\alpha \beta^{-1}) = 1 which gives us a solution, except that αβ1\alpha \beta^{-1} might have rational coefficients and not integer coefficients. We now check what αβ1\alpha \beta^{-1} looks like. If we let α=a0+b0d\alpha = a_0 +b_0 \sqrt{d} and β=a1+b1d\beta = a_1+ b_1 \sqrt{d}, we get

αβ1=a0+b0da1+b1d=(a0+b0d)(a1b1d)a12db12=(a0a1db0b1)+(b0a1a0b1)dm.\alpha \beta^{-1} = \frac{a_0 + b_0 \sqrt{d}}{a_1 + b_1 \sqrt{d}} = \frac{(a_0+b_0 \sqrt{d})(a_1 - b_1 \sqrt{d})}{a_1^2 - db_1^2} = \frac{ (a_0 a_1 - d b_0 b_1) + (b_0 a_1 - a_0 b_1) \sqrt{d}}{m}.

Note that if a0a1(modm)a_0 \equiv a_1 \pmod m and b0b1(modm)b_0 \equiv b_1 \pmod m, the numerator reduces to a02db02a_0^2 - db_0^2 in mod mm and this is 0(modm)0 \pmod m. Thus it suffices to find a pair of solutions such that they are congruent mod mm, and in particular if we had infinitely many α\alpha's with N(α)=mN(\alpha) = m, we can definitely find two which are congruent modulo mm. Hence we reduced our problem of N(α)=1N(\alpha) = 1 to N(α)N(\alpha) being small.

To find α\alpha with N(α)N(\alpha) small, we see that x2dy2x^2 - dy^2 being small is equivalent to xyd\frac{x}{y} - \sqrt{d} being small. Now we know how to find approximations of the latter, By Dirichlet approximation, we can find infinitely many pairs (x,y)(x,y) such that 

xyd1y2    xyd1y    x2dy2x+ydy2d+1.\left| \frac{x}{y} - \sqrt{d} \right| \leq \frac{1}{y^2} \implies |x- y \sqrt{d}| \leq \frac{1}{y} \implies x^2 - dy^2 \leq \frac{x + y \sqrt{d}}{y} \leq 2 \sqrt{d} + 1.

Since 2d2 \sqrt{d} is bounded, we are now done - there exists some mm with m2d+1m \leq 2 \sqrt{d} + 1 such that x2dy2=mx^2 - dy^2 = m has infinitely many solutions, and then taking two solutions which are congruent to each other modulo mm, their quotient gives us a solution to x2dy2=1x^2 - dy^2 = 1 as desired. 

Let's move onto higher degrees. Now a naive generalization would be to simply replaces squares with cubes and get x3dy3=1x^3 - dy^3 = 1. However this in fact only have finitely many integer solutions and in fact behaves very differently from Pell's equation. The correct generalization is to generalize the norm function N()N( \cdot ) from Z[d]\mathbb{Z}[\sqrt{d}] to other rings. For example, if we take Z[23]\mathbb{Z}[\sqrt[3]{2}], then every element can be written in the form a+b23+c43a + b \sqrt[3]{2} + c \sqrt[3]{4} and our norm function is given by

N(a+b23+c43)=(a+b23+c43)(a+bω23+cω243)(a+bω223+cω43)=a3+2b3+4c36abc.N(a + b \sqrt[3]{2} + c \sqrt[3]{4}) = (a + b \sqrt[3]{2} + c \sqrt[3]{4})( a + b \omega \sqrt[3]{2} + c \omega^2 \sqrt[3]{4}) (a + b \omega^2 \sqrt[3]{2} + c \omega \sqrt[3]{4}) = a^3 + 2b^3 + 4c^3 - 6abc.

To come up with the norm function, the conjugates of 23\sqrt[3]{2} are ω23\omega \sqrt[3]{2} and ω223\omega^2 \sqrt[3]{2}. We then replace 23\sqrt[3]{2} with those, replace 43\sqrt[3]{4} with the square of the replacement, and take their products. One can verify that the norm is multiplicative and in particular N(α)=1    N(αn)=1N(\alpha) = 1 \implies N(\alpha^n) = 1. Now we have

N(1+23+43)=1    N((1+23+43)n)=1 for all n1.N(1 + \sqrt[3]{2} + \sqrt[3]{4}) = 1 \implies N( (1 + \sqrt[3]{2} + \sqrt[3]{4})^n) = 1 \text{ for all } n \geq 1.

This gives us infinitely many triplets (a,b,c)(a,b,c) such that a3+2b3+4c36abc=1a^3 + 2b^3 + 4c^3 - 6abc = 1, which is exactly a USA TST problem. Of course there is still more to be desired: we haven't showed that all solutions can be generated from a smallest solution. Since there are three variables in play this time, it is no longer clear if we can arrange them by size and perform the same argument. Also, in the case of 23\sqrt[3]{2}, we were lucky that (1,1,1)(1,1,1) gave a solution to our cubic Pell's equation. In general for n3\sqrt[3]{n}, we have to solve

a3+nb3+n2c33nabc=1a^3 + nb^3 + n^2 c^3 - 3nabc = 1

and it is no longer clear whether there is a solution in positive integers if nn is not a perfect cube. Let's first try to prove that all solutions are generated by a minimal one. We first try to embed Z[n3]\mathbb{Z}[\sqrt[3]{n}] into a higher dimensional space so that it becomes a lattice. Currently it looks like a three dimensional lattice squeezed into the one dimensional line R\mathbb{R}

We let V=R×CV = \mathbb{R} \times \mathbb{C} and we embed Z[n3]\mathbb{Z}[\sqrt[3]{n}] into it by sending 

a+bn3+cn23(a+bn3+cn23,a+bωn3+cω2n23).a + b \sqrt[3]{n} + c \sqrt[3]{n^2} \mapsto (a + b \sqrt[3]{n} + c \sqrt[3]{n^2}, a + b \omega \sqrt[3]{n} + c \omega^2 \sqrt[3]{n^2}).

We don't bother using the last factor since it is the complex conjugate of the second factor and so does not carry any extra information. Then clearly the map is injective as any two points with the same image must have all conjugates equal, and thus be equal. The more interesting claim is that this embeds Z[n3]\mathbb{Z}[\sqrt[3]{n}] as a discrete lattice inside VV. To prove this, it suffices to prove that the vectors (1,1),(n3,ωn3),(n23,ω2n23)(1,1), (\sqrt[3]{n}, \omega \sqrt[3]{n}), (\sqrt[3]{n^2}, \omega^2 \sqrt[3]{n^2}) are linearly independent over R\mathbb{R}. But this is easy to check. Hence we have embedded our ring Z[n3]\mathbb{Z}[\sqrt[3]{n}] as a discrete lattice inside a three-dimensional space VV

Now where do our units lie? If we define a norm function on V by 

N(v)=xy2 for v=(x,y),N(v) = x \cdot |y|^2 \text{ for } v = (x,y),

where y|y| is the norm of the complex number yy, then it lies on the subset N(v)=1N(v) = 1. Let's try to understand what the subset N(v)=1N(v) = 1 looks like. It is closed under multiplication, so if we took log's we would get a vector subspace. So looking at (logx,2logy)R2(\log |x|, 2 \log |y|) \subseteq \mathbb{R}^2, we are looking at the subspace satisfying x1+x2=0x_1 + x_2 = 0. Hence it is a one-dimensional line and since we are looking at lattice points inside this one-dimensional line, we should expect to get a one-dimensional lattice in it. 

Ok now there are two difficulties that I sidestepped. One is that taking absolute values destroys lots of information on the complex side. Hence one might ask whether looking at L(v)=(logx,2logy)L(v) = (\log |x|, 2 \log |y|) preserves information about our lattice. The answer is yes - for any point (c,d)(c,d) that lies in the image of our lattice, the other lattice points that map to the same image differ by (x1,x2)(x_1,x_2) where x1=x2=1|x_1| = |x_2| = 1. Hence we are trying to classify elements of Z[n3]\mathbb{Z}[\sqrt[3]{n}] with all conjugates having norm 11. It is a theorem of Kronecker, that works in general, that says the only such elements are exactly the roots of unity. It uses a clever idea, but is not hard once you know it. If we let α\alpha be such a number, and let α1,α2,α3\alpha_1,\alpha_2,\alpha_3 be the three conjugates (including itself), then we know that αirZ[n3]\alpha_i^r \in \mathbb{Z}[\sqrt[3]{n}] for any r0r \geq 0, and in particular the polynomial

(xα1r)(xα2r)(xα3r)(x - \alpha_1^r)(x-\alpha_2^r)(x-\alpha_3^r)

is a monic polynomial with integer coefficients. But αir=1|\alpha_i^r| = 1 and so expanding out, using the triangle inequality it is clear that there are only finitely many options for the coefficients. In particular there must exist r1,r2r_1,r_2 such that the two polynomials obtained this way are the exact same polynomial, and hence share roots. In particular we get αir1=αjr2\alpha_i^{r_1} = \alpha_j^{r_2} for some i,ji,j. Repeating this for other rir_i's, it is easy to show that α\alpha must then be a root of unity.

So we don't lose much information by taking absolute values of our elements, at least about our lattice. The second difficulty sidestepped is that nobody said the image is going to remain discrete. It is possible that the image is something like how Z[n3]\mathbb{Z}[\sqrt[3]{n}] was inside R\mathbb{R} and so it is not necessary it is a one dimensional lattice. To show our lattice is discrete, it suffices to check that there are only finitely many elements inside some open ball around the origin. This corresponds to the region

1Cx,yC -\frac{1}{C} \leq |x| , |y| \leq C

for some C>0C > 0. But by the same argument of looking at the polynomial with roots being the three conjugates, it is clear that the possibilities of the coefficients of the polynomial formed is finite, and so again there are only finitely many elements of Z[n3]\mathbb{Z}[\sqrt[3]{n}] that lie in this region and so the image of Z[n3]\mathbb{Z}[\sqrt[3]{n}] under 

a+bn3+cn23(loga+bn3+cn23,2loga+ b ωn3+cω2n23)a + b \sqrt[3]{n} + c \sqrt[3]{n^2} \mapsto (\log |a + b \sqrt[3]{n} + c \sqrt[3]{n^2}|, 2 \log | a +  b  \omega\sqrt[3]{n} + c \omega^2 \sqrt[3]{n^2}|)

inside the line x1+x2=0x_1 + x_2 = 0 is discrete, and since it is a lattice, it must be a one dimensional lattice and so generated by some minimal element as desired.

We now come to the question of whether a single solution exists. Much like how for Pell's equation we showed there are solutions with x2dy2x^2 - dy^2 being small, before proving the existence for x2dy2=1x^2 - dy^2 = 1, we will do the same for the cubic version. If we move back to V=R×CV = \mathbb{R} \times \mathbb{C}, we have this lower dimensional set {N(v)=1}\{N(v) = 1\} and we wish to exhibit lattice points in it. But since our set is lower dimensional, it is possible that our lattice simply misses it. Instead, we are going to fatten our set and then show that a lattice point in this enlarged version. This will correspond to solutions with N(x)N(x) being small. 

To produce lattice points in a region, we are going to use Minkowski's theorem. I won't bother proving it here, although the proof is a simple pigeonhole, but you can google and read about it. Given a discrete nn-dimensional lattice LL inside Rn\mathbb{R}^n, there exists a fundamental domain for it, which is given by looking at 

D={a1x1+a2x2++anxn0ai1}D = \{a_1x_1 + a_2x_2 + \cdots + a_n x_n \mid 0 \leq a_i \leq 1\}

for a Z\mathbb{Z}-basis x1,,xnx_1,\ldots,x_n. Minkowski's theorem states that if we have a convex subset that is symmetric about the origin, meaning if xSx \in S then so is x-x, then if vol(S)>2nvol(D)\operatorname{vol}(S) > 2^n \operatorname{vol}(D), then SS must contain a non-zero lattice point of DD. Now we are going to take the lattice Z[n3]\mathbb{Z}[\sqrt[3]{n}] inside VV as our lattice. We then choose a convex symmetric subset OO with volume larger than 2nvol(D)2^n \operatorname{vol}(D). Now given a vv such that N(v)=1N(v) = 1, we look at v1Ov^{-1}O, which is the subset OO multiplied by vv. The claim is that multiplication by vv scales the volume by exactly N(v)N(v), and so the volume of v1Ov^{-1}O is the same as the volume of OO. Also, it remains convex and symmetric about the origin, and so Minkowski's theorem applies and we obtain a non-zero lattice point inside v1Ov^{-1}O, say aa. Then vava lies inside OO and so N(va)N(va) is bounded, implying that N(a)N(a) is bounded. We now repeat this argument with a sequence of vn=(xn,yn)v_n = (x_n,y_n)'s, say with xnx_n much larger than xn1x_{n-1}. This will ensure that the first coordinate of ana_n is much smaller than that of an1a_{n-1} and so we obtain an infinite sequence of ana_n's with N(an)N(a_n) all uniformly bounded. 

Now finally, it is a fact that for xZ[n3]x \in \mathbb{Z}[\sqrt[3]{n}], the norm N(x)N(x) is exactly the size of distinct remainders when taken modulo xx, or the size of the quotient Z[n3]/(x)\mathbb{Z}[\sqrt[3]{n}]/(x), where (x)(x) is the set of multiples of xx. Since Z[n3]\mathbb{Z}[\sqrt[3]{n}] is a three dimensional lattice, we can prove that there are only finitely many sublattices whose quotient has a given size. Thus if we have infinitely many xx with N(x)N(x) all equal, there must be two of them such that they generate the same set of multiples, and in particular xx is a multiple of xx' and so their quotient is an element of Z[n3]\mathbb{Z}[\sqrt[3]{n}] with norm 11, solving the equation

a3+nb3+n2c33nabc=1.a^3 + nb^3 + n^2 c^3 - 3n abc = 1.

This solves the "cubic" Pell's equation that we have. A natural question to ask is what about higher degrees, such as Z[nm]\mathbb{Z}[\sqrt[m]{n}], or even Z[α]\mathbb{Z}[\alpha] where α\alpha satisfies some monic polynomial with integer coefficients. It turns out that the argument for the cubic Pell's generalizes easily, and I leave it as an exercise to find out what the set of solutions look like. For example, with Z[25]\mathbb{Z}[\sqrt[5]{2}], it turns out that there are two fundamental solutions that are independent of each other, and together they generate all other solutions. So we get a two-dimensional lattice, and not a one-dimensional lattice like in the Pell's equation and the cubic case as we just saw. 


 

Comments

Popular posts from this blog

SMO Open 2024 ??% Speedrun

Musings about the IMO