Pell's Equation and Beyond
(Jit here). Today I am going to write about the Pell's equation , which hopefully should be familiar to readers, and about generalizations of it to higher degrees. Let's first review some facts about the integer solutions of a Pell's equation.
By negating and , we can focus on the solutions with being positive integers. If is a perfect square, then we can factorize the equation into and it is easy to solve for all solutions. Else if is not a perfect square, then there exists a solution that is minimal among all positive integers, say arranged by either size of or , and more importantly one can generate all other solutions by the formula
Most of you who are in national team (or even senior team) should at least know these facts but have no idea how to prove them. Let's try to prove them now. The first step is to view solving the equation from a different point of view. Consider the set
This is closed under addition and multiplication and forms what we call a ring, but that is not really important. The important thing is that there is a notion of a norm, which we denote by , which define as
Thus solving the Pell's equation is equivalent to solving . It might not seem we did anything useful, but the interesting thing about the norm is that it is multiplicative; i.e.
for . It is not hard to simply expand and verify that this is true, but a more conceptual reason is that the norm multiplies with its conjugate . From an algebraic perspective in , it is impossible to differentiate an element with its conjugate (it is similar to how people say could be swapped with and mathematics would still be the same). In particular, the conjugate of a product should be the product of conjugates (otherwise we could differentiate them) and so the norm of a product is the product of norms.
Accepting that the norm is multiplicative, it becomes clear that if , then so is . In particular given a solution , we can generate more elements with norm , and this is exactly how the solutions of the Pell's equation are generated from the fundamental solution. Let's assume that at least one solution among positive integers exist for . Then clearly we can choose one such that it is minimal among all positive integers, say . We now have to show that all other solutions must be powers of .
If there exists with being positive integers, and not a power of , we will try to construct a smaller solution. Among all elements with , we can put an ordering among them by size since certainly if , then is also if . We check that is true if and only if is true. In particular, if we let be the largest power such that , we have
which contradicts the minimality of . Thus if we had a minimal solution, then all other solutions are generated by taking powers of it.
It suffices to show that there exists at least one solution to the Pell's equation, which is the hardest part. Let's say we had such that . Then which gives us a solution, except that might have rational coefficients and not integer coefficients. We now check what looks like. If we let and , we get
Note that if and , the numerator reduces to in mod and this is . Thus it suffices to find a pair of solutions such that they are congruent mod , and in particular if we had infinitely many 's with , we can definitely find two which are congruent modulo . Hence we reduced our problem of to being small.
To find with small, we see that being small is equivalent to being small. Now we know how to find approximations of the latter, By Dirichlet approximation, we can find infinitely many pairs such that
Since is bounded, we are now done - there exists some with such that has infinitely many solutions, and then taking two solutions which are congruent to each other modulo , their quotient gives us a solution to as desired.
Let's move onto higher degrees. Now a naive generalization would be to simply replaces squares with cubes and get . However this in fact only have finitely many integer solutions and in fact behaves very differently from Pell's equation. The correct generalization is to generalize the norm function from to other rings. For example, if we take , then every element can be written in the form and our norm function is given by
To come up with the norm function, the conjugates of are and . We then replace with those, replace with the square of the replacement, and take their products. One can verify that the norm is multiplicative and in particular . Now we have
This gives us infinitely many triplets such that , which is exactly a USA TST problem. Of course there is still more to be desired: we haven't showed that all solutions can be generated from a smallest solution. Since there are three variables in play this time, it is no longer clear if we can arrange them by size and perform the same argument. Also, in the case of , we were lucky that gave a solution to our cubic Pell's equation. In general for , we have to solve
and it is no longer clear whether there is a solution in positive integers if is not a perfect cube. Let's first try to prove that all solutions are generated by a minimal one. We first try to embed into a higher dimensional space so that it becomes a lattice. Currently it looks like a three dimensional lattice squeezed into the one dimensional line .
We let and we embed into it by sending
We don't bother using the last factor since it is the complex conjugate of the second factor and so does not carry any extra information. Then clearly the map is injective as any two points with the same image must have all conjugates equal, and thus be equal. The more interesting claim is that this embeds as a discrete lattice inside . To prove this, it suffices to prove that the vectors are linearly independent over . But this is easy to check. Hence we have embedded our ring as a discrete lattice inside a three-dimensional space .
Now where do our units lie? If we define a norm function on V by
where is the norm of the complex number , then it lies on the subset . Let's try to understand what the subset looks like. It is closed under multiplication, so if we took log's we would get a vector subspace. So looking at , we are looking at the subspace satisfying . Hence it is a one-dimensional line and since we are looking at lattice points inside this one-dimensional line, we should expect to get a one-dimensional lattice in it.
Ok now there are two difficulties that I sidestepped. One is that taking absolute values destroys lots of information on the complex side. Hence one might ask whether looking at preserves information about our lattice. The answer is yes - for any point that lies in the image of our lattice, the other lattice points that map to the same image differ by where . Hence we are trying to classify elements of with all conjugates having norm . It is a theorem of Kronecker, that works in general, that says the only such elements are exactly the roots of unity. It uses a clever idea, but is not hard once you know it. If we let be such a number, and let be the three conjugates (including itself), then we know that for any , and in particular the polynomial
is a monic polynomial with integer coefficients. But and so expanding out, using the triangle inequality it is clear that there are only finitely many options for the coefficients. In particular there must exist such that the two polynomials obtained this way are the exact same polynomial, and hence share roots. In particular we get for some . Repeating this for other 's, it is easy to show that must then be a root of unity.
So we don't lose much information by taking absolute values of our elements, at least about our lattice. The second difficulty sidestepped is that nobody said the image is going to remain discrete. It is possible that the image is something like how was inside and so it is not necessary it is a one dimensional lattice. To show our lattice is discrete, it suffices to check that there are only finitely many elements inside some open ball around the origin. This corresponds to the region
for some . But by the same argument of looking at the polynomial with roots being the three conjugates, it is clear that the possibilities of the coefficients of the polynomial formed is finite, and so again there are only finitely many elements of that lie in this region and so the image of under
inside the line is discrete, and since it is a lattice, it must be a one dimensional lattice and so generated by some minimal element as desired.
We now come to the question of whether a single solution exists. Much like how for Pell's equation we showed there are solutions with being small, before proving the existence for , we will do the same for the cubic version. If we move back to , we have this lower dimensional set and we wish to exhibit lattice points in it. But since our set is lower dimensional, it is possible that our lattice simply misses it. Instead, we are going to fatten our set and then show that a lattice point in this enlarged version. This will correspond to solutions with being small.
To produce lattice points in a region, we are going to use Minkowski's theorem. I won't bother proving it here, although the proof is a simple pigeonhole, but you can google and read about it. Given a discrete -dimensional lattice inside , there exists a fundamental domain for it, which is given by looking at
for a -basis . Minkowski's theorem states that if we have a convex subset that is symmetric about the origin, meaning if then so is , then if , then must contain a non-zero lattice point of . Now we are going to take the lattice inside as our lattice. We then choose a convex symmetric subset with volume larger than . Now given a such that , we look at , which is the subset multiplied by . The claim is that multiplication by scales the volume by exactly , and so the volume of is the same as the volume of . Also, it remains convex and symmetric about the origin, and so Minkowski's theorem applies and we obtain a non-zero lattice point inside , say . Then lies inside and so is bounded, implying that is bounded. We now repeat this argument with a sequence of 's, say with much larger than . This will ensure that the first coordinate of is much smaller than that of and so we obtain an infinite sequence of 's with all uniformly bounded.
Now finally, it is a fact that for , the norm is exactly the size of distinct remainders when taken modulo , or the size of the quotient , where is the set of multiples of . Since is a three dimensional lattice, we can prove that there are only finitely many sublattices whose quotient has a given size. Thus if we have infinitely many with all equal, there must be two of them such that they generate the same set of multiples, and in particular is a multiple of and so their quotient is an element of with norm , solving the equation
This solves the "cubic" Pell's equation that we have. A natural question to ask is what about higher degrees, such as , or even where satisfies some monic polynomial with integer coefficients. It turns out that the argument for the cubic Pell's generalizes easily, and I leave it as an exercise to find out what the set of solutions look like. For example, with , it turns out that there are two fundamental solutions that are independent of each other, and together they generate all other solutions. So we get a two-dimensional lattice, and not a one-dimensional lattice like in the Pell's equation and the cubic case as we just saw.
Comments
Post a Comment