Taylor series in number theory

Glen here.

At some point about a year ago, I scrolled through the blogger interface and found an old draft by Dylan, a skeletal outline of a post featuring a problem and a few section headings. The problem happened to be one I recognised:

Show that $2^{2000}$ divides the numerator of $2+\frac{2^2}2 + \frac{2^3}3 + \cdots + \frac{2^{2024}}{2024}$ when expressed as a simplified improper fraction.

I remembered first seeing this as a bonus problem in some old SIMO National Team set, when I'd looked at it, had absolutely no idea how to approach it, and promptly given up. But this time, having learnt some more math in the intervening years, I did have an idea of what to do, and ended up solving it it in my head during some seminar that I wasn't quite following. The solution turned out to be pretty cool, so I made a note to myself to write a blog post about it at some point if Dylan's post never manifested. I then promptly forgot about it.

Fast forward to this week. Some of the ideas used in solving this problem appeared in some classes that I was teaching, and reminded me of this. So, as a post about this problem has yet to appear, I am writing about it now.

Preliminary thoughts

When I first saw this problem years and years ago, the only thing I could think of doing was to try to combine everything into one fraction. We can just cross-multiply in the most stupid way possible and get some giant thing over $2024!$, but this giant thing at the top doesn't have a nice formula? And also, we don't know how many things cancel out between the top and the bottom. We could instead take the lcm of the first $2024$ integers, but that's not much better...

But when I saw this again, I realised that if we replaced $2$ by $x$, what we had left was an expression that looked more familiar: $x+\frac{x^2}2 + \frac{x^3}3 + \cdots + \frac{x^{2024}}{2024}$. This looks like some calculus thing. Indeed, if we differentiate it, we get the far less scary expression $1+x+\cdots+x^{2023}$.

Let's forget about number theory for now and just do calculus. The second expression is just a truncation for the power series $1+x+x^2+\cdots = \frac1{1-x}$, and so, integrating, the first expression corresponds to the power series of $-\log(1-x)$. (This is $\log$ base $e$, just in case that wasn't clear.)

How does this relate to the original question? Let's instead consider the expression $2+\frac{2^2}2 + \frac{2^3}3 + \cdots$ where we now take the "infinite sum". This obviously doesn't converge to any value; each term we add is exponentially bigger than the last so it just increases to infinity. But let's ignore this problem for now. Each time we add some $\frac{2^n}{n}$ where $n\ge 2025$, this is always going to be a multiple of $2^{2000}$ (or rather, its numerator is going to be), since $v_2(2^n) - v_2{n} \ge n - \log_2{n} \ge 2025-\log_2(2025) > 2000$. So in some sense, this infinite sum should also be a "multiple of $2^{2000}$", ignoring the fact that the sum doesn't actually exist.

But now, the number $2000$ is quite arbitrary, and so we'd expect this to be "divisible" by $2^n$ for all $n$, which means that the sum is...$0$?

A bunch of illegal things have happened here, but how could we make it legal?

The $2$-adic numbers

Jit and David have already written to some extent about the $p$-adic integers, taking different (but equivalent) definitions. Jit's is more analytic ("calculus-y") while David's is more abstract/group-theoretic. I'm going to try to bridge the gap between the two. In everything that follows, we're taking $p=2$ since this is what's relevant to the problem, but everything works for general primes $p$.

The analytic approach

The analytic approach uses a choice of a metric, which is something that measures how far away different elements in a set are from each other. For the integers $\mathbb{Z}$, there's an obvious metric that we're all used to: if we think of $\mathbb{Z}$ as the number line, then the distance between $m$ and $n$ is just $|m-n|$.

This isn't very meaningful, number-theoretically.

Instead, we could say that numbers are closer together if their difference is a higher power of $2$. This defines the $2$-adic metric: $|m-n|_2 = \frac1{2^{v_2(m-n)}}$.

What does this look like? Well, let's look at the integers "close to" $0$, say with distance $\le \frac12$. These are exactly the even numbers. In fact, all even numbers are distance $\le \frac12$ away from each other. Conversely, we have the odd numbers, which are all $\le \frac12$ away from each other as well. However, each even number and each odd number are $1$ away. So at first glance, our picture should look something like this:

Now, within the even numbers, this again splits into two groups, the multiples of $4$ and the ones that are $2\pmod4$. Within each group, everything is within $\frac14$ of each other, whereas between any two things in different groups, the distance is $\frac12$. The same happens on the right, splitting the odd numbers into $1\pmod4$ and $3\pmod4$.

Each of these split again, this time into residue classes mod $8$, and so on, so we get this fractal-looking thing. I gave up drawing at depth $4$.


Now, what are the integers? Each integer is determined purely by its residues modulo $2^n$ for each $n$. Looking at its remainder mod $2$, then mod $4$, etc. corresponds to choosing one of the big circles, then choosing one of the circles that's in that big circle, then choosing one of the next circles, going on infinitely as we pick circles that get smaller and smaller, shrinking to a point.

How many ways are there to pick such a sequence of circles? At each stage, we're picking between two choices. This corresponds exactly to picking a binary expansion of an integer: in the first step, we're choosing the units digit to be $0$ or $1$. Then, we're choosing the "twos" digit, and then the "fours" digit, etc. However, this process may not lead to an integer! We'd only get an integer if, after some point, we kept picking the digit $0$ (so we get a positive integer) or if we kept picking the digit $1$ (so we get a negative integer - this is because the remainder of $-a$ mod $2^n$ for sufficiently large $n$ is always $111...1(\text{stuff})$ in binary). In fact, most of the sequences we pick in this way don't result in an integer: the set of such sequences surjects onto the interval $[0,1]$ (reverse the sequence to get a binary expansion of real number), and so is uncountable!

Now suppose we "fill in" each such sequence of shrinking circles with an element. This gives us the $2$-adic integers, $\mathbb{Z}_2$. We can view this as a binary expansion that goes infinitely to the left.

This is a process known as completion: given every sequence of integers that looks like it should converge to a limit (by virtue of belonging to a sequence of shrinking circles), we declare that such a limit exists. This is exactly how we get from $\mathbb{Q}$ to $\mathbb{R}$: to each sequence that looks like it should converge (with the usual notion of distance), e.g. $3,3.1,3.14,3.141,3.1415.\ldots$, we insist that it does, and so we get extra numbers like $\pi=3.14159\ldots$. Of course, under our new $2$-adic metric, a convergent sequence now looks more like $0,10,1010,101010,\ldots$, with us adding digits to the left instead, so the number that this converges to looks like "$\ldots1010101010$".

What if we had started with the rationals instead? Well, there is still a perfectly good notion of $v_2$ for rationals, which is $v_2\left(\frac{p}q\right) = v_2(p) - v_2(q)$, so we may do the same thing. Then, the picture above contains all the rationals with odd denominator. Those with denominators that are $2\pmod 4$ give us another copy of that picture. Those with denominators that are $4\pmod8$ give us a copy of both of those pictures. And so on, giving an expanding fractal structure.

We can again insist that there's a limit point for each sequence of shrinking circles, resulting in the $2$-adic numbers, $\mathbb{Q}_2$. In terms of binary expansions, we can think of these as stuff of the form $\ldots11001010101001.0001010$, binary expressions which are finite on the right but possibly infinite on the left. This means that the expressions are different from the usual expansions! For example, $\frac13$ is $0.01010101\ldots$ in regular binary. However, since $\frac13 \equiv \frac{2^n+1}3 \pmod{2^n}$ for $n$ odd and $\frac13 \equiv \frac{2^{2n}+1}3 \pmod{2^n}$ for $n$ even, its "binary expansion" in $\mathbb{Q}_2$ looks like $\ldots101010101$.

Group-theoretic approach

Let's look at our picture of $\mathbb{Z}$ under the $2$-adic metric again. At the $n^\text{th}$ level, the circles correspond exactly to residue classes of integers modulo $2^n$, i.e. $\mathbb{Z}/(2^n)$. Under this correspondence, the inclusion of circles from one level to the previous one corresponds to taking the remainder, or forgetting the leftmost binary digit. That's how we get the inverse system in David's post, $$\mathbb{Z}/(2) \leftarrow \mathbb{Z}/(4) \leftarrow \mathbb{Z}/(8) \leftarrow \cdots$$

Each of these arrows respects addition and multiplication, because modular arithmetic works. So, there is now some general "limit" construction that directly produces a set with addition and multiplication, i.e. a ring (see my past post on UFDs for more on rings). This produces the same set $\mathbb{Z}_2$, but now it also has addition and multiplication, i.e. the structure of a ring.

Notice that nothing we did here used the $2$-adic metric. Indeed, this is a special case of a general construction called pro-$p$-completion, where we're completing in some algebraic, instead of an analytic sense. But because our metric was so compatible with the number-theoretic structure of $\mathbb{Z}$, these two end up producing the same thing.

It's a bit harder to say what $\mathbb{Q}_2$ is under this formulation, but it so happens that we can get from $\mathbb{Z}_2$ to $\mathbb{Q}_2$ in the same way as we get from $\mathbb{Z}$ to $\mathbb{Q}$: taking fractions.

This makes $\mathbb{Q}_2$ into a field, and so we can do arithmetic and algebra as per usual.

Digression: Hensel's lifting lemma

Because of the similarities of how $\mathbb{Q}_p$ and $\mathbb{R}$ are constructed, we can import calculus-y ideas from the latter into the former. An example is:

(Hensel's lifting lemma) Let $P$ be a polynomial with integer coefficients and $n$ a single root mod $p$, i.e. $P(n) \equiv 0 \pmod{p}$, but $P'(n) \not\equiv0\pmod{p}$. Then $P$ has a unique root $N$ in $\mathbb{Z}_p$ such that $N \equiv n \pmod{p}$.

First, a comment on the single root condition: this is exactly the definition of a single root in regular calculus.

We can think of this as some sort of Intermediate Value Theorem: $P(n) \equiv 0 \pmod{p}$ somewhat means that $P$ has a root in some general vicinity (recall that the set of things that are $\equiv n\pmod{p}$ form a "circle of radius $\frac1p$", and Hensel's lifting lemma says that now there has to be a root.

It's not hard to see why $N$ has to be unique, if it exists: if there's some other $N'$, then $P$ factors as $(x-N)(x-N')Q(x)$, but upon modding out by $p$ this gives $n$ as a repeat root.

The proof of the existence of $N$ is pretty cool: it comes from translating the Newton-Raphson  method of approximation. Here's a way to find a root of a (differentiable) function: start from a point $x_0$, draw a tangent line from $(x_0,P(x_0))$ to meet the $x$-axis at $x_1$, repeat. 

If your function is "nice enough" and you start "near enough" to a root, this should converge to that root. Proof by picture.

Unfortunately, we can't really draw a picture for the $\mathbb{Z}_p$ case, so we have to proceed algebraically. Then, our sequence $(x_k)$ is given by the recurrence $$x_{k+1} = x_k - \frac{P(x_k)}{P'(x_k)}.$$ Setting $x_0$ to be the $n$ in the lemma, we shall induct to show that $v_p(P(x_{k+1})) > v_p(P(x_k))$ and $P'(x_k) \not\equiv 0\pmod{p}$ for each $k$.

Indeed, under the induction hypothesis, $v_p\left(\frac{P(x_k)}{P'(x_k)}\right) = v_p(P(x_k))$. Now, when we plug in $x_{k+1} = x_k-a$ into the polynomial, each $(x-a)^r$ term expands as $x^r - rax^{r-1} + \text{stuff divisible by }a^2$, so $$P(x_{k+1}) \equiv P(x_k) - aP'(x_k) \equiv P(x_k) - P(x_k) \equiv 0 \pmod{p^{2v_p(P(x_k))}}.$$

On the other hand, we also have, by the same reasoning, $P'(x_{k+1}) \equiv P'(x_k) \not\equiv 0\pmod{p}$, and the induction is complete.

Finally, each time we go from $x_k$ to $x_{k+1}$, we are adding a multiple of a larger power of $p$, and so (by the definition of $\mathbb{Z_p}$), this converges to some root $N$. $\square$

Now, taking residues mod $p^k$ for arbitrary $k$, we recover the usual elementary statement of Hensel's lifting lemma:

(Hensel's lifting lemma, olympiad version) Let $P$ be a polynomial with integer coefficients and $n$ a single root mod $p$, i.e. $P(n) \equiv 0 \pmod{p}$, but $P'(n) \not\equiv0\pmod{p}$. Then for each $k\ge 1$, $P$ has a unique root $N$ mod $p^k$ such that $N \equiv n \pmod{p}$.

All of this, just from the usual calculus fact that $\frac{f(y)-f(x)}{y-x} \approx f'(x)$! But as we know from school math, we can also keep differentiating to get a Taylor series...

Back to the problem

When we left off, we had this expression: $$-\log(1-2) = 2+\frac{2^2}2 + \frac{2^3}3 + \cdots$$ As noted, the RHS doesn't make sense with our usual definitions, but what if we work in the $2$-adic metric? Well, note that $\left|\frac{2^n}{n}\right|_2 \le 2^{-n+\log_2 n}$, which decreases exponentially as the $\log_2 n$ term is far outweighed by the $n$ term. So this partial sums of this correspond to our shrinking circles, so it converges to some limit in $\mathbb{Q}_2$. (The jargon-y way to say this is that Cauchy sequences converge in a complete metric space.)

What about the LHS? I guess we're used to thinking of $\log x$ as the inverse to $e^x$, but what even is $e$? There are two definitions of $e$ that I can remember knowing in high school: one is the "limit of compound interest", $$e = \lim_{n\rightarrow\infty} \left(1+\frac1n\right)^n.$$ The other is putting $x=1$ into the Taylor series expansion for $e^x$, giving $$e = 1 + \frac1{2!} + \frac1{3!} + \cdots$$ But do either of these make sense in $\mathbb{Q}_2$? The second is obviously bad, since $\left|\frac1{n!}\right|_2$ grows exponentially large. For the first, setting $n$ to be a power of $2$ gives large numbers in the $2$-adic metric, and even if we somehow ignored them, there's no guarantee that whatever's left would converge.

So defining $\log x$ as the inverse of $e^x$ is kind of a no-go. Instead, what we could do is to define $\log$ using the power series, i.e. define $$-\log(1-x) = x+\frac{x^2}2 + \frac{x^3}3 + \cdots$$ Our proof of converges from earlier really only used the fact that $|2|_2 < 1$, and similarly, we may see that this defines $\log(1-x)$ for all $|x|_2 < 1$.

Finishing up

We have reduced our problem to showing that in $\mathbb{Q}_2$, we have $\log(-1) = 0$.

Admittedly, this looks a a little weird, because what on earth is the log of a negative number? But remember that this log isn't our usual log defined as an inverse to exponentials; it's just some power series.

Nevertheless, let's pretend it works like a regular log. Then, we'd have $\log(1) = 0$. How does this relate to what we want? Well, the only other thing we know about logs is that they satisfy $\log(ab) = \log(a) + \log(b)$, so setting $a=b=-1$, we have...$\log(-1)+\log(-1) = \log(1) = 0$, and so we'd be done. What?!

Let's try to check if the steps we've done above are legal. The first is pretty clear; if we put $x=0$ in the power series we indeed get $\log(1) = 0$. Phew. The second normally comes from properties of exponentials, but we don't have access to that here. Let's instead rewrite it in a form more compatible with our setup. We want $$-\log(1-x) - \log(1-y) = -\log(1-(x+y-xy))$$ for all $x,y$ such that $|x|_2,|y|_2 < 1$.

First, let's check that the expression on the right is legal. Indeed, as $x,y$ both have even numerators, so does $x+y-xy$. Thus, we want to show that $$\sum_{n=1}^\infty \frac{x^n}n + \sum_{n=1}^\infty \frac{y^n}n = \sum_{n=1}^\infty \frac{(x+y-xy)^n}n.$$

To show this, the intuitive thing to do is to expand the RHS and compare coefficients. This turns out to be legal, but somewhat technical and not very exciting, so refer to the next subsection if you're interested.

But this is now a completely algebraic thing, and hence we may think of this in $\mathbb{R}$ instead of $\mathbb{Q}_2$. Indeed $-\log(1-x) - \log(1-y) = -\log(1-(x+y-xy))$ is a true statement in $\mathbb{R}$ whenever the RHS is defined, say when $|x|,|y| < \frac14$. Now, we may recover the $x^ay^b$ term on either side by just differentiating both sides by $x$ $a$ times and by $y$ $b$ times and plugging in $0$ (and dividing by $a!b!$); this is just like our usual Taylor series but with two variables. But evidently both would produce the same results, so we're done.

Yucky analysis for pedants

Previously, we had a step where we expanded the RHS, which gives us a double summation $$\sum_{n=1}^\infty \frac1n \sum_{i+j+k=n} \binom{n}{i,j,k} x^i y^j (-xy)^k.$$ When we compare coefficients between this and the LHS, what we're effectively doing is reordering the terms of our summation. This is not a thing we can do in general, even for series whose sum is defined! Here's a silly example: consider $$1-1+\frac12-\frac12+\frac13-\frac13+\frac14-\frac14+\cdots$$ The partial sums are $1,0,\frac12,0,\frac13,0,\frac14,0,\ldots$, which clearly goes to $0$. Now, let's try to rearrange terms to make the sum tend to $1$ instead. We can do something like $$1+\frac12-1+\frac13+\frac14-\frac12+\frac15+\frac16+\frac17-\frac13+\cdots$$ where at each stage, we add the next unit fraction if our partial sum is $\le 1$ and subtract the next unit fraction if it's $>1$. By construction, this has to go to $1$.

However, there is a condition in which rearranging terms is allowed, which is when the series is absolutely convergent, which means that the sums of the absolute values of each of the terms converges. The example above fails that, because we'd have $2\sum_{n=1}^\infty \frac1n = \infty$.

In the case of $\mathbb{Q}_2$, the same thing holds, except we replace absolute value by the $2$-adic norm. Note that each summand has norm $$\left| \binom{n}{i,j,k} x^i y^j (-xy)^k \right|_2 \le 2^{-n},$$ since the multinomial coefficient is an integer, and $|x|_2,|y|_2,|xy|_2 \le \frac12$. On the other hand, for each $n$, the number of such terms is $\binom{n+2}2$, which is a polynomial in $n$. And as before, we may bound $\left|\frac1n\right| \le 2^{\log_2 n}$. So overall, what we want to check is that $$\sum_{n=1}^\infty 2^{\log_2n -n}\binom{n+2}2$$ converges, which is true because the exponential $2^{-n}$ term dominates. (For people who have learnt analysis, the thing to apply here would be the ratio test.)

There is also a minor technicality from when we worked in $\mathbb{R}$. From our knowledge of properties of $\log$ in $\mathbb{R}$, we have that $$\sum_{n=1}^\infty \frac{x^n}n + \sum_{n=1}^\infty \frac{y^n}n = \sum_{n=1}^\infty \frac{(x+y-xy)^n}n = \sum_{n=1}^\infty \frac1n \sum_{i+j+k=n} \binom{n}{i,j,k} x^i y^j (-xy)^k$$ for all $x,y$ with $|x|,|y| < \frac14$. What we want is that differentiating the LHS by $x$ $a$ times and by $y$ $b$ times yields $$\sum_{n=1}^\infty \frac{\partial^{a+b}}{\partial x^a \partial y^b}\frac{x^n}n + \sum_{n=1}^\infty \frac{\partial^{a+b}}{\partial x^a \partial y^b} \frac{y^n}n$$ and likewise differentiating the RHS gives $$\sum_{n=1}^\infty \frac1n \sum_{i+j+k=n} \binom{n}{i,j,k} \frac{\partial^{a+b}}{\partial x^a \partial y^b} x^i y^j (-xy)^k$$ for all $i,j$.

For this to be true, it suffices to show that the sums of the absolute values of the summands in both cases are uniformly bounded, i.e. they converge, and are bounded by some constant independent of $x,y$. This is clear for the first one; we only have to care about the cases $a=0$ and $b=0$, in which case (WLOG $b=0$) we get something that's $$\le \sum_{n=1}^\infty n^{a-1}\left(\frac14\right)^{n-a}$$ which is certainly a finite constant independent of $x,y$.

For the second one, note that for $i+j+k = n$ we have $$\left|\frac{\partial^{a+b}}{\partial x^a \partial y^b} x^i y^j (-xy)^k\right| \le n^{a+b}\left(\frac14\right)^{n-a-b}.$$ Now since $\sum_{i+j+k=n}\binom{n}{i,j,k} = 3^n$, our overall sum is $$\le \sum_{n=1}^\infty \frac1n 3^n n^{a+b}\left(\frac14\right)^{n-a-b} = \sum_{n=1}^\infty \frac{(4n)^{a+b}}n \left(\frac34\right)^n$$ which again converges to a constant independent of $x,y$ since the exponential term beats out the polynomial term (or, again, ratio test).

Recovering a more elementary solution

Recall that we were trying to show something stronger than the original problem. In fact, all we needed to show was that $$\sum_{n=1}^{2024} \frac{x^n}n + \sum_{n=1}^{2024} \frac{y^n}n \equiv \sum_{n=1}^{2024} \frac{(x+y-xy)^n}n \pmod{2^{2000}}$$ whenever $|x|_2,|y|_2 < 1$, and then plugging in $x=y=2$ would immediately give us what we wanted.

Now, we have finite sums, so we can compare coefficients.

However, while this lets us avoid the first bit of technical analysis, I don't quite see how to avoid analysis completely when we work with real numbers, though the argument should at least simplify. To the other old people who read this, I am open to suggestions.

One could, of course, just directly expand the multinomials to get some combinatorial identity, which brings me to...

Epilogue

After solving the problem (or at least coming up with the general steps and convincing myself that the analysis could be done), I went to look for the problem on AoPS. Here it is. The statement is a little different, but it's in essence the same problem. Interestingly, the first solution posted is by a then 15 year-old Peter Scholze. Yes, that Peter Scholze.

Anyway, what he did was pretty much what I talked about above, except that instead of using the reals, he tried to expand the multinomials out to get some combinatorial identity, which he couldn't prove. Have a go at it! If you figure it out, you'll have beaten a future Fields Medallist.

Source: xkcd

Comments

Popular posts from this blog

SMO Open 2024 ??% Speedrun

SMO Senior 2025 ??% Speedrun