Sorting books on a shelf

(This is Yan Sheng.)

The time has come for me to finally write a post about actually solving an olympiad-level problem, in particular the thought process and the winding path that got me to the solution.

This came from one of my Discord servers a few months ago:

You have a bookshelf of (finitely many) unsorted books. Every minute you are forced to move one book not already in the correct location to its correct (absolute) location, shifting the other books as you do so. This is the only allowed movement. If the books ever become sorted, the world will explode. Will the world explode?

(For clarity, all the books have the same thickness, and the bookshelf has exactly enough space for all the books.)

Here's an example of the above operation: suppose the bookshelf looks like 312 (i.e., book 3 is in position 1, book 1 in position 2, book 2 in position 3).

I can choose to pick up book 2 to move it to position 2, while pushing book 1 to the right, so the bookshelf now looks like 321.
Now I can't touch book 2 (since it's already in its home position), and the two remaining choices are symmetric. So let's say I move book 3 to position 3, pushing the other two books to the left, so the bookshelf is now 213.
Now I can't touch book 3, and moving either book 1 or book 2 gives me 123, and the world explodes. It's not hard to verify that the process also must terminate for the other two choices at the first step.

This is a good place to pause and attempt this question yourself.

Solution

Of course, the first step here would be to enumerate the small cases, for 2 books (12 done, 21 dies in one move) and 3 books (6 cases, but the example above essentially covers all of them).

All the small cases terminate after finitely many moves, so we boldly conjecture that's true in general. If the process doesn't terminate then the state of the bookshelf must enter a cycle, so probably some extremal state gives us trouble; at this point there are only a few possible ideas to try, and hopefully one of them works out.

Indeed, consider the book with the lowest index that we picked up and moved to the left. Since we're in a cycle, this book must have been moved right from its home position at some point; the only way this could happen is if another book moved left past this book, contradicting the minimality assumption. So no books ever move to the left, and similarly no books ever move to the right, so no books ever move and a cycle can't exist, and we are done.

Another solution

The above solution isn't mine; my first instinct for "show there are no cycles" isn't "extremal configuration", but "monovariant". If we want to show that the books must eventually end up in the correct order, the monovariant should somehow measure how disordered the current arrangement is. Again, there are not that many possible ideas to try, and hopefully one of them works.

The sum of absolute distances from each book to its home position almost works, since it never increases. When does this sum stay the same? For example, when you shift a book B to the left, each book that got shifted one spot to the right must have increased their distance to its home position, so the home positions must all be to the left of the home position of B. (Right?) Hence the number of {pairs of books in the correct order} strictly decreases, so the pair (sum of distances, number of correctly ordered pairs) always strictly decreases in lexicographical ordering, so at some point we can't move anymore, at which point the books are sorted.

This argument has the advantage of giving an explicit upper bound on the number of moves: each value of the pair can take at most $O(n^2)$ values (where $n$ is the number of books), so we can make at most $O(n^4)$ moves. Much better than the previous solution, which doesn't give a bound better than $n!-1$.

Better bounds?

At this point I wondered what the best construction was, since $O(n^4)$ seemed kinda strange to be sharp. So I drew out the complete state space for $n=4$, ordered by (sum of distances, number of correctly ordered pairs):

Can you spot what's wrong? For one thing, 3142 is in the wrong spot: it should be at the $(6,3)$ level instead of the $(4,3)$ level, but I didn't want to relayout the whole graph.

The more serious problem is the horizontal arrows on the $(6,2)$ level, which my argument says shouldn't exist. I've already helpfully signposted where the error in my argument was; for example, when we perform the operation from 4132 to 4213 by moving book 2 to the left, book 3 has to move to the right away from its home position, but that doesn't mean its home position is to the left of 2.

Okay, so now we no longer have any nontrivial upper bound. We don't have a construction yet either, so maybe the maximum number of moves is just not a nice problem?

Spoiler: it's nice. It's also hard, definitely the hardest combinatorics problem that I've solved after IMO: I took 3 days. This is a good place to pause and attempt this question yourself; here's a motivational GIF in the meantime.

I Get Knocked Down GIFfrom I Get Knocked Down GIFs

Upper and lower bounds

Let's try to extract some sort of upper bound from the first (correct) argument. How many times can each book get picked up and moved to the left?

Book 1 can move left once, then it's forever stuck in position 1;
Book 2 can move left once, then it's stuck there until book 1 moves past it and bumps it to the right, for a total of 2 times;
Book 3 can move left once, then it's stuck there until it gets bumped to the right by book 1 or 2, for a total of 1+1+2=4 times; etc.

So book $m$ can be picked up and moved to the left at most $2^{m-1}$ times; summing across $1\le m\le n-1$ gives a total of $2^{n-1}-1$ moves to the left. Similarly we can have $2^{n-1}-1$ moves to the right, for a total of at most $2^n-2$ operations.

The bound for number of moves to the left is sharp: considering the first 3 books, we can do

xxxxx…
xx3xx…
x2x3x…
x23xx…
1x23x…
1x32x…
12x3x…
123xx…

Extending this up to book $n-1$, we get the construction $(n,1,2,3,\ldots,n-2,n-1)$, where each operation moves the rightmost book for a total of $2^{n-1}-1$ moves.

So now our upper and lower bounds are a factor of 2 apart, and we know the maximal number of moves is $O(2^n)$.

Computational experiments

At this point someone in the Discord server said they've computationally verified that the maximal number of moves is $2^{n-1}-1$ up to $n\le5$. Our work so far has already manually shown this for $n\le4$ (with an alarming number of equality cases for $n=4$).

It's actually a nice little puzzle in itself to figure out an efficient way to represent the operation in code. For example, depending on the language and implementation, it might be better to keep track of the positions of each book instead of the books at each position (i.e., use the inverse permutation). Anyway here's what I threw together in Python:

from functools import cache
from itertools import permutations
from time import time


def place_book(t, i):
    """
    Given tuple of positions of each book, move the i-th
    book to position i (assuming it is not already there).
    """
    N = len(t)
    if t[i] < i:
        return tuple(i if j == i
                     else t[j] - 1 if t[i] < t[j] <= i  # books in these positions shift left by 1
                     else t[j]
                     for j in range(N))
    else:
        return tuple(i if j == i
                     else t[j] + 1 if i <= t[j] < t[i]  # books in these positions shift right by 1
                     else t[j]
                     for j in range(N))


@cache
def max_moves(t):
    """
    Given tuple of positions of each book, find the maximum
    number of moves that can be made before they are all sorted.
    """
    N = len(t)
    if N == 1:
        return 0
    if t[-1] == N-1:
        return max_moves(t[:-1])
    if t[0] == 0:
        return max_moves(tuple(x-1 for x in t[1:]))
    return max(max_moves(place_book(t, i)) + 1 for i in range(N) if t[i] != i)


for n in range(1, 11):
    tt = time()
    print(f"n: {n:2}  max moves: {max(max_moves(t) for t in permutations(range(n))):4}  T: {time()-tt:7.3f}")

And here's the output:

n:  1  max moves:    0  T:   0.000
n:  2  max moves:    1  T:   0.000
n:  3  max moves:    3  T:   0.000
n:  4  max moves:    7  T:   0.000
n:  5  max moves:   15  T:   0.001
n:  6  max moves:   31  T:   0.007
n:  7  max moves:   63  T:   0.051
n:  8  max moves:  127  T:   0.514
n:  9  max moves:  255  T:   5.506
n: 10  max moves:  511  T:  79.503

So the conjecture holds up to $n\le10$, which seems pretty solid evidence that $2^{n-1}-1$ is the correct answer. I guess you could use a faster language and optimise the code or whatever, but even with a 1000x speedup you would only be able to reach $n\le13$; I don't think that's worth the effort.

Another great feature of having code is that we can see what the equality cases look like. Here's an example for $n=6$ with $2^5-1=31$ moves, where books in their home position are highlighted:

546123
543612 after 1 move
563412 after 2 more moves
523461 after 4 more moves
623451 after 8 more moves
123456 after 16 more moves

Here at the start of each phase we pick one of the outer books to move to join the $m$ books in middle block, bumping all of them so that we can take $2^m-1$ operations to move them all back. (After the $m=4$ phase, 5 of the books are in the correct position, so the last book must be correct as well.)

This tells us quite a lot about the problem:

There are a lot of equality cases, one for each way to grow the middle block outwards.
A lot of potential approaches to the problem can now be ruled out. At one point I was wondering if we can just improve the upper bound by showing that the books have to either always move left or always move right, and that's just dead now. Also, any monovariant would now have to be very smart in order to somehow preserve all the possible equality cases.

Sign words

Another idea: maybe we only need to track on which side of its home position each book is currently. As such, let's write a word of $n$ signs, where a -/+/0 in the $m$-th position indicates that book $m$ is left of/right of/exactly on its home position. The allowed operations now look like:

Choose some + and change it to 0; change all 0's on its right into +'s; possibly change some -'s to 0's; or
Choose some - and change it to 0; change all 0's on its left into -'s; possibly change some +'s to 0's.

For our previous examples, we have

xxxxx…: +++…
xx3xx…: ++0…
x2x3x…: +0+…
x23xx…: +00…
1x23x…: 0++…
1x32x…: 0+0…
12x3x…: 00+…
123xx…: 000…

Thus optimal sequences of operations are just counting down in binary in sign words. It's a good exercise to work this out in our $n=6$ example: the counting down happens in the middle block either left-to-right or right-to-left, depending on which way the block extends:

546123: +++---
543612: ++0---
563412: ++00--
523461: +000--
623451: +0000-
123456: 000000

Reformulating the problem in terms of sign words loses some information about the actual book numbers; in particular, we have to track when the "possibly"s happen, and if we don't, some sequences of sign words wouldn't correspond to actual operations on books. However, the "possibly"s never happen in the equality case; skipping numbers when counting down is clearly bad.

Okay, I lied: there is one major exception where the "possibly" must happen. At the last step, a sign word like 00+-00 cannot become 000-++, and must immediately become 000000. (5 of the books are in the correct position, so the last book must be correct as well.)

We could consider the alternative problem where we operated entirely on sign words using the two operations given above; to make things even simpler, let's just remove the "possibly" clauses. Now this is an even easier problem computationally (there are only $3^n$ states instead of $n!$), and after some more coding we verify that the longest sequence of sign words survive for $2^n-1$ moves for $n\le11$. Notice that the exponent is $n$ instead of $n-1$; this number is double that for the bookshelf problem! What happens is that optimal sequences will reach +-0000 and barrel through to 0-++++, which is only the halfway point of the sequence.

This suggests that maybe there's some weird "boundary effect" going on at the end of the bookshelf problem, because we're dealing with actual books that need to have actual positions, and there's not enough room or something. Schematically the situation is something like this:

For the first $2^{n-2}-1$ moves, we don't need to (and don't want to) touch book 1 or book $n$;
We get to 623451, and so we're forced to touch book 1 or book $n$ for the next move;
Now we're reduced to the problem with $n$ replaced by $n-1$, so by induction we have at most $2^{n-2}-1$ moves left.

So the insight here is that the last point is the "boundary effect", and the real meat of the problem happens in the first point, where we (hopefully) don't actually have to keep track of the positions of all the books, only the sign words.

This is probably worth 2 points

So we want to show that if we never touch the first and the last signs of a sign word of length $n$, then we can make at most $2^{n-2}-1$ moves.

But imagine we have a much longer sign word, and we just focus on $n-2$ of the signs in that word. If we only touched those signs and completely ignore the rest of the word, the $n-2$ signs do exactly the same things as if the rest of the word isn't there. Hence the statement we want is equivalent to the following:

Claim 1: On a bookshelf with any number of books, if we only touch at most $k$ books, then we can make at most $2^k-1$ moves.

At this point we can throw away $n$.

Solving the sign word problem

The equality cases for the sign word problem turn out to be of the form ${++}\cdots{+}{--}\cdots{-}$, the sequence of moves being to count in binary in the middle. What if our sign word doesn't look like that? Then it has to look like ${+}^i(\text{bad}){-}^j$, where a bad subword is either empty, or both the following conditions hold:

The leftmost sign is not +; and
The rightmost sign is not -.

Let $m$ be the length of the bad subword; it suffices to show that we can make at most $2^{m-1}$ moves if we are limited to touching signs in the bad subword, because after that we have to touch a sign outside, and we just end up with a longer bad subword in the middle of this word.

Now consider what the leftmost sign of a bad subword can do: if it's 0 it can't do anything, and if it's - then it can only change signs to the left of itself, so it can't change the signs in the rest of the bad subword.

What can we do to the leftmost sign of a bad subword? The only way to change it is if it is a 0 and we touched a - in the core (i.e., middle $m-2$ signs), which changes the leftmost sign to a -.

This tells us that the number of times that we can touch the leftmost sign is at most 1 plus the number of times we touch a - in the core (i.e., middle $m-2$ signs). An analogous bound holds for the rightmost sign. Hence $$\begin{align*} \#\text{moves}(\text{bad})&\le\#\text{moves}(\text{core})+\#({+}\text{ touches})+1+\#({-}\text{ touches})+1\\ &=2\#\text{moves}(\text{core})+2. \end{align*}$$ But by Claim 1, $\#\text{moves}(\text{core})\le2^{m-2}-1$, so we're done by (very careful) induction!

And we're done

Here's what the argument looks like back in bookshelf space:

We will prove Claim 1 by induction. On any bookshelf, we call a subset $S$ of books wide if the book with the leftmost home position is to the left of or on the home position, and similarly for the book with the rightmost home position. (We call these books the ends of $S$.) The following properties are left as exercises:

If $S$ is wide and we only touch books in $S$, then $S$ remains wide;
Furthermore, each such move shifts at most one of the end books of $S$;
After we touch a book $B\not\in S$, the set of books $S\cup{B}$ becomes wide.

Claim 2: On a bookshelf with any number of books, let $S$ be a wide subset of (at most $k-1$) books. If we only touch books in $S$, then we can make no moves if $\#S\le1$, and at most $2^{\#S-1}$ moves otherwise. Moreover, equality can only hold if neither end book of $S$ is in its home position.

We prove Claim 2 by induction. Note that touching the end books of $S$ does not affect the relative ordering of the other books to each other and to their home positions. We can touch each end book at most once before touching the other books; then by induction hypothesis for Claim 1, we can touch the other books at most $2^{\#S-2}-1$ times, after which we get to touch at most one of the end books. This yields at most $2+2(2^{\#S-2}-1)=2^{\#S}$ moves in total, with equality possible only if we can touch both end books before the other books, which was what we wanted.

Now we finish the proof of Claim 1. Let $K$ be the set of $k$ books we will touch; note that (exercise) the maximal wide subset in $K$ consists of all books in $K$ whose indices are between:

the smallest index of a book to the left of or on home position; and
the largest index of a book to the right of or on home position. (This set may be empty.)

Let us keep track of the size $m$ of the maximal wide subset among the $k$ books we will touch; by Claim 2 and the last bullet point before it, $m$ is non-decreasing, and can only stay at a constant value $m=i$ for at most $2^{i-1}$ moves. Furthermore, when $m$ increases, the book we just touched becomes one of the end books of the new maximal wide subset (exercise), and since it is in home position now, equality cannot hold in Claim 2. Hence if $m=j$ initially then we can make at most $$2^{j-1}+1+(2^j-1)+1+(2^{j+1}-1)+\cdots+1+(2^{k-1}-1)=2^k-2^{j-1}$$ moves if $j\ge2$. For $j\le1$ the first summand is 0 instead of $2^{j-1}$, and we get a sum of $2^k-2$ for $j=1$ and $2^k-1$ for $j=0$, which finishes the proof of Claim 1.

Search This Blog

SIMO Retiree Blog