Turbo is more than a newspaper puzzle: IMO 2024/5

(This week: a special guest post from Andrew, a member of the IMO 2024 PSC!)

This post is going to be a discussion of everything that could possibly be construed as related to Turbo, i.e. IMO 2024/5, from my perspective. It will be long. Apparently I will also be up against basically all of the Singaporean old people, so this should be fun. (Spoilers for IMO 2024/5. The opinions represented here are those of an individual and not necessarily the opinion shared by all or even a majority of PSC members.) 

Turbo

(IMO 2024/5) Turbo the snail plays a game on a board with $2024$ rows and $2023$ columns. There are hidden monsters in $2022$ of the cells. Initially, Turbo does not know where any of the monsters are, but he knows that there is exactly one monster in each row except the first row and the last row, and that each column contains at most one monster. Turbo makes a series of attempts to go from the first row to the last row. On each attempt, he chooses to start on any cell in the first row, then repeatedly moves to an adjacent cell sharing a common side. (He is allowed to return to a previously visited cell.) If he reaches a cell with a monster, his attempt ends and he is transported back to the first row to start a new attempt. The monsters do not move, and Turbo remembers whether or not each cell he has visited contains a monster. If he reaches any cell in the last row, his attempt ends and the game is over. Determine the minimum value of $n$ for which Turbo has a strategy that guarantees reaching the last row on the $n$-th attempt or earlier, regardless of the locations of the monsters. 

I'll leave the details of the solution to Evan Chen and instead discuss why everyone is saying 2024 will be remembered as the Turbo year. Some details will have to be omitted for privacy reasons. The discussion will also incorporate ideas and conclusions from numerous extensive conversations I have had with various people, which have been helpful and insightful in many ways. I thank everybody who has contributed to this article, whether they intended to or not. For later reference, the commonly attempted log strategy is usually something along the lines of "the worst case for Turbo is when the monsters are 'diagonally placed' but because of the board size there must be a break in the diagonal so we bisect to find the gap". I will refer to this as the log solution or log trap, even though it's not a solution.

Remarks: 
  • Turbo became 'Mr Bean' in the Chinese translation.
  • The PSC considered formulating this in a Squid Game setting.
  • The proposer made this in an English listening exam?!?! He also managed to troll his own teammates pretty badly.

Criticisms of Turbo

Why is it on the PSC only when things go wrong?

Turbo is extremely controversial because a number of criticisms have been made, which I attempt to summarise below:
  1. Turbo isn't a maths problem. According to certain Chinese people, "this is a WeChat minigame". Enough said.
  2. Turbo is a troll problem hence/or doesn't belong on the IMO/ISL. There is the now infamous log trap where students believe the answer is $\log_2(n)$ where $n$ is the size of the board, and coordination reflects that a substantial number of students did fall into this trap. The mark scheme specifically says no marks for a log bound so these students end up with zero unless they serendipitously made some other useful observations along the way. (Sidenote: one of the coordinators suggested a mark scheme of getting $10-k$ where $k$ is the bound proved by the student. There are worse mark schemes out there.)
  3. Turbo isn't in the right position in the IMO/ISL. This is an interesting one. At least one leader/observer fell into the log trap and couldn't dig themselves out, so they thought it was harder than its position. After talking to students after the contest, they thought it would be much better done if placed as Problem 4 instead. (This option was discussed but rejected in the jury meeting.)
  4. Turbo will screw up the cutoffs/balance of the paper. The argument here is something along the lines of if you get stuck in the log trap you won't have enough time to dig yourself out and make progress on P6, or that the number of people who solve exactly one of P2 and P5 will be quite high, resulting in lots of bunching at the silver cutoffs and lots of people being lucky/unlucky no matter which way the cookie crumbles. Others have pointed out that both P2 and P5 are rather binary: either you solve them or you don't, and partials aren't particularly forthcoming, which could lead to strange things at cutoffs.
These are related to each other, so in what follows my answer to these criticisms will also involve some amount of cross-referencing.

There is one other criticism I will not address in much detail, for obvious reasons: there are better problems on the ISL. What I will say is this: first of all, that is mostly out of the PSC's hands. Secondly, if the leaders wanted a different problem, they should have made a case for one of the others, which in our view shouldn't be hard: there were strong cases to be made. Instead, they let the discussion be dominated by "to Turbo or not to Turbo", and those against Turbo did such a good job of explaining their position that they convinced everyone to put Turbo on the IMO.

Getting around the criticisms

I will address the criticisms in the order listed. 

1. Turbo isn't a maths problem

You're wrong. Glen has written at length about P3 and why he likes the style of combinatorics where one has to stare at the process until it clicks. That's a great style of problem which I like very much too. However, I don't think that is the only valid style of combinatorics. Many people think this problem belongs to the 'Russian school of combinatorics', because you have to find the right algorithm which has a bit of a magical feel to it. You might find this stylistically unappealing, which Glen very much does, but that is a matter of taste, and I think to call 'Russian combi' not maths is ridiculous. Whether or not Turbo is a maths problem/has mathematical content very much depends on your definition of the corresponding terms, but I think looking for an algorithm/strategy against an adversary is very much within the scope of (Olympiad) mathematics. It is true that this could appear as a newspaper puzzle or a game, but that is purely a consequence of the setup and answer. After all, Euclidea is a mobile phone app that is still very much Euclidean geometry, and chess puzzles appear in the newspaper which are very much still chess. Why should Turbo be any different?

If you think that the solution is magical or unmotivated, here's how you could arrive at it systematically, and also how to see the problem has mathematical content: 
  1. Observe the shoulders argument (for when the monster in the second row isn't on the edge).
  2. Assume monster the is in the top left square. The best case is when you're done after O(1) moves after finding a monster, so you should try to visit as many cells as possible (where you would be done in O(1) if you hit a monster). Then you get the upside-down triangle, also known as the happy triangle, plus everything to the right of the happy triangle. This is an extension of the shoulders argument, because you should already have found the idea of going straight down behind a monster.
  3. Simply realise that if you've cleared an interval in a row, you can iterate/induct to keep going down. The board size means you get to the end.
Of course, you can also somehow draw the staircase, but this doesn't mean there is no mathematical content. This is also related to the issue of Turbo screwing up the cutoffs, which we return to later: lots of students being able to do Turbo quickly by finding the right trick. In theory anybody could do any problem really quickly by being lucky or just having the right idea off the bat, but when it's a P3/6 and some clever student from China/South Korea finds a really short solution nobody complains. The above shows it is reasonable to find a solution with a bit of time and a clear head, so if Turbo really is as trivial as people make it out to be, then something is going wrong in the training of students who didn't solve it. (Here is more evidence that students' training is not done correctly: the number of insanely overcomplicated solutions that the P1 coordinators had the cosmic misfortune of having to decipher. I know that in a contest you do what you have to blah blah blah, but come on. Whether you solved this problem or not, go read the official solutions and see what a vaguely sensible solution looks like. I seriously want to compile a list of the most unhinged P1 solutions. It would make for a great comedy show if it weren't so tragic.)

2. Turbo is a troll problem hence/or doesn't belong on the IMO/ISL.

Ok, I'll admit I didn't realise there was a troll component until the leaders complained. There may well have been other people who realised Turbo is a troll problem and decided to advocate for it anyway, but this particular issue wasn't brought up at any point prior to the jury meeting. The discussion thread on AoPS brings up APMO 2019/4 and IMO 2010/5 for comparison. I feel that the difference is that Turbo's strategy is not that hard to find, whereas the APMO strategy is really well hidden. I don't have that good a read on IMO 2010 P5, but I feel that Turbo's strategy is at least much easier than the one for 2010/5. (This is at least empirically true since if anything, 2010/4 was easier than this year's P4.)  I think that for Turbo there are at least some hints that you can do it in 3 attempts: e.g. it's really easy to come up with the 'shoulders' argument for when the monster in the second row isn't on the edge, and if you attempt to prove the lower bound of log you will obviously fail. I think if you have a gap between upper and lower bound, you should alternate between trying to improve them, just like if there is a yes/no question you should alternate between proving yes and proving no. (One commentator suggests if there is disparity between upper and lower bounds the lower bound always prevails, but I'm less sure about that.)

There isn't an agreed on definition of troll, but I'll concede that it's a troll problem for most sensible definitions of the term (although at least one coordinator has a completely ridiculous definition and should just shut up). However, what makes it troll is exactly what makes it attractive in the first place: the surprising punchline that you can do it in 3 moves, regardless of board size!  The log strategy is a perfectly valid thing to do, but that would be so boring. (Dylan pointed out that this is probably why the PSC like the problem: it's non-standard and too many standard problems have been submitted, boring everyone to death. That should partially answer your question Mr. Proposer.)

We make a big fuss about problems not being reused to make sure well-trained contestants don't just quote results from their training to trivialise the problem, or standard techniques being done to death resulting in IMO being an issue of which country has the best training program (looking at you 2022/5). Now we have the perfect problem to remedy this: it's novel, it's largely agnostic of previous training (because it's novel, not because it has no content) so provides a level playing field, it tests flexibility of thought/approach to see if you can dig yourself out of the log trap... Why on earth shouldn't this be suitable? Maybe 6 Turbos a year every year is probably not the way to go, but I think having a Turbo every now and then to keep people on their toes is great. Otherwise you end up having problems within the comfort zone of the stronger countries every year and that's not very interesting.

3. Turbo isn't in the right position in the IMO/ISL.

I have a decent amount of sympathy for this viewpoint. I think psychologically some candidates expected log to be correct because it's 'the right level of sophistication', and I do agree that this would be much better done if given as P4, because that signals to students they should keep it simple. Turbo tripping students up might remind tennis fans of Michael Chang's win in 1989 by adopting 'unconventional tactics'. It was sort of hard to avoid Turbo ending up where it did though. When the PSC get the problems, we don't have any indication of how hard it is, so it takes us longer than it would have if we had some upper bound on difficulty, which contestants do have. (It's interesting to note the proposer's remark in the AoPS thread that their trainer, a three-time gold medalist, also didn't manage to do the problem the first time round when just given the problem without context.) This then results in us putting it somewhere in the middle of the shortlist instead of as C2 or something. I presume if we had put it as C2 fewer people would fall into the log trap, and the jury would have been more amenable to putting Turbo as P4, but the way things turned out was pretty hard to avoid since we either didn't realise or severely underestimated the amount of psychology that goes into solving this problem. This isn't an isolated incident though. I'm told that when 2014/2 was given to Swedish students in a national round there were around 5 solves, because they didn't expect it to be that hard, so they just did it. One can check that Sweden scored 17 points on 2014/2 in the actual IMO. It would be interesting to find out just how much psychology is affecting contestants. The other bit of psychology is 'it was log last year'. If you conjecture log and don't stop to think what if I'm wrong, something is going wrong, e.g. overtraining, but if 2023/5 is your reason for thinking the answer is log you deserve 0. I think the upshot is that difficulty is hard to judge but Turbo is a rather extreme case. (For other instances of misjudgement, look at IMO 2017 lol, but also 2011/2, 2021/2, 2023/3. We also had mishaps this year: on a scale of 1-5, 5 being pretty hard, we thought that P3 was about 4 in difficulty, but the contestants did not. At least we were less off than Glen, who apparently thinks it's the easiest problem in the history of the IMO or something. One PSC member, after thinking about P6 for two months, decided that it was a complete triviality and we would get 60 solves. I told him he would eat his words and watched with great aplomb as he did.)

One might also argue that telling students that P1/4 are meant to be easy, P2/5 medium, P3/6 hard is a bad thing, since this is artificial information purely added to for contests' sake. Limiting your search space based on the problem difficulty is silly, but may generally be a sensible piece of exam technique. I'm not sure how to correct this, but I do believe that it's something to think about.

4. Turbo will screw up the cutoffs/balance of the paper.

To the astonishment and frankly bewilderment of some, the cutoffs look sane and no macroscopic disasters happened. USA, China, and South Korea (ha I got the name right! IMO 1, Olympics 0) finished in the top three, and most countries finished in a neighbourhood of where they expected to. Individually, of course there were some upsets and records. Belarus had their best ever performance and Singapore owes its return to the top 10 in part to Turbo. Algeria got their first ever gold medal, and Bosnia and Herzegovina got not only their first gold medal, but their first three gold medals. I think it's a great thing that different countries get a shot at medals in different years, but the people who lost out might reasonably feel bummed out. One might argue that we got lucky that it all worked out in the end, which is sort of valid. I did feel that Turbo wouldn't derail that many gold medallists, so I suppose the issue would have been at the bronze cutoff, where loads of people could potentially do P1, P4, P5. and we might have too high a bronze cutoff. I guess this was balanced out by P4 being on the hard side, but this too is controversial: some of the coordinators felt P4 was very standard, and the statistics suggest it wasn't particularly difficult, contrary to jury and PSC fears. Figuring out cutoffs before the paper is sat is well beyond me.

As for the balance of the paper, I don't think the jury gives this much thought beyond subject area. I thought this hinged a lot on what ended up being P3 and P6. It seems that P3 being a very different style of combi, and P6 requiring quite a lot of algebraic expertise ended up smoothing over most of the potential Turbo chaos, since you would probably need all the time savings you can get from P5 to make real progress in the FE. It also made for some curious statistics: notably, South Korea's P6 score is more than three times their P5 score. It all worked out in the end.

Conclusion

If you have read this far, thank you for your persistence! I think there are valid reasons to not like Turbo. Beauty lies in the eyes of the beholder and whatnot. I also understand that there are good reasons to prefer setting Turbo in a different contest, or a different position in the contest, although I think it is fine as an IMO problem and it has definite mathematical content. I will happily debate this with anyone. It has been a surprise how many interesting questions/issues about maths contests Turbo has raised, which I have greatly enjoyed discussing with others. I hope they will prove food for thought for all the maths olympiad enthusiasts out there.

Comments

Popular posts from this blog

SMO Open 2024 ??% Speedrun

Musings about the IMO