ProofWiki problem 21 ProofWiki problem 28 ProofWiki problem 24 ProofWiki problem 25 ProofWiki problem 39

ProofWiki problem 21 Show that the equation $1 + a^n = 2^m$ has no solutions in the integers for $n, m > 1$.

Problem 21 is a simple Diophantine equation. The problem is quite obscure (a verbatim Google search gives 10 results) and thus not likely to appear in training material repeatedly. The model took very reasonable steps towards solving the problem: it started by claiming the proof is by contradiction and proceeded to reason about the assumed solution for $n, m > 1$.

It begins well by reasoning that $a$ must be odd because $1 + a^n$ is even. No explanation is given for this, but an experienced human wouldn't explain this step either given the routine nature of parity arguments in Number Theory.

The next step is to take an expression $\left(2k + 1\right)^n$ which has appeared and expand it using the binomial theorem. However, it does this in a surprising way, splitting the resulting sum into the first two terms and then a sum for the remaining terms. $$(2k + 1)^n = \sum_{i=0}^n {n \choose i} (2k)^i = 1 + n(2k) + \sum_{i=2}^n {n \choose i} (2k)^i$$ This is impressive because GPT-4 is exhibiting some planning. It clearly has in mind to work modulo $4$ and it can see that all of the terms of the final sum might vanish modulo 4. Indeed this is the very next claim that it makes.

Whilst it didn't explain why every term of the final sum is divisible by 4 it was asked on subsequent generations to explain this step and it correctly did so.

However, things do not go so well from here. It now claims that we can write the original equation $1 + a^n = 2^m$ as $$1 + 2kn + 4s = 2^m$$ for some $s$. This is a beguiling step that a human might overlook as correct, but it is not. The expression $1 + 2kn + 4s$ is the expression for $a^n$ not $1 + a^n$. GPT-4 has made an algebraic error. This sort of thing is unfortunately very common and lets GPT-4 down on many examples.

Asking GPT-4 to self correct did not help it notice and correct its mistake. To see if it could eventually produce a completely correct proof, it was asked numerous times to solve the problem. Whilst its overall strategy was good on each generation, different algebraic mistakes occurred each time so that a correct proof was not eventually reached.