Thoughts
Thoughts after NeurIPS, 18 Dec 2023
This is my first NeurIPS - last year I had three papers and no visa.
The general atmosphere is contagious: the echo chamber of AI is indeed comfortable and exciting. When so many people get together to have low-latency conversations that plastically shape their views, an electrifying air ensues. I have been enlightened countless times during a mere week. However, it’s worthwhile to remember that it is still an echo chamber and we should ground ourselves firmly in reality >95% of the time to not be intoxicated and completely detached.
Two main themes dominate the conference for me: AI for maths and language models. Here I want to discuss them, first individually and then how they relate to each other.
AI for maths:
More people than ever are getting interested in AI for maths. The tutorial on machine learning for theorem proving captivated a room of ~300 people; For the first time I heard “theorem proving” being mentioned in the halls of a machine learning conference venue by random people.
Behind the popularity are two seemingly contradictory views: one is that AI is finally good enough that we have a glimpse of what they can do for maths; the other is that AI is still extremely stupid in reasoning that mathematical abilities are something that we desperately need to improve on. They are not really contradictory. Moravec’s paradox rules over mathematics too: while being very helpful in certain tasks such as embedding the entire formal library and enabling searching over them with hybrid natural and formal languages, even the most advanced LLMs by themselves can’t reliably verify abstract reasoning steps (even the very simple ones).
So, what are the promising directions?
There are many and my opinions here are likely to be quite limited and should not deter you from carrying out research in completely different directions.
- Build a vibrant community for people working on AI for maths research.
- Lower the entry barrier for people. MATH-AI is still a hazardous and messy place
- Where are the entry points? What are the benchmarks?
- Why do I need to do metaprogramming in Lean or Scala + ML to get anything to work?
- To grow MATH-AI as a scientific field, these central practices need defining.
- Make basic research tools more available and low-cost
- Running basic proof search that was SOTA in 2022 still seems elusive for academics
- The HTPS project had 400 GPUs training for 2 weeks, a dedicated CPU cluster to host Lean servers, and a stellar team to build the search so it was an engineering feast.
- But 1.5 years later, we have: better and more efficient pre-trained models, better quantisation techniques, Lean4 that is much faster in performance and hence doesn’t hurt if it runs on cpus attached to GPUs, and a much larger community.
- We should be able to reproduce it and have something similar that runs on an M1 or even cpus.
- Running basic proof search that was SOTA in 2022 still seems elusive for academics
- Lower the entry barrier for people. MATH-AI is still a hazardous and messy place
- Train a verifier that works beyond simple arithmetic reasoning.
- It’s unclear what data/architecture is needed for a good verifier. Even within some company very focused on mathematical reasoning, there are different opinions about what the right way to do it is, formal or informal.
- Formal signals are nice because they are guaranteed to be “signals”, though very conservative and not flexible.
- I have less confidence than before the conference that fine-tuning a (currently existing) pretrained LLM can result in a good verifier. This is because most popular LLMs are pretrained on and imitating large internet corpora, which cannot be expected to induce mathematical reasoning capabilities. Fine-tuning can inject more knowledge into the LLMs, or even shift the LLM into the more maths-loving mode. But picking up complicated novel skills such as maths is probably quite hard.
- Just pick your preferred way and do it! We need to try out both at some stage!
- Get data for “how the sausage is made” for mathematical proofs.
- Mathematicians often demolish the scaffolding after the house is built. This means there are only proofs without any trial-and-error in the literature. This is not great for reasoning.
- Use LLMs to construct the missing traces.
- Maybe encourage mathematicians to elaborate (as advocated by Timothy Gowers at some point) is easier?
- Elicit and utilise abstract knowledge/skills.
- There are some abstract skills and lots of data can be seen as instantiations of them. The question is how to extract and apply them.
- The auto-regressive decoding objective is not great for this. Changing the objective to be more “abstract” at fine-tuning time (while ensuring the high utilisation of GPUs!) can be an interesting topic to explore for a new PhD student.
I leave with more questions than when I came.
Language models
More thoughts on language models are a bit more fragmented:
- It’s very sad to see some labs only publish things that don’t work (or the second-best methods). People should be extra cautious about what they choose to believe in, even for NeurIPS posters, spotlights, and orals. It’s ok if you don’t feel like sharing your trade secrets, just don’t mislead people.
- The sentence that is stuck in my head is Paige Bailey’s great motto: “Keep it simple, make it scale”. I don’t believe in extremely fancy and brittle prompting techniques to ever be of much use, but the motto extends beyond just promoting.
- The reason I start working for Mistral AI (apart from that it’s a great company with great people) is that I really don’t like the choice between subpar open models and api calls in my research. So I want to push open models to as good as the best closed models. At this day and age, the reality is that there are orgs serving different purposes and working on language models that have different levels of openness. Science in the open; open science; open-sourced; open-weights, etc.
Both together
If it wasn’t clear 2 years ago, it should be very clear now that language models are an essential component of building general AI mathematicians/assistants and mathematics is a great area to test language models in. It’s time we push hard in both directions.