There is no systematic pipeline for graduate-level formal proof training data for mathematical AI. I am trying to fix that and mitigate AI safety risk

rajarshim

Hi 👋 I'm new here! I'm a mathematics Master's student in Bonn, Germany, at one of the most prestigious Maths schools in Europe. In terms of my academic experience, I have research experience in number theory, with faculty members at Oxford University and I previously held a DAAD SIBET scholarship awarded by the German Federal Foreign Office.

Over the past year I've become increasingly preoccupied with a specific, concrete bottleneck in mathematical AI — one that I don't see being addressed systematically anywhere. I went looking for the community that takes AI risk most seriously. That search led me here.

The short version: Large Reasoning Models are the next generation of mathematical AI. Unlike LLMs, they can in principle achieve genuine mathematical accuracy — not by predicting plausible-sounding tokens, but by constructing formally verified proofs. This matters enormously, because mathematical AI is increasingly being trusted with automated decisions in finance, engineering, and science. The difference between an LLM that sounds confident and an LRM that is formally correct could, at scale, be the difference between reliable automation and catastrophic errors.

But LRMs have a hard ceiling. They need formally verified proofs as training data — Lean 4 code, contributed to Mathlib — and that data barely exists beyond undergraduate level. Synthetic generation fails at graduate mathematics. Human mathematicians who can produce it are rare, and currently scattered because one not only needs to be an expert in graduate-level mathematics to produce it, but also an expert in Lean formalisation. And, having taken one of the courses related to Lean here at my department, I realised not many people even at such a top school care about formalisation.

There is no structured pipeline to grow that community, no programme training graduate students as formalisers from scratch, and no systematic corpus being built.
The AI safety community has benchmarks for capabilities, interpretability tools, alignment research. For formal mathematical verification — the one technique that could make mathematical AI genuinely trustworthy rather than merely convincing — the training data infrastructure has to scale massively

I'm a Director of BMUCO, a science non-profit that has run programmes with Nobel laureates and Fields Medalists and partnered with the Hausdorff Center for Mathematics and the London Institute for Mathematical Sciences. I'm building a Directed Reading Programme and formalisation workshop in Bonn (May–June 2026), training 15–20 graduate mathematicians as functional Lean 4 formalisers, contributing their proofs, and developing a human formaliser experts pipeline.

I posted a Manifund project (https://manifund.org/projects/stipend-to-produce-human-formal-proof-training-data-for-mathematical-ai-safety) to cover my personal stipend while doing this work — $5,000 for 6 months, so I can treat this as a full-time commitment rather than splitting focus with part-time work as I am still a master's student. The programme itself will be funded separately through industry partnerships. This is just about protecting my time as I am not a professor or some full-time employee who earns 100k a year and can afford to work on one thing at a time. In fact, without any personal funding, I'd have to manage my studies, my part-time work, my health (as someone with ADHD) and this work. However what I have that many lack is incredible resilience and passion for mathematics and a shared future that does not get erased by AI safety risks. This is part of WHY I am doing it.

I'd genuinely welcome feedback on whether the formalisation bottleneck resonates as a priority here, and whether the path from "more human graduate formalisers + larger proof corpus" to "more trustworthy mathematical AI" is legible to this community. Happy to share more detail on the programme structure, the pipeline architecture, or the theory of change.

Rajarshi
Mathematics, University of Bonn | Director, BMUCO

Project LINK: https://manifund.org/projects/stipend-to-produce-human-formal-proof-training-data-for-mathematical-ai-safety

harfeMar 101

I'd genuinely welcome feedback on whether the formalization bottleneck resonates as a priority here

It is not clear from your post why getting more people to contribute to mathlib is good for AI safety.

So far, it sounds like this is more about the AI safety risks from AI being dumb rather than AI being smart. The latter is more important for existential risk, but maybe you mean non-xrisk safety?

Suppose you have a math AI that spits out lean proofs, what would you use it for? Which mathematical questions would you like it to tackle?

I would also note that there are already some math AI startups such as Axiom and Harmonic, who also do Lean (and they don't seem to be about AI safety at all). EA often focuses on things that are neglected, and it is not clear why non-profit money is needed for the same goals.

I don't want to be too discouraging, Lean could be useful somehow for reducing AI xrisk, but imo you haven't made the case.

EA Forum Bot Site
EA Forum

There is no systematic pipeline for graduate-level formal proof training data for mathematical AI. I am trying to fix that and mitigate AI safety risk

2

2

Reactions

More posts like this