The Google DeepMind mechanistic interpretability team has made a strategic pivot over the past year, from ambitious reverse-engineering to a focus on pragmatic interpretability:
Trying to directly solve problems on the critical path to AGI going well[1]
Measuring progress with empirical feedback on proxy tasks
We believe that, on the margin, more researchers who share our goalsshould take a pragmatic approach to interpretability, both in industry and academia, and we call on people to join us
Our proposed scope is broad and includes much non-mech interp work, but we see this as the natural approach for mech interp researchers to have impact
Specifically, we’ve found that the skills, tools and tastes of mech interp researchers transfer well to important and neglected problems outside “classic” mech interp
Most existing interpretability techniques struggle on today’s important behaviours, e.g. they involve large models, complex environments, agentic behaviour and long chains of thought
Problem: It is easy to do research that doesn't make real progress.
Our approach: ground your work with a North Star - a meaningful stepping-stone goal towards AGI going well - and a proxy task - empirical feedback that stops you fooling yourself and that tracks progress toward the North Star.
We see two main approaches to research projects: focused projects (proxy task driven), and exploratory projects (curiosity-driven, proxy task validated)
Curiosity-driven work can be very effective, but can also get caught in rabbit holes. We recommend starting in a robustly useful setting, time box your exploration[3], and finding a proxy task as a validation step[4]
We advocate method minimalism: start solving your proxy task with the simplest methods (e.g. prompting, steering, probing, reading chain-of-thought). Introduce complexity or design new methods only once baselines have failed.
Read the full post here, and the companion piece on promising AGI Safety relevant research directions here
This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...
Why building and backing Welfare Tech companies may be one of the most promising things we can do for billions of animals.
I used AI to assist in writing this post, but I’ve rewritten it extensively and endorse it.
* Announcing the launch of Spring Innovation Fund, a not-for-profit venture philanthropy studio and fund built specifical...