In March of this year, 30,000 people, including leading AI figures like Yoshua Bengio and Stuart Russell, signed a letter calling on AI labs to pause the training of AI systems. While it seems unlikely that this letter will succeed in pausing the development of AI, it did draw substantial attention to slowing AI as a strategy for reducing existential risk.
While initial work has been done on this topic (this sequence links to some relevant work), many areas of uncertainty remain. I’ve asked a group of participants to discuss and debate various aspects of the value of advocating for a pause on the development of AI on the EA Forum, in a format loosely inspired by Cato Unbound.
- On September 16, we will launch with three posts:
- David Manheim will share a post giving an overview of what a pause would include, how a pause would work, and some possible concrete steps forward
- Nora Belrose will post outlining some of the risks of a pause
- Thomas Larsen will post a concrete policy proposal
- After this, we will release one post per day, each from a different author
- Many of the participants will also be commenting on each other’s work
Responses from Forum users are encouraged; you can share your own posts on this topic or comment on the posts from participants. You’ll be able to find the posts by looking at this tag (remember that you can subscribe to tags to be notified of new posts).
I think it is unlikely that this debate will result in a consensus agreement, but I hope that it will clarify the space of policy options, why those options may be beneficial or harmful, and what future work is needed.
People who have agreed to participate
These are in random order, and they’re participating as individuals, not representing any institution:
- David Manheim (ALTER)
- Matthew Barnett (Epoch AI)
- Zach Stein-Perlman (AI Impacts)
- Holly Elmore (AI pause advocate)
- Buck Shlegeris (Redwood Research)
- Anonymous researcher (Major AI lab)
- Anonymous professor (Anonymous University)
- Rob Bensinger (Machine Intelligence Research Institute)
- Nora Belrose (EleutherAI)
- Thomas Larsen (Center for AI Policy)
- Quintin Pope (Oregon State University)
Scott Alexander will be writing a summary/conclusion of the debate at the end.
Thanks to Lizka Vaintrob, JP Addison, and Jessica McCurdy for help organizing this, and Lizka (+ Midjourney) for the picture.
It's definitely good to think about whether a pause is a good idea. Together with Joep from PauseAI, I wrote down my thoughts on the topic here.
Since then, I have been thinking a bit on the pause and comparing it to a more frequently mentioned option, namely to apply model evaluations (evals) to see how dangerous a model is after training.
I think the difference between the supposedly more reasonable approach of evals and the supposedly more radical approach of a pause is actually smaller than it seems. Evals aim to detect dangerous capabilities. What will need to happen when those evals find that, indeed, a model has developed such capabilities? Then we'll need to implement a pause. Evals or a pause is mostly a choice about timing, not a fundamentally different approach.
With evals, however, we'll move precisely to the brink, look straight into the abyss, and then we plan to halt at the last possible moment. Unfortunately, though, we're in thick mist and we can't see the abyss (this is true even when we apply evals, since we don't know which capabilities will prove existentially dangerous, and since an existential event may already occur before running the evals).
And even if we would know where to halt: we'll need to make sure that the leading labs will practically succeed in pausing themselves (there may be thousands of people working there), that the models aren't getting leaked, that we'll implement the policy that's needed, that we'll sign international agreements, and that we gain support from the general public. This is all difficult work that will realistically take time.
Pausing isn't as simple as pressing a button, it's a social process. No one knowns how long that process of getting everyone on the same page will take, but it could be quite a while. Is it wise to start that process at the last possible moment, namely when the evals turn red? I don't think so. The sooner we start, the higher our chance of survival.
Also, there's a separate point that I think is not sufficiently addressed yet: we don't know how to implement a pause beyond a few years duration. If hardware and algorithms improve, frontier models could democratize. While I believe this problem can be solved by international (peaceful) regulation, I also think this will be hard and we will need good plans (hardware or data regulation proposals) for how to do this in advance. We currently don't have these, so I think working on them should be a much higher priority.
Evals showing dangerous capabilities (such as how to build a nuclear weapon) can be used to convince lawmakers that this stuff is real and imminent.
Of course, you don't need that if lawmakers already agree with you – in that case, it's strictly best to not tinker with anything dangerous.
But assuming that many lawmakers will remain skeptical, one function of evals could be "drawing out an AI warning shot, making it happen in a contained and controlled environment where there's no damage."
Of course, we wouldn't want evals teams to come up with AI capability ... (read more)