In March of this year, 30,000 people, including leading AI figures like Yoshua Bengio and Stuart Russell, signed a letter calling on AI labs to pause the training of AI systems. While it seems unlikely that this letter will succeed in pausing the development of AI, it did draw substantial attention to slowing AI as a strategy for reducing existential risk.
While initial work has been done on this topic (this sequence links to some relevant work), many areas of uncertainty remain. I’ve asked a group of participants to discuss and debate various aspects of the value of advocating for a pause on the development of AI on the EA Forum, in a format loosely inspired by Cato Unbound.
- On September 16, we will launch with three posts:
- David Manheim will share a post giving an overview of what a pause would include, how a pause would work, and some possible concrete steps forward
- Nora Belrose will post outlining some of the risks of a pause
- Thomas Larsen will post a concrete policy proposal
- After this, we will release one post per day, each from a different author
- Many of the participants will also be commenting on each other’s work
Responses from Forum users are encouraged; you can share your own posts on this topic or comment on the posts from participants. You’ll be able to find the posts by looking at this tag (remember that you can subscribe to tags to be notified of new posts).
I think it is unlikely that this debate will result in a consensus agreement, but I hope that it will clarify the space of policy options, why those options may be beneficial or harmful, and what future work is needed.
People who have agreed to participate
These are in random order, and they’re participating as individuals, not representing any institution:
- David Manheim (ALTER)
- Matthew Barnett (Epoch AI)
- Zach Stein-Perlman (AI Impacts)
- Holly Elmore (AI pause advocate)
- Buck Shlegeris (Redwood Research)
- Anonymous researcher (Major AI lab)
- Anonymous professor (Anonymous University)
- Rob Bensinger (Machine Intelligence Research Institute)
- Nora Belrose (EleutherAI)
- Thomas Larsen (Center for AI Policy)
- Quintin Pope (Oregon State University)
Scott Alexander will be writing a summary/conclusion of the debate at the end.
Thanks to Lizka Vaintrob, JP Addison, and Jessica McCurdy for help organizing this, and Lizka (+ Midjourney) for the picture.
The 'final paragraph' was simply noting that when you try to make concrete AI risks - instead of an abstract machine that is overwhelmingly smarter than human intelligence and randomly aligned, but a real machine that humans have to train and run on their computers - numerous technical mitigation methods are obvious. The one I was alluding to was (1)
(1) https://www.lesswrong.com/posts/C8XTFtiA5xtje6957/deception-i-ain-t-got-time-for-that
Sparsity and myopia are general alignment strategies and as it happens are general software engineering practices. Many of the alignment enthusiasts on lesswrong have rediscovered software architectures that already exist. Not just exist, but are core to software systems ranging from avionics software to web hyperscalers.
Sparsity happens to be TDD.
Myopia happens to be stateless microservices.