Slowing AI is many-dimensional. This post presents variables for determining whether a particular kind of slowing improves safety. Then it applies those variables to evaluate some often-discussed scenarios.
Variables
Many variables affect whether an intervention improves AI safety. Here are four crucial variables at stake when slowing AI progress:
- Time until critical systems are deployed. More time seems good for alignment research, governance, and demonstrating risks of powerful AI.
- Length of crunch time. In this post, "crunch time" means the time near critical systems before they are deployed. More time until critical systems are deployed is good; more such time near critical systems is especially good. A lab is more likely to (be able to) pay an alignment tax for a critical system if it has more time to pay the tax for that system. Time near critical systems also seems especially good for alignment research and potentially for demonstrating risks of powerful AI and doing governance.
- Safety level of labs that develop critical systems. This can be improved both by making labs safer and by differentially slowing unsafe labs.
- Propensity to coordinate or avoid racing. This is associated with many factors, but plausible factors relevant to slowing AI seem to be there are few leading labs, they like/trust each other, and they are all in the same country (or at least allied countries) (in part because regulation is one possible cause of not-racing).
One lab's progress, especially on the frontier, tends to boost other labs. Labs leak their research both intentionally (publishing research and deploying models) and unintentionally.
Some interventions would differentially slow relatively safe labs (relevant to 3). Some interventions (especially policies that put a ceiling on AI capabilities or inputs) would differentially slow leading labs (relevant to 4). Both outcomes are worse than uniform slowing and potentially net-negative.
If something slows progress temporarily, after it ends progress may gradually partially catch up to the pre-slowing trend, such that powerful AI is delayed but crunch time is shortened (relevant to 1 and 2).
Coordination may facilitate more coordination later (relevant to 4).
Current leading labs (Google DeepMind, OpenAI, and maybe Anthropic) seem luckily safety-conscious (relevant to 3). Current leading labs seem luckily concentrated in America (relevant to 4).
Some endogeneities in AI progress may give rise to considerations about the timing of slowing. For example, the speed at which the supply of (ML training) compute responds to (expected) demand determines the effect of slowing soon on future supply. Or perhaps slowing affects the distribution of talent between dangerous AI paths, safe AI paths, and non-AI stuff. Additionally, some kinds of slowing increase or decrease the probability of similar slowing later.
Scenarios
Magic uniform slowing of all dangerous AI: great. This delays dangerous AI and lengthens crunch time. It has negligible downside.
A leading safety-conscious lab slows now, unilaterally: bad. This delays dangerous AI slightly. But it makes the lab irrelevant, thus making the labs that develop critical systems less safe and making the lab unable to extend crunch time by staying at the frontier for now and slowing later.
All leading labs coordinate to slow during crunch time: great. This delays dangerous AI and lengthens crunch time. Ideally the leading labs slow until risk of inaction is as great as risk of action on the margin, then deploy critical systems.
All leading labs coordinate to slow now: bad. This delays dangerous AI. But it burns leading labs' lead time, making them less able to slow progress later (because further slowing would cause them to fall behind, such that other labs would drive AI progress and the slowed labs' safety practices would be irrelevant).
Strong global treaty: great. A strong global agreement to stop dangerous AI, with good operationalization of 'dangerous AI' and strong verification, would seem to stop labs from acting unsafely and thus eliminate AI risk. The downside is the risk of the treaty collapsing and progress being faster and distributed among more labs and jurisdictions than otherwise.
Strong US regulation: good. Like "strong global treaty," this stops labs from acting unsafely—but not in all jurisdictions. Insofar as this differentially slows US AI progress, it could eventually cause AI progress to be driven by labs outside the regulation's reach. If so, the regulation—and the labs it slowed—would cease to be relevant, and it would likely have been net-negative: it would cause critical systems to be created by labs other than the relatively-safety-conscious currently-leading ones and cause leading labs to be more globally diffuse.
US moratorium now: bad. A short moratorium (unless succeeded by a strong policy regime) would slightly delay dangerous AI on net, but also cause progress to be faster for a while after it ends (when AI is stronger and so time is more important), increase the number of leading labs (especially by adding leading labs outside the US), and result in less-safe leading labs (because current leading labs are relatively safety-conscious). A long moratorium would delay dangerous AI, but like in "strong US regulation" the frontier of AI progress would eventually be surpassed by labs outside the moratorium's reach.
Which scenarios are realistic; what interventions are tractable? These questions are vital for determining optimal actions, but I will not consider them here.
Thanks to Rose Hadshar, Harlan Stewart, and David Manheim for comments on a draft.
This post is part of AI Pause Debate Week. Please see this sequence for other posts in the debate.
I would be more inclined to agree with this if there was a set of criteria we had that indicated we were in "crunch time" which we are very likely to meet before dangerous systems and haven't met now. Have people generated such a set? Without that, how do we know when "crunch time" is, or for that matter, if we're already here?