Hide table of contents

Slowing AI[1] is many-dimensional. This post presents variables for determining whether a particular kind of slowing improves safety. Then it applies those variables to evaluate some often-discussed scenarios.

Variables

Many variables affect whether an intervention improves AI safety.[2] Here are four crucial variables at stake when slowing AI progress:[3]

  1. Time until critical systems are deployed.[4] More time seems good for alignment research, governance, and demonstrating risks of powerful AI.
  2. Length of crunch time. In this post, "crunch time" means the time near critical systems before they are deployed.[5] More time until critical systems are deployed is good; more such time near critical systems is especially good. A lab is more likely to (be able to) pay an alignment tax for a critical system if it has more time to pay the tax for that system. Time near critical systems also seems especially good for alignment research and potentially for demonstrating risks of powerful AI and doing governance.
  3. Safety level of labs that develop critical systems.[6] This can be improved both by making labs safer and by differentially slowing unsafe labs.
  4. Propensity to coordinate or avoid racing.[7] This is associated with many factors, but plausible factors relevant to slowing AI seem to be there are few leading labs, they like/trust each other, and they are all in the same country (or at least allied countries) (in part because regulation is one possible cause of not-racing).

One lab's progress, especially on the frontier, tends to boost other labs. Labs leak their research both intentionally (publishing research and deploying models) and unintentionally.

Some interventions would differentially slow relatively safe labs (relevant to 3). Some interventions (especially policies that put a ceiling on AI capabilities or inputs) would differentially slow leading labs (relevant to 4). Both outcomes are worse than uniform slowing and potentially net-negative.

If something slows progress temporarily, after it ends progress may gradually partially catch up to the pre-slowing trend, such that powerful AI is delayed but crunch time is shortened (relevant to 1 and 2).[8]

Coordination may facilitate more coordination later (relevant to 4).

Current leading labs (Google DeepMind, OpenAI, and maybe Anthropic) seem luckily safety-conscious (relevant to 3). Current leading labs seem luckily concentrated in America (relevant to 4).[9]

Some endogeneities in AI progress may give rise to considerations about the timing of slowing. For example, the speed at which the supply of (ML training) compute responds to (expected) demand determines the effect of slowing soon on future supply. Or perhaps slowing affects the distribution of talent between dangerous AI paths, safe AI paths, and non-AI stuff. Additionally, some kinds of slowing increase or decrease the probability of similar slowing later.

Scenarios

Magic uniform slowing of all dangerous AI: great. This delays dangerous AI and lengthens crunch time. It has negligible downside.

A leading safety-conscious lab slows now, unilaterally: bad. This delays dangerous AI slightly. But it makes the lab irrelevant, thus making the labs that develop critical systems less safe and making the lab unable to extend crunch time by staying at the frontier for now and slowing later.

All leading labs coordinate to slow during crunch time: great. This delays dangerous AI and lengthens crunch time. Ideally the leading labs slow until risk of inaction is as great as risk of action on the margin, then deploy critical systems.

All leading labs coordinate to slow now: bad. This delays dangerous AI. But it burns leading labs' lead time, making them less able to slow progress later (because further slowing would cause them to fall behind, such that other labs would drive AI progress and the slowed labs' safety practices would be irrelevant).

Strong global treaty: great. A strong global agreement to stop dangerous AI, with good operationalization of 'dangerous AI' and strong verification, would seem to stop labs from acting unsafely[10] and thus eliminate AI risk. The downside is the risk of the treaty collapsing and progress being faster and distributed among more labs and jurisdictions than otherwise.

Strong US regulation:[11] good. Like "strong global treaty," this stops labs from acting unsafely—but not in all jurisdictions. Insofar as this differentially slows US AI progress, it could eventually cause AI progress to be driven by labs outside the regulation's reach.[12] If so, the regulation—and the labs it slowed—would cease to be relevant, and it would likely have been net-negative: it would cause critical systems to be created by labs other than the relatively-safety-conscious currently-leading ones and cause leading labs to be more globally diffuse.

US moratorium now: bad. A short moratorium (unless succeeded by a strong policy regime) would slightly delay dangerous AI on net, but also cause progress to be faster for a while after it ends (when AI is stronger and so time is more important), increase the number of leading labs (especially by adding leading labs outside the US), and result in less-safe leading labs (because current leading labs are relatively safety-conscious). A long moratorium would delay dangerous AI, but like in "strong US regulation" the frontier of AI progress would eventually be surpassed by labs outside the moratorium's reach.


Which scenarios are realistic; what interventions are tractable? These questions are vital for determining optimal actions, but I will not consider them here.

Thanks to Rose Hadshar, Harlan Stewart, and David Manheim for comments on a draft.

 

This post is part of AI Pause Debate Week. Please see this sequence for other posts in the debate.

  1. ^

    That is, slowing progress toward dangerous AI, or AI that would cause an existential catastrophe. Many kinds of AI seem safe, such as vision, robotics, image generation, medical imaging, narrow game-playing, and prosaic data analysis—maybe everything except large language models, some bio/chem stuff, and some reinforcement learning. Note that in this post, I assume that AI safety is sufficiently hard that marginal changes in my variables are very important.

  2. ^

    This post is written from the perspective that powerful AI will eventually appear and AI safety is mostly about increasing the probability that it will be aligned. Note that insofar as other threats arise before powerful AI or intermediate AI systems pose threats, it's better for powerful AI to arrive faster—but I ignore this here.

  3. ^

     See my Slowing AI: Foundations for more.

  1. ^

    In this post, a critical system is one whose deployment would cause an existential catastrophe if misaligned or be able to execute a pivotal act if aligned. This concept is a simplification: capabilities that could cause catastrophe are not identical to capabilities that could execute a pivotal act, 'cause catastrophe' and 'execute a pivotal act' depend on not just the system but also the world, 'catastrophe or not' and 'pivotal act or not' aren't really binary, and deployment is not binary. Nevertheless, it is a useful concept.

  2. ^

    This concept is a simplification insofar as "near critical systems" is not binary. Separately, note that some interventions could lengthen total time to critical systems but reduce crunch time or vice versa. For example, slowing now in a way that causes progress to partially catch up to the old trend later would lengthen total time but reduce crunch time.

    Separately, I believe we are not currently in crunch time. I expect we will be able to predict crunch time decently well (say) a year in advance by noticing AI systems' near-dangerous capabilities.

  3. ^

    This concept is a simplification: non-lab actors may be central to safety, especially the creators of tools/plugins/scaffolding/apps to integrate with ML models.

  4. ^

    The other variables are implicitly by default, without much coordination.

  5. ^
  6. ^

    Coordination seems easier if leading labs are concentrated in a single state, in part because it can be caused by regulation. (Additionally, the AI safety community has relatively more influence over government in the US, so US regulatory effectiveness and thus US lead is good, all else equal.)

    Observations about current leads are relevant insofar as (1) those leads will be sustained over time and (2) dangerous AI is sufficiently close that current leaders are likely to be leaders in crunch time by default.

    On the risk of differentially slowing US labs, see my Cruxes on US lead for some domestic AI regulation.

  7. ^

    Or in terms of the above variables, a strong global treaty would delay dangerous AI, cause labs to be safer, and (insofar as it discriminates between safe and unsafe labs) differentially slow unsafe labs.

  8. ^

    I imagine "strong global treaty" and "strong US regulation" as including miscellaneous safety standards/regulations but focusing on oversight of large training runs, enforcing a ceiling on training compute and/or doing model evals during large training runs and stopping runs that fail an eval until the lab can ensure the model is safe.

  9. ^

    Labs outside US regulation's reach could eventually dominate AI progress due to some combination of the following (overlapping):

    • The US fails to get a large coalition to join it
    • Labs in coalition states can effectively move to non-coalition states to escape the regulation
    • Labs in non-coalition states can quickly catch up to the frontier given slowed progress in the coalition
    • Coalition export controls fail to deny compute to labs in non-coalition states
    • Other attempted extraterritorialization of the regulation fails
    • (Also just there being a substantial tradeoff between speed and (legible) safety, such that the regulation substantially slows the labs it affects)
    • (Also just powerful AI being far off, such that outside labs have longer to catch up to the slowed coalition labs)
Show all footnotes
Comments9


Sorted by Click to highlight new comments since:

"All leading labs coordinate to slow during crunch time: great. This delays dangerous AI and lengthens crunch time. Ideally the leading labs slow until risk of inaction is as great as risk of action on the margin, then deploy critical systems.

All leading labs coordinate to slow now: bad. This delays dangerous AI. But it burns leading labs' lead time, making them less able to slow progress later (because further slowing would cause them to fall behind, such that other labs would drive AI progress and the slowed labs' safety practices would be irrelevant)."

 

I would be more inclined to agree with this if there was a set of criteria we had that indicated we were in "crunch time" which we are very likely to meet before dangerous systems and haven't met now. Have people generated such a set? Without that, how do we know when "crunch time" is, or for that matter, if we're already here?

The problem I have with the scenarios is that they are end-state scenarios without considering who does anything or how negotiations proceed. But unlike in idealized though experiments, in social and geopolitical systems, the process by which the goal is pursued, not the stated goal state, actually determines what the end state looks like.

(totally agree thinking about end-states is insufficient, but I think it's a necessary first step and this kind of thinking reveals big cruxes and some real disagreements)

I believe we are not currently in crunch time. I expect we will be able to predict crunch time decently well (say) a year in advance by noticing AI systems' near-dangerous capabilities.


We are already in crunch time, doubly so post GPT-4. What predictors are you using that aren't yet being triggered?

I also agree with David Manheim that the path matters; and therefore incremental steps such as a US moratorium are likely net positive, especially considering that it is crunch time, now. International treaties can be built from such a precedent, and the US is probably at least 1-2 years ahead of the rest of the world currently.

"Crunch time" has many meanings, but in this post it mostly means a time shortly before critical systems in which alignment research is much more productive. We don't seem to be in that crunch time yet.

I agree that US domestic policy can lead to international law; that should be a consideration.

That makes sense. But like Greg_Colbourn says, it seems like a non-trivial assumption that alignment research will become significantly more productive with newer systems. 

Also, different researchers may expect very different degrees of "more productive." It seems plausible to me that we could learn more about the motivations of AI models once we move to a paradigm that isn't just "training next-token prediction on everything on the internet." At the same time, it seems outlandish to me that there'd ever come a point where new systems could help us with the harder parts of alignment (due the expert delegation problem where delegating well in an environment where the assistants may not all be competent and well-intentioned becomes impossible if you don't already have the expertise yourself). 

Thanks. I don't share the expectation that alignment research will be much more productive shortly before critical systems. At least not to a degree where it reduces relative risk. We should only have systems more advanced than those we've already got once we've solved mechanistic interpretability for the current ones (and we're so far off that - the frontier of interpretability research is looking at GPT-2 sized models and smaller!). Also, I think there is a non-zero chance that the next generation of models will be critical, so we're basically at crunch time now in terms of having a good shot at averting extinction.

I am actually interested answers to my question, it wasn't rhetorical (and not sure why my comment was downvoted - disagreement votes: fine).

There's also a lot of overlap between disagreeing with someone and liking a post. If you disagree with something, you are more likely to not like it. I don't love this about the voting system but I don't really have a better alternative to suggest.

Curated and popular this week
 ·  · 23m read
 · 
Or on the types of prioritization, their strengths, pitfalls, and how EA should balance them   The cause prioritization landscape in EA is changing. Prominent groups have shut down, others have been founded, and everyone is trying to figure out how to prepare for AI. This is the first in a series of posts examining the state of cause prioritization and proposing strategies for moving forward.   Executive Summary * Performing prioritization work has been one of the main tasks, and arguably achievements, of EA. * We highlight three types of prioritization: Cause Prioritization, Within-Cause (Intervention) Prioritization, and Cross-Cause (Intervention) Prioritization. * We ask how much of EA prioritization work falls in each of these categories: * Our estimates suggest that, for the organizations we investigated, the current split is 89% within-cause work, 2% cross-cause, and 9% cause prioritization. * We then explore strengths and potential pitfalls of each level: * Cause prioritization offers a big-picture view for identifying pressing problems but can fail to capture the practical nuances that often determine real-world success. * Within-cause prioritization focuses on a narrower set of interventions with deeper more specialised analysis but risks missing higher-impact alternatives elsewhere. * Cross-cause prioritization broadens the scope to find synergies and the potential for greater impact, yet demands complex assumptions and compromises on measurement. * See the Summary Table below to view the considerations. * We encourage reflection and future work on what the best ways of prioritizing are and how EA should allocate resources between the three types. * With this in mind, we outline eight cruxes that sketch what factors could favor some types over others. * We also suggest some potential next steps aimed at refining our approach to prioritization by exploring variance, value of information, tractability, and the
 ·  · 1m read
 · 
I wanted to share a small but important challenge I've encountered as a student engaging with Effective Altruism from a lower-income country (Nigeria), and invite thoughts or suggestions from the community. Recently, I tried to make a one-time donation to one of the EA-aligned charities listed on the Giving What We Can platform. However, I discovered that I could not donate an amount less than $5. While this might seem like a minor limit for many, for someone like me — a student without a steady income or job, $5 is a significant amount. To provide some context: According to Numbeo, the average monthly income of a Nigerian worker is around $130–$150, and students often rely on even less — sometimes just $20–$50 per month for all expenses. For many students here, having $5 "lying around" isn't common at all; it could represent a week's worth of meals or transportation. I personally want to make small, one-time donations whenever I can, rather than commit to a recurring pledge like the 10% Giving What We Can pledge, which isn't feasible for me right now. I also want to encourage members of my local EA group, who are in similar financial situations, to practice giving through small but meaningful donations. In light of this, I would like to: * Recommend that Giving What We Can (and similar platforms) consider allowing smaller minimum donation amounts to make giving more accessible to students and people in lower-income countries. * Suggest that more organizations be added to the platform, to give donors a wider range of causes they can support with their small contributions. Uncertainties: * Are there alternative platforms or methods that allow very small one-time donations to EA-aligned charities? * Is there a reason behind the $5 minimum that I'm unaware of, and could it be adjusted to be more inclusive? I strongly believe that cultivating a habit of giving, even with small amounts, helps build a long-term culture of altruism — and it would
 ·  · 1m read
 · 
I recently read a blog post that concluded with: > When I'm on my deathbed, I won't look back at my life and wish I had worked harder. I'll look back and wish I spent more time with the people I loved. Setting aside that some people don't have the economic breathing room to make this kind of tradeoff, what jumps out at me is the implication that you're not working on something important that you'll endorse in retrospect. I don't think the author is envisioning directly valuable work (reducing risk from international conflict, pandemics, or AI-supported totalitarianism; improving humanity's treatment of animals; fighting global poverty) or the undervalued less direct approach of earning money and donating it to enable others to work on pressing problems. Definitely spend time with your friends, family, and those you love. Don't work to the exclusion of everything else that matters in your life. But if your tens of thousands of hours at work aren't something you expect to look back on with pride, consider whether there's something else you could be doing professionally that you could feel good about.