Cutting AI Safety down to size

Holly Elmore ⏸️ 🔸

This is more of a personal how-I-think post than an invitation to argue the details.

This post discuss two sources of bloat, mental gymnastics, and panic in AI Safety:

p(doom)/timeline creep
lesswrong dogma accommodates embedded libertarian assumptions

I think there’s been an inflation of the stakes of AI Safety unchecked for a long time. I feel it myself since I switched into AI Safety. When you’re thinking about AI capabilities every day it’s easy to become more and more scared and convinced that doom is almost certain, SOON.

My general take on timelines and p(doom) is that the threshold for trying to get AI paused should be pretty low (5% chance of extinction or catastrophe seems more than enough to me) and that, above that, differences in p(doom) have their uses to keep track of but generally aren’t action-relevant. But the subjective sense of doom component of my p(doom) has risen in the last year. In 2023 my p(doom) was something like 20-40% for the worst outcomes (extinction, societal collapse, s-risk) with 40-50% probability mass on very bad outcomes such as society or sentient life becoming bad through things short of x-risk like mass job loss and instability, AI-enabled authoritarianism, or creating solipsistic little AI worlds around people that strangle sentient-sentient connection. Now, I feel my p(x-risk) going up, more like 60% by my feels, somewhat due to actual AI advances and political outcomes, but mainly because I have felt scared and powerless about the development of AI every day in the intervening time. I’ve generally refused to participate in timeline talk because I think it’s too reductive and requires too much explanation to be useful for conveying your worldview, but ditto my subjective internal sense of timelines. The scope of the problem has already reached its physical limits— the lightcone— in the minds of many here. How could this not also be happening for others who have been living on this IV drip of fear longer than me?

This p(doom) creep is not harmless. I wrote last week that freaking about AI x-risk doesn’t help, and it sure doesn’t. It sabotages the efforts most likely to improve the AI risk situation.

Many people think that they need the sense of urgency to take effective action, that if the problem were smaller or we had more time, they and others would just put it off or bounce off of it. 1) people are bouncing off of the insane and false-sounding levels of urgency you are presenting (at some point you’re just a street preacher yelling about the apocalypse) and 2) maybe you can scare yourself into working a little harder and a little faster in the short term, but inflated urgency is a recipe for burnout. The reason I hate timeline talk is that we’re not just approaching one event— there are lots of harms and risks it would be good to mitigate, and they dawn at different speeds, and some are undoable and others aren’t. Only in the very worst scenarios does everything happen in an instant, but even in that case thinking of FOOM as some kind of deadline is a really unhelpful guide to action. Action should be guided by when we can intervene, not when we’re fucked.

So, here’s my vision of the stakes in AI Safety, cut down to size:

The problem with powerful AI is that our world is fragile, depending on many delicate equilibria, many of which we are unaware of because our capabilities level is too low to easily upset them. Introducing a more powerful intelligence, without our many biological constraints, will disrupt many of those equilibria in ways we either can’t control or constrain or can’t even anticipate will be disrupted because we completely take them for granted.
The harms of this disruption will not come all at once, and shades of many of them are already here (things like algorithmic bias, deepfakes, job loss likely soon). It’s not that there could never be *better* equilibria established for a lot of these things with the aid of AI, like a world where people don’t have to work to have what they need, but disrupting the equilibrium we have causes chaos->suffering and we do not have a plan for getting to the better equilibrium. (Up until recently, the AI Safety community’s deadass plan was that the aligned AI would figure it out.)
Given the capabilities of current models, at any time, there could be a bad accident due to the models pursuing goals that we didn’t anticipate they would have, that they hid during training and testing, following human instructions that are antisocial (like war or terrorism) or following instructions that were just unwise (the operator doesn’t realize they’re instructing the model to destroy the atmosphere or something). The nature and magnitude of accidents possible grows as the model training scales, because the model is more and more able to plow through our evolved societal ecosystem that we depend on.
The scale of the danger really could cripple civilization or cause extinction, and the possibility of this alone is reason enough to pursue pausing frontier AI development. Furthermore, the people of the world (I’m judging mostly by US polls, but everything I've seen is consistent) don’t want their world radically disrupted, so it’s not okay for people who want to build AGI and take a crazy chance to improve the world to do that.

One way my take^ differs from the standard LessWrong debate positions is that I don’t think we need extraordinary circumstances to justify telling people they can’t build something dangerous or sufficiently disruptive. I’m not that much of a libertarian, so I’m allowed to care about the softer harms leading up to the possibility of extinction. I think the fact that I’m not articulating AI danger in such a way as to accommodate strong libertarianism makes my version less tortured and way easier to understand.

My headcanon is that one reason the traditional AI Safety crowd has such high (90+%) p(doom) estimates is that they need AI to be a uniquely dangerous technology to justify intervention to control it rather than feeling free to acknowledge that advancing technological capabilities almost by definition carries risk to society that society might want to mitigate. Or, rather, they need a bright red line to distinguish AI from the rest of the reference class of technology, which is always good. The idea that building more advanced technology is always “progress” and that “progress” is always good just doesn’t stand up to the reality of society’s vulnerability to major disruptions, and I believe the libertarian AI Safety wing really struggles to reconcile the danger they see and want to prevent with their political and economic ideology. Free of the libertarian identity and ideology, I have no trouble acting on a lower and, I believe, more realistic p(doom).

I also feel free to share that I put my remaining 10-15% probability mass on us just missing something entirely in our models and nothing that bad happening with AGI development– say LLMs are aligned by default or something and then scaled-up LLMs protect us from other architectures that could be misaligned, or we miraculously get lucky and do “hit a wall” with machine learning that gives us crucial time. I think there is a real possibility that things with advanced AI will accidentally turn out fine even without intervention. But, Holly, you run PauseAI US–doesn’t that undermine your message? A lot in AI Safety seem to think you need a guarantee of catastrophic AI harm for it to be worth trying to pause or regulate it. Why would you need that? Imo it’s because they are strong libertarians and think with anything less than a guarantee of harm, the developers' rights to do whatever they want take precedence. They don’t want to look like “Luddites”. They may also have Pollyannaish beliefs about the market sorting everything out as long as not literally everyone dies and callous
“price of progress” thinking toward everyone who does suffer and die in that scenario.

I’m not saying you shouldn’t be a libertarian. I am saying that you need to realize that libertarianism is a separate set of values that you bring to the facts and probabilities of AI danger. The two have been so tightly intertwined for decades in AI Safety that the ideological assumptions of libertarianism in so much AI Safety dogma have been obscured and are not properly owned up to as political views. You can be someone who thinks that you can’t interfere with the development of AI if it is merely very harmful to society and doesn’t kill everyone, but that’s not a fact about AI– that’s a fact about your values. And can you imagine if we applied this standard toward any other danger in our lives? It’s been pointed out there are more regulations on selling a sandwich than making AGI. Any given sandwich made in unsanitary conditions probably won’t poison you. There are extreme libertarians who think the government shouldn’t be able to control how hygienically people make sandwiches they sell. So? That doesn’t mean it’s a fact that it’s immoral to demand safety guarantees. I think it’s extremely moral to demand safety on something we have as much reason to believe to be dangerous and deadly as advanced AI. Rather than (I allege) ratchet up your p(doom) to a near certainty of doom to be allowed to want regultions on AI, you could just have a less extreme view on when it is okay to regulate potential dangers.

detNov 12 202422

I agree with the main ideas of this post. But I want to flag that, as someone who's engaged with the AI safety community (outside of the Bay Area) for several years, I don't recognize this depiction. In my experience, it's very common to say "even a [1, 5, 10]% chance of AI x-risk justifies taking this very seriously."

I don't have time to track down many examples, but just to illustrate:

Neel Nanda's "Simplify EA Pitches to 'Holy Shit, X-Risk'": "If you believe the key claims of "there is a >=1% chance of AI causing x-risk and >=0.1% chance of bio causing x-risk in my lifetime" this is enough to justify the core action relevant points of EA."
Toby Ord estimated AI x-risk at 10% in The Precipice.
Scott Alexander is sympathetic to libertarianism and put AI x-risk at 20-25% in 2023.

To address another claim: "The Dial of Progress" by Zvi, a core LessWrong contributor, makes the case that technology is not always good (similar to the "Technology Bucket Error" post) and the comments overwhelmingly seem to agree.

I'm sure someone has said that 10-25% x-risk would not be worth addressing due to libertarianism -- but I don't believe I've heard this argument, and wasn't able to find someone making it after a few minutes of searching. (But it's a hard thing to search for.)

I don't doubt that Holly is accurately reporting her experiences, and she's almost certainly engaged more widely than I have with people in AI safety. I wouldn't be surprised if there are people saying these things in the Bay Area. But I don't have the impression that they represent the mainstream of the AI safety community in any forum I'm familiar with (Twitter, LW, and definitely EA).

Neel NandaNov 15 202413

In fairness, I wrote my post because I saw lots of people making arguments for a far stronger claim than necessary, and was annoyed by this

Holly Elmore ⏸️ 🔸Nov 13 20246

I am not making a case for who should be referred to as "the AI Safety community", and if you draw a larger circle then you get a lot more people with lower p(doom)s. You still get mostly x-risk people as opposed to other risks and people thinking that it's only x-risk that justifies intervening, implicitly if not explicitly.

> To address another claim: "The Dial of Progress" by Zvi, a core LessWrong contributor, makes the case that technology is not always good (similar to the "Technology Bucket Error" post) and the comments overwhelmingly seem to agree.

This post is a great example of my point. If you pay attention to the end, Zvi says that the dial thinking is mostly right and is morally sympathetic to it, just that AGI is an exception. I wanted it to mean what you thought it meant :(

detNov 13 20241

Again, just giving my impressions from interacting with AI safety people: it doesn't seem to me like I get this impression by drawing a larger circle -- I don't recall hearing the types of arguments you allude to even from people I consider "core" to AI safety. I think it would help me understand if you were able to provide some examples? (Although like I said, I found examples either way hard to search for, so I understand if you don't have any available.)

I still disagree about the Dial post: at the end Zvi says

Seeing highly intelligent thinkers who are otherwise natural partners and allies making a variety of obvious nonsense arguments, in ways that seem immune to correction, in ways that seem designed to prevent humanity from taking action to prevent its own extinction, is extremely frustrating. Even more frustrating is not knowing why it is happening, and responding in unproductive ways.

So my read is that he wants to explain and understand the position as well as possible, so that he can cooperate as effectively as possible with people who take the Dial position. He also agrees on lots of object-level points with the people he's arguing against. But ultimately actually using the Dial as an argument is "obvious nonsense," for the same reason the Technology Bucket Error is an error.

Holly Elmore ⏸️ 🔸Nov 13 20243

I was going on my memory of that post and I don't have the spoons to go through it again, so I'll take your word for it.

EA Forum Bot Site
EA Forum

Cutting AI Safety down to size

90

90

Reactions

More posts like this