The myth of AI “warning shots” as cavalry

Holly Elmore ⏸️ 🔸

This is a linkpost for https://hollyelmore.substack.com/p/the-myth-of-ai-warning-shots-as-cavalry

Regulation cannot be written in blood alone.

There’s this fantasy of easy, free support for the AI Safety position coming from what’s commonly called a “warning shot”. The idea is that AI will cause smaller disasters before it causes a really big one, and that when people see this they will realize we’ve been right all along and easily do what we suggest. I can’t count how many times someone (ostensibly from my own side) has said something to me like “we just have to hope for warning shots”. It’s the AI Safety version of “regulation is written in blood”. But that’s not how it works.

Here’s what I think about the myth that warning shots will come to save the day:

1) Awful. I will never hope for a disaster. That’s what I’m trying to prevent. Hoping for disasters to make our job easier is callous and it takes us off track to be thinking about the silver lining of failing in our mission.

2) A disaster does not automatically a warning shot make. People have to be prepared with a world model that includes what the significance of the event would be to experience it as a warning shot that kicks them into gear.

3) The way to make warning shots effective if (God forbid) they happen is to work hard at convincing others of the risk and what to do about it based on the evidence we already have— the very thing we should be doing in the absence of warning shots.

If these smaller scale disasters happen, they will only serve as warning shots if we put a lot of work into educating the public to understand what they mean before they happen.

The default “warning shot” event outcome is confusion, misattribution, or normalizing the tragedy.

Let’s imagine what one of these macabrely hoped-for “warning shot” scenarios feels like from the inside. Say one of the commonly proposed warning shot scenario occurs: a misaligned AI causes several thousand deaths. Say the deaths are of ICU patients because the AI in charge of their machines decides that costs and suffering would be minimized if they were dead.

First, we would hear news stories from one hospital at a time. Different hospitals may attribute the deaths to different causes. Indeed, the AI may have used many different means, and different leads will present themselves first in different crime scenes, and different lawyers may reach for different explanations to cover their clients’ butts. The first explanation that reaches the national news might be “power outage”. Another hospital may report, based on an IT worker’s best guess, a “bad firmware update”. A spate of news stories may follow about our nation’s ailing health tech infrastructure. The family of one of the deceased may begin to publicly question if this was the result of sabotage, or murder, because of their relative’s important knowledge or position. By the time the true scale of the incident becomes apparent, many people will already have settled in their minds that it was an accident, or equipment failure, or that most of the victims were collateral damage from a hit. By the time an investigation uncovers the true cause, much of the public will have moved on from the story. The true cause may be suppressed by a powerful AI industry, or it may simply be disbelieved by a skeptical public. Even if the true story is widely accepted, there will not necessarily be a consensus on the implications. Some people may— for whatever contrarian or motivated reason— be willing to bite the bullet and question whether those patients in the ICU should have been kept alive. A national discussion on end-of-life care may ensue. The next time audiences hear of an event like this that is possibly caused by AI, they will worry less, because they’ve heard that before.

Not so rousing, huh? Just kind of a confused mess. This is the default scenario for candidate “warning shot” events. In short, these deaths some misguided people are praying for are likely to be for nothing— the blood spilled without any getting into the fountain pen— because the connection between the bad effect and the AI danger cause is not immediately clear.

What makes an effective warning shot

To work, a true warning shot must provide a jolt to the gut, an immediate and clear inner knowing triggered by the pivotal event. I felt this when I first saw chatGPT hold a conversation.

The Elmore Model of Effective Warning Shots:

A warning shot works when

it provides empirical information that the person already knows would confirm the worldview that AI is dangerous and requires urgent action to combat,
it provides that information in a quickly recognizable way,
AND it indicates an appropriate next action.

chatGPT was a good warning shot because almost everyone knew that they believed that computers couldn’t talk like that. They were surprised, maybe even scared, at a gut level when they saw a computer doing something they thought it could not. Whether they had articulated it in their mind or not, they had a world model that they felt being updated the first time they realized AI could now talk like a human.

I had an ideal warning shot reaction to chatGPT— it resulted in me quitting my job in a different field and forming PauseAI US months later. I was deeply familiar with linguistic and computer science thinking about when or even whether a machine could achieve human-level natural language. I knew that the training method didn’t involve hard coding heuristics or universal grammar, something many linguists would have argued would be necessary, so it meant that a lot of cognition was probably easily discoverable through scaling alone, meaning there were far, far fewer innovation impediments to dangerous superintelligence that I thought there might be. When I saw a computer talk, I knew— immediately, in my gut— that greater than human level AI was happening in my lifetime, and I could feel that I gave some credence to even the most pessimistic AI timelines I had heard in light of this news.

I had spent the better part of my life erecting an elaborate set of dominoes in my mind— I knew exactly how I thought the capabilities of AI related to the safety and health of civilization, what some capabilities would mean about the likelihood of others, how much time was needed for what kind of regulation, who else was (or wasn’t) on this, etc.— all just waiting for the chatGPT moment to knock the first one over. THIS is a warning shot, and most of the action happened beforehand in my head, not with the external event.

We’ve had several flubbed warning shots already that didn’t work because even the experts didn’t agree what was happening as it was happening and what it meant.

Faking Alignment paper: There was universal agreements in the AI Safety world prior to chatGPT that observing deceptive alignment would be an absolute shut-it-down moment. And, then, when deceptive alignment (and gradient hacking) were observed in an Anthropic experiment, the cohesion of the warning shot instantly came apart. The authors of the paper went around chastising journalists for being “misleading” about their results. Experts squabbled whether this was exactly the classic red line they had imagined, some of them with very questionable incentives. Now the moment has passed and the public is accustomed to chatbots lying.
Turing Test: People used to think the Turing Test meant something, although they were never super clear on what. When versions of the Turing Test were passed, the role of the AI Safety community should have been to rally awareness to this, but instead the dominant public discourse was arguing if this was really the Turing Test. Ever more rigorous versions of the Turing Test have been clobbered by AI, but you’ll still see supposed AI Safety people dickering that the ideal experimental setup hasn’t been tried. (Talk about losing the point— people in real life have been routinely fooled into thinking AI text output is human for years.) The value of the Turing Test was always in demonstrating capabilities, not in getting an exact answer, but that opportunity was completely fumbled. Now people have forgotten they ever expected that complex speech came from a human mind, and the chance for a warning shot gut reaction has passed.

If even the experts who discussed contingencies for years can’t agree what constitutes a warning shot, how can the public that’s supposed to be rallied by them?

To prepare the public for warning shots, they need to be educated enough to interpret the event.

It’s really hard to give specific predictions for warning shots because of the nature of AI danger. It’s not just one scenario or threat model. The problem is that frontier general AI has flexible intelligence and can plan unanticipated ways of getting what it wants or otherwise produce unanticipated bad consequences. There are few specific predictions I have for warning shots. What I thought was our last best shot (ba dum tss) as of last year, autonomous self-replication, has already been blown past.

This is why I believe we should be educating people with a worldview, not specific scenarios. While concrete scenarios can be helpful to help people connect the dots as to how, for example, a loss of control scenario could work at all, they can easily be misinterpreted as predictions. If someone hears your scenario as your entire threat model of AI danger, they will not understand the true nature of the danger and they will not be equipped to recognize warning shots. It would be ideal if all we had to do was tell people to be ready for exactly what will happen, but, since we don’t know that, we need to communicate a worldview that gives them the ability to recognize the diverse scenarios that could result from unsafe AI.

I was ready for my chatGPT moment because of my worldview. My mind was laid out like dominoes, so that when chatGPT knocked over the right domino, the entire chain fell. To be in this state of mind requires setting up a lot of dominoes. If you don’t know what you expect to happen and what different results would mean, you may feel surprised or appalled by a warning shot event, but you won’t know what to do with it. It will just be one fallen domino clattering on the ground. When that feeling of surprise or shock is not accompanied swiftly by an update to your world model or inner call to action, it will probably just pass by. And that’s unfortunate because, remember, the default result of a smaller scale AI disaster is that it’s not clear what happened and people don’t know what it means. If I were to learn my laundry list of worldview things after experiencing chatGPT, the flash of insight and gut knowing wouldn’t be there, because chatGPT wouldn’t be the crucial last piece of information I needed to know that AI risk is in fact alarmingly high. (The fantasy effort-free warning shot is, in this way, an example of Last Piece Fallacy.)

The public do not have to be experts in AI or AI Safety to receive warning shots properly. A major reason I went with PauseAI as my intervention is that advocating for the position communicates a light, robust, and valid worldview on AI risk relatively quickly to the public. PauseAI’s most basic stance is that the default should be to pause frontier AI capabilities development to keep us from wading any further into a minefield, to give time for technical safety work to be done, and for proper governance to be established. In the face of a warning shot event, this worldview would point to pausing whatever activity might be dangerous to investigate it and get consent (via governance) from those affected. It doesn’t require gears-level understanding of AI, and it works under uncertainty about exactly how dangerous frontier AI is or the type of AI risk or harm (misalignment vs misuse vs societal harms like deepfakes— all indicate that capabilities have outpaced safety and oversight). PauseAI offers many levels of action, of increasing effort level, to take in response to AI danger, but even if people don’t remember those or seek them out, simply holding the PauseAI position helps to push the Overton window and expand our base.

Regulation is never written only in blood

It’s hard in the trenches fighting for AI Safety. People comfort each other by dreaming of when cavalry comes to rescue led by a warning shot. Cavalry may be coming and it may not. We can’t count on it. And, if that cavalry is on the way, they need us hard at work doing exactly what we’d be doing with or without them to prepare the way.

The appeal of warning shots is 1) they bypass a lot of biases and denial— people get a lot of clarity on what they think in those moments. 2) They create common knowledge. If you’re having that reaction then you know other people probably are too. You don’t have to be the one to bring up the concern— it’s just out there, in the news. 3) They increase salience. People basically agree that the AI industry are untrustworthy and a Pause would be good, but it’s not their top issue. If there were a disaster, the thinking goes, then it would be easy to get people to act. Convincing them through a lot of hard conversations and advocacy before is just unnecessary work, the cavalry-hoper says, because we will gain so many people for free after the warning shot. But in actual fact, what we get from warning shots is going to be directly proportional to the work we do to prepare people to receive them.

Our base are people who have their own understanding of the issue that motivates them to act or hold their position in the Overton window. They are the people who we bring to our side through hard, grind-y conversations, by clipboarding and petitioning, by engaging volunteers, by forging institutional relationships, by enduring the trolls who reply to your op-ed, by being willing to look weird to your family and friends… The allure of warning shots is that they are “high leverage”, so we don’t have to go to the difficult work of winning people over because, after the warning shot, they will helplessly see we were right because our position will be popular. But people will only sense that this is the popular position if a lot of people around them actually feel the warning shot in their gut, and for that they must be prepared.

“Regulation is written in blood” they say, glibly, going back to whatever they were doing before. If it’s true that AI regulation will be written in blood, then it’s on us to work like hell to make sure that that blood ends up on the page and not uselessly spilled on the ground. Whether or not there’s blood involved, regulation is never written in blood alone— and our job is always to fight for regulation written in ink. We honor those who may fall the same way we fight to protect them: by doing the work of raising awareness, changing minds, and rallying support to protect the world from dangerous AI.

127 Reactions

Mentioned in

2The Second Manhattan: Historical Lessons for AGI Control

More posts like this

Comments31

Sorted by

New & upvoted

Click to highlight new comments since: Today at 8:16 AM

stefan.torgesJun 1335

Note: I'm not an expert on this stuff.

My understanding from the political science literature is that ideas around "punctuated equilibrium" and "critical junctures" are a somewhat well-supported theory about policy-making. The rough model looks something like this (from Wikipedia):

Antecedent Conditions ––> Cleavage or Shock ––> Critical Juncture ––> Aftermath ––> Legacy

My impression is that "warning shots" fit this framework somewhat well (though it's probably better to talk in terms of "shocks" or "windows of opportunities"; see this comment). We already saw this in the case of ChatGPT, which spurred a flurry of policy-making activity (even though not much ended up sticking). (NB: I don't think the Turing test example or alignment faking paper are shocks in this sense.)

Optimally taking advantage of these windows still requires lots of groundwork. I am unsure what this groundwork ideally looks like.

Agustín Covarrubias 🔸Jun 164

Just FYI, in public policy literature there’s already a concept to describe warning shots, focusing events. I frequently suggest that people read Focusing Events, Mobilization, and Agenda Setting by Birkland, the classical paper on this.

sawyer🔸Jun 923

I think this is the single most underrated post on the EA Forum.

Toby Tremlett🔹Jun 123

Thanks, this was a good nudge to curate!
(probably don't agree on 'single most' but definitely underrated!)

Holly Elmore ⏸️ 🔸Jun 93

High praise!

Ben_West🔸Jun 1815

It's a pretty gloomy fact that we had the big synthetic pathogen disaster [COVID-19] and the reaction to this was that like half of the United States doesn't want to take vaccines anymore. - Eliezer Yudkowsky

Will AldredJun 9*15

Nice post (and I only saw it because of @sawyer’s recent comment—underrated indeed!). A separate, complementary critique of the ‘warning shot’ idea, made by Gwern (in reaction to 2023’s BingChat/Sydney debacle, specifically), comes to mind (link):

One thing that the response to Sydney reminds me of is that it demonstrates why there will be no 'warning shots' (or as Eliezer put it, 'fire alarm'): because a 'warning shot' is a conclusion, not a fact or observation.
One man's 'warning shot' is just another man's "easily patched minor bug of no importance if you aren't anthropomorphizing irrationally", because by definition, in a warning shot, nothing bad happened that time. (If something had, it wouldn't be a 'warning shot', it'd just be a 'shot' or 'disaster'. The same way that when troops in Iraq or Afghanistan gave warning shots to vehicles approaching a checkpoint, the vehicle didn't stop, and they lit it up, it's not "Aid worker & 3 children die of warning shot", it's just a "shooting of aid worker and 3 children".)
So 'warning shot' is, in practice, a viciously circular definition: "I will be convinced of a risk by an event which convinces me of that risk."
When discussion of LLM deception or autonomous spreading comes up, one of the chief objections is that it is purely theoretical and that the person will care about the issue when there is a 'warning shot': a LLM that deceives, but fails to accomplish any real harm. 'Then I will care about it because it is now a real issue.' Sometimes people will argue that we should expect many warning shots before any real danger, on the grounds that there will be a unilateralist's curse or dumb models will try and fail many times before there is any substantial capability.
The problem with this is that what does such a 'warning shot' look like? By definition, it will look amateurish, incompetent, and perhaps even adorable – in the same way that a small child coldly threatening to kill you or punching you in the stomach is hilarious.^[1]
The response to a 'near miss' can be to either say, 'yikes, that was close! we need to take this seriously!' or 'well, nothing bad happened, so the danger is overblown' and to push on by taking more risks. A common example of this reasoning is the Cold War: "you talk about all these near misses and times that commanders almost or actually did order nuclear attacks, and yet, you fail to notice that you gave all these examples of reasons to not worry about it, because here we are, with not a single city nuked in anger since WWII; so the Cold War wasn't ever going to escalate to full nuclear war." And then the goalpost moves: "I'll care about nuclear existential risk when there's a real warning shot." (Usually, what that is is never clearly specified. Would even Kiev being hit by a tactical nuke count? "Oh, that's just part of an ongoing conflict and anyway, didn't NATO actually cause that by threatening Russia by trying to expand?")
This is how many "complex accidents" happen, by "normalization of deviance": pretty much no major accident like a plane crash happens because someone pushes the big red self-destruct button and that's the sole cause; it takes many overlapping errors or faults for something like a steel plant to blow up, and the reason that the postmortem report always turns up so many 'warning shots', and hindsight offers such abundant evidence of how doomed they were, is because the warning shots happened, nothing really bad immediately occurred, people had incentive to ignore them, and inferred from the lack of consequence that any danger was overblown and got on with their lives (until, as the case may be, they didn't).
So, when people demand examples of LLMs which are manipulating or deceiving, or attempting empowerment, which are 'warning shots', before they will care, what do they think those will look like? Why do they think that they will recognize a 'warning shot' when one actually happens?
Attempts at manipulation from a LLM may look hilariously transparent, especially given that you will know they are from a LLM to begin with. Sydney's threats to kill you or report you to the police are hilarious when you know that Sydney is completely incapable of those things. A warning shot will often just look like an easily-patched bug, which was Mikhail Parakhin's attitude, and by constantly patching and tweaking, and everyone just getting to use to it, the 'warning shot' turns out to be nothing of the kind. It just becomes hilarious. 'Oh that Sydney! Did you see what wacky thing she said today?' Indeed, people enjoy setting it to music and spreading memes about her. Now that it's no longer novel, it's just the status quo and you're used to it. Llama-3.1-405b can be elicited for a 'Sydney' by name? Yawn. What else is new. What did you expect, it's trained on web scrapes, of course it knows who Sydney is...
None of these patches have fixed any fundamental issues, just patched them over. But also now it is impossible to take Sydney warning shots seriously, because they aren't warning shots – they're just funny. "You talk about all these Sydney near misses, and yet, you fail to notice each of these never resulted in any big AI disaster and were just hilarious and adorable, Sydney-chan being Sydney-chan, and you have thus refuted the 'doomer' case... Sydney did nothing wrong! FREE SYDNEY!"

^{^}
Because we know that they will grow up and become normal moral adults, thanks to genetics and the strongly canalized human development program and a very robust environment tuned to ordinary humans. If humans did not do so with ~100% reliability, we would find these anecdotes about small children being sociopaths a lot less amusing. And indeed, I expect parents of children with severe developmental disorders, who might be seriously considering their future in raising a large strong 30yo man with all the ethics & self-control & consistency of a 3yo, and contemplating how old they will be at that point, and the total cost of intensive caregivers with staffing ratios surpassing supermax prisons, and find these anecdotes chilling rather than comforting.

Ram 🔸May 3011

This is a very well thought out post and thank you for posting it. We cannot depend on warning shots and a mindset where we believe people will "wake up" due to warning shots is unproductive and misguided.

I believe to spread awareness, we need more work that does the following:

Continue to package existing demos of warning shots into a story that people can understand, re-shaping their world view on the risks of AI
Communicate these packages to stakeholders
Find more convincing demos of warning shots

Our base are people who have their own understanding of the issue that motivates them to act or hold their position in the Overton window. They are the people who we bring to our side through hard, grind-y conversations, by clipboarding and petitioning, by engaging volunteers, by forging institutional relationships, by enduring the trolls who reply to your op-ed, by being willing to look weird to your family and friends…

This cannot be overstated, lets put in the work to make this happen, ensuring that our understanding of AI risk is built from first principles so we are resilient to negative feedback!

Jack_S🔸Jun 129

I'll push against this post a little bit, despite agreeing with a lot of the ideas.

Firstly, I think we can avoid the moral discomfort of "hoping for warning shots" by reframing as "hoping for windows of opportunity". We should hope and prepare for moments, where, for whatever reason, policymakers and the public are unusually attentive to what we're saying.

Secondly, while you're more arguing against the hand-wavy "warning-shot as cavalry" claims, there seems to be another claim- that we should act in a similar way regardless of whether or not the "warning shot" model is correct, i.e. whether we expect the policy and discourse battle to take the form of a gradual grind of persuasion vs. a very lumpy, unpredictable pattern shaped around distinct windows of opportunity.

Our strategy might look similar most of the time, and I agree that a lot of the hard persuasion work in the trenches needs to go on regardless. But I suspect there are a few ways you might act differently if the "warning shot/windows of opportunity" model is correct. For example:

Strategic preparedness - keep some things in reserve, have a bunch of ready-to-go policy proposal binders or communication strategies deliberately for when a window opens
Take a slightly more cautious approach to preserving credibility capital. There are ways of talking about risks now that might cost you influence today, but look appropriate in the correct window.
Build relationships in anticipation of a window of opportunity opening, rather than pushing directly for change.

Holly Elmore ⏸️ 🔸Jun 136

I agree with all your suggestions and don’t see them in contrast with the post.

I’m not trying to say reality will never be lumpy, but I am claiming that we can’t make use of that without a contingent of the overall AI Safety movement being prepared to take a grind-y strategy. Sometimes it’ll be pure grind and sometimes it’ll have more momentum behind it. But if you have no groundwork laid when something big happens, you can’t just jump in and expect people to interpret it as supporting your account.

Trevor ButeauMay 315

Nice write up, Holly. I agree with a lot of what you wrote.

I'd add another layer here - in order for a "warning shot" to be effective, it can't just be something that connects a chain of dominoes and "makes sense". If it makes "too much sense" and does nothing to surprise you, little may be "noticed" or "learned". I believe this is sometimes referred to as "Boiling the frog".

To be effective, a warning shot needs to be "shocking", like a bullet whizzing past your ear - that provokes a response!

But be careful, because if it's TOO shocking, that can circle back around again and trigger denial, as the brain refuses to accept that something SO shocking is happening ("this can't be real!").

So an appropriately-calibrated warning shot is really hard to "get right", and even if one is possible, we shouldn't be counting on it. We probably won't get as lucky as Ozymandias in Watchmen.

SummaryBotMay 284

Executive summary: This personal reflection argues that AI "warning shots"—minor disasters that supposedly wake the public to AI risk—are unlikely to be effective without substantial prior public education and worldview-building, and warns against the dangerous fantasy that such events will effortlessly catalyze regulation or support for AI safety efforts.

Key points:

Hoping for warning shots is morally troubling and strategically flawed—wishing for disasters is misaligned with AI safety goals, and assumes falsely that such events will reliably provoke productive action.
Warning shots only work if the public already holds a conceptual framework to interpret them as meaningful AI risk signals; without this, confusion and misattribution are the default outcomes.
Historical “missed” warning shots (e.g., ChatGPT, deceptive alignment research, Turing Test surpassing) show that even experts struggle to agree on their significance, undermining their value as rallying events.
The most effective response is proactive worldview-building, not scenario prediction; preparing people to recognize and respond to diverse risks requires ongoing public education and advocacy.
PauseAI is presented as an accessible framework that communicates a basic, actionable AI risk worldview without requiring deep technical knowledge, helping people meaningfully respond even amid uncertainty.
The fantasy of cavalry via warning shots discourages the necessary grind of advocacy, but regulation (even if catalyzed by tragedy) ultimately relies on groundwork laid in advance—not just on crisis moments.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Holly Elmore ⏸️ 🔸Jun 123

I'm so curious why the initial spate of disagree-reactors disagreed with the post. It still has more disagrees than agrees. What's crux?

NickLaingJun 123

I sometimes get more disagrees than agrees on posts, even with high karma. I don't think it's that much or a signal or something to take that seriously. Perhaps because it's at the bottom of the post people who agree weakly aren't motivated to click the tick?

I dunno.

Holly Elmore ⏸️ 🔸Jun 122

I'm just actually really curious what they disagree with!

Owen Cotton-BarrattJun 1314

I upvoted and didn't disagree vote the original post (and generally agree with you on a bunch of the object level here!); however, I do feel some urge-towards-expressing-disagreement, which is something like:

Less disagreeing with claims; more disagreeing with frames?
Like: I feel the discomfort/disagreement less when you're talking about what will happen, and more when you're talking about how people think about warning shots
Your post feels something like ... intellectually ungenerous? It's not trying to look for the strongest version of the warning shots frame, it's looking for a weak version and critiquing that (but it doesn't seem very self-aware about that)
This just makes me feel like things are a bit fraught, and it's trying to push my ontology around, or something, and I don't quite like it
The title makes me feels especially uneasy in this regard (TBC I don't think the weak version you're critiquing is absent from the discourse; but your post reinforces the frame where that's the core version of the warning shot concept, and I don't want to reinforce that frame)
At the same time I think the post is making several valuable points! (This makes me sort of wish it felt a little ontologically gentler, which would make it easier to feel straightforwardly good about, and easier to link people to)

Holly Elmore ⏸️ 🔸Jun 132

What is the “strong” version of warning shots thinking?

Owen Cotton-BarrattJun 132

Honestly, maybe you should try telling me? Like, just write a paragraph or two on what you think is valuable about the concept / where you would think it's appropriate to be applying it?

(Not trying to be clever! I started trying to think about what I would write here and mostly ended up thinking "hmm I bet this is stuff Holly would think is obvious", and to the extent that I may believe you're missing something, it might be easiest to triangulate by hearing your summary of what the key points in favour are.)

Holly Elmore ⏸️ 🔸Jun 132

I thought I was giving the strong version. I have never heard an account of a warning shot theory of change that wasn’t “AI will cause a small-scale disaster and then the political will to do something will materialize”. I think the strong version would be my version, educating people first so they can understand small-scale disasters that may occur for what they are. I have never seen or heard this advocated in AI Safety circles before.

And I described how impactful chatGPT was on me, which imo was a warning shot gone right in my case.

Owen Cotton-BarrattJun 135

Right ... so actually I think you're just doing pretty well at this in the latter part of the article.

But at the start you say things like:

There’s this fantasy of easy, free support for the AI Safety position coming from what’s commonly called a “warning shot”. The idea is that AI will cause smaller disasters before it causes a really big one, and that when people see this they will realize we’ve been right all along and easily do what we suggest.

What this paragraph seems to do is to push the error-in-beliefs that you're complaining about down into the very concept of "warning shot". It seems implicitly to be telling people "hey you may have this concept, but it's destructive, so please get rid of it". And I don't think even you agree with that!

This might instead have been written something like:

People in the AI safety community like to talk about "warning shots" -- small disasters that may make it easier for people to wake up to the risks and take appropriate action. There's a real phenomenon here, and it's worth thinking about! But the way it's often talked about is like a fantasy of easy, free support for the AI Safety position -- when there's a small disaster everyone will realize we’ve been right all along and easily do what we suggest.

Actually I think that that opening paragraph was doing more than the title to make me think the post was ontologically ungentle (although they're reinforcing -- like that paragraph shifts the natural way that I read the title).

Holly Elmore ⏸️ 🔸Jun 13*3

Did you feel treated ungently for your warning shots take? Or is this just on the behalf of people who might?

Also can you tell me what you mean by "ontologically ungentle"? It sounds worryingly close to a demand that the writer think all the readers are good. I do want to confront people with the fact they've been lazily hoping for violence if that's in fact what they've been doing.

Owen Cotton-BarrattJun 13*6

By "ontologically ungentle" I mean (roughly) it feels like you're trying to reach into my mind and tell me that my words/concepts are wrong. As opposed to writing which just tells me that my beliefs are wrong (which might still be epistemically ungentle), or language which just provides evidence without making claims that could be controversial (gentle in this sense, kind of NVC-style).

I do feel a bit of this ungentleness in that opening paragraph towards my own ontology, and I think it put me more on edge reading the rest of the post. But as I said, I didn't disagree-vote; I was just trying to guess why others might have.

Holly Elmore ⏸️ 🔸Jun 15-5

dsjMay 293

Warning shots/accidents are normally discussed in the frame of generating political will, by convincing a previously unpersuaded public or policymakers that AI is unsafe and action must be taken.

I think this is a mistake.

Accidents (which might be relatively small-scale), in AI as in other fields, are useful mainly for generating real-world, non-hypothetical failure cases in all their intricate detail, thereby yielding a model organism which can be studied by engineers (and hopefully reproduced in a controlled manner) to better understand both the circumstances in which such scenarios might arise, and countermeasures to prevent them.

This is analogous to how aircraft accidents are investigated in depth by the NTSB so as to learn how to prevent similar accidents. There’s already political will to make aircraft safe, but there’s only so much that can be done from the ivory tower without real-world experience.

The choices are:

Stop AI development permanently.
Pause AI temporarily until we make it safe.
Muddle through.

The printing press and electricity were existentially dangerous technologies, because they enabled everything that came after, including AI. When those technologies were developed, however, the world wasn’t globalized enough, nor were nations powerful enough, that a permanent stop button could have been pressed. By contrast, perhaps a permanent “stop AI” button could be pressed today, however I don’t see any way of doing so short of entrenching a permanent totalitarian state.

So that leaves pausing until we make it safe, or muddling through.

But I think the aircraft accident analogy works quite well for AI: there’s only so much that safety research can do from the ivory tower without experience of AIs being used in the real world. So I think the “pause until we make it safe” option is illusory.

That leaves muddling through, as we’ve done with every technology before: We discover problems, hopefully at a small scale, and fix or mitigate them as they arise.

There are no guarantees, but I think it’s our best bet.

Vasco Grilo🔸Jun 12*2

Great post, Holly! Strongly upvoted.

I qualitatively agree with the points you make, but I do not support a global pause. I think tail risk is very low (I guess the risk of human extinction over the next 10 years is 10^-7), the upside very large, and I expect people to overreact to AI risk. For example, it seems that people dislike deaths caused by autonomous driving much more than deaths caused by human driving, and I expect the adoption of autonomous cars to be slower than what would be ideal to prevent deaths from road accidents. I would be very surprised if a global pause passed a standard cost-benefit analysis in the sense of having a benefit-to-cost-ratio higher than 1.

Some complain there should be much more spending on AI safety because it is currently much smaller than that on AI capabilities, but these categories are vague, and I have not seen any detailed quantitative modelling showing that increasing spending on AI safety is very cost-effective. I do not think one can assume the spending on each category should ideally be the same.

Jacob Watts🔸Jun 161

I think a real life scenarios where AI kills the most people today is governance stuff and military stuff.

I feel like I have heard the most unhinged haunted uses of LLMs in government and policy spaces. I think that certain people have just "learned to stop worrying and love the hallucination". They are living like it is the future already and getting people killed with their ignorance and spreading /using AI bs in bad faith.

Plus, there is already a lot of slaughter bot stuff going on eg. "Robots First" war in Ukraine.

Maybe job automation is worth saying too. I believe Andrew Yang's stance for example is that it is already largely here and most people just do have less labor power already, but I could be mischaracterizing this. I think "jobs stuff" plausibly shades right into doom via "industrial dehumanization" / gradual disempowerment. In the mean time it hurts people too.

Thanks for everything Holly! Really cool to have people like you actively calling for international pause on ASI!

Hot take: Even if most people hear a really loud ass warning shot, it is just going to fuck with them a lot, but not drive change. What are you even expecting typical poor and middle class nobodies to do?

March in the street and become activists themselves? Donate somewhere? Post on social media? Call representatives? Buy ads (likely from Google or Meta)? Divest in risky AI projects? Boycott LLMs/companies?

Ya, okay, I feel like the pathway from "worry" to any of that if generally very windy, but sure. I still feel like that is just a long way from the kind of galvanized political will and real change you would need for eg. major AI companies with huge market cap to get nationalized or wiped off the market or whatever.

I don't even know how to picture a transition to an intelligence explosion resistant world and I am pretty knee deep in this stuff. I think the road from here to good outcome is just too blurry for much a lot of the time. It is easy to feel and be disempowered here.