Regulation cannot be written in blood alone.
There’s this fantasy of easy, free support for the AI Safety position coming from what’s commonly called a “warning shot”. The idea is that AI will cause smaller disasters before it causes a really big one, and that when people see this they will realize we’ve been right all along and easily do what we suggest. I can’t count how many times someone (ostensibly from my own side) has said something to me like “we just have to hope for warning shots”. It’s the AI Safety version of “regulation is written in blood”. But that’s not how it works.
Here’s what I think about the myth that warning shots will come to save the day:
1) Awful. I will never hope for a disaster. That’s what I’m trying to prevent. Hoping for disasters to make our job easier is callous and it takes us off track to be thinking about the silver lining of failing in our mission.
2) A disaster does not automatically a warning shot make. People have to be prepared with a world model that includes what the significance of the event would be to experience it as a warning shot that kicks them into gear.
3) The way to make warning shots effective if (God forbid) they happen is to work hard at convincing others of the risk and what to do about it based on the evidence we already have— the very thing we should be doing in the absence of warning shots.
If these smaller scale disasters happen, they will only serve as warning shots if we put a lot of work into educating the public to understand what they mean before they happen.
The default “warning shot” event outcome is confusion, misattribution, or normalizing the tragedy.
Let’s imagine what one of these macabrely hoped-for “warning shot” scenarios feels like from the inside. Say one of the commonly proposed warning shot scenario occurs: a misaligned AI causes several thousand deaths. Say the deaths are of ICU patients because the AI in charge of their machines decides that costs and suffering would be minimized if they were dead.
First, we would hear news stories from one hospital at a time. Different hospitals may attribute the deaths to different causes. Indeed, the AI may have used many different means, and different leads will present themselves first in different crime scenes, and different lawyers may reach for different explanations to cover their clients’ butts. The first explanation that reaches the national news might be “power outage”. Another hospital may report, based on an IT worker’s best guess, a “bad firmware update”. A spate of news stories may follow about our nation’s ailing health tech infrastructure. The family of one of the deceased may begin to publicly question if this was the result of sabotage, or murder, because of their relative’s important knowledge or position. By the time the true scale of the incident becomes apparent, many people will already have settled in their minds that it was an accident, or equipment failure, or that most of the victims were collateral damage from a hit. By the time an investigation uncovers the true cause, much of the public will have moved on from the story. The true cause may be suppressed by a powerful AI industry, or it may simply be disbelieved by a skeptical public. Even if the true story is widely accepted, there will not necessarily be a consensus on the implications. Some people may— for whatever contrarian or motivated reason— be willing to bite the bullet and question whether those patients in the ICU should have been kept alive. A national discussion on end-of-life care may ensue. The next time audiences hear of an event like this that is possibly caused by AI, they will worry less, because they’ve heard that before.
Not so rousing, huh? Just kind of a confused mess. This is the default scenario for candidate “warning shot” events. In short, these deaths some misguided people are praying for are likely to be for nothing— the blood spilled without any getting into the fountain pen— because the connection between the bad effect and the AI danger cause is not immediately clear.
What makes an effective warning shot
To work, a true warning shot must provide a jolt to the gut, an immediate and clear inner knowing triggered by the pivotal event. I felt this when I first saw chatGPT hold a conversation.
The Elmore Model of Effective Warning Shots:
A warning shot works when
- it provides empirical information that the person already knows would confirm the worldview that AI is dangerous and requires urgent action to combat,
- it provides that information in a quickly recognizable way,
- AND it indicates an appropriate next action.
chatGPT was a good warning shot because almost everyone knew that they believed that computers couldn’t talk like that. They were surprised, maybe even scared, at a gut level when they saw a computer doing something they thought it could not. Whether they had articulated it in their mind or not, they had a world model that they felt being updated the first time they realized AI could now talk like a human.
I had an ideal warning shot reaction to chatGPT— it resulted in me quitting my job in a different field and forming PauseAI US months later. I was deeply familiar with linguistic and computer science thinking about when or even whether a machine could achieve human-level natural language. I knew that the training method didn’t involve hard coding heuristics or universal grammar, something many linguists would have argued would be necessary, so it meant that a lot of cognition was probably easily discoverable through scaling alone, meaning there were far, far fewer innovation impediments to dangerous superintelligence that I thought there might be. When I saw a computer talk, I knew— immediately, in my gut— that greater than human level AI was happening in my lifetime, and I could feel that I gave some credence to even the most pessimistic AI timelines I had heard in light of this news.
I had spent the better part of my life erecting an elaborate set of dominoes in my mind— I knew exactly how I thought the capabilities of AI related to the safety and health of civilization, what some capabilities would mean about the likelihood of others, how much time was needed for what kind of regulation, who else was (or wasn’t) on this, etc.— all just waiting for the chatGPT moment to knock the first one over. THIS is a warning shot, and most of the action happened beforehand in my head, not with the external event.
We’ve had several flubbed warning shots already that didn’t work because even the experts didn’t agree what was happening as it was happening and what it meant.
- Faking Alignment paper: There was universal agreements in the AI Safety world prior to chatGPT that observing deceptive alignment would be an absolute shut-it-down moment. And, then, when deceptive alignment (and gradient hacking) were observed in an Anthropic experiment, the cohesion of the warning shot instantly came apart. The authors of the paper went around chastising journalists for being “misleading” about their results. Experts squabbled whether this was exactly the classic red line they had imagined, some of them with very questionable incentives. Now the moment has passed and the public is accustomed to chatbots lying.
- Turing Test: People used to think the Turing Test meant something, although they were never super clear on what. When versions of the Turing Test were passed, the role of the AI Safety community should have been to rally awareness to this, but instead the dominant public discourse was arguing if this was really the Turing Test. Ever more rigorous versions of the Turing Test have been clobbered by AI, but you’ll still see supposed AI Safety people dickering that the ideal experimental setup hasn’t been tried. (Talk about losing the point— people in real life have been routinely fooled into thinking AI text output is human for years.) The value of the Turing Test was always in demonstrating capabilities, not in getting an exact answer, but that opportunity was completely fumbled. Now people have forgotten they ever expected that complex speech came from a human mind, and the chance for a warning shot gut reaction has passed.
If even the experts who discussed contingencies for years can’t agree what constitutes a warning shot, how can the public that’s supposed to be rallied by them?
To prepare the public for warning shots, they need to be educated enough to interpret the event.
It’s really hard to give specific predictions for warning shots because of the nature of AI danger. It’s not just one scenario or threat model. The problem is that frontier general AI has flexible intelligence and can plan unanticipated ways of getting what it wants or otherwise produce unanticipated bad consequences. There are few specific predictions I have for warning shots. What I thought was our last best shot (ba dum tss) as of last year, autonomous self-replication, has already been blown past.
This is why I believe we should be educating people with a worldview, not specific scenarios. While concrete scenarios can be helpful to help people connect the dots as to how, for example, a loss of control scenario could work at all, they can easily be misinterpreted as predictions. If someone hears your scenario as your entire threat model of AI danger, they will not understand the true nature of the danger and they will not be equipped to recognize warning shots. It would be ideal if all we had to do was tell people to be ready for exactly what will happen, but, since we don’t know that, we need to communicate a worldview that gives them the ability to recognize the diverse scenarios that could result from unsafe AI.
I was ready for my chatGPT moment because of my worldview. My mind was laid out like dominoes, so that when chatGPT knocked over the right domino, the entire chain fell. To be in this state of mind requires setting up a lot of dominoes. If you don’t know what you expect to happen and what different results would mean, you may feel surprised or appalled by a warning shot event, but you won’t know what to do with it. It will just be one fallen domino clattering on the ground. When that feeling of surprise or shock is not accompanied swiftly by an update to your world model or inner call to action, it will probably just pass by. And that’s unfortunate because, remember, the default result of a smaller scale AI disaster is that it’s not clear what happened and people don’t know what it means. If I were to learn my laundry list of worldview things after experiencing chatGPT, the flash of insight and gut knowing wouldn’t be there, because chatGPT wouldn’t be the crucial last piece of information I needed to know that AI risk is in fact alarmingly high. (The fantasy effort-free warning shot is, in this way, an example of Last Piece Fallacy.)
The public do not have to be experts in AI or AI Safety to receive warning shots properly. A major reason I went with PauseAI as my intervention is that advocating for the position communicates a light, robust, and valid worldview on AI risk relatively quickly to the public. PauseAI’s most basic stance is that the default should be to pause frontier AI capabilities development to keep us from wading any further into a minefield, to give time for technical safety work to be done, and for proper governance to be established. In the face of a warning shot event, this worldview would point to pausing whatever activity might be dangerous to investigate it and get consent (via governance) from those affected. It doesn’t require gears-level understanding of AI, and it works under uncertainty about exactly how dangerous frontier AI is or the type of AI risk or harm (misalignment vs misuse vs societal harms like deepfakes— all indicate that capabilities have outpaced safety and oversight). PauseAI offers many levels of action, of increasing effort level, to take in response to AI danger, but even if people don’t remember those or seek them out, simply holding the PauseAI position helps to push the Overton window and expand our base.
Regulation is never written only in blood
It’s hard in the trenches fighting for AI Safety. People comfort each other by dreaming of when cavalry comes to rescue led by a warning shot. Cavalry may be coming and it may not. We can’t count on it. And, if that cavalry is on the way, they need us hard at work doing exactly what we’d be doing with or without them to prepare the way.
The appeal of warning shots is 1) they bypass a lot of biases and denial— people get a lot of clarity on what they think in those moments. 2) They create common knowledge. If you’re having that reaction then you know other people probably are too. You don’t have to be the one to bring up the concern— it’s just out there, in the news. 3) They increase salience. People basically agree that the AI industry are untrustworthy and a Pause would be good, but it’s not their top issue. If there were a disaster, the thinking goes, then it would be easy to get people to act. Convincing them through a lot of hard conversations and advocacy before is just unnecessary work, the cavalry-hoper says, because we will gain so many people for free after the warning shot. But in actual fact, what we get from warning shots is going to be directly proportional to the work we do to prepare people to receive them.
Our base are people who have their own understanding of the issue that motivates them to act or hold their position in the Overton window. They are the people who we bring to our side through hard, grind-y conversations, by clipboarding and petitioning, by engaging volunteers, by forging institutional relationships, by enduring the trolls who reply to your op-ed, by being willing to look weird to your family and friends… The allure of warning shots is that they are “high leverage”, so we don’t have to go to the difficult work of winning people over because, after the warning shot, they will helplessly see we were right because our position will be popular. But people will only sense that this is the popular position if a lot of people around them actually feel the warning shot in their gut, and for that they must be prepared.
“Regulation is written in blood” they say, glibly, going back to whatever they were doing before. If it’s true that AI regulation will be written in blood, then it’s on us to work like hell to make sure that that blood ends up on the page and not uselessly spilled on the ground. Whether or not there’s blood involved, regulation is never written in blood alone— and our job is always to fight for regulation written in ink. We honor those who may fall the same way we fight to protect them: by doing the work of raising awareness, changing minds, and rallying support to protect the world from dangerous AI.
Great post, Holly! Strongly upvoted.
I qualitatively agree with the points you make, but I do not support a global pause. I think tail risk is very low (I guess the risk of human extinction over the next 10 years is 10^-7), the upside very large, and I expect people to overreact to AI risk. For example, it seems that people dislike deaths caused by autonomous driving much more than deaths caused by human driving, and I expect the adoption of autonomous cars to be slower than what would be ideal to prevent deaths from road accidents. I would be very surprised if a global pause passed a standard cost-benefit analysis in the sense of having a benefit-to-cost-ratio higher than 1.
Some complain there should be much more spending on AI safety because it is currently much smaller than that on AI capabilities, but these categories are vague, and I have not seen any detailed quantitative modelling showing that increasing spending on AI safety is very cost-effective. I do not think one can assume the spending on each category should ideally be the same.