Epistemic status: some thoughts I wanted to get out quickly
A lot of fantastic work has been done by people in the AI existential risk research community and related communities over the last several months in raising awareness about risks from advanced AI. However, I have some cause for unease that I’d like to share.
These efforts may have been too successful too soon.
Or, more specifically, this level of outreach success this far ahead of the development of AI capable of posing existential risk may have fallout. We should consider steps to mitigate this.
(1) Timelines
I know that there are well-informed people in the AI and existential risk communities who believe AI capable of posing existential risk may be developed within 10 years. I certainly can’t rule this out, and even a small chance of this is worth working to prevent or mitigate to the extent possible, given the possible consequences. My own timelines are longer, although my intuitions don’t have a rigorous model underpinning them (my intuitions line up similarly to the 15-40 year timelines mentioned in this recent blog post by Matthew Barnett from Epoch).
Right now the nature of media communications means that the message is coming across with a lot of urgency. From speaking to lay colleagues, impressions often seem to be of short timelines (and some folks e.g. Geoff Hinton have explicitly said 5-20 years, sometimes with uncertainty caveats and sometimes without).
It may be that those with short (<10 years) timelines are right. And even if they’re not, and we’ve got decades before this technology poses an existential threat, many of the attendant challenges – alignment, governance, distribution of benefits – will need that additional time to be addressed. And I think it’s entirely plausible that the current level of buy-in will be needed in order to initiate the steps needed to avoid the worst outcomes, e.g. recruiting expertise and resources to alignment, development and commitment to robust regulation, even coming to agreements not to pursue certain technological developments beyond a certain point.
However, if short timelines do not transpire, I believe there’s a need to consider a scenario I think is reasonably likely.
(2) Crying wolf
I propose that it is most likely we are in a world where timelines are >10 years, perhaps >20 or 30 years. Right now this issue has a lot of the most prominent AI scientists and CEOs signed up, and political leaders worldwide committing to examining the issue seriously (examples from last week). What happens then in the >10 year-timeline world?
The extinction-level outcomes that the public is hearing, and that these experts are raising and policymakers making costly reputational investments in, don’t transpire. What does happen is all the benefits of near-term AI that have been talked about, plus all the near-term harms that are being predominantly raised by the AI ethics/FAccT communities. Perhaps these harms include somewhat more extreme versions than what is currently talked about, but nowhere near catastrophic. Suddenly the year is 2028, and that whole 2023 furore is starting to look a bit silly. Remember when everyone agreed AI was going to make us all extinct? Yeah, like Limits to Growth all over again. Except that we’re not safe. In reality, in this scenario, we’re just entering the period in which risk is most acute, and in which gaining or maintaining the support of leaders across society for coordinated action is most important. And it’s possibly even harder to convince them, because people remember how silly lots of people looked the last time. [1] [2]
(3) How to navigate this scenario (in advance).
Suggestions:
- Have our messaging make clear that we don’t know when extinction-potential AI will be developed, and it’s quite likely that it will be over a decade, perhaps much longer. But it needs to be discussed now, because
- we can’t rule out that it will be developed sooner;
- there are choices to be made now that will have longer-term consequences;
- the challenges need a lot more dedicated time and effort than they’ve been getting.
Uncertainty is difficult to communicate in media, but it’s important to try.
- Don’t be triumphal over winning the public debate now; it may well be ‘lost’ again in 5 years
- Don’t unnecessarily antagonise the AI ethics/FaCCT folk [3] because they’re quite likely to look like the ones who were right in 5 years (and because it’s just unhelpful).
- Build bridges where possible with the AI ethics/FaCCT folk on a range of issues and interventions that seem set to overlap in that time; work together where possible. Lots of people from those communities are making proposals that are relevant and overlapping with challenges associated with the path to transformative AI. This includes external evaluation; licensing and liability; oversight of powerful tech companies developing frontier AI; international bodies for governing powerful AI, and much more. E.g. see this and this, as well as CAIS's recent blog post.
- Don’t get fooled into thinking everyone now agrees. A lot more senior names are now signing onto statements and speaking up, and this is making it easier for previously silent-but-concerned researchers to speak up. However I think a majority of AI researchers probably still don’t agree this is a serious, imminent concern (Yann LeCun’s silent majority is probably still real), and this disconnect in perceptions may result in significant pushback to come.
- Think carefully about the potential political fallout if and when this becomes an embarrassing thing for the politicians who have spoken up, and how to manage this.
To sum: I’m not saying it was wrong to push for this level of broad awareness and consensus-building; I think it may well turn out to be necessary this early in order to navigate the challenges on the path to transformative AI, even if we still have decades until that point (and we may not). But there’s the potential for a serious downside/backlash that this community, and everyone who shares our concern about existential risk from AI, should be thinking carefully about, in terms of positioning for effectiveness on slightly longer timelines.
Thank you to Shakeel Hashim, Shahar Avin, Haydn Belfield and Ben Garfinkel for feedback on a previous draft of this post.
- ^
Pushing against this, it seems likely that AI will have continued advancing as a technology, leading to ever-greater scientific and societal impacts. This may maintain or increase the salience of the idea that AI could pose extremely significant risks.
- ^
A ‘softer’ version of this scenario is that some policy happens now, but then quietly drops off / gets dismantled over time, as political attention shifts elsewhere
- ^
I don’t know how much this is happening in practice (there’s just so much online discourse right now it’s hard to track), but I have seen it remarked on several times e.g. here
Super great post. I've been thinking about posting a nuance in (what I think about) the Eliezer class of threat models but haven't gotten around to it. (Warning: negative valence, as I will recall the moment I first underwent visceral sadness at the alignment problem).
Rob Bensinger tweeted something like "if we stick the landing on this, I'm going to lose an unrecoverable amount of bayes points", and for two years already I've had a massively different way of thinking about deployment of advanced systems because I find something like a "law of mad science" very plausible.
The high level takeaway is that (in this class of threat models) we can "survive takeoff" (not that I don't hate that framing) and accumulate lots of evidence that the doomcoin landed on heads (really feeling like we're in the early stages of a glorious transhuman future or a more modest FALGSC), for hundreds of years. And then someone pushes a typo in a yaml file to the server, and we die.
There seems to be very little framing of "mostly Eliezer-like 'flipping the doomcoin' scenario, where forecasters thus far have only concerned themselves with the date of the first flip, but from then on the doomcoin is flipped on new years eve at midnight every year until it comes up tails and we die". In other words, if we are obligated to hustle the weight of the doomcoin now before the first flip, then we are at least as obligated to apply at least constant vigilance, forevermore, and there's a stronger case to be made for demanding strictly increasing vigilance (pulling the weight of the doomcoin further and further every year). (this realization was my visceral sadness moment, in 2021 on discord, whereas before I was thinking about threat models as like a fun and challenging video game RNG or whatever).
I think the oxford folks have some literature on "existential security", which I just don't buy or expect at all. It seems deeply unlikely to me that there will be tricks we can pull after the first time the doomcoin lands on heads to keep it from flipping again. I think the "pivotal act" literature from MIRI tries to discuss this, by thinking about ways we can get some freebie years thrown in there (new years eve parties with no doomcoin flip), which is better than nothing. But this constant/increasing vigilance factor or the repeated flips of doomcoin seems like a niche informal inside view among people who've been hanging out longer than a couple years.
Picking on Eliezer as a public intellectual for a second, insofar as my model of him is accurate (that his confidence that we die is more of an "eventually" thing and he has very little relation to Conjecture, who in many worlds will just take a hit to their brier score in 2028, which Eliezer will be shielded from because he doesn't commit to dates), I would have liked to see him retweet the Bensinger comment and warn us about all the ways in which we could observe wildly transformative AI not kill everyone, declare victory, then a few hundred years later push a bad yaml file to the server and die.
(All of this modulo my feelings that "doomcoin" is an annoying and thought-destroying way of characterizing the distribution over how you expect things to go well and poorly, probably at the same time, but that's it's own jar of paperclips)
Yeah, I think "ASI implies an extreme case of lock-in" is a major tendency in the literature (especially sequences-era), but 1. people disagree about whether "alignment" refers to something that outsmarts even this implication or not, then they disagree about relative tractability and plausibility of the different alignment visions, and 2. this is very much a separate set of steps that provide room for disagreement among people who broadly accept Eliezer-like threatmodels (doomcoin stuff).
I don't want to zero in on actually-existing Eliezer (at which... (read more)