I was at an AI safety retreat recently and there seemed to be two categories of researchers:
- Those who thought most AI safety research was useless
- Those who thought all AI safety research was useless
This is a darkly humurous anecdote illustrating a larger pattern of intense pessimism I’ve noticed among a certain contingency of AI safety researchers.
I don’t disagree with the more moderate version of this position. If things continue as they are, anywhere up to a 95% chance of doom seems defendable.
What I disagree with is the degree of confidence. While we certainly shouldn’t be confident that everything will turn out fine, we also shouldn’t feel confident that it won’t. This post might have easily been titled the same as Rob Bensinger’s similar post: we shouldn’t be maximally pessimistic about AI alignment.
The main two reasons for not being overly confident of doom are:
- All of the arguments saying that it’s hard to be confident that transformative AI (TAI) isn’t just around the corner also apply to safety research progress.
- It’s still early days and we’ve had about as much progress as you’d predict given that up until recently we’ve only had double-digit numbers of people working on the problem.
The arguments that apply to TAI potentially being closer than we think also apply to alignment
It’s really hard to predict research progress. In ‘There’s no fire alarm for artificial general intelligence’, Eliezer Yudkowsky points out that historically, ‘it is very often the case that key technological developments still seem decades away, five years before they show up’ - even to scientists who are working directly on the problem.
Wilbur Wright thought that heavier-than-air flight was fifty years away; two years later, he helped build the first heavier-than-air flyer. This is because it often feels the same when the technology is decades away and when the technology is a year away: in either case, you don’t yet know how to solve the problem.
These arguments apply not only to TAI, but also to TAI alignment. Heavier-than-air flight felt like it was years away when it was actually round the corner. Similarly, researchers’ sense that alignment is decades away - or even that it is impossible - is consistent with the possibility that we’ll solve alignment next year.
AI safety researchers are more likely to be pessimistic about alignment than the general public because they are deeply embroiled in the weeds of the problem. They are viscerally aware, from firsthand experience, of the difficulty. They are the ones who have to feel the day-to-day confusion, frustration, and despair of bashing their heads against a problem and making inconsistent progress. But this is how it always feels to be on the cutting edge of research. If it felt easy and smooth, it wouldn’t be the edge of our knowledge.
AI progress thus far has been highly discontinuous; there have been times of fast advancement interspersed with ‘AI winters’ where enthusiasm waned, and then several important advances in the last few months. This could also be true for AI safety - even if we’re in a slump now, massive progress could be around the corner.
It’s not surprising to see this little progress when we have so few people working on it
I understand why some people are in despair about the problem. Some have been working on alignment for decades and have still not figured it out. I can empathize. I’ve dedicated my life to trying to do good for the last twelve years and I’m still deeply uncertain whether I’ve even been net positive. It’s hard to stay optimistic and motivated in that scenario.
But let’s take a step back: this is an extremely complex question, and we haven’t attacked the problem with all our strength yet. Some of the earliest pioneers of the field are no doubt some of the most brilliant humans out there. Yet, they are still only a small number of people. There are currently only about one hundred and fifty people working full-time on technical AI safety, and even that is recent - ten years ago, it was more like five. We probably need more like tens of thousands of people researching this for several decades.
I’m reminded of the great bit in Harry Potter and the Methods of Rationality where Harry explains to Fred and George how to think about something. For context, Harry just asked the twins to creatively solve a problem for him:
‘Fred and George exchanged worried glances.
"I can't think of anything," said George.
"Neither can I," said Fred. "Sorry."
Harry stared at them.
And then Harry began to explain how you went about thinking of things.
It had been known to take longer than two seconds, said Harry.
You never called any question impossible, said Harry, until you had taken an actual clock and thought about it for five minutes, by the motion of the minute hand. Not five minutes metaphorically, five minutes by a physical clock….
So Harry was going to leave this problem to Fred and George, and they would discuss all the aspects of it and brainstorm anything they thought might be remotely relevant. And they shouldn't try to come up with an actual solution until they'd finished doing that, unless of course they did happen to randomly think of something awesome, in which case they could write it down for afterward and then go back to thinking. And he didn't want to hear back from them about any so-called failures to think of anything for at least a week. Some people spent decades trying to think of things.’
We’ve definitely set a timer and thought about this for five minutes. But this is the sort of problem that won’t just be solved by a small number of geniuses. We need way more “quality-adjusted researcher-years” if we’re going to get through this.
This is one of if not the most difficult intellectual challenge of our time. Even understanding the problem is difficult, and to solve it we will probably require a mix of math, philosophy, programming, and a healthy dose of political acumen.
Think about how many scientists it took before we made progress on practically any important scientific discovery. Except for the lucky ones at the beginning of the Enlightenment period where there were few scientists and lots of low-hanging fruit, there are usually thousands to tens of thousands scientists banging their heads against walls for decades for every one who makes a significant breakthrough. And we’ve got around one hundred in a field barely over a decade old!
When you look at it this way, it’s no wonder we haven’t made a lot of progress yet. In fact, it would be quite surprising if we had. We are a small field that’s just getting started.
We’re currently Fred and George, feeling discouraged after having pondered the world’s most important and challenging question for a few metaphorical seconds. Let’s be inspired by Harry to not only think about it for five minutes, but for decades, with a massive community of other people trying to do the same. Let’s field-build and get thousands of people banging their head against this wicked problem.
Who knows - one of the new researchers might be just a year away from making the crucial insight that ushers in the AI alignment summer.
Reminder that you can listen to EA Forum/LessWrong posts on your podcast player using The Nonlinear Library.
This post was written collaboratively by Kat Woods and Amber Dawn Ace as part of Nonlinear’s experimental Writing Internship program. The ideas are Kat’s; Kat explained them to Amber, and Amber wrote them up. We would like to offer this service to other EAs who want to share their as-yet unwritten ideas or expertise.
If you would be interested in working with Amber to write up your ideas, fill out this form.
I think this is still in the framework of thinking that large groups of people having to coordinate leads to stagnation. To change my mind, you'd have to make the case that having a larger number of startups leads to less innovation, which seems like a hard case to make.
I think this is a separate issue that might be caused by the size of the movement, but a different hypothesis is that it's simply an idea that has traction in the movement. One which has been around for a long time, even while we were a lot smaller. Spending your "weirdness points" and such considerations have been around since the very beginning.
(On a side note, I think we're overly concerned about this, but that's a whole other post. Suffice to say here that a lot of the probability mass is on this not being caused by the size of the movement, but rather a particularly sticky idea)
🎯 I 100% agree. I'm thinking of spending some more time thinking on and writing up ways that we could make it so the movement could usefully take on more researchers. I also encourage others to think on this, because it could unlock a lot of potential.
I think this is where we disagree. It'd be very surprising if ~150 researchers is the optimal amount, or that having less would lead to more innovation and more/better research agendas.
An alternative hypothesis is that people you've been talking to have been becoming more pessimistic about having hope at all (if you hang out with MIRI folk a lot, I'd expect this to be more acute). It might not be because there's more people having bad ideas or that having more people in the movement leads to a decline in quality, but rather a certain contingency think alignment is impossible or deeply improbable, so that all ideas seem bad. In this paradigm/POV, the default is that all new research agendas seem bad. It's not that the agendas got worse. It's that people think the problem is even harder than they originally thought.
Another hypothesis is that the idea of epistemic humility has been spreading, combined with the idea that you need intensive mentorship. This leads to new people coming in being less likely to actually come up with new research agendas, but rather to defer to authority. (A whole other post there!)
Anyways, just some alternatives to consider :) It's hard to convey tone over text, but I'm enjoying this discussion a lot and you should read all my writing assuming a lot of warmth and engagement. :)