I was at an AI safety retreat recently and there seemed to be two categories of researchers:
- Those who thought most AI safety research was useless
- Those who thought all AI safety research was useless
This is a darkly humurous anecdote illustrating a larger pattern of intense pessimism I’ve noticed among a certain contingency of AI safety researchers.
I don’t disagree with the more moderate version of this position. If things continue as they are, anywhere up to a 95% chance of doom seems defendable.
What I disagree with is the degree of confidence. While we certainly shouldn’t be confident that everything will turn out fine, we also shouldn’t feel confident that it won’t. This post might have easily been titled the same as Rob Bensinger’s similar post: we shouldn’t be maximally pessimistic about AI alignment.
The main two reasons for not being overly confident of doom are:
- All of the arguments saying that it’s hard to be confident that transformative AI (TAI) isn’t just around the corner also apply to safety research progress.
- It’s still early days and we’ve had about as much progress as you’d predict given that up until recently we’ve only had double-digit numbers of people working on the problem.
The arguments that apply to TAI potentially being closer than we think also apply to alignment
It’s really hard to predict research progress. In ‘There’s no fire alarm for artificial general intelligence’, Eliezer Yudkowsky points out that historically, ‘it is very often the case that key technological developments still seem decades away, five years before they show up’ - even to scientists who are working directly on the problem.
Wilbur Wright thought that heavier-than-air flight was fifty years away; two years later, he helped build the first heavier-than-air flyer. This is because it often feels the same when the technology is decades away and when the technology is a year away: in either case, you don’t yet know how to solve the problem.
These arguments apply not only to TAI, but also to TAI alignment. Heavier-than-air flight felt like it was years away when it was actually round the corner. Similarly, researchers’ sense that alignment is decades away - or even that it is impossible - is consistent with the possibility that we’ll solve alignment next year.
AI safety researchers are more likely to be pessimistic about alignment than the general public because they are deeply embroiled in the weeds of the problem. They are viscerally aware, from firsthand experience, of the difficulty. They are the ones who have to feel the day-to-day confusion, frustration, and despair of bashing their heads against a problem and making inconsistent progress. But this is how it always feels to be on the cutting edge of research. If it felt easy and smooth, it wouldn’t be the edge of our knowledge.
AI progress thus far has been highly discontinuous; there have been times of fast advancement interspersed with ‘AI winters’ where enthusiasm waned, and then several important advances in the last few months. This could also be true for AI safety - even if we’re in a slump now, massive progress could be around the corner.
It’s not surprising to see this little progress when we have so few people working on it
I understand why some people are in despair about the problem. Some have been working on alignment for decades and have still not figured it out. I can empathize. I’ve dedicated my life to trying to do good for the last twelve years and I’m still deeply uncertain whether I’ve even been net positive. It’s hard to stay optimistic and motivated in that scenario.
But let’s take a step back: this is an extremely complex question, and we haven’t attacked the problem with all our strength yet. Some of the earliest pioneers of the field are no doubt some of the most brilliant humans out there. Yet, they are still only a small number of people. There are currently only about one hundred and fifty people working full-time on technical AI safety, and even that is recent - ten years ago, it was more like five. We probably need more like tens of thousands of people researching this for several decades.
I’m reminded of the great bit in Harry Potter and the Methods of Rationality where Harry explains to Fred and George how to think about something. For context, Harry just asked the twins to creatively solve a problem for him:
‘Fred and George exchanged worried glances.
"I can't think of anything," said George.
"Neither can I," said Fred. "Sorry."
Harry stared at them.
And then Harry began to explain how you went about thinking of things.
It had been known to take longer than two seconds, said Harry.
You never called any question impossible, said Harry, until you had taken an actual clock and thought about it for five minutes, by the motion of the minute hand. Not five minutes metaphorically, five minutes by a physical clock….
So Harry was going to leave this problem to Fred and George, and they would discuss all the aspects of it and brainstorm anything they thought might be remotely relevant. And they shouldn't try to come up with an actual solution until they'd finished doing that, unless of course they did happen to randomly think of something awesome, in which case they could write it down for afterward and then go back to thinking. And he didn't want to hear back from them about any so-called failures to think of anything for at least a week. Some people spent decades trying to think of things.’
We’ve definitely set a timer and thought about this for five minutes. But this is the sort of problem that won’t just be solved by a small number of geniuses. We need way more “quality-adjusted researcher-years” if we’re going to get through this.
This is one of if not the most difficult intellectual challenge of our time. Even understanding the problem is difficult, and to solve it we will probably require a mix of math, philosophy, programming, and a healthy dose of political acumen.
Think about how many scientists it took before we made progress on practically any important scientific discovery. Except for the lucky ones at the beginning of the Enlightenment period where there were few scientists and lots of low-hanging fruit, there are usually thousands to tens of thousands scientists banging their heads against walls for decades for every one who makes a significant breakthrough. And we’ve got around one hundred in a field barely over a decade old!
When you look at it this way, it’s no wonder we haven’t made a lot of progress yet. In fact, it would be quite surprising if we had. We are a small field that’s just getting started.
We’re currently Fred and George, feeling discouraged after having pondered the world’s most important and challenging question for a few metaphorical seconds. Let’s be inspired by Harry to not only think about it for five minutes, but for decades, with a massive community of other people trying to do the same. Let’s field-build and get thousands of people banging their head against this wicked problem.
Who knows - one of the new researchers might be just a year away from making the crucial insight that ushers in the AI alignment summer.
Reminder that you can listen to EA Forum/LessWrong posts on your podcast player using The Nonlinear Library.
This post was written collaboratively by Kat Woods and Amber Dawn Ace as part of Nonlinear’s experimental Writing Internship program. The ideas are Kat’s; Kat explained them to Amber, and Amber wrote them up. We would like to offer this service to other EAs who want to share their as-yet unwritten ideas or expertise.
If you would be interested in working with Amber to write up your ideas, fill out this form.
I think we should generally have a prior that social dynamics of large groups of people end up pushing heavily towards conformity, and that those pressures towards conformity can cancel out many orders of magnitude of growth of the number of people who could theoretically explore different directions.
As a concrete case study, I like this Robin Hanson post "The World Forager Elite":
The number of nations, as well as the number of communities and researchers that were capable of doing innovative things in response to COVID was vastly greater in 2020 than for any previous pandemic. But what we saw was much less global variance and innovation in pandemic responses. I think there was scientific innovation, and that innovation was likely greater than for previous pandemics, but overall, despite the vastly greater number of nations and people in the international community of 2020, this only produced more risk-aversion in stepping out of line with elite consensus.
I think by-default we should expect similar effects in fields like AI Alignment. I think maintaining a field that is open to new ideas and approaches is actively difficult. If you grow the field without trying to preserve the concrete and specific mechanisms that are in place to allow innovation to grow, more people will not result in more innovation, it will result in less, even from the people that have previously been part of the same community.
In the case of COVID, the global research community spent a substantial fraction of its effort on actively preventing people from performing experiments like variolation or challenge trials, and we see the same in fields like Psychology research where a substantial fraction of energy is spent on ever-increasing ethical review requirements.
We see the same in the construction industry (a recent strong interest of mine), which despite its quickly growing size, is performing substantially fewer experiments than it was 40 years ago, and is spending most of its effort actively regulating what other people in the industry can do, and limiting the type of allowable construction materials and approaches to smaller and smaller sets.
I think by-default, I expect fast growth of the AI Alignment community to reduce innovation for the same reasons. I expect a larger community will increase pressures towards forming an elite consensus, and that consensus will be enforced via various legible and illegible means. Most of the world is really not great at innovation, and the default outcome of large groups of people, even when pointed towards a shared goal, is not innovation, but conformity, and if we recklessly grow, I think we will default towards the same common outcome.