AI safety has become a big deal in EA, and so I'm curious about how much "due diligence" on it has been done by the EA community as a whole. Obviously there have been many in-person discussions, but it's very difficult to evaluate whether these contain new or high-quality content. Probably a better metric is how much work has been done which:
1. Is publicly available;
2. Engages in detail with core arguments for why AI might be dangerous (type A), OR tries to evaluate the credibility of the arguments without directly engaging with them (type B);
3. Was motivated or instigated by EA.
I'm wary of focusing too much on credit assignment, but it seems important to be able to answer a question like "if EA hadn't ever formed, to what extent would it have been harder for an impartial observer in 2019 to evaluate whether working on AI safety is important?" The clearest evidence would be if there were much relevant work produced by people who were employed at EA orgs, funded by EA grants, or convinced to work on AI safety through their involvement with EA. Some such work comes to mind, and I've listed it below; what am I missing?
Type A work which meets my criteria above:
- A lot of writing by Holden Karnofsky
- A lot of writing by Paul Christiano
- This sequence by Rohin Shah
- These posts by Jeff Kaufman
- This agenda by Allan Dafoe
- This report by Tom Sittler
Type A work which only partially meets criterion 3 (or which I'm uncertain about):
- These two articles by Luke Muehlhauser
- This report by Eric Drexler
- This blog by Ben Hoffman
- AI impacts
Type B work which meets my criteria above:
Things which don't meet those criteria:
- This 80,000 hours report (which mentions the arguments, but doesn't thoroughly evaluate them)
- Superintelligence
- The AI Foom debate
Edited to add: Wei Dai asked why I didn't count Nick Bostrom as "part of EA", and I wrote quite a long answer which explains the motivations behind this question much better than my original post. So I've copied most of it below:
The three questions I am ultimately trying to answer are: a) how valuable is it to build up the EA movement? b) how much should I update when I learn that a given belief is a consensus in EA? and c) how much evidence do the opinions of other people provide in favour of AI safety being important?
To answer the first question, assuming that analysis of AI safety as a cause area is valuable, I should focus on contributions by people who were motivated or instigated by the EA movement itself. Here Nick doesn't count (except insofar as EA made his book come out sooner or better).
To answer the second question, it helps to know whether the focus on AI safety in EA came about because many people did comprehensive due diligence and shared their findings, or whether there wasn't much investigation and the ubiquity of the belief was driven via an information cascade. For this purpose, I should count work by people to the extent that they or people like them are likely to critically investigate other beliefs that are or will become widespread in EA. Being motivated to investigate AI safety by membership in the EA movement is the best evidence, but for the purpose of answering this question I probably should have used "motivated by the EA movement or motivated by very similar things to what EAs are motivated by", and should partially count Nick.
To answer the third question, it helps to know whether the people who have become convinced that AI safety is important are a relatively homogenous group who might all have highly correlated biases and hidden motivations, or whether a wide range of people have become convinced. For this purpose, I should count work by people to the extent that they are dissimilar to the transhumanists and rationalists who came up with the original safety arguments, and also to the extent that they rederived the arguments for themselves rather than being influenced by the existing arguments. Here EAs who started off not being inclined towards transhumanism or rationalism at all count the most, and Nick counts very little.
Thanks for the stab, Anthony. It's fairly fair. :-)
Some clarifying points:
First, I should note that my piece was written from the perspective of suffering-focused ethics.
Second, I would not say that "investment in AI safety work by the EA community today would only make sense if the probability of AI-catalyzed GCR were decently high". Even setting aside the question of what "decently high" means, I would note that:
1) Whether such investments in AI safety make sense depends in part on one's values. (Though another critique I would make is that "AI safety" is less well-defined than people often seem to think: https://magnusvinding.com/2018/12/14/is-ai-alignment-possible/, but more on this below.)
2) Even if "the probability of AI-catalyzed GCR" were decently high — say, >2 percent — this would not imply that one should focus on "AI safety" in a standard narrow sense (roughly: constructing the right software), nor that other risks are not greater in expectation (compared to the risks we commonly have in mind when we think of "AI-catalyzed catastrophic risks").
You write of "scenarios in which AGI becomes a catastrophic threat". But a question I would raise is: what does this mean? Do we all have a clear picture of this in our minds? This sounds to me like a rather broad class of scenarios, and a worry I have is that we all have "poorly written software" scenarios in mind, although such scenarios could well comprise a relatively narrow subset of the entire class that is "catastrophic scenarios involving AI".
Zooming out, my critique can be crudely summarized as a critique of two significant equivocations that I see doing an exceptional amount of work in many standard arguments for "prioritizing AI".
First, there is what we may call the AI safety equivocation (or motte and bailey): people commonly fail to distinguish between 1) a focus on future outcomes controlled by AI and 2) a focus on writing "safe" software. Accepting that we should adopt the former focus by no means implies we should adopt the latter. By (imperfect) analogy, to say that we should focus on future outcomes controlled by humans does not imply that we should focus primarily on writing safe human genomes.
The second is what we may call the intelligence equivocation, which is the one you described. We operate with two very different senses of the term "intelligence", namely 1) the ability to achieve goals in general (derived from Legg & Hutter, 2007), and 2) "intelligence" in the much narrower sense of "advanced cognitive abilities", roughly equivalent to IQ in humans.
These two are often treated as virtually identical, and we fail to appreciate the rather enormous difference between them, as argued in/evident from books such as The Knowledge Illusion: Why We Never Think Alone, The Ascent of Man, The Evolution of Everything, and The Secret of Our Success. This was also the main point in my Reflections on Intelligence.
Intelligence2 lies all in the brain, whereas intelligence1 includes the brain and so much more, including all the rest of our well-adapted body parts (vocal cords, hands, upright walk — remove just one of these completely in all humans and human civilization is likely gone for good). Not to mention our culture and technology as a whole, which is the level at which our ability to achieve goals at a significant level really emerges: it derives not from any single advanced machine but from our entire economy. A vastly greater toolbox than what intelligence2 covers.
Thus, to assume that we by boosting intelligence2 to vastly super-human levels necessarily get intelligence1 at a vastly super-human level is a mistake, not least since "human-level intelligence1" already includes vastly super-human intelligence2 in many cognitive domains.