Hi, I'm Rohin Shah! I work as a Research Scientist on the technical AGI safety team at DeepMind. I completed my PhD at the Center for Human-Compatible AI at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don't initially know what the user wants.
I'm particularly interested in big picture questions about artificial intelligence. What techniques will we use to build human-level AI systems? How will their deployment affect the world? What can we do to make this deployment go better? I write up summaries and thoughts about recent work tackling these questions in the Alignment Newsletter.
In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.
I think that most of classic EA vs the rest of the world is a difference in preferences / values, rather than a difference in beliefs.
I somewhat disagree but I agree this is plausible. (That was more of a side point, maybe I shouldn't have included it.)
most people really really don't want to die in the next ten years
Is your claim that they really really don't want to die in the next ten years, but they are fine dying in the next hundred years? (Else I don't see how you're dismissing the anti-aging vs sports team example.)
So, for x-risk to be high, many people (e.g. lab employees, politicians, advisors) have to catastrophically fail at pursuing their own self-interest.
Sure, I mostly agree with this (though I'd note that it can be a failure of group rationality, without being a failure of individual rationality for most individuals). I think people frequently do catastrophically fail to pursue their own self-interest when that requires foresight.
Most people really don’t want to die, or to be disempowered in their lifetimes. So, for existential risk to be high, there has to be some truly major failure of rationality going on.
... What is surprising about the world having a major failure of rationality? That's the default state of affairs for anything requiring a modicum of foresight. A fairly core premise of early EA was that there is a truly major failure of rationality going on in the project of trying to improve the world.
Are you surprised that ordinary people spend more money and time on, say, their local sports team, than on anti-aging research? For most of human history, aging had a ~100% chance of killing someone (unless something else killed them first).
If you think the following claim is true - 'non-AI projects are never undercut but always outweighed'
Of course I don't think this. AI definitely undercuts some non-AI projects. But "non-AI projects are almost always outweighed in importance" seems very plausible to me, and I don't see why anything in the piece is a strong reason to disbelieve that claim, since this piece is only responding to the undercutting argument. And if that claim is true, then the undercutting point doesn't matter.
We are disputing a general heuristic that privileges the AI cause area and writes off all the others.
I think the most important argument towards this conclusion is "AI is a big deal, so we should prioritize work that makes it go better". But it seems you have placed this argument out of scope:
[The claim we are interested in is] that the coming AI revolution undercuts the justification for doing work in other cause areas, rendering work in those areas useless, or nearly so (for now, and perhaps forever).
[...]
AI causes might be more cost-effective than projects in other areas, even if AI doesn’t undercut those projects’ efficacy. Assessing the overall effectiveness of these broad cause areas is too big a project to take on here.
I agree that lots of other work looks about as valuable as it did before, and isn't significantly undercut by AI. This seems basically irrelevant to the general heuristic you are disputing, whose main argument is "AI is a big deal so is way more important".
I agree with some of the points on point 1, though other than FTX, I don't think the downside risk of any of those examples is very large
Fwiw I find it pretty plausible that lots of political action and movement building for the sake of movement building has indeed had a large negative impact, such that I feel uncertain about whether I should shut it all down if I had the option to do so (if I set aside concerns like unilateralism). I also feel similarly about particular examples of AI safety research but definitely not for the field as a whole.
Agree that criticisms of AI companies can be good, I don't really consider them EA projects but it wasn't clear that was what I was referring to in my post
Fair enough for the first two, but I was thinking of the FrontierMath thing as mostly a critique of Epoch, not of OpenAI, tbc, and that's the sense in which it mattered -- Epoch made changes, afaik OpenAI did not. Epoch is at least an EA-adjacent project.
Sign seems pretty negative to me.
I agree that if I had to guess I'd say that the sign seems negative for both of the things you say it is negative for, but I am uncertain about it, particularly because of people standing behind a version of the critique (e.g. Habryka for the Nonlinear one, Alexander Berger for the Wytham Abbey one, though certainly in the latter case it's a very different critique than what the original post said).
I think I stand by the claim that there aren't many criticisms that clearly mattered, but this was a positive update for me.
Fwiw, I think there are probably several other criticisms that I alone could find given some more time, let alone impactful criticisms that I never even read. I didn't even start looking for the genre of "critique of individual part of GiveWell cost-effectiveness analysis, which GiveWell then fixes", I think there's been at least one and maybe multiple such public criticisms in the past.
I also remember there being a StrongMinds critique and a Happier Lives Institute critique that very plausibly caused changes? But I don't know the details and didn't follow it
I'm not especially pro-criticism but this seems way overstated.
Almost all EA projects have low downside risk in absolute terms
I might agree with this on a technicality, in that depending on your bar or standard, I could imagine agreeing that almost all EA projects (at least for more speculative causes) have negligible impact in absolute terms.
But presumably you mean that almost all EA projects are such that their plausible good outcomes are way bigger in magnitude than their plausible bad outcomes, or something like that. This seems false, e.g.
There are almost no examples of criticism clearly mattering
I'd be happy to endorse something like "public criticism rarely causes an organization to choose to do something different in a major org-defining way" (but note that's primarily because people in a good position to change an organization through criticism will just do so privately, not because criticism is totally ineffective).
Of course, it's true that they could ignore serious criticism is they wanted to, but my sense is that people actually quite often feel unable to ignore criticism.
As someone sympathetic to many of Habryka's positions, while also disagreeing with many of Habryka's positions, my immediate reaction to this was "well that seems like a bad thing", c.f.
shallow criticism often gets valorized
I'd feel differently if you had said "people feel obliged to take criticism seriously if it points at a real problem" or something like that, but I agree with you that the mechanism is more like "people are unable to ignore criticism irrespective of its quality" (the popularity of the criticism matters, but sadly that is only weakly correlated with quality).
Tbc if the preferences are written in words like "expected value of the lightcone" I agree it would be relatively easy to tell which was which, mainly by identifying community shibboleths. My claim is that if you just have the input/output mapping of (safety level of AI, capabilities level of AI) --> utility, then it would be challenging. Even longtermists should be willing to accept some risk, just because AI can help with other existential risks (and of course many safety researchers -- probably the majority at this point -- are not longtermists).
What you call the "lab's" utility function isn't really specific to the lab; it could just as well apply to safety researchers. One might assume that the parameters would be set in such a way as to make the lab more C-seeking (e.g. it takes less C to produce 1 util for the lab than for everyone else).
But at least in the case of AI safety, I don't think this is the case. I doubt I could easily distinguish a lab capabilities researcher (or lab leadership, or some "aggregate lab utility function") from an external safety researcher if you just gave me their utility functions over C and S. (AI safety has significant overlap with transhumanism; relative to the rest of humanity they are way more likely to think there are huge benefits to development of safe AGI.) In practice it seems like the issue is more like epistemic disagreement.
You could still recover many of the conclusions in this post by positing that an increase to S leads to a proportional decrease in probability of non-survival, and the proportion is the same between the lab and everyone else, but the absolute numbers aren't. I'd still feel like this was a poor model of the real situation though.
"Leadership / strategy" and "government and policy expertise" are emphatically not management or communications. There's quite a lot of effort on building a talent pipeline for "government and policy expertise". There isn't one for "leadership / strategy" but I think that's mostly because no one knows how to do it well (broadly speaking, not just limited to EA).
If you want to view things through the lens of status (imo often a mistake), I think "leadership / strategy" is probably the highest status role in the safety community, and "government and policy expertise" is pretty high as well. I do agree that management / communications are not as high status as the chart would suggest they should be, though I suspect this is mostly due to tech folks consistently underestimating the value of these fields.
If I were hiring for a manager and somehow had to choose between only these two applicants with only this information, I would choose applicant A. (Though of course the actual answer is to find better applicants and/or get more information about them.)
I can always train applicant A to be an adequate people manager (and have done so in the past). I can't train applicant B to have enough technical understanding to make good prioritization decisions.
(Relatedly, at tech companies, the people managers often have technical degrees, not MBAs.)
I've done a lot of hiring, and I suppose I do look for "value alignment" in the sense of "are you going to have the team's mission as a priority", but in practice I have a hard time imagining how any candidate who actually was mission aligned could somehow fail to demonstrate it. My bar is not high and I care way more about other factors. (And in fact I've hired multiple people who looked less "EA-value aligned" than either applicants A or B, I can think of four off the top of my head.)
It's possible that other EA hiring cares more about this, but I'd weakly guess that this is a mostly-incorrect community-perpetuated belief.
(There is another effect which does advantage e.g. MATS -- we understand what MATS is, and what excellence at it looks like. Of the four people I thought of above, I think we plausibly would have passed over 2-3 of them in a nearby world where the person reviewing their resume didn't realize what made them stand out.)