Hi, I'm Rohin Shah! I work as a Research Scientist on the technical AGI safety team at DeepMind. I completed my PhD at the Center for Human-Compatible AI at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don't initially know what the user wants.
I'm particularly interested in big picture questions about artificial intelligence. What techniques will we use to build human-level AI systems? How will their deployment affect the world? What can we do to make this deployment go better? I write up summaries and thoughts about recent work tackling these questions in the Alignment Newsletter.
In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.
I'm not especially pro-criticism but this seems way overstated.
Almost all EA projects have low downside risk in absolute terms
I might agree with this on a technicality, in that depending on your bar or standard, I could imagine agreeing that almost all EA projects (at least for more speculative causes) have negligible impact in absolute terms.
But presumably you mean that almost all EA projects are such that their plausible good outcomes are way bigger in magnitude than their plausible bad outcomes, or something like that. This seems false, e.g.
There are almost no examples of criticism clearly mattering
I'd be happy to endorse something like "public criticism rarely causes an organization to choose to do something different in a major org-defining way" (but note that's primarily because people in a good position to change an organization through criticism will just do so privately, not because criticism is totally ineffective).
Of course, it's true that they could ignore serious criticism is they wanted to, but my sense is that people actually quite often feel unable to ignore criticism.
As someone sympathetic to many of Habryka's positions, while also disagreeing with many of Habryka's positions, my immediate reaction to this was "well that seems like a bad thing", c.f.
shallow criticism often gets valorized
I'd feel differently if you had said "people feel obliged to take criticism seriously if it points at a real problem" or something like that, but I agree with you that the mechanism is more like "people are unable to ignore criticism irrespective of its quality" (the popularity of the criticism matters, but sadly that is only weakly correlated with quality).
Tbc if the preferences are written in words like "expected value of the lightcone" I agree it would be relatively easy to tell which was which, mainly by identifying community shibboleths. My claim is that if you just have the input/output mapping of (safety level of AI, capabilities level of AI) --> utility, then it would be challenging. Even longtermists should be willing to accept some risk, just because AI can help with other existential risks (and of course many safety researchers -- probably the majority at this point -- are not longtermists).
What you call the "lab's" utility function isn't really specific to the lab; it could just as well apply to safety researchers. One might assume that the parameters would be set in such a way as to make the lab more C-seeking (e.g. it takes less C to produce 1 util for the lab than for everyone else).
But at least in the case of AI safety, I don't think this is the case. I doubt I could easily distinguish a lab capabilities researcher (or lab leadership, or some "aggregate lab utility function") from an external safety researcher if you just gave me their utility functions over C and S. (AI safety has significant overlap with transhumanism; relative to the rest of humanity they are way more likely to think there are huge benefits to development of safe AGI.) In practice it seems like the issue is more like epistemic disagreement.
You could still recover many of the conclusions in this post by positing that an increase to S leads to a proportional decrease in probability of non-survival, and the proportion is the same between the lab and everyone else, but the absolute numbers aren't. I'd still feel like this was a poor model of the real situation though.
I agree reductions in infant mortality likely have better long-run effects on capacity growth than equivalent levels of population growth while keeping infant mortality rates constant, which could mean that you still want to focus on infant mortality while not prioritizing increasing fertility.
I would just be surprised if the decision from the global capacity growth perspective ended up being "continue putting tons of resources into reducing infant mortality, but not much into increasing fertility" (which I understand to be the status quo for GHD), because:
That said, it's been many years since I closely followed the GHD space, and I could easily be wrong about a lot of this.
?? It's the second bullet point in the cons list, and reemphasized in the third bullet?
If you're saying "obviously this is the key determinant of whether you should work at a leading AI company so there shouldn't even be a pros / cons table", then obviously 80K disagrees given they recommend some such roles (and many other people (e.g. me) also disagree so this isn't 80K ignoring expert consensus). In that case I think you should try to convince 80K on the object level rather than applying political pressure.
There’s currently very little work going into issues that arise even if AI is aligned, including the deployment problem
The deployment problem (as described in that link) is a non-problem if you know that AI is aligned.
Fwiw I find it pretty plausible that lots of political action and movement building for the sake of movement building has indeed had a large negative impact, such that I feel uncertain about whether I should shut it all down if I had the option to do so (if I set aside concerns like unilateralism). I also feel similarly about particular examples of AI safety research but definitely not for the field as a whole.
Fair enough for the first two, but I was thinking of the FrontierMath thing as mostly a critique of Epoch, not of OpenAI, tbc, and that's the sense in which it mattered -- Epoch made changes, afaik OpenAI did not. Epoch is at least an EA-adjacent project.
I agree that if I had to guess I'd say that the sign seems negative for both of the things you say it is negative for, but I am uncertain about it, particularly because of people standing behind a version of the critique (e.g. Habryka for the Nonlinear one, Alexander Berger for the Wytham Abbey one, though certainly in the latter case it's a very different critique than what the original post said).
Fwiw, I think there are probably several other criticisms that I alone could find given some more time, let alone impactful criticisms that I never even read. I didn't even start looking for the genre of "critique of individual part of GiveWell cost-effectiveness analysis, which GiveWell then fixes", I think there's been at least one and maybe multiple such public criticisms in the past.
I also remember there being a StrongMinds critique and a Happier Lives Institute critique that very plausibly caused changes? But I don't know the details and didn't follow it