Hi, I'm Rohin Shah! I work as a Research Scientist on the technical AGI safety team at DeepMind. I completed my PhD at the Center for Human-Compatible AI at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don't initially know what the user wants.
I'm particularly interested in big picture questions about artificial intelligence. What techniques will we use to build human-level AI systems? How will their deployment affect the world? What can we do to make this deployment go better? I write up summaries and thoughts about recent work tackling these questions in the Alignment Newsletter.
In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.
I am in fact claiming it is causally upstream. Idk why you think it's implausible.
The main reason I'm not persuaded by your politician analogy, is that the politician analogy bakes in the assumption that there is a zero-sum conflict going on. But the whole question here is why there is a conflict in the first place.
I don't really care whether it's "democratic" or "undemocratic" and wish I hadn't used the word in my original comment (I was mostly just mirroring the original language).
My main claim is that AI safety / EA likely created their own enemies due to an intense focus on gaining influence and power.
I am not claiming it is inherently bad to gain influence and power. I do it myself. I just think AI safety / EA is pretty naive in how it goes about it.
Tbc some of your explanations would still go against my claim if they were true. I don't think they're true, but I agree I haven't justified that here.
What's your explanation for why they attack EAs rather than, say, the AI ethics crowd?
Why was SB 1047 so controversial, while other much more onerous AI bills (esp for "little tech") were barely discussed?
If you think their goal is just to win, why attack the movement that has power and can coordinate funding to counter their actions? What exactly are they trying to win, and why would EA stop them from achieving that (if EA were not seeking power and influence)?
(I am not claiming that their target-selection rubric is calibrated to who is actually bad or good and idk why you would think that. I feel like you are committing some kind of fallacy where in any conflict there is a "good" side and a "bad" side and this is causing you to read implications into my comments that I don't intend.)
Immense, undemocratic political spending is ramping up (though spending by the enemies of AI safety is also growing)
You present the parenthetical as a meliorating factor, but I expect that these enemies exist due to previous undemocratic power-seeking actions by the AI safety community.
(This isn't based on any private information, I just think there must be some reason these enemies single out EAs in particular. Bad faith actors don't just randomly pick targets to attack. My best guess at the reason is the intense focus on gaining influence and power.)
Or is this a stronger claim that safety work is inherently a more short-time horizon thing?
It is more like this stronger claim.
I might not use "inherently" here. A core safety question is whether an AI system is behaving well because it is aligned, or because it is pursuing convergent instrumental subgoals until it can takeover. The "natural" test is to run the AI until it has enough power to easily take over, at which point you observe whether it takes over, which is extremely long-horizon. But obviously this was never an option for safety anyway, and many of the proxies that we think about are more short horizon.
Oh sorry, I missed the weights on the factors, and thought you were taking an unweighted average.
Is it because you have to run large evals or do pre-training runs? Do you think this argument applies to all areas of capabilities research?
All tasks in capabilities are ultimately trying to optimize the capability-cost frontier, which usually benefits from measuring capability.
If you have an AI that will do well at most tasks you give it that take (say) a week, then you have the problem that the naive way of evaluating the AI (run it on some difficult tasks and see how well it does) now takes a very long time to give you useful signal. So you now have two options:
This doesn't apply for training / inference efficiency (since you hold the AI and thus capabilities constant, so you don't need to measure capability). And there is already a good proxy for pretraining improvements, namely perplexity. But for all the other areas, this is going to increasingly be a problem that they will need to solve.
On reflection this is probably not best captured in your "task length" criterion, but rather the "feedback quality / verifiability" criterion.
Great analysis of factors impacting automatability.
Looking at your numbers though, I feel like you didn't really need this; you could have just said "I think scheming risk is by far the most important factor in automatability of research areas, therefore capabilities will come first". EDIT Overstated, I missed the fact that scheming risk factor had lower weight than the others.
I don't agree with that conclusion for two main reasons:
I don't think that's an attack on the AI ethics crowd. I think that's an attack on wokeness which maybe deals a glancing blow to AI ethics as an incidental side effect.
Like, if you look at the purpose:
This has very little to do with what the "AI ethics" crowd wants in my experience. The topics I hear about are more like algorithmic discrimination, misinformation, the right to an explanation, child safety, job loss, copyright, accessibility, etc.
You can also skim through the papers at FAccT 2025, I think this also suggests that it's not an attack on that crowd except incidentally.