AI safety researcher
Being really good at your job is a good way to achieve impact in general, because your "impact above replacement" is what counts. If a replacement level employee who is barely worth hiring has productivity 100, and the average productivity is 150, the average employee will get 50 impact above replacement. If you do your job 1.67x better than average (250 productivity), you earn 150 impact above replacement, which is triple the average.
I strongly disagree with a couple of claims:
MIRI's business model relies on the opposite narrative. MIRI pays Eliezer Yudkowsky $600,000 a year. It pays Nate Soares $235,000 a year. If they suddenly said that the risk of human extinction from AGI or superintelligence is extremely low, in all likelihood that money would dry up and Yudkowsky and Soares would be out of a job.
[...] The kind of work MIRI is doing and the kind of experience Yudkowsky and Soares have isn't really transferable to anything else.
However, things I agree with
If the Mechanize co-founders wanted to focus on safety rather than capabilities, they could.
the Mechanize co-founders decided to start the company after forming their views on AI safety.
The Yudkowsky/Soares/MIRI argument about AI alignment is specifically that an AGI's goals and motivations are highly likely to be completely alien from human goals and motivations in a way that's highly existentially dangerous.
See the gpt-5 report. "Working lower bound" is maybe too strong; maybe it's more accurate to describe it as an initial guess at a warning threshold for rogue replication and 10x uplift (if we can even measure time horizons that long). I don't know what the exact reasoning behind 40 hours was, but one fact is that humans can't really start viable companies using plans that only take a ~week of work. IMO if AIs could do the equivalent with only a 40 human hour time horizon and continuously evade detection, they'd need to use their own advantages and have made up many current disadvantages relative to humans (like being bad at adversarial and multi-agent settings).
What scale is the METR benchmark on? I see a line that "Scores are normalized such that 100% represents a 50% success rate on tasks requiring 8 human-expert hours.", but is the 0% point on the scale 0 hours?
METR does not think that 8 human hours is sufficient autonomy for takeover; in fact 40 hours is our working lower bound.
What if we decide that the Amazon rainforest has a negative WAW sign? Would you be in favor of completely replacing it with a parking lot, if doing so could be done without undue suffering of the animals that already exist there?
Definitely not completely replacing because biodiversity has diminishing returns to land. If we pave the whole Amazon we'll probably extinct entire families (not to mention we probably cause ecological crises elsewhere and disrupt ecosystem services etc), whereas on the margin we'll only extinct species endemic to the deforested regions.
If the research on WAW comes out super negative I could imagine it being OK to replace half the Amazon with higher-welfare ecosystems now, and work on replacing the rest when some crazy AI tech allows all changes to be fully reversible. But the moral parliament would probably still not be happy about this. Eg killing is probably bad, and there is no feasible way to destroy half the Amazon in the near term without killing most of the animals in it.
It's plausible to me that biodiversity is valuable, but with AGI on the horizon it seems a lot cheaper in expectation to do more out-there interventions, like influencing AI companies to care about biodiversity (alongside wild animal welfare), recording the DNA of undiscovered rainforest species about to go extinct, and buying the cheapest land possible (middle of Siberia or Australian desert, not productive farmland). Then when the technology is available in a few decades and we're better at constructing stable ecosystems de novo, we can terraform the deserts into highly biodiverse nature preserves. Another advantage of this is that we'll know more about animal welfare-- as it stands now the sign of habitat preservation is pretty unclear.
Thanks for the reply.
[1] Or just whether they grew up vegetarian, like how people are often disgusted by any strange food
On a global scale I agree. My point is more that due to the salary standards in the industry, Eliezer isn't necessarily out of line in drawing $600k, and it's probably not much more than he could earn elsewhere; therefore the financial incentive is fairly weak compared to that of Mechanize or other AI capabilities companies.