Claude 3.7's coding ability forced me to reanalyze whether where will be a SWE job for me after college. This has forced me to re-explore AI safety and its arguments, and I have been re-radicalized towards the safety movement.

What I can’t understand, though, is how contradictory so much of Effective Altruism (EA) feels. It hurts my head, and I want to explore my thoughts in this post.

EA seems far too friendly toward AGI labs and feels completely uncalibrated to the actual existential risk (from an EA perspective) and the probability of catastrophe from AGI (p(doom)). Why aren’t we publicly shaming AI researchers every day? Are we too unwilling to be negative in our pursuit of reducing the chance of doom? Why are we friendly with Anthropic? Anthropic actively accelerates the frontier, currently holds the best coding model, and explicitly aims to build AGI—yet somehow, EAs rally behind them? I’m sure almost everyone agrees that Anthropic could contribute to existential risk, so why do they get a pass? Do we think their AGI is less likely to kill everyone than that of other companies? If so, is this just another utilitarian calculus that we accept even if some worlds lead to EA engineers causing doom themselves? What is going on...

I suspect that many in the AI safety community avoid adopting the "AI doomer" label. I also think that many AI safety advocates quietly hope to one day work at Anthropic or other labs and will not publicly denounce a future employer.

Another possibility is that Open Philanthropy (OP) plays a role. Their former CEO now works at Anthropic, and they have personal ties to its co-founder. Given that most of the AI safety community is funded by OP, could there be financial incentives pushing the field more toward research than anti AI-lab advocacy? This is just a suspicion, and I don’t have high confidence in it, but I’m looking for opinions.

Spending time in the EA community does not calibrate me to the urgency of AI doomerism or the necessary actions that should follow. Watching For Humanity’s AI Risk Special documentary made me feel far more emotionally in tune with p(doom) and AGI timelines than engaging with EA spaces ever has. EA feels business as usual when it absolutely should not. More than 700 people attended EAG, most of whom accept X-risk arguments, yet AI protests in San Francisco still draw fewer than 50 people. I bet most of them aren’t even EAs.

What are we doing?

I’m looking for discussion. Please let me know what you think.

70

11
5

Reactions

11
5
Comments18


Sorted by Click to highlight new comments since:

I appreciate the concern that you (and clearly many other Forum users) have, and I do empathise. Still, I'd like to present a somewhat different perspective to others here.

EA seems far too friendly toward AGI labs and feels completely uncalibrated to the actual existential risk (from an EA perspective)

I think that this implicitly assumes that there is such a things as "an EA perspective", but I don't think this is a useful abstraction. EA has many different strands, and in general seems a lot more fractured post-FTX.

e.g. You ask "Why aren’t we publicly shaming AI researchers every day?", but if you're an AI-sceptical EA working in GH&D that seems entirely useless to your goals! If you take 'we' to mean all EAs already convinced of AI doom then that's assuming the conclusion, whether there is a action-significant amount of doom is the question here.

Why are we friendly with Anthropic? Anthropic actively accelerates the frontier, currently holds the best coding model, and explicitly aims to build AGI—yet somehow, EAs rally behind them? I’m sure almost everyone agrees that Anthropic could contribute to existential risk, so why do they get a pass? Do we think their AGI is less likely to kill everyone than that of other companies?

Anthropic's alignment strategy, at least publicly facing, is found here.[1] I think Chris Olah's tweets about it found here include one particularly useful chart:

https://cdn.xcancel.com/pic/AEB4CA8B293FF/media%2FFyCIR2paYAEKIEL.jpg%3Fname%3Dsmall%26format%3Dwebp

The probable cruxes here are that 'Anthropic', or various employees there, are much more optimistic about the difficulty of AI safety than you are. They also likely believe that empirical feedback from actual Frontier models is crucial to a successful science of AI Safety. I think if you hold these two beliefs, then working at Anthropic makes a lot more sense from an AI Safety perspective.

For the record, the more technical work I've done, and the more understanding I have about AI systems as they exist today, the more 'alignment optimistic' I've got, and I get increasingly skeptical of OG-MIRI-style alignment work, or AI Safety work done in the absence of actual models. We must have contact with reality to make progress,[2] and I think the AI Safety field cannot update on this point strongly enough. Beren Millidge has really influenced my thinking here, and I'd recommend reading Alignment Needs Empirical Evidence and other blog posts of his to get this perspective (which I suspect many people at anthropic share).

Finally, pushing the frontier of model performance isn't apriori bad, especially if you don't accept MIRI-style arguments. Like, I don't see Sonnet 3.7 as increasing the risk of extinction from AI. In fact, it seems to be both a highly capable model that's also very-well aligned according to Anthropic's HHH criteria. All of my experience using Claude and engaging with the research literature about the model has pushed my distribution of AI Safety towards the 'Steam Engine' level in the chart above, instead of the P vs NP/Impossible level.

Spending time in the EA community does not calibrate me to the urgency of AI doomerism or the necessary actions that should follow

Finally, on the 'necessary actions' point, even if we had a clear empirical understanding of what the current p(doom) is, there are no clear necessary actions. There's still lots of arguments to be had here! See Matthew Barnett has argued in these comments that one can make utilitarian arguments for AI acceleration even in the presence of AI risk,[3] or Nora Belrose arguing that pause-style policies will likely be net-negative. You don't have to agree with either of these, but they do mean that there aren't clear 'necessary actions', at least from my PoV.

  1. ^

    Of course, if one has completely lost trust with Anthropic as an actor, then this isn't useful information to you at all. But I think that's conceptually a separate problem, because I think have given information to answer the questions you raise, perhaps not to your satisfaction.

  2. ^

    Theory will only take you so far

  3. ^

    Though this isn't what motivates Anthropic's thinking afaik

  1. ^

    To the extent that word captures the classic 'single superintelligent model' form of risk

Show all footnotes

"Why aren’t we publicly shaming AI researchers every day? Are we too unwilling to be negative in our pursuit of reducing the chance of doom? Why are we friendly with Anthropic? Anthropic actively accelerates the frontier, currently holds the best coding model, and explicitly aims to build AGI—yet somehow, EAs rally behind them? I’m sure almost everyone agrees that Anthropic could contribute to existential risk, so why do they get a pass? Do we think their AGI is less likely to kill everyone than that of other companies? If so, is this just another utilitarian calculus that we accept even if some worlds lead to EA engineers causing doom themselves? What is going on..."

Preface I have no skin in this game and no inside knowledge, this is just from reading the forums for a few years plus some chats.

I think you've put this well. Yes I think many people think Anthropic are more likely to not kill us all than other labs. Which is why you'll still see their jobs advertised on on the forum and why big EA people like Holden Karnofsky have joined their team.

There are a lot of people that will agree with you that wee should be fighting and shaming not pandering  (see pause AI), along with a lot of people who won't. There's certainly a (perhaps healthy?) split within the effective altrutism community between those who think we should work technically on the "inside" towards safety and those who think we should just be anti the labs.

Personally I think there's some "sunk cost " fallacy here. After Open Phil pumped all that money into open AI, many EAs joined safety teams of labs and there was a huge push towards getting EAs doing technical safety research. After all that it now might feel very hard to turn around now and be anti the labs.

I also think that perhaps the general demeanor of many EAs is bent towards quiet research, policy and technical work rather than protest and loud public criticism, which pushes against that being a core EA contribution to AI safety too.

I don't think you're alone at all. EY and other prominent rationalists (like LW webmaster Habryka) have also said they believe EA has been net-negative for human survival for quite a while already, EleutherAI's Connor Leahy has recently released the strongly EA-critical Compendium, which has been praised by many leading longtermists, particularly FLI's Max Tegmark, and Anthropic's recent antics like calling for recursive self-improvement to beat China is definitely souring a lot of people left unconvinced in those spaces on OP. From personal conservations, I can tell you PauseAI in particular is increasingly hostile to EA leadership.

I don't think Eliezer Yudkowsky and the rationalists should be throwing stones here. Sam Altman himself claimed that "eliezer has IMO done more to accelerate AGI than anyone else".  They've spent decades trying to convince people of the miraculous powers of AI, and now are acting shocked that this motivated people to try and build it. 

Well, they're not claiming the moral high ground; they can consistently say that EA has been net negative, and been net negative themselves for human survival.

Yeah IIRC I think EY do consider himself to have been net-negative overall so far, hence the whole "death with dignity" spiral. But I don't think one can claim his role has been more negative than OPP/GV deciding to bankroll OpenAI and Anthropic (at least when removing the indirect consequences due to him having influenced the development of EA in the first place).

I'm admittedly unusual within the EA community on the issue of AI, but I'll just give my thoughts on why I don't think it's productive to shame people who work at AI companies advancing AI capabilities. 

In my view, there are two competing ethical priorities that I think we should try to balance:

  1. Making sure that AI is developed safely and responsibly, so that AIs don't harm humans in the future.
  2. Making sure that AI is developed quickly, in order to take advantage of the enormous economic and technological benefits of AI sooner in time. This would, among other things, enable us to save lives by hastening AI-assisted medical progress.

If you believe that AI safety (priority 1) is the only meaningful ethical concern and that accelerating AI progress (priority 2) has little or no value in comparison, then it makes sense why you might view AI companies like Anthropic as harmful. From that perspective, any effort to advance AI capabilities could be seen as inherently trading off against an inviolable goal.

However, if you think—as I do—that both priorities matter substantially, then what companies like Anthropic are doing seems quite positive. They are not simply pushing forward AI development; rather, they are working to advance AI while also trying to ensure that it is developed in a safe and responsible way.

This kind of balancing act isn’t unusual. In most industries, we typically don’t perceive safety and usefulness as inherently opposed to each other. Rather, we usually recognize that both technological progress and safe development are important objectives to push for.

Reaping the benefits of AGI later is pretty insignificant in my opinion. If we get aligned AGI utopia, we will have utopia for millions of years. Acceleration by a few years if negligible if it increase p(doom) by >1%.

1% X 1 million utopia years = 10 thousand utopia years (better than 2 utopia years)

Dario gives a 25% p(doom) if I'm not mistaken. He still continues the build the tech that could knowingly bring doom. Dario and Anthropic are pro-acceleration via their messaging and actions according to a LW'er. How is this position coherent?

I don't think you can name another company that admits to building technology with a >1% chance of killing everyone... besides maybe OpenAI.

Reaping the benefits of AGI later is pretty insignificant in my opinion. If we get aligned AGI utopia, we will have utopia for millions of years. Acceleration by a few years if negligible if it increase p(doom) by >1%.

This is not true depending on what you think AGI utopia will look like. There's some math outlined in What We Owe the Future about this dilemma i.e. area under the curve of these hypothetical AGI utility functions.

Getting utopia 1 year faster creates a 2x better universe. (hypothetically)

I was reluctant to get into the weeds here but how can anything near this model be possible if 2^300 is around how many atoms there are in the universe and we already have conquered 2^150 of them. At some point, there will likely be no more growing and then there will be millions of stable utopia years.

I think the benefits of AGI arriving sooner are substantial. Many of my family members, for example, could be spared from death or serious illness if advanced AI accelerates medical progress. However, if AGI is delayed for many years, they will likely die before such breakthroughs occur, leaving me to live without them. 

I'm not making a strictly selfish argument here either, since this situation isn't unique to me—most people have loved ones in similar circumstances. Therefore, speeding up the benefits of AGI would have substantial ethical value from a perspective that values the lives of all humans who are alive today.

A moral point of view in which we give substantial weight to people who exist right now is indeed one of the most common ethical frameworks applied to policy. This may even be the most common mainstream ethical framework, as it's implicit in most economic and political analysis. So I don't think I'm proposing a crazy ethical theory here—just an unusual one within EA.

To clarify, I’m not arguing that AI should always be accelerated at any cost. Instead, I think we should carefully balance between pushing for faster progress and ensuring AI safety. If you either (1) believe that p(doom) is low, or (2) doubt that delaying AGI would meaningfully reduce p(doom), then it makes a lot of sense—under many common ethical perspectives—to view Anthropic as a force for good.

I see your point.

For the interest of the people today, there is an argument to be made for taking on risk of extinction. However, if this is not a purely utilitarian argument, I think it's extremely careless and condemnable to impose this risk on humanity just because you have personally deemed it acceptable. This would be a deontological nightmare. Who gave AI labs the right to risk the lives of 8 billion people? 

I think it's extremely careless and condemnable to impose this risk on humanity just because you have personally deemed it acceptable.

I'm not sure I fully understand this criticism. From a moral subjectivist perspective, all moral decisions are ultimately based on what individuals personally deem acceptable. If you're suggesting that there is an objective moral standard—something external to individual preferences—that we are obligated to follow, then I would understand your point. 

That said, I’m personally skeptical that such an objective morality exists. And even if it did, I don’t see why I should necessarily follow it if I could instead act according to my own moral preferences—especially if I find my own preferences to be more humane and sensible than the objective morality.

This would be a deontological nightmare. Who gave AI labs the right to risk the lives of 8 billion people?

I see why a deontologist might find accelerating AI troublesome, especially given their emphasis on act-omission asymmetry—the idea that actively causing harm is worse than merely allowing harm to happen. However, I don’t personally find that distinction very compelling, especially in this context. 

I'm also not a deontologist: I approach these questions from a consequentialist perspective. My personal ethics can be described as a mix of personal attachments and broader utilitarian concerns. In other words, I both care about people who currently exist, and more generally about all morally relevant beings. So while I understand why this argument might resonate with others, it doesn’t carry much weight for me.

Oh I see. I was quick to bifurcate between deontology and utilitarianism. I guess I'm less familiar with other branches of consequentialism. Sorry for being unclear in the critique. My whole reply was just centered around being bad deontologically.

That makes sense. For what it’s worth, I’m also not convinced that delaying AI is the right choice from a purely utilitarian perspective. I think there are reasonable arguments on both sides. My most recent post touches on this topic, so it might be worth reading for a better understanding of where I stand.

Right now, my stance is to withhold strong judgment on whether accelerating AI is harmful on net from a utilitarian point of view. It's not that I think a case can't be made: it's just I don’t think the existing arguments are decisive enough to justify a firm position. In contrast, the argument that accelerating AI benefits people who currently exist seems significantly more straightforward and compelling to me.

This combination of views leads me to see accelerating AI as a morally acceptable choice (as long as it's paired with adequate safety measures). Put simply:

  • When I consider the well-being of people who currently exist, the case for acceleration appears fairly strong and compelling.
  • When I take an impartial utilitarian perspective—one that prioritizes long-term outcomes for all sentient beings—the arguments for delaying AI seem weak and highly uncertain.

Since I give substantial weight to both perspectives, the stronger and clearer case for acceleration (based on the interests of people alive today) outweighs the much weaker and more uncertain case for delay (based on speculative long-term utilitarian concerns) in my view.

Of course, my analysis here doesn’t apply to someone who gives almost no moral weight to the well-being of people alive today—someone who, for instance, would be fine with everyone dying horribly if it meant even a tiny increase in the probability of a better outcome for the galaxy a billion years from now. But in my view, this type of moral calculus, if taken very seriously, seems highly unstable and untethered from practical considerations. 

Since I think we have very little reliable insight into what actions today will lead to a genuinely better world millions of years down the line, it seems wise to exercise caution and try to avoid overconfidence about whether delaying AI is good or bad on the basis of its very long-term effects.

On getting a software job in in the age of AI tools, I tried to collect some thoughts here.

Similar'ish discussions about Anthropic keep coming up, see a recent one here. (I think it would be better to discuss these things in a central place since parts of the conversation repeat themselves. I don't think a central place currently exists, but maybe you'd prefer to merge over opening a new one)

Minor question: What's the For Humanity’s AI Safety Special documentary? Is there a way to watch it already? I can't find it online.

Curated and popular this week
Relevant opportunities