What is the Expected Value of Working on AI Safety? I Ran the Numbers.

Hazem Hassan🔸

This is a linkpost for https://hazemhasan.substack.com/p/what-is-the-expected-value-of-working

Update (May 18, 2026): A Samotsvety member pointed out two errors in this post . (1) I attributed a 0.75% P(AI catastrophe) to Samotsvety when their actual 2022 estimate was 25%. (2) I cited a "Samotsvety Jan 2026" AGI forecast that doesn't exist. Their real timeline is from January 2023. I'm not sure where the 0.75% figure came from; I probably didn't double-check the number while Claude-source-finding.

(I'm leaving the article as is.)

Given how the error happened, it's likely that other numbers are wrong. I can endorse this post only as a showcase of what a set of AI-skeptical assumptions entails (that is, the conclusion that the floor of impact of AI safety careers is very high). If someone wants to use the calculator below, I recommend manually inputting numbers.

For context (and to accomodate the EA forum's new policy): I used an LLM (Claude Opus 4.6) to help draft this post and it likely contains 10-20% AI-generated text, but I've edited and rewritten it and mostly endorse it (see the paragraph above this).

TL;DR

I estimate the expected impact of an additional early-career AI safety researcher by combining assumptions about AI risk, tractability, counterfactual replaceability, and population at stake to express the result in GiveWell-equivalent terms. Under what I believe are very conservative inputs, the estimate is on the order of a few million dollars per year in equivalent donations. The results are very sensitive to some unkown parameters, though. Access the calculator/model used here (tweakable to a wide range of beliefs and judgements).

Written for readers not necessarily familiar with EA, so some basic concepts and orgs are explained in more detail than is typical on the Forum.

The Question (Almost) Nobody Calculates

“AI safety is important” is something EAs say a lot. But how important, exactly? Like, in dollars? That’s the question I faced a month ago, and I’ve just finished answering it (at least enough to convince myself).

Background: I’m a freshman at Penn. I have to decide what to do with my career in the next few years. “Work on AI safety” kept coming up, but I couldn’t find an expected value calculation for how much good one more safety researcher actually generates.* Probably because it’s too hard and any attempt ends up wishy-washy. Nevertheless, I sat down with Opus 4.6 and tried to actually calculate it.

I made every assumption as modest as possible. I anchored to the most skeptical credible forecasters I could find. I pressure-tested (nearly) every parameter. Even after all that, the answer was: $5M in (GiveWell) donations per year.

I built a calculator^[1] where you can plug in your own assumptions and see what you get. The rest of this post walks through the logic behind it.

*The one prior attempt I found is @Jordan Taylor's “Expected impact of a career in AI safety under different opinions” (2022). Taylor’s estimations are more optimistic (they count a lot of future people). Mine is deliberately skeptical, trying to (1) establish a floor for undergrads / early-career EAs considering AI safety, and (2) make the case to people who aren’t already convinced (most of my college friends). The idea for (1) and this entire project was inspired by William MacAskill’s description of “lower-bound reasoning” (chapter 6 of Doing Good Better) to help decide between two careers. For context, my other contender is entrepreneurship with earning-to-give.

How to Calculate E(AI Safety Career)?

The expected impact of one AI safety career, measured in GiveWell-equivalent dollars, is:

Each piece has sub-components. The calculator lets you tweak all of them. I’ll walk through the important ones.

How Likely Is AI Catastrophe?

This is the parameter with the widest disagreement, and the one that matters most.

The Existential Risk Persuasion Tournament (Karger et al. 2023) brought together 89 superforecasters — people with strong track records on real-world prediction questions — and 80 domain experts, then had them deliberate and give final estimates.^[2] Superforecasters landed at 0.38% chance of AI-caused extinction by 2100. Domain experts landed at 3%. (This was the largest disagreement in the entire tournament.)

Meanwhile, a survey of 2,778 AI researchers who’d published at NeurIPS or ICML gave a median of 5% and a mean of ~9%.^[3]

The calculator defaults to 0.75% — roughly where Samotsvety, an high-perfoming forecasting group, would land based on their AGI timelines. That’s between the superforecasters and the domain experts. You can change it.

When Does Your Career Window End?

Here’s something imporatnt to note: your career window ends when AGI arrives, because after that, either AI solves everything (and your career is useless/moot) or it doesn’t go well (we all die).

The calculator uses your chosen AGI timeline to set this automatically. Pick “Samotsvety” and your career ends around 2033. Pick “Metaculus” and it’s similar. Pick the superforecaster timeline and you get until ~2040.

For a freshman graduating in May 2029 with a Samotsvety anchor, that’s about 3.6 working years.

How Much Does Safety Research Actually Help?

This is the parameter I’m least confident about. Nobody has measured it empirically. The only framework I found: assume some percentage of relative risk reduction per doubling of cumulative research effort.^[4] One EA Forum commenter suggested 5-10% per doubling. The calculator defaults to 5%.

Whether the entire field of ~1,100 researchers^[5] reduces AI catastrophe risk by 0.03 or 3 percentage points depends almost entirely on this one assumption. If you believe the research being done today is useless (against future AI systems), the number is near zero. If you believe a few key breakthroughs may change the trajectory, then it’s higher.

I can’t resolve this. Neither can anyone else right now (I think). So for now, I’ll have to make it up. The calculator lets you try different values.

How Productive Are You?

A fresh graduate doesn’t produce as much research as a senior researcher. Academic data gives us a baseline: at Norwegian universities, the bottom 50% of researchers produce 15% of total output.^[6] That’s a per-person average of 0.30x the field mean. Juniors cluster in this bottom half. The boundary between the top and bottom half sits around 0.65-0.70x.

The calculator uses these empirically-derived multipliers (0.30x for your first two years, 0.70x for years 3-4, 0.80x for year 5+) and weighs them against your working years to get quality-adjusted researcher-years.

With 3.6 years and the default multipliers, you produce about 1.7 quality-adjusted researcher-years out of a field total of roughly 15,000.

What About the Counterfactual?

If you don’t take this role, does someone equally good fill it? If yes, your contribution was zero regardless of how important the work is.

The answer is somewhere between “definitely yes” and “definitely no” (note to self: no sh*t, Sherlock).

Evidence for the answer being less than “definitely yes”: MATS (a major AI safety training program) reports that fellow applications are growing 1.8x per year, but actual deployed research talent is only growing 1.25x per year.^[7]

The calculator defaults to a what I believe is modest: 30% (a judgment call, not a derivation of anything). I couldn’t find any empirical measurements of this. In fact, I’m not sure how this can ever be measured.

How Many People Are at Stake?

Not just the current 8 billion. If you prevent AI catastrophe, future generations exist too. But each future generation should be discounted by the probability it goes extinct from non-AI causes (think: nuclear war, pandemics, climate change).

The only peer-reviewed estimate of natural extinction risk is Snyder-Beattie et al. (2019, Nature Scientific Reports): less than 1-in-14,000 per year.^[8] But that excludes anthropogenic (man-caused) risk. Domain experts put total non-AI risk at roughly 3% per century.^[2]

Using a geometric series with 2% per-generation risk (deliberately pessimistic), 3.3 billion births per generation (UN data^[9]), and 50% chance that AI catastrophe is truly permanent^[10], the calculator arrives at about 90 billion (probability-weighted) people at stake.

What Is a GiveWell Donation Actually Worth?

GiveWell’s top charities save a life for roughly $3,000-$5,500.^[11] The calculator defaults to $4,000. But under short AI timelines, that $4,000 doesn’t really buy a full life.

There are three scenarios. The calculator groups them explicitly:

Scenario A: AGI goes well (28.5% chance with Samotsvety defaults). AI solves poverty, malaria, clean water (all of global poverty, basically) within a few years. The child you saved for $4,000 would have been saved anyway. Your donation was redundant. Maybe 10% of the value is retained (again, making numbers up), accounting for a deployment lag.

Scenario B: AI catastrophe (0.2% chance). Everyone dies or civilization collapses. The child you saved at age 2 dies at age ~4 in the catastrophe. You bought 2 years of life, not 58. That’s 2/58 = 3.1% of the value. This number is derived from your AGI timeline (change the timeline and it update)s.

Scenario C: No AGI yet (71.3% chance). Business as usual. GiveWell retains full value.

Blended: a GiveWell donation is worth about 74% of its face value. The effective cost per life is roughly $5,400 instead of $4,000.

Note that this discount only applies to GiveWell donations. Direct AI safety work isn’t discounted by short timelines because if you reduce the probability of catastrophe, that value doesn’t depend on when AGI arrives.

The Results

With all the defaults (Samotsvety’s P(doom), Samotsvety AGI timeline, modest assumptions everywhere else), one marignal freshman pursuing AI safety researcher produces:

$13.6 million in total career impact (GiveWell-equivalent)

To match that through GiveWell donations, you’d need to donate:

$16.9 million as a lump sum at graduation (GiveWell discount is smallest here because AGI hasn’t arrived yet)
$18.4 million as a lump sum at mid-career (discount is larger cause AGI is more likely by then)
$5.1 million per year on average

That last number is the bottom line. Can you donate an average of $5.1 million per year (inflation-adjusted, of course) to GiveWell from graduation until AGI? If not, AI safety research is the better career option in terms of expected value. (At least, that’s how I was thinking about it, because my other option for doing good was entrepreneurship + earning-to-give at scale)

What If You’re More Skeptical?

Change the P(AI catastrophe) to the superforecaster level (0.38%) and everything scales down proportionally. The per-year threshold drops to roughly $2.5M per year. Still not a number 99%+ of American adults can make per year.

Change the tractability parameter (risk reduction per doubling) to the very-skeptical 3% and it drops to $1M per year.

Try it yourself.^[1] Every parameter is a dropdown with sourced options. The three scenarios in the GiveWell section adapt automatically when you change the AGI timeline. The sources sheet lists 22 cited works with URLs.

What Would It Take to Reject AI Safety?

You’d need to believe many** of these simultaneously:

AI catastrophe risk is below 0.38% (lower than superforecasters)
Safety research reduces risk by less than 3% (i.e. the entire field is nearly useless)
Your counterfactual contribution is way below 30% (i.e. you’re highly replaceable))
Non-AI extinction risk exceeds 2% per generation (i.e. civilization is fragile enough that solving alignment barely matters)
AI catastrophe is recoverable most of the the time
You can donate ~$5M per year to GiveWell

Some are defensible; however, holding multiple of these at once is too confident a position (in my opinion).

**I’m too lazy to find the configuration with the smallest number of simultaneous beliefs.

I might be wrong about a lot of things here. I’m a freshman, not an expert. I used Claude extensively for the calculations and literature sourcing (the judgment calls and final reasoning are largely mine, though). If you have better numbers for any parameter, especially tractability (AI risk reduction per doubling of AI safety researchers) and per‑generation non-AI-caused extinction risk, please share them! Those two drive the estimate strongly, and neither has any empirical basis (at least, that I can find).

^{^}
Calculator: https://docs.google.com/spreadsheets/d/1ucXLecZ1OA42I9pJh-xAUzr4ICUg2ccfvMbwj8D0bh4/edit?usp=sharing. Alll parameters are tweakable dropdowns. Sources sheet has 22 citations.
^{^}
Karger et al. (2023), “Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament.” 89 superforecasters and 80 domain experts. Superforecasters: 0.38% AI extinction by 2100. Domain experts: 3%.
^{^}
Grace et al. (2024), “Thousands of AI Authors on the Future of AI.” 2,778 researchers surveyed. Median: 5%. Mean: ~9%. 38-51% placed at least 10% on extinction-level outcomes.

^{^}

The “relative risk reduction per doubling” framework comes from comments on Jordan Taylor’s EA Forum post (2022), “Expected impact of a career in AI safety under different opinions”

^{^}

EA Forum field growth analysis (September 2025). ~600 technical + ~500 non-technical AI safety FTEs across 113 organizations. Growing at ~21% per year on the technical side.

^{^}

Kyvik (1989), “Productivity differences, fields of learning, and Lotka’s law.” Norwegian universities. “About 20% of the tenured faculty produce 50% of the total output, and the most prolific half of the researchers account for almost 85% of the output.” Bottom 50% average: 15%/50% = 0.30x field mean.

^{^}

LessWrong (2025), “AI safety undervalues founders.”

^{^}

Snyder-Beattie, Ord, and Bonsall (2019), “An upper bound for the background rate of human extinction.” Nature Scientific Reports. Natural extinction risk: “almost guaranteed to be less than one in 14,000 per year, and likely to be less than one in 87,000.”

^{^}

UN Department of Economic and Social Affairs (2024), World Population Prospects. ~132 million births per year currently.

^{^}

RAND (2025), “Could AI Really Kill Off Humans?” Found that true human extinction is mechanistically very hard (e.g. AI-initiated nuclear war would probably not kill every human).

^{^}

GiveWell (2024), “How Much Does It Cost to Save a Life?” Range: $3,000-$5,500. Note: they “generally expect the cost to save a life to increase over time.”

Show all footnotes

28 Reactions

More posts like this

Comments14

Sorted by

New & upvoted

Click to highlight new comments since: Today at 9:17 PM

EliezerYudkowskyApr 1012

All of the difficulty here is having the sign of your impact be positive. It's very hard to end up neutral; eg, if your work is just nonsense, it's negative because it's a distraction and attention sink. And it's quite easy to end up negative, for example, if you exaggerate the impact of your work and provide more feed into the hopium ecosystem that desperately touts up any sign of progress.

When you mess with AI, whatever you do will outweigh any other impacts of your life, of course, obviously. It's having the sign end up positive that is the hard part.

Stephen McAleeseMay 54

Really great article. Thank you for writing it.

Before I had naively assumed that if an AI safety researcher earned say $50k per year then donating that much per year would have an equivalent impact.

So for me reading this post was an update toward the value of doing AI safety research directly rather than donating money to the field.

Clara Torres Latorre 🔸May 62

How so?

As I read it this post compares to donations to Givewell, not to donations to AI safety research.

Stephen McAleeseMay 62

Oh right, thanks for pointing that out!

jackcMay 182

This is a great article! But I am unsure where you got your default estimates attributed to Samotsvety (of which I am a member):

You cite a Samotsvety forecast of 0.75% probability of AI catastrophe this century - but https://forum.effectivealtruism.org/posts/EG9xDM8YRz4JN4wMN/samotsvety-s-ai-risk-forecasts puts it at 25%. I expect our current forecasts would be in that ballpark too - 0.75% is lower than even the lowest individual's forecast!

And you cite a 2033 AGI timeline from Samotsvety (Jan 2026), but I am pretty sure Samotsvety didn't do a AGI timelines forecast in Jan 2026. I found a bunch of articles on the internet that incorrectly referred to a Jan 2026 forecast but actually linked to the 2023 forecast so perhaps that's where you got it from? The 2023 forecast is now quite outdated - we've done more recent informal forecasting exercises but only with a subset of the team. I personally think the best timelines forecasts are from the AI 2027 folks https://www.aifuturesmodel.com/

Hazem Hassan🔸May 192

Thank you for the correction and apologies for the error.

I'm actually not sure where 0.75% came from now that I look at it. I wrote this a month ago and cannot remember my state of mind. I probably didn't double-check the number while Claude-source-finding.

I'm adding a disclaimer at the top of the post and will update the calculator. Thanks again.

jackcMay 192

Thanks for the update. I think the rest of the calculations and parameters look about right actually (I didn't look at all the other sources, but the ones you use as defaults in the spreadsheet look reasonable at least). And I was doing an independent estimate that came to about the same ballpark as what I get using your spreadsheet with the corrected Samotsvety figures. So I think the post basically still holds up except that the numbers shift more in favor of AI safety work if you use the Samotsvety x-risk estimates - and the original parameters still serve as a conservative estimate.

MichaelDickensApr 84

This is a nice analysis and I wish more people would do things like this. A lot of the setup looks sensible to me. I thought it was clever how you used non-AI x-risk to determine the total number of people affected by AI x-risk, since this doesn't get you unwieldy Astronomical Waste-style numbers, but it also assigns positive value to future people. I'm not sure it's actually a good approach, but at least it's a clever idea.

It would be more useful to compare AI safety work vs. other longtermist interventions, since it's unlikely that donations to GiveWell would beat longtermist interventions from a longtermist POV. But I realize that would be a lot more work, and you've already put an admirable amount of work into this.

The biggest thing missing from the model is the possibility that safety research is net harmful. I believe much historical safety research ended up being harmful by making AI easier to commercialize and thus accelerating development (it would've been better for researchers to focus more on theoretical work that doesn't directly enable commercialization). I'm less sure about this but there may also be a replacement effect where empirical work on aligning current-gen models—which I don't think is very useful for aligning ASI—crowds out more important long-term work.

Hazem Hassan🔸Apr 81

It would be more useful to compare AI safety work vs. other longtermist interventions, since it's unlikely that donations to GiveWell would beat longtermist interventions from a longtermist POV

I agree that would be incredibly useful; maybe I'll do that next (20% chance). The same model can be used for pandemics and nuclear risk -- I'd just need to update (1) P(doom) for each, (2) tractability (for AI, that's the 'AI safety decreases risk by 7% per doubling of staff'), and (3) personal contribution. It could be a quick tool for anyone to realize how impactful longtermist careers are and, based on their beliefs about the world and their own ability, choose the career with the highest EV, though I'd only recommend acting on that comparison if the difference is quite large (my hunch is 5x or higher) given the uncertainty involved.

It would also force people to hold self-consistent beliefs. If there's a separate calculator for AI safety and one for biosecurity, someone could claim that non-[x-risk at hand] is much higher than [x-isk at hand] in each case, but that wouldn't be consistent across the two, cause each [x-risk at hand] would factor into the other's non-[x-risk at hand]. In other words, it can be used as a tool to calibrate beliefs about existential risks (I think it would do that for me, at least).

The biggest thing missing from the model is the possibility that safety research is net harmful.

This is quite interesting; I hadn't thought of this. Do you think it should be approximated as "% chance that AI safety is actually bad" and "increase in AI risk per doubling of staff"? e.g. it would look like this:

90% chance AI safety reduces AI risk, decreasing it by 10% per doubling of staff
10% chance AI safety increases AI risk, increasing it by 10% per doubling of staff

Or is that too rudimentary, you think?

Mo PuteraApr 93

re: the latter, maybe you can get inspiration from RP's CCM > existential risk > "small-scale AI misalignment project" and check out the graphics below. Their default params are 96.4% chance no effect, 70% chance +ve outcome conditional on effect, +30% rise in p(extinction) conditional on -ve outcome, and you can change them and see how the EV updates; these defaults don't matter as much as the takeaway that AIS work needs to be robustly +ve and that folks whose risk aversion is greater than zero (probably wise) will do well to prioritise resolving this sign uncertainty, which boils down to Michael's advice above (cf. the advice to build deep models, or Dave Banerjee's advice more specifically).

MichaelDickensApr 82

This is quite interesting; I hadn't thought of this. Do you think it should be approximated as "% chance that AI safety is actually bad" and "increase in AI risk per doubling of staff"?

I'm not sure. I would probably say that you shouldn't start a career in AI safety unless you can articulate a theory of why safety work has been harmful in the past, and how you're going to avoid more of the same. Building that theory is more important than adjusting the model inputs on a cost-effectiveness model.

meeriApr 82

Nice analysis!

If not, AI safety research is the better career option in terms of expected value. (At least, that’s how I was thinking about it, because my other option for doing good was entrepreneurship + earning-to-give at scale)

I hope if you were thinking about earning to give at scale, that you consider funding AI safety. It seems based on these calculations that in this model, funding AI safety work would need a lot less than 1 million USD per year to have more impact-in-expectation than 5 million USD to GiveWell.

jackchang110Apr 82

It's interesting and is actually an under discussed but important topic in EA community.

However, I think you could compare direct work vs donating to support AI safety research directly, not donating Givewell(which mainly focus on improving global health) Because for some people, donating to longtermism funds is much more effective than GiveWell.

Hazem Hassan🔸Apr 81

I compared GiveWell to convince people who believe in global dev but are skeptical of AI risk. I could have kept the explanation of why dollars to GiveWell should be discounted and instead said "Donate [smaller yet still big amount] to AI safety / longtermist solutions" which would be equivalent to "Donate $5M to GiveWell" (assuming my discounting is accurate), but I feared it would sound circular. Some people's natural response would be "I already don't believe AI risk is that big of a deal!" even though the two framings are logically equivalent