TL;DR
I estimate the expected impact of an additional early-career AI safety researcher by combining assumptions about AI risk, tractability, counterfactual replaceability, and population at stake to express the result in GiveWell-equivalent terms. Under what I believe are very conservative inputs, the estimate is on the order of a few million dollars per year in equivalent donations. The results are very sensitive to some unkown parameters, though. Access the calculator/model used here (tweakable to a wide range of beliefs and judgements).
Written for readers not necessarily familiar with EA, so some basic concepts and orgs are explained in more detail than is typical on the Forum.
The Question (Almost) Nobody Calculates
“AI safety is important” is something EAs say a lot. But how important, exactly? Like, in dollars? That’s the question I faced a month ago, and I’ve just finished answering it (at least enough to convince myself).
Background: I’m a freshman at Penn. I have to decide what to do with my career in the next few years. “Work on AI safety” kept coming up, but I couldn’t find an expected value calculation for how much good one more safety researcher actually generates.* Probably because it’s too hard and any attempt ends up wishy-washy. Nevertheless, I sat down with Opus 4.6 and tried to actually calculate it.
I made every assumption as modest as possible. I anchored to the most skeptical credible forecasters I could find. I pressure-tested (nearly) every parameter. Even after all that, the answer was: $5M in (GiveWell) donations per year.
I built a calculator[1] where you can plug in your own assumptions and see what you get. The rest of this post walks through the logic behind it.
*The one prior attempt I found is @Jordan Taylor's “Expected impact of a career in AI safety under different opinions” (2022). Taylor’s estimations are more optimistic (they count a lot of future people). Mine is deliberately skeptical, trying to (1) establish a floor for undergrads / early-career EAs considering AI safety, and (2) make the case to people who aren’t already convinced (most of my college friends). The idea for (1) and this entire project was inspired by William MacAskill’s description of “lower-bound reasoning” (chapter 6 of Doing Good Better) to help decide between two careers. For context, my other contender is entrepreneurship with earning-to-give.
How to Calculate E(AI Safety Career)?
The expected impact of one AI safety career, measured in GiveWell-equivalent dollars, is:
Each piece has sub-components. The calculator lets you tweak all of them. I’ll walk through the important ones.
How Likely Is AI Catastrophe?
This is the parameter with the widest disagreement, and the one that matters most.
The Existential Risk Persuasion Tournament (Karger et al. 2023) brought together 89 superforecasters — people with strong track records on real-world prediction questions — and 80 domain experts, then had them deliberate and give final estimates.[2] Superforecasters landed at 0.38% chance of AI-caused extinction by 2100. Domain experts landed at 3%. (This was the largest disagreement in the entire tournament.)
Meanwhile, a survey of 2,778 AI researchers who’d published at NeurIPS or ICML gave a median of 5% and a mean of ~9%.[3]
The calculator defaults to 0.75% — roughly where Samotsvety, an high-perfoming forecasting group, would land based on their AGI timelines. That’s between the superforecasters and the domain experts. You can change it.
When Does Your Career Window End?
Here’s something imporatnt to note: your career window ends when AGI arrives, because after that, either AI solves everything (and your career is useless/moot) or it doesn’t go well (we all die).
The calculator uses your chosen AGI timeline to set this automatically. Pick “Samotsvety” and your career ends around 2033. Pick “Metaculus” and it’s similar. Pick the superforecaster timeline and you get until ~2040.
For a freshman graduating in May 2029 with a Samotsvety anchor, that’s about 3.6 working years.
How Much Does Safety Research Actually Help?
This is the parameter I’m least confident about. Nobody has measured it empirically. The only framework I found: assume some percentage of relative risk reduction per doubling of cumulative research effort.[4] One EA Forum commenter suggested 5-10% per doubling. The calculator defaults to 5%.
Whether the entire field of ~1,100 researchers[5] reduces AI catastrophe risk by 0.03 or 3 percentage points depends almost entirely on this one assumption. If you believe the research being done today is useless (against future AI systems), the number is near zero. If you believe a few key breakthroughs may change the trajectory, then it’s higher.
I can’t resolve this. Neither can anyone else right now (I think). So for now, I’ll have to make it up. The calculator lets you try different values.
How Productive Are You?
A fresh graduate doesn’t produce as much research as a senior researcher. Academic data gives us a baseline: at Norwegian universities, the bottom 50% of researchers produce 15% of total output.[6] That’s a per-person average of 0.30x the field mean. Juniors cluster in this bottom half. The boundary between the top and bottom half sits around 0.65-0.70x.
The calculator uses these empirically-derived multipliers (0.30x for your first two years, 0.70x for years 3-4, 0.80x for year 5+) and weighs them against your working years to get quality-adjusted researcher-years.
With 3.6 years and the default multipliers, you produce about 1.7 quality-adjusted researcher-years out of a field total of roughly 15,000.
What About the Counterfactual?
If you don’t take this role, does someone equally good fill it? If yes, your contribution was zero regardless of how important the work is.
The answer is somewhere between “definitely yes” and “definitely no” (note to self: no sh*t, Sherlock).
Evidence for the answer being less than “definitely yes”: MATS (a major AI safety training program) reports that fellow applications are growing 1.8x per year, but actual deployed research talent is only growing 1.25x per year.[7]
The calculator defaults to a what I believe is modest: 30% (a judgment call, not a derivation of anything). I couldn’t find any empirical measurements of this. In fact, I’m not sure how this can ever be measured.
How Many People Are at Stake?
Not just the current 8 billion. If you prevent AI catastrophe, future generations exist too. But each future generation should be discounted by the probability it goes extinct from non-AI causes (think: nuclear war, pandemics, climate change).
The only peer-reviewed estimate of natural extinction risk is Snyder-Beattie et al. (2019, Nature Scientific Reports): less than 1-in-14,000 per year.[8] But that excludes anthropogenic (man-caused) risk. Domain experts put total non-AI risk at roughly 3% per century.[2]
Using a geometric series with 2% per-generation risk (deliberately pessimistic), 3.3 billion births per generation (UN data[9]), and 50% chance that AI catastrophe is truly permanent[10], the calculator arrives at about 90 billion (probability-weighted) people at stake.
What Is a GiveWell Donation Actually Worth?
GiveWell’s top charities save a life for roughly $3,000-$5,500.[11] The calculator defaults to $4,000. But under short AI timelines, that $4,000 doesn’t really buy a full life.
There are three scenarios. The calculator groups them explicitly:
Scenario A: AGI goes well (28.5% chance with Samotsvety defaults). AI solves poverty, malaria, clean water (all of global poverty, basically) within a few years. The child you saved for $4,000 would have been saved anyway. Your donation was redundant. Maybe 10% of the value is retained (again, making numbers up), accounting for a deployment lag.
Scenario B: AI catastrophe (0.2% chance). Everyone dies or civilization collapses. The child you saved at age 2 dies at age ~4 in the catastrophe. You bought 2 years of life, not 58. That’s 2/58 = 3.1% of the value. This number is derived from your AGI timeline (change the timeline and it update)s.
Scenario C: No AGI yet (71.3% chance). Business as usual. GiveWell retains full value.
Blended: a GiveWell donation is worth about 74% of its face value. The effective cost per life is roughly $5,400 instead of $4,000.
Note that this discount only applies to GiveWell donations. Direct AI safety work isn’t discounted by short timelines because if you reduce the probability of catastrophe, that value doesn’t depend on when AGI arrives.
The Results
With all the defaults (Samotsvety’s P(doom), Samotsvety AGI timeline, modest assumptions everywhere else), one marignal freshman pursuing AI safety researcher produces:
- $13.6 million in total career impact (GiveWell-equivalent)
To match that through GiveWell donations, you’d need to donate:
- $16.9 million as a lump sum at graduation (GiveWell discount is smallest here because AGI hasn’t arrived yet)
- $18.4 million as a lump sum at mid-career (discount is larger cause AGI is more likely by then)
- $5.1 million per year on average
That last number is the bottom line. Can you donate an average of $5.1 million per year (inflation-adjusted, of course) to GiveWell from graduation until AGI? If not, AI safety research is the better career option in terms of expected value. (At least, that’s how I was thinking about it, because my other option for doing good was entrepreneurship + earning-to-give at scale)
What If You’re More Skeptical?
Change the P(AI catastrophe) to the superforecaster level (0.38%) and everything scales down proportionally. The per-year threshold drops to roughly $2.5M per year. Still not a number 99%+ of American adults can make per year.
Change the tractability parameter (risk reduction per doubling) to the very-skeptical 3% and it drops to $1M per year.
Try it yourself.[1] Every parameter is a dropdown with sourced options. The three scenarios in the GiveWell section adapt automatically when you change the AGI timeline. The sources sheet lists 22 cited works with URLs.
What Would It Take to Reject AI Safety?
You’d need to believe many** of these simultaneously:
- AI catastrophe risk is below 0.38% (lower than superforecasters)
- Safety research reduces risk by less than 3% (i.e. the entire field is nearly useless)
- Your counterfactual contribution is way below 30% (i.e. you’re highly replaceable))
- Non-AI extinction risk exceeds 2% per generation (i.e. civilization is fragile enough that solving alignment barely matters)
- AI catastrophe is recoverable most of the the time
- You can donate ~$5M per year to GiveWell
Some are defensible; however, holding multiple of these at once is too confident a position (in my opinion).
**I’m too lazy to find the configuration with the smallest number of simultaneous beliefs.
I might be wrong about a lot of things here. I’m a freshman, not an expert. I used Claude extensively for the calculations and literature sourcing (the judgment calls and final reasoning are largely mine, though). If you have better numbers for any parameter, especially tractability (AI risk reduction per doubling of AI safety researchers) and per‑generation non-AI-caused extinction risk, please share them! Those two drive the estimate strongly, and neither has any empirical basis (at least, that I can find).
- ^
Calculator: https://docs.google.com/spreadsheets/d/1ucXLecZ1OA42I9pJh-xAUzr4ICUg2ccfvMbwj8D0bh4/edit?usp=sharing. Alll parameters are tweakable dropdowns. Sources sheet has 22 citations.
- ^
Karger et al. (2023), “Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament.” 89 superforecasters and 80 domain experts. Superforecasters: 0.38% AI extinction by 2100. Domain experts: 3%.
- ^
Grace et al. (2024), “Thousands of AI Authors on the Future of AI.” 2,778 researchers surveyed. Median: 5%. Mean: ~9%. 38-51% placed at least 10% on extinction-level outcomes.
- ^
The “relative risk reduction per doubling” framework comes from comments on Jordan Taylor’s EA Forum post (2022), “Expected impact of a career in AI safety under different opinions”
- ^
EA Forum field growth analysis (September 2025). ~600 technical + ~500 non-technical AI safety FTEs across 113 organizations. Growing at ~21% per year on the technical side.
- ^
Kyvik (1989), “Productivity differences, fields of learning, and Lotka’s law.” Norwegian universities. “About 20% of the tenured faculty produce 50% of the total output, and the most prolific half of the researchers account for almost 85% of the output.” Bottom 50% average: 15%/50% = 0.30x field mean.
- ^
LessWrong (2025), “AI safety undervalues founders.”
- ^
Snyder-Beattie, Ord, and Bonsall (2019), “An upper bound for the background rate of human extinction.” Nature Scientific Reports. Natural extinction risk: “almost guaranteed to be less than one in 14,000 per year, and likely to be less than one in 87,000.”
- ^
UN Department of Economic and Social Affairs (2024), World Population Prospects. ~132 million births per year currently.
- ^
RAND (2025), “Could AI Really Kill Off Humans?” Found that true human extinction is mechanistically very hard (e.g. AI-initiated nuclear war would probably not kill every human).
- ^
GiveWell (2024), “How Much Does It Cost to Save a Life?” Range: $3,000-$5,500. Note: they “generally expect the cost to save a life to increase over time.”

All of the difficulty here is having the sign of your impact be positive. It's very hard to end up neutral; eg, if your work is just nonsense, it's negative because it's a distraction and attention sink. And it's quite easy to end up negative, for example, if you exaggerate the impact of your work and provide more feed into the hopium ecosystem that desperately touts up any sign of progress.
When you mess with AI, whatever you do will outweigh any other impacts of your life, of course, obviously. It's having the sign end up positive that is the hard part.
This is a nice analysis and I wish more people would do things like this. A lot of the setup looks sensible to me. I thought it was clever how you used non-AI x-risk to determine the total number of people affected by AI x-risk, since this doesn't get you unwieldy Astronomical Waste-style numbers, but it also assigns positive value to future people. I'm not sure it's actually a good approach, but at least it's a clever idea.
It would be more useful to compare AI safety work vs. other longtermist interventions, since it's unlikely that donations to GiveWell would beat longtermist interventions from a longtermist POV. But I realize that would be a lot more work, and you've already put an admirable amount of work into this.
The biggest thing missing from the model is the possibility that safety research is net harmful. I believe much historical safety research ended up being harmful by making AI easier to commercialize and thus accelerating development (it would've been better for researchers to focus more on theoretical work that doesn't directly enable commercialization). I'm less sure about this but there may also be a replacement effect where empirical work on aligning current-gen models—which I don't think is very useful for aligning ASI—crowds out more important long-term work.
I agree that would be incredibly useful; maybe I'll do that next (20% chance). The same model can be used for pandemics and nuclear risk -- I'd just need to update (1) P(doom) for each, (2) tractability (for AI, that's the 'AI safety decreases risk by 7% per doubling of staff'), and (3) personal contribution. It could be a quick tool for anyone to realize how impactful longtermist careers are and, based on their beliefs about the world and their own ability, choose the career with the highest EV, though I'd only recommend acting on that comparison if the difference is quite large (my hunch is 5x or higher) given the uncertainty involved.
It would also force people to hold self-consistent beliefs. If there's a separate calculator for AI safety and one for biosecurity, someone could claim that non-[x-risk at hand] is much higher than [x-isk at hand] in each case, but that wouldn't be consistent across the two, cause each [x-risk at hand] would factor into the other's non-[x-risk at hand]. In other words, it can be used as a tool to calibrate beliefs about existential risks (I think it would do that for me, at least).
This is quite interesting; I hadn't thought of this. Do you think it should be approximated as "% chance that AI safety is actually bad" and "increase in AI risk per doubling of staff"? e.g. it would look like this:
Or is that too rudimentary, you think?
re: the latter, maybe you can get inspiration from RP's CCM > existential risk > "small-scale AI misalignment project" and check out the graphics below. Their default params are 96.4% chance no effect, 70% chance +ve outcome conditional on effect, +30% rise in p(extinction) conditional on -ve outcome, and you can change them and see how the EV updates; these defaults don't matter as much as the takeaway that AIS work needs to be robustly +ve and that folks whose risk aversion is greater than zero (probably wise) will do well to prioritise resolving this sign uncertainty, which boils down to Michael's advice above (cf. the advice to build deep models, or Dave Banerjee's advice more specifically).
I'm not sure. I would probably say that you shouldn't start a career in AI safety unless you can articulate a theory of why safety work has been harmful in the past, and how you're going to avoid more of the same. Building that theory is more important than adjusting the model inputs on a cost-effectiveness model.
Nice analysis!
I hope if you were thinking about earning to give at scale, that you consider funding AI safety. It seems based on these calculations that in this model, funding AI safety work would need a lot less than 1 million USD per year to have more impact-in-expectation than 5 million USD to GiveWell.
It's interesting and is actually an under discussed but important topic in EA community.
However, I think you could compare direct work vs donating to support AI safety research directly, not donating Givewell(which mainly focus on improving global health) Because for some people, donating to longtermism funds is much more effective than GiveWell.
I compared GiveWell to convince people who believe in global dev but are skeptical of AI risk. I could have kept the explanation of why dollars to GiveWell should be discounted and instead said "Donate [smaller yet still big amount] to AI safety / longtermist solutions" which would be equivalent to "Donate $5M to GiveWell" (assuming my discounting is accurate), but I feared it would sound circular. Some people's natural response would be "I already don't believe AI risk is that big of a deal!" even though the two framings are logically equivalent