Hide table of contents

tl;dr

AI career advice orgs, prominently 80,000 Hours, encourage career moves into AI safety roles, including mid‑career pivots. I analyse the quality of this advice from the private satisfaction, public-good, and counterfactual equilibrium perspectives, and learn the following things:

  1. Rational Failure: If you value personal, direct impact highly, it can be rational to attempt a pivot that will probably fail (e.g., success chance).
  2. Misalignment: The optimal success rate for the field (maximizing total impact) often differs from the optimal rate for individuals (maximizing personal EV). The advice ecosystem appears calibrated to neither.
  3. Counterfactual impact: Counterfactually, the value of a pivot can be lower than naïvely; in highly contested roles you need not only to be the best but to be the best by a wide enough margin to justify all the effort, and career pivots are negative EV for many plausible parameter ranges
  4. Donations: If the EV of a pivot is uncertain, donating the attempt costs (e.g., the sabbatical expenses you were willing to pay) provides a guaranteed positive counterfactual impact.
  5. Opacity: if we needed to work out whether our advice was producing good or not, we would need data both on success rates and candidate quality distributions, which are currently unavailable.

The problem is acute for mid-career professionals with high switching costs. Mitigations, both personal and institutional, are proposed below, and an explorable widget is provided to help individuals evaluate their own pivots.

This is a draft for Draft Amnesty Week. Hopefully I get time to revisit this post and edit for style

Epistemic status

The analysis here should not be astonishing; we know advice calibration is hard, and we know that the collective and private goals might be misaligned. Grinding through the details nonetheless reveals some less obvious implications.

Here's a sketch of a toy model of the AI-safety career-advice economy. It explains how advice can honestly recommend gambles that mostly fail and why, without calibration data, we can't tell whether that failure rate is healthy (field-optimal) or wasteful (value-dissipating).

A field can rationally budget for many individual failures---but it owes its members transparency. If the field that studies alignment relies on opaque, misaligned mechanisms, it risks both dissipating value and weakening credibility.

For this note, an AI pivot means 'applying to roles in existing AI safety organizations'. (We return later to how alternatives may change the game.) This logic likely extends to other impact fields with constrained hiring pipelines, e.g., climate tech, biosecurity, global health, and so on.

Key Concepts:

  • We evaluate decisions based on Impact, measured in donation-equivalent dollars.
  • Private costs are the candidate's own losses (sabbatical spend, opportunity cost).
  • Public or social costs are losses at the field level.
  • Systemic waste is the inefficiency that occurs when miscalibrated entry dissipates value rather than merely transferring it.

Part A---Private decision model

Meet Alice

Alice is a senior software engineer in her mid-30s. She has been donating roughly 10% of her income to effective charities and now wonders whether to switch lanes entirely. She has saved six months of runway funds to explore AI-safety roles --- research engineering, governance, or technical coordination --- but each month out of work costs her real money and momentum. Her question is simple: Is this pivot worth the gamble? I will answer this question in the rest of this note.

An uncertain career pivot is a gamble, and as such we model it the usual way that we model gambles.

Suppose you have a stable job, and make donations to your impact causes, but you think you might be able to pivot into something more impactful --- say, AI safety research. You take an unpaid sabbatical to prepare and apply for one of the AI Safety jobs you've seen advertised, maybe at a lower wage but with higher impact. That is what we model here.

We assume that you have preference both for remuneration and for impact, and an estimation of potential impact in the new role. We don't deal with risk aversion here, or diminishing returns to keep things simple. We also ignore job satisfaction and other non-pecuniary benefits, but it is easy to add those in, or count them as part of the wage.

We evaluate the decision in donation‑equivalent dollars (after‑tax wage + donations + impact valuation). All flows are valued in present value (PV), discounted at a continuous rate . We use a simple linear utility function (two dollars of take-home wage is twice as good as one etc)[1] and ignore risk aversion and non-pecuniary benefits (like job satisfaction) for simplicity.

Utility and Surplus:

  • Utility function , where is your weight on (impact + donations) relative to personal consumption.
  • is your baseline annual utility (current role).
  • is your annual utility in the new role (if successful).
  • is the per-year surplus if the pivot succeeds.

The Search Process:

A stream of jobs arrives in the inbox from the job board.

  • is the per‑application success probability (assumed i.i.d.).
  • is the rate of job opportunities (opportunities/year), modeled as a Poisson process.
  • is the maximum sabbatical length (the "runway" in years).

Caveat: the i.i.d. assumption (each job is independent of the others) is strong, and likely optimistic. In practice, applications are correlated: if you are the best fit for one job, you are likely a good fit for others.

Costs and Discounting:

  • is the burn rate while on sabbatical (k$/year, donation-equivalent), capturing all net opportunity costs (foregone pay, progression, etc.).
  • is the continuous discount rate (per year). You can approximate where is your "effective horizon" (e.g., your AGI timeline).

We assume that if you succeed, you start immediately; if you fail after time , you return to your old job (which feels optimistic in this economy, so beware).

Modeling the Sabbatical

We want to calculate the expected value (EV) of this gamble. Since successes arrive at rate , the probability of success within the runway is .

With discounting at rate , the expected present value (PV) of attempting the pivot is (derived in Appendix A): Intuition: The bracket compares the discounted expected gain rate to the burn rate . The prefactor is the discounted expected duration of the sabbatical "clock".

The sign of the EV is determined by the bracketed term. Thus, the break-even per-application success probability (where EV=0) is All else being equal we can say

  • Higher burn rate or higher discount rate increases (making it harder for us to break even).
  • Higher application rate or higher upside decreases .
  • is independent of runway (though still scales the level of ).

Another sanity-check: When , So, the same closed-form threshold drops out immediately.

  1. Valuation sensitivity: is highly sensitive to the discount rate and the upside . Overestimating impact or underestimating wage cuts quickly leads to negative EV. Estimating impact is also highly uncertain, potentially making EV calculations fragile.
  2. Correlated applications: Job applications are not IID. The effective number of independent shots is lower than the raw application count, reducing effective .
  3. Estimating : The missing variable is the individual's . Career advisors rarely provide the data needed to estimate this.

Worked example

Now, let's plug in some plausible representative numbers for Alice, our mid‑career technical researcher considering a pivot into an AI safety role in a developed economy. She is a successful software engineer taking home k$/year, donating k$/year post tax, and having no on-the-job impact (i.e. No net harm, no net good). Alice earns , donates , and has baseline impact . A target role offers , , . Set , runway years, application rate /year, discount , burn . Then and Over 6 months, the chance of at least one success at is . Her expected actual sabbatical length is , and conditional on success, . Under these assumptions sabbatical is expected to break even because the job offers enough upside to compensate for greater-than-even risk of failure.

We plot a few values of Alice's to visualize the trade-offs for different upsides .

If you wish to play around with the assumptions check out the interactive Pivot EV Calculator (source at danmackinlay/career_pivot_calculator).

For Alice, the threshold probability marks the point where a pivot breaks even; below that, she burns runway faster than expected value, and if she wants to end up making satisfactory impact, she might consider donating money instead. If a 6‑month pivot fails, Alice burns $c\ell=$25k$ that could have been donated now to AI‑safety orgs, and she would have been better off donating that money directly.

Part B --- Field-level model

tl;dr

In world with heavy-tailed distribution of candidate impact, the field benefits from many attempts because a few "hits" dominate. In light-tailed worlds, the same encouragement becomes destructive. We simply don't know which world we're in.

That was Alice's private decision. Let us now consider the field-level perspective. What if everyone has Alice's decision rule? Does that help the world at large? For this we need to add some heterogeneity in candidate quality.

From personal gambles to field strategy

Our goal is to move beyond Alice's private break-even () to calculate the field's optimum applicant pool size (). This tells us how many Alices society can afford to have 'roll the dice' before the costs of the failures outweigh the value of the discoveries.

We switch from the private ledger of Part A to a public ledger measured in pure impact dollars.

  • Private cost (Part A): includes the individual's full opportunity cost during the sabbatical (foregone wages, consumption, career progression).
  • Public cost (Part B): captures only the foregone public good during the sabbatical: (baseline impact, donations, and externalities). The private wage hit is sad for the candidate but not for the field.

When an individual fails a pivot, they bear the private cost . The field bears the public cost , which captures the tendency that people who are giving up earning to switch to AI safety are donating less etc.

From a field-level perspective, wages and private consumption are transfers and drop out of the analysis; what matters is the net production of public goods, i.e. all that impact we are notionally delivering by working on AI safety.

For this to be anything but trivial, we have to include some way of capturing heterogeneity in candidate quality, otherwise everyone is identical and only one person should ever apply for any job. The minimally complicated way to do this is to assume that each candidate has a different true impact in the role, drawn i.i.d. from some distribution .

  • We fix the number of roles per year at and just consider the per-year scale From Alice's perspective the roles were arriving randomly; but from the field's perspective, each year there are a certain number of are fixed and we want to fill them with the best candidates.
  • Each applicant has a true per‑year impact in the role , i.i.d. (measured in impact dollars per year).
  • Employers perfectly observe (a strong assumption) and hire the top candidates.
  • Applicants do not observe their own .
  • From a public-goods perspective, each failed pivot costs the field the foregone public impact during the sabbatical, captured by the social burn rate .
  • We switch from a time-discounted flow to a steady-state annual perspective; if we do it in continuous time there is lots of annoying book-keeping around sabbatical durations and so on which obscures the main point, so there is a hidden fudge factor here which I am suppressing to keep the algebra simple.

The intuition here is that candidates can bring different capacities for impact, and the field benefits from having more candidates to choose from. The capacity would depend on the role, and include stuff like mathematical ability, interpersonal skills, grit, prestige, access to funding, or strategic personal connections etc.

Let the field‑side annual impact of the hires be the sum of the impact of the top candidates (we are assuming the employers picks the best perfectly), We care about the expected total impact of these best candidates, and in particular its marginal value each time we add one extra applicant,

From a public-good perspective, one failed attempt costs the field in foregone public impact. Define the social burn rate during sabbatical: where $ $ is the candidate's baseline on-the-job impact per year (often ≈0), is their baseline donations per year, and captures any externalities you choose to include (e.g., reviewer time, cohort displacement). Most of a candidate's private consumption loss is a transfer and drops out of the public ledger; what matters socially is the public good that didn't happen during the sabbatical.

The present value of foregone public impact for one failed attempt is: where is the social discount rate (which we can set equal to for simplicity, though they need not be identical). Under the sabbatical-then-stop rule, all unsuccessful candidates impose this cost on the field.

We do not model employer congestion from reviewing lots of applicants --- on the rationale that it is empirically small because employers stop looking at candidates when they're overwhelmed [@Horton2021Jobseekers]. Note, however, that we have also claimed that employers perfectly observe ; which means that we are being optimistic about the field's ability to sort candidates. The social benefit in the model comes from the fact that, all else being equal, more applicants mean a better top‑, and we have more effective researchers doing their research.

  1. With seats fixed, adding one more applicant increases the expected number of failures by exactly 1. Hence the marginal social cost of one extra applicant is , the expected foregone public impact per failed candidate.

  2. Whether adding one applicant is socially worthwhile depends on .

How much better does one extra applicant make the top-? Extreme value theory tells us this depends on the tail of the candidate quality distribution, The shape of the tail determines how quickly the marginal value from adding candidates diminishes as the pool grows.

Tail distributions and diminishing returns

We consider two families: light-tailed (Exponential) and heavy-tailed (Fréchet). Heavy tails capture the intuition that "unicorn" candidates (e.g., 10x engineers) exist and might be waiting in the career-pivot recruiting pool. The exponential family is a light-tailed distribution that decays quickly, and is also super easy to work with, so I've kept it in to check my working on the heavy-tailed case, which is not at all easy to work with.

Light tails: If , the marginal value decays hyperbolically: The intuition is that returns diminish quickly; applicants are relatively similar.

Heavy tails: If (where lower means heavier tail), the marginal value decays much slower (power law): If is close to 1 (very heavy tail), decays extremely slowly, justifying a wide search for unicorns. As increases, we recover the light-tailed behaviour.

Field-level optima

Once there are enough people for roles (when ), adding more applicants ("widening the recruiting funnel") improves the hired cohort on average, because some of the new entrants might be stronger. At the same time, each candidate burning sabbatical time is one more failed attempt that burns public-good resources. We compare these in present-value impact dollars, using a social discount rate (which we can set equal to from Part A for simplicity, though they need not be identical). This rate reflects both time preference and any expected shortening of future impact streams (e.g. job changes, role turnover, project endings).

A filled role produces a stream of impact over time which we will discount, even though we are filling the roles in annualized buckets. The total present value of $$1K$/year of annual impact is \(1K/\delta\) for discount rate \(\delta\).

The total welfare curve rises up to a maximum and then declines.

The break-even condition therefore defines the welfare-maximizing pool size . coincides (up to discretization) with the point where the gradient of changes sign.

Substituting and canceling from both sides gives:

The factors cancel, so the threshold depends mainly on and (via the factor ).

Solving for the welfare-maximizing pool size gives:

  • Exponential (, mean in impact dollars):

  • Fréchet (, scale ; ): with , Here is the Fréchet scale parameter (units of impact per year).

With light tails, there's a finite pool size after which turning up the hype (growing ) destroys net welfare. Every extra applicant burns in foregone public impact while adding that shrinks like .

With heavy tails, it's different: As increases, explodes: in very heavy tails, very wide funnels can still be net positive. We might want, as a society, to widen the recruiting funnel relative to the light-tailed case, even if the average candidate quality is the same; it is worth spending a lot of money to find the unicorns.

Here is in impact dollars/year, converts annual impact to present value, and is in impact dollars (PV).

  • Exponential:

    • .
    • .
  • Fréchet ():

    • .
    • There is not exact form for but the large- asymptotic with .

We set the expected impact per hire per year to (impact dollars/yr) to match expected impact Alice's hypothetical target role, but this is just for exposition; you can change it.

We can, of course, plot this.

  • This plots total net welfare and marks the maximum for each family, showing where total welfare peaks. The dashed line at shows where failures begin: people each impose public cost . The markers show , the pool size where widening further would reduce total impact.
  • Units: is in impact dollars per year and is converted to PV by multiplying by . The subtraction uses the discounted per-failure cost .
  • Fréchet curves use the large- asymptotic (with ). We could work harder to get exact for Fréchet, but the asymptotic is good enough to illustrate the qualitative behaviour.
  • We treat all future uncertainties about role duration, turnover, or project lifespan as already captured in the overall discount rate .

We can combine these perspectives to visualize the tension between private incentives and public welfare.

This visualization combines the private and public views by assuming an illustrative mapping between pool size and success probability: (where bundles screening efficiency; here ). The black curve (left axis) shows the candidate's private EV vs success probability . The colored curves (right axis) show field welfare . The private break-even (black dashed line) can fall far to the left of the field-optimal (colored vertical lines) This gap represents the region where individuals may be rationally (at a field-level) incentivized to enter, even though the field is already saturated or oversaturated (at a candidate level).

None

Without evidence of heavy tails at your current applicant pool , widening the funnel likely increases social loss.

Part C --- Counterfactual Impact and Equilibrium

Part A modeled a "naive" applicant who evaluates their pivot based on the absolute impact () of the role, ignoring pool dynamics. Part B analyzed the field-level optimum (), showing how the marginal value () of an applicant decreases as the pool grows.

Now we tie those together. If Alice is sophisticated, and understands these dynamics and aims to maximize her counterfactual impact, will she make the same choice?

This changes the game, introducing a feedback loop where individual incentives depend on the crowd size (), and the candidate quality distribution .

The Counterfactual Impact Model

If we assume applicants cannot know their quality relative to the pool ex-ante (formally: they are exchangeable), the expected counterfactual impact of the decision to apply is exactly per year.

Alice should use this in her EV calculation. However, the private EV formula (Part A) requires the impact conditional on success, not the ex-ante expected impact due to her decision in isolation.

Let be the expected counterfactual impact conditional on success. Let be the probability of success given the pool size . (In the static model of Part B, with applicants competing for slots, ). If the attempt fails, the counterfactual impact is zero.

We can derive the relationship (see Appendix): Therefore, the impact conditional on success is: We recalibrate the private decision by defining the counterfactual private surplus, , replacing the naive absolute impact with the counterfactual estimate .

This changes the dynamics; previously the gamble's even depended on the pool size only insofar as it affected the per-application success probability and thus the overall success probability . Now the value of the upside also depends on the pool size . As grows, decreases, but also decreases. The behavior of depends on how these balance, which is determined by the tail of the impact distribution.

The Dynamics of Counterfactual Impact

The behavior of leads to different implications depending on whether the recruitment pool is light-tailed or heavy-tailed.

Case 1: Light Tails

In a light-tailed recruitment pool (where applicants are relatively similar), the math reveals (See Appendix) that the expected counterfactual impact conditional on success, , is constant and equal to the population average impact (), regardless of how crowded the field is (). Intuition: While a larger pool increases the quality of the very best hire, it also increases the quality of the person they displace. In the stylized light-tailed model (Exponential distribution), these effects perfectly cancel out. More generally, in light-tailed talent pools, the gap between the hire and the displaced candidate does not grow significantly with the pool size.

Implication: If the average impact is modest and the candidate skills are relatively evenly distributed, pivots involving significant pay cuts are likely negative EV for the average applicant, no matter the pool size.

Case 2: Heavy Tails

In a heavy-tailed model, "unicorns" are hiding in the recruiting pool. Here, increases as the field gets more crowded (), and undert certain assumptions can increase fast enough to offset the costs of all the sabbaticals and foregone donations etc. For Fréchet with shape , grows proportionally to . Intuition: As increases, the expected quality of the top candidates increases significantly faster than the quality of the candidates they displace. Success in a large pool is a strong signal that you are likely a high-impact individual, and the gap between you and the displaced candidate is large.

Implication: In a heavy-tailed world, the pivot can become highly attractive if the field is sufficiently crowded, even with significant pay cuts.

Alice Revisited

Alice revisited. With light‑tailed assumptions, equals the population mean and is too small to offset Alice's pay cut and lost donations---her counterfactual surplus is negative regardless of . Under heavy‑tailed assumptions, rises with ; under a broad range of conditions, the pivot can become attractive despite large pay cuts (i.e. if Alice truly might be a unicorn). The sign and size of this effect hinge on the tail parameter and scale, which are currently unmeasured.

Visualizing Private Incentives vs. Public Welfare

We can now visualize the dynamics of private, public and counterfactual private valuations together by assuming an illustrative mapping between pool size and success probability: . This allows us to see how the incentives change as the field gets more crowded (moving left on the x-axis).

This visualization combines all three perspectives using Alice's parameters. There are a lot of lines and a lot of assumptions wrapped up in this one plot. The main takeaway is that if you care about solving the problem, for many variants of this model, in a trade-off of pivoting versus donating, you should probably donate. The only exception is if you believe the talent pool is very heavy-tailed (Fréchet with ); in which case if you are one of those unicorns you should probably pivot. Otherwise, donating is likely to have higher expected impact.

  • Left Axis (Private EV):

    • A (Black Solid): The naive applicant's EV (Part A). Crosses zero at the naive break-even .
    • C (Colored Dashed): The sophisticated applicant's EV (Part C), using counterfactual impact . Where these curves cross zero defines the equilibrium .
  • Right Axis (Public Welfare):

    • B (Colored Solid): The field's total welfare (Part B). The peak defines the social optimum .
  1. The Information Gap (A vs C): 1. The Information Gap (A vs C): The Naive EV (Black) is significantly higher than the Counterfactual EV (Colored Dashed) across most of the range. Applicants relying on naive valuations of the impact of a career pivot (using personal impact change rather than rather than counterfactual impact change ) will drastically overestimate their counterfactual impact, and thus the expected value of the pivot.

  2. The Impact of Costs vs. Tails:

    • In light-tailed talent pools (Exponential, Purple; Fréchet , Red), the Counterfactual EV is always negative. Alice's 78k financial loss dominates the expected impact. The equilibrium is minimal (), leading to Under-Entry relative to the optimum ().
    • In heavy-tailed talent pools (Fréchet , Green; , Orange), the dynamics change dramatically.
  3. Complex Dynamics in Heavy Tails (The "Hump Shape"): For heavy tails (Green, Orange dashed lines), the Counterfactual EV is non-monotonic. It starts positive, increases as grows (because increases rapidly), and eventually decreases as the success probability drops too low.

  4. The Structural Misalignment (B vs C): In heavy-tailed talent pools, the equilibrium is vastly larger than the optimum . The efficient search process (high ) means the private cost of trying is low, incentivizing entry long past the social optimum. This leads to massive over-entry. (For example, in the case, is around 30k, while is over 400k).

This visualization confirms the analysis: the system's calibration is highly sensitive to the tail distribution and private costs. Depending on the parameters, the system can structurally incentivize either severe under-entry or massive over-entry, even if applicants are sophisticated.

Equilibrium vs. Optimum

This feedback mechanism---where incentives depend on ---creates a natural equilibrium. Applicants will enter until the EV for the marginal entrant is zero. This defines the equilibrium candidate pool size, .

To analyze this, we must reintegrate the counterfactual surplus into the dynamic search model (Part A). We assume the pool size determines the surplus and the per-application success probability . The equilibrium occurs when the expected gain rate equals the burn rate (the bracketed term in the EV formula is zero): Does this equilibrium align with the socially optimal pool size (Part B)?

Generally, no. The alignment depends on the comparison between private costs (), social costs (), and the efficiency of the job search process ().

Search Efficiency

The equilibrium condition reveals a dependence on the application rate . We can rewrite the equilibrium condition as: The left side is the expected counterfactual surplus per application attempt. The right side, , represents the effective private cost hurdle per application attempt (scaled by the discount rate).

If the job search process is highly efficient (high ), the private cost hurdle is low. This encourages individuals to enter even when the expected counterfactual impact per application is small, because trying is cheap.

We can compare this private incentive to the social optimum. As derived in Appendix C.3, if the private cost hurdle () is significantly lower than the social cost of failure (related to ), the system structurally leads to Over-Entry ().

Let's check Alice's numbers: . The private cost hurdle is . The social cost rate (foregone donations) is .

Since , the system strongly favors over-entry. The efficiency of the search process dramatically lowers the private barrier to entry compared to the social costs incurred.

For example, in the heavy-tailed case (), we might find , while .

Implications and Solutions

The analysis suggests the AI safety field may be oversubscribed. The core problem is misalignment: organizations influencing the funnel size do not internalize the costs borne by unsuccessful applicants. This incentivizes maximizing application volume (a visible proxy) rather than welfare-maximizing matches---a classic setup for Goodhart's Law.

A healthy field can rationally accept high individual failure rates if it measures and communicates the odds. An unmeasured one turns the same logic into waste. The ethical burden shifts when the system knowingly asks people to take low-probability gambles without making that explicit.

For Individuals: Knowing the Game

For mid-career individuals, the decision is high-stakes. (For early-career individuals, costs are lower, making the gamble more favorable, but the need to estimate remains.)

  1. Calculate your threshold (): Use the model in Part A (and the linked calculator). Without strong evidence that , a pivot involving significant unpaid time is likely EV-negative.
  2. Seek cheap signals: Seek personalized evidence of fit---such as applying to a few roles before leaving your current job---before committing significant resources.
  3. Use grants as signals: Organizations like Open Philanthropy offer career transition grants. These serve as information gates. If received, a grant lowers the private cost (). If denied, it is a valuable calibration signal. If a major funder declines to underwrite the transition, candidates should update downwards. (If you don't get that Open Phil transition grant, don't quit your current job.)

For Organizations: Transparency and Feedback

Employers and advice organizations control the information flow. Until they provide evidence-based estimates of success probabilities, their generic encouragement should be treated with skepticism.

  1. Publish stage-wise acceptance rates (Base Rates). Employers must publish historical data (applicants, interviews, offers) by track and seniority. This is the single most impactful intervention for anchoring .
  2. Provide informative feedback and rank. Employers should provide standardized feedback or an indication of relative rank (e.g., "top quartile"). This feedback is costly, but this cost must be weighed against the significant systemic waste currently externalized onto applicants and the long-term credibility of the field.
  3. Track advice calibration. Advice organizations should track and publish their forecast calibration (e.g., Brier scores) regarding candidate success. If an advice organization does not track outcomes, its advice cannot be calibrated except by coincidence.

For the Field: Systemic Calibration

To optimize the funnel size, the field needs to measure costs and impact tails.

  1. Estimate applicant costs (). Advice organizations or funders should survey applicants (successful and unsuccessful) to estimate typical pivot costs.
  2. Track realized impact proxies. Employers should analyze historical cohorts to determine if widening the funnel is still yielding significantly better hires, or if returns are rapidly diminishing.
  3. Experiment with mechanism design. In capacity-constrained rounds, implementing soft caps---pausing applications after a certain number---can reduce applicant-side waste without significantly harming match quality [@Horton2024Reducing].

Where next?

I'd like feedback from people deeper in the AI safety career ecosystem. I'd love to chat with people from 80,000 Hours, MATS, FHI, CHAI, Redwood Research, Anthropic, etc., about this. What is your model about the candidate impact distribution, the tail behaviour, and the costs? What have I got wrong? What have I missed? I'm open to the possibility that this is well understood and being actively managed behind the scenes, but I haven't seen it laid out this way anywhere.

Further reading

Resources that complement the mechanism-design view of the AI safety career ecosystem:

Appendix: Mathematical Details

Part A: Private Decision Model Derivations

We model the career pivot attempt as a continuous-time process during a sabbatical of maximum length .

Setup:

  • Job opportunities arrive as a Poisson process with rate .
  • Per-application success probability is (i.i.d.).
  • The success process is a Poisson process with rate .
  • The time to the first success is .
  • The actual sabbatical duration is the stopping time .
  • The continuous discount rate is .
  • The annual utility surplus if the pivot succeeds is .
  • The burn rate during the sabbatical is .

Sabbatical Duration and Success Statistics

The probability of success within the runway is: We calculate the expected duration using the survival function . For , if and only if no success has occurred by time , so . for . The expected duration conditional on success, , uses the truncated exponential distribution. The PDF of conditional on is for . Using integration by parts:

Derivation of the Expected Present Value ()

The expected value of the pivot attempt is the expected discounted benefit minus the expected discounted cost.

Expected Discounted Benefit (): If success occurs at time , the benefit is the present value (PV) of the stream starting at : . The expectation is taken over the time of success , up to the runway limit , using the density :

Expected Discounted Cost (): The cost is incurred at rate during the sabbatical . We calculate . We swap expectation and integration (by Fubini's theorem, as the integrand is positive): Using the survival function for :

Total Expected Value: Factoring out the common term (the expected discounted duration) and substituting :

Break-even Probability ()

The EV is zero when the bracketed term is zero (since the prefactor is strictly positive).

Part B: Field-Level Model Derivations

We analyze the field-level optimum using a public ledger in impact dollars.

Setup:

  • applicants, seats. Impacts i.i.d.
  • Hires are the top order statistics: .
  • Expected annual benefit .
  • Marginal value .
  • Social discount rate .
  • Social burn rate (foregone public impact) .

We do not model congestion costs. Generally, employers who have filled a given role can just ignore excess applications, and there's a lot of evidence to suggest that they do so [@Horton2017Digital; @Horton2021Jobseekers; @Horton2024Reducing].

Welfare Function and Optimality

Present Value Horizon: .

PV of Failed Attempt: Assuming a failed attempt uses the full runway (this simplifies the calculation of the marginal cost of an additional applicant):

Total Welfare (): The welfare-maximizing pool size occurs (for ) when the marginal benefit equals the marginal cost. Adding one applicant adds exactly one expected failure, so the marginal cost is . Substituting the expressions and canceling :

Distribution-Specific Results

We solve for based on the behavior of for different distributions .

Exponential Distribution (Light Tail)

Let , with mean . The expected sum of the top order statistics out of draws has a known closed form, often derived using the Rényi representation of exponential spacings: where is the -th harmonic number.

Marginal Value: Returns diminish hyperbolically ().

Optimal Pool Size : Setting equal to the marginal social cost :

Fréchet Distribution (Heavy Tail)

Let , with shape (required for finite mean) and scale . We use asymptotic results from Extreme Value Theory for large . The expected sum of the top scales as : where is a constant independent of and :

Marginal Value: We approximate the marginal value using the derivative of the asymptotic expression: Returns diminish as a power law (), slower than Exponential.

Optimal Pool Size : Setting equal to the marginal social cost and solving for : As (heavier tails), the exponent , causing to explode.

Appendix C: Counterfactual Impact and Equilibrium Derivations

Derivation of

We want to relate the Marginal Value of entry () to the expected counterfactual impact conditional on success ().

is the expected increase in total field impact when an applicant joins the pool (moving from to applicants). Let be the event of success (being hired). is the probability of success . We assume applicants are exchangeable.

By the law of total expectation: If the applicant enters and fails (), their counterfactual impact is 0. The expected impact conditional on success is .

Therefore: Assuming exchangeable applicants competing for slots in a pool of , .

C.2 for Distributions

Exponential (Light Tail): Mean impact . From Appendix B, . The expected counterfactual impact conditional on success is constant.

Fréchet (Heavy Tail): From Appendix B, (asymptotically for large K). . The expected counterfactual impact conditional on success grows with the pool size .

C.3 Equilibrium Condition and Misalignment

The equilibrium occurs when the private EV, using the counterfactual surplus , is zero. This happens when the bracketed term in the EV formula (Part A) is zero: The RHS, , is the effective private cost hurdle per application attempt.

We compare this to the social optimality condition , defined in Part B. For simplicity, we approximate the social cost of failure (assuming large ) and set . The optimality condition becomes:

Analyzing Over/Under Entry:

To illustrate the misalignment, consider a simplified case where private financial losses (pay cuts) are negligible, so . Also assume the per-application success rate approximates the overall success probability .

In this case, .

The private equilibrium condition simplifies to: . The social optimum condition remains: .

Since is decreasing in , (Over-Entry) occurs if the private threshold is lower than the social threshold: This happens when the private cost hurdle per attempt is less than the social cost rate of failure. As shown in the main text using Alice's parameters (0.69k vs 18k), this inequality often holds strongly, indicating a structural tendency toward over-entry even with sophisticated, counterfactual reasoning.

References


  1. To be consistent we need to take this to be a local linear approximation at your current wage and impact level; so we are implicitly looking at marginal utility. ↩︎

19

0
0

Reactions

0
0

More posts like this

Comments6
Sorted by Click to highlight new comments since:

JD from Christians for Impact has recently been posting about the downside risks of unsuccessful pivots, which reminded me of this post. Thank you for taking the effort to write this up; I've shared it with advisors in my network.

I wonder why this hasn't attracted more upvotes - seems like a very interesting and high-effort post!

Spitballing - I guess there's such a lot of math here that many people (including me) won't be able to fully engage with the key claims of the post, which limits the surface area of people who are likely to find it interesting.

I note that when I play with the app, the headline numbers don't change for me when I change the parameters of the model. May be a bug?

Ah that's why it's for draft amnesty week ;-) Somewhere inside this dense post there is a simpler one waiting to get out, but I figured this was worth posting. Right now it is in the form of ”my own calculations for myself” and it’s not that comprehensible nor the model of good transdisiplinary communication to which I aspire. I'm trying to collaborate with a colleague of mine to write that shorter version.  (And to improve the app. Thanks for the bug report @Henry Stanley 🔸 !)

I guess I should flag that I'm up for collaborations and this post can be edited on github including the code to generate the diagrams, so people should feel free to dive in and improve this

One thing that occurs to me (as someone considering a career pivot) is the case of who someone isn't committed to a specific cause area. Here you talk about someone who is essentially choosing between EtG for AI safety or doing AI safety work directly.

But in my case, I'm considering a pivot to AI safety from EtG - but currently I exclusively support animal welfare causes when I donate. Perhaps this is just irrational on my part. My thinking is that I'm unlikely, given my skillset, to be any good at doing direct work in the animal welfare space, but consider it the most important issue of our time. I also think AI safety is important and timely but I might actually have the potential to work on it directly, hence considering the switch.

So in some cases there's a tradeoff of donations foregone in one area vs direct work done in another, which I guess is trickier to model.

Yes, I sidestepped the details of relative valuation entirely here by collapsing the calculation of “impact” into “donation-equivalent dollars.” That move smuggles in multiple subjective factors — specifically, it incorporates a complex impact model and a private valuation of impacts. We’ll all have different “expected impacts,” insofar as anyone thinks in those terms, because we each have different models of what will happen in the counterfactual paths, not to mention differing valuations of those outcomes.

One major thing I took away from researching this is that I don’t think enough about substitutability when planning my career (“who else would do this?”), and I suppose part of that involves modelling comparative advantage. This holds even relative to my private risk/reward model. But thinking in these terms isn’t natural: my estimated impact in a cause area depends on how much difference I can make relative to others who might do it — which itself requires modelling the availability and willingness of others to do each thing.

Another broader philosophical question worth unpacking is whether these impact areas are actually fungible. I lean toward the view that expected value reasoning makes sense at the margins (ultimately, I have a constrained budget of labour and capital, and I must make a concrete decision about how to spend it — so if Bayes didn’t exist, I’d be forced to invent him). But I don’t think it is a given that we can take these values globally seriously, even within an individual. Perhaps animal welfare and AI safety involve fundamentally different moral systems and valuations?

Still, substitutability matters at the margins. If you move into AI safety instead of animal welfare, ideally that would enable someone else — with a better match to animal welfare concerns — to move into AI safety despite their own preferences. That isn’t EtG per se, but it could still represent a globally welfare-improving trade in the “impact labour market.”

If we take that metaphor seriously, though, I doubt the market is very efficient. Do we make these substitution trades as much as we should? The labour market is already weird; the substitutability of enthusiasm and passion is inscrutable; and the transactions are costly. Would it be interesting or useful to make it more efficient somehow? Would we benefit from better mechanisms to coordinate on doing good — something beyond the coarse, low-bandwidth signal of job boards? What might that look like?

Curated and popular this week
Relevant opportunities