Thank you David, for an important and well-argued piece. The "brain misallocation" trap is a concept that resonates deeply for me.
I previously worked in an HQ role at MSF, and I can confirm that the central tensions you highlight from 2003 are not only real but continue to be a subject of active, near constant, and often painful debate within the organization.
Your post and Ivan’s account captures two of the most difficult ethical binds that organizations like MSF face:
1. Pay Inequity: The challenge of "equity" between internationally-hired and locally-hired staff is a massive, unresolved issue. It's a challenging knot to untangle. If you pay a global "fair" wage, you completely distort the local market, becoming the "trap" you describe. If you pay purely local wages, you struggle to deploy specialized international staff, and you create a deeply visible and demoralizing "caste" system within the team. This isn't an excuse, but it's a constant source of moral injury and operational challenge.
2. The "Slippery Slope": Your post also hits on the tension between emergency and development. MSF's identity is built on emergency response: show up, treat the cholera (or malaria, or measles, or…) patch the wounds, and leave. The problem is that you often come for the emergency but stay because there is no or a poorly functioning health system. This is the true "slippery slope." The team and support on the ground often can't ethically leave, there is no option for a transfer of care, so they know people will die the day they pack up. But by staying, the organization becomes a parallel or de facto health system, which does contribute to the long-term distortions and dependencies the original post critiques.
This is where I think we need to draw a distinction, and where I find an EA-informed justification for the emergency component of this work.
The critique of "brain misallocation" is most powerful when applied to the development sector, which operates in relatively stable (even if low resource) environments. In that context, long-term, systemic change is the goal, and misallocating the best local minds to aid bureaucracy is a crucial (and often neglected) negative externality.
However, "pure" emergency work (like in an active conflict, a natural disaster, or a major outbreak) operates under a different ethical framework.
• Tractability & Neglectedness: In a true emergency, the "local system" is not just weak; it it may be relatively non-existent, overwhelmed, or actively hostile. The counterfactual is not "this nurse would be working for the local government." The counterfactual may be "this nurse would be a refugee, dead, or working with no supplies or medicine."
• The Triage Model: The value proposition of an organization like MSF is not systemic change; it is immediate, tractable, and measurable harm reduction. It is a triage model. The goal is not to fix the country's economy or governance (the needed underlying solutions). The goal is to stop this specific person from dying of this specific bullet wound or this specific case of cholera, today.
From an EA perspective, this is a powerful justification. The intervention is highly tractable (we know how to treat cholera) and serves a highly neglected population (those who will die in the next 24 hours without intervention).
The moral and strategic tragedy is when the triage (emergency) is forced to become the long-term ward (development) because the "underlying solutions" - stable governance, peace, infrastructure - are so often in retreat.
This doesn't invalidate the original post's critique. In fact, it reinforces it. The "Trap" is what happens when the emergency response model is incorrectly or indefinitely applied to a development problem. It highlights the immense difficulty of working in a world where the problems are so deep that even the solutions can cause harm.
P.S. I also want to humbly acknowledge that critiques of aid's unintended consequences are not new, and are most powerfully articulated by economists with direct experience in the Global South. This is a central thesis of Dambisa Moyo’s "Dead Aid," where she specifically details how the aid industry can siphon talent and distort local markets and governance.
Yarrow, thank you for this sharp and clarifying discussion.
You have completely convinced me that my earlier arguments from "investment as a signal" or "LHC/Pascal's Wager" were unrigorous, and I concede those points.
I think I can now articulate my one, non-speculative crux.
The "so what" of Toby Ord's (excellent) analysis is that it provides a perfect, rigorous, "hindsight" view of the last paradigm—what I've been calling "Phase 1" RL for alignment.
My core uncertainty isn't speculative "what-if" hope. It's that the empirical ground is shifting.
The very recent papers we discussed (Khatri et al. on the "art" of scaling, and Tan et al. on math reasoning) are, for me, the first public, rigorous evidence for a "Phase 2" capability paradigm.
• They provide a causal mechanism for why the old, simple scaling data may be an unreliable predictor.
• They show this "Phase 2" regime is different: it's not a simple power law but a complex, recipe-dependent "know-how" problem (Khatri), and it has different efficiency dynamics (Tan).
This, for me, is the action-relevant dilemma.
We are no longer in a state of "pure speculation". We are in a state of grounded, empirical uncertainty where the public research is just now documenting a new, more complex scaling regime that the private labs have been pursuing in secret.
Given that the lead time for any serious safety work is measured in years, and the nature of the breakthrough is a proprietary, secret "recipe," the "wait for public proof" strategy seems non-robust.
That's the core of my concern. I'm now much clearer on the crux of the argument, and I can't thank you enough for pushing me to be more rigorous. This has been incredibly helpful, and I'll leave it there.
Yarrow, these are fantastic, sharp questions. Your “already accounted for” point is the strongest counter-argument I’ve encountered.
You’re correct in your interpretation of the terms. And your core challenge—if LLM reward models and verifiable domains have existed for ~3 years, shouldn’t their impact already be visible?—is exactly what I’m grappling with.
Let me try to articulate my hypothesis more precisely:
The Phase 1 vs Phase 2 distinction:
I wonder if we’re potentially conflating two different uses of RL that might have very different efficiency profiles:
1. Phase 1 (Alignment/Style): This is the RLHF that created ChatGPT—steering a pretrained model to be helpful/harmless. This has been done for ~3 years and is probably what’s reflected in public benchmark data.
2. Phase 2 (Capability Gains): This is using RL to make models fundamentally more capable at tasks through extended reasoning or self-play (e.g., o1, AlphaGo-style approaches).
My uncertainty is: could “Phase 2” RL have very different efficiency characteristics than “Phase 1”?
Recent academic evidence:
Some very recent papers seem to directly address this question:
• A paper by Khatri et al., "The Art of Scaling Reinforcement Learning Compute for LLMs" (arXiv: 2510.13786), appears to show that simple RL methods do hit hard performance ceilings (validating your skepticism), but that scaling RL is a complex “art.” It suggests a specific recipe (ScaleRL) can achieve predictable scaling. This hints the bottleneck might be “know-how” rather than a fundamental limit.
• Another paper by Tan et al., "Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning" (arXiv: 2509.25300), on scaling RL for math found that performance is more bound by data quality (like from verifiable domains) than just compute, and that larger models are more compute- and sample-efficient at these tasks.
Why this seems relevant:
This research suggests “Phase 1” RL (simple, public methods) and “Phase 2” RL (complex recipes, high-quality data, large models) might have quite different scaling properties.
This makes me wonder if the scaling properties from prior RL research might not fully capture what’s possible in this new regime: very large models + high-quality verifiable domains + substantial compute + the right training recipe. Prior research isn’t irrelevant, but perhaps extrapolation from it is unreliable when the conditions are changing this much?
If labs have found (or are close to finding) these “secret recipes” for scalable RL, that could explain continued capital investment from well-informed actors despite public data showing plateaus.
The action-relevant dilemma:
Even granting the epistemic uncertainty, there seems to be a strategic question: Given long lead times for safety research, should researchers hedge by preparing for RL efficiency improvements, even if we can’t confidently predict them?
The asymmetry: if we wait for public evidence before starting safety work, and RL does become substantially more efficient (because a lab finds the right “recipe”), we’ll have even less lead time. But if we prepare unnecessarily, we’ve misallocated resources.
I don’t have a clean answer to what probability threshold for a potential breakthrough justifies heightened precautionary work. But the epistemic uncertainty itself—combined with some papers suggesting the scaling regime might be fundamentally different than assumed—makes me worry whether we’re evaluating the efficiency of propellers while jet engines are being invented in private.
Does this change your analysis at all, or do you think the burden of proof still requires more than theoretical papers about potential scaling regimes?
YB, thank you for the pushback. You’ve absolutely convinced me that my “science vs. engineering” analogy was unrigorous, and your core point about extrapolating a trend by assuming a new causal factor will appear is the correct null hypothesis to hold.
What I’m still trying to reconcile, specifically regarding RL efficiency improvements, is a tension between what we can observe and what may be hidden from view.
I expect Toby’s calculations are 100% correct. Your case is also rigorous and evidence-based: RL has been studied for decades, PPO (2017) was incremental, and we shouldn’t assume 10x-100x efficiency gains without evidence. The burden of proof is on those claiming breakthroughs are coming.
But RL research seems particularly subject to information asymmetry:
• Labs have strong incentives to keep RL improvements proprietary (competitive advantage in RLHF, o1-style reasoning, agent training)
• Negative results rarely get published (we don’t know what hasn’t worked)
• The gap between “internal experiments” and “public disclosure” may be especially long for RL
We’ve seen this pattern before - AlphaGo’s multi-year information lag, GPT-4’s ~7-month gap. But for RL specifically, the opacity seems greater. OpenAI uses RL for o1, but we don’t know their techniques, efficiency gains, or scaling properties. DeepMind’s work on RL is similarly opaque.
This leaves me uncertain about future RL scaling specifically. On one hand, you’re right that decades of research suggest efficiency improvements are hard. On the other hand, recent factors (LLMs as reward models, verifiable domains for self-play, unprecedented compute for experiments) combined with information asymmetry make me wonder if we’re reasoning from incomplete data.
The specific question: Does the combination of (a) new factors like LLMs/verifiable domains, plus (b) the opacity and volume of RL research at frontier labs, warrant updating our priors on RL efficiency? Or is this still the same “hand-waving” trap - just assuming hidden progress exists because we expect the trend to continue?
On the action-relevant side: if RL efficiency improvements would enable significantly more capable agents or self-improvement, should safety researchers prepare for that scenario despite epistemic uncertainty? The lead times for safety work seem long enough that “wait and see” may not be viable.
For falsifiability: we should know within 18-24 months. If RL-based systems (agents, reasoners) don’t show substantial capability gains despite continued investment, that would validate skepticism. If they do, it would suggest there were efficiency improvements we couldn’t see from outside.
I’m genuinely uncertain here and would value a better sense of whether the information asymmetry around RL research specifically changes how we should weigh the available evidence?
Thank you, Toby et al., for this characteristically clear and compelling analysis and discussion. The argument that RL scaling is breathtakingly inefficient and may be hitting a hard limit is a crucial consideration for timelines.
This post made me think about the nature of this bottleneck, and I'm curious to get the forum's thoughts on a high-level analogy. I'm not an ML researcher, so I’m offering this with low confidence, but it seems to me there are at least two different "types" of hard problems.
1. A Science Bottleneck (Fusion Power): Here, the barrier appears to be fundamental physics. We need to contain a plasma that is inherently unstable at temperatures hotter than the sun. Despite decades of massive investment and brilliant minds, we can't easily change the underlying laws of physics that make this so difficult. Progress is slow, and incentives alone can't force a breakthrough.
2. An Engineering Bottleneck (Manhattan Project): Here, the core scientific principle was known (nuclear fission). The barrier was a set of unprecedented engineering challenges: how to enrich enough uranium, how to build a stable reactor, etc. The solution, driven by immense incentives, was a brute-force, parallel search for any viable engineering path (e.g., pursuing gaseous diffusion, electromagnetic separation, and plutonium production all at once).
This brings me back to the RL scaling issue. I'm wondering which category this bottleneck falls into.
From the outside, it feels more like an engineering or "Manhattan Project" problem. The core scientific discovery (the Transformer architecture, the general scaling paradigm) seems to be in place. The bottleneck Ord identifies is that one specific method (RL - likely PPO based) is significantly compute-inefficient and hard to continue scaling.
But the massive commercial incentives at frontier labs aren't just to make this one inefficient method 1,000 or 1,000,000x bigger. The incentive is to invent new, more efficient methods to achieve the same goal or similar.
We've already seen a small-scale example of this with the rapid shift from complex RLHF to the more efficient Direct Preference Optimization (DPO). This suggests the problem may not be a fundamental "we can't continue to improve models" barrier, but an engineering one: "this way of improving models is too expensive and unstable."
If this analogy holds, it seems plausible that the proprietary work at the frontier isn't just grinding on the inefficient RL problem, but is in a "Manhattan"-style race to find a new algorithm or architecture that bypasses this specific bottleneck.
This perspective makes me less confident that this particular bottleneck will be the one that indefinitely pushes out timelines, as it seems like exactly the kind of challenge that massive, concentrated incentives are historically good at solving.
I could be completely mischaracterizing the nature of the challenge, though, and still feel quite uncertain. I'd be very interested to hear from those with more technical expertise if this framing seems at all relevant or if the RL bottleneck is, in fact, closer to a fundamental science or "Fusion" problem.
The “depopulation bad” framing - while helpful for engagement - misses key longtermist concerns in my opinion. The real question isn’t just how many people exist—but whether humanity (and other life) can flourish sustainably within planetary boundaries.
We’re already in ecological overshoot, degrading biosphere systems essential to all sentient life. Climate change is just one facet of a more complex set of systems facing challenges. A smaller, well-supported population—achieved via voluntary, rights-based policies—could reduce existential risk by stabilizing Earth’s life-support systems, supporting biodiversity, and improving welfare per capita.
Yes, demographic decline poses economic and institutional challenges. But these are solvable. Civilizational collapse from ecological breakdown is not.
Optimizing for total population without sufficient ecological resilience risks long-term value. We should aim for a population trajectory that preserves planetary habitability over the long run.
Thanks for the discussion!
M
Hi all,
Thank you, Allegra for the well presented initial post, and for constructive replies. I thought I would share how I’ve been grappling with this standard EA dilemma regarding donations to acute humanitarian crises (as we see now in Sudan). The default position is that while morally urgent, these donations may be 1-2 orders of magnitude less cost-effective (in $ per life saved) than top GiveWell picks. This often leads to a “head vs. heart” framework, where one might allocate a 10-20% “moral” portion to crisis relief.
However, in thinking this through, I believe this binary view misses several distinct, high-impact frameworks that are defensible from an EA perspective.
1. The Middle Ground: Anticipatory Action (AA)
The first and most obvious “fix” is to reframe disaster aid as proactive rather than reactive. This is the “Anticipatory Action” (or Early Warning, Early Action) model.
This framework applies most significantly to predictable shocks (floods, droughts) and represents investment rather than mere relief. The data on its cost-effectiveness is compelling:
High ROI: Multiple agencies, including the UN’s Food and Agriculture Organization (FAO) and the World Food Programme (WFP), have demonstrated that every $1 invested in anticipatory action can save over $7 in avoided losses and added benefits for beneficiaries (e.g., protecting assets, avoiding debt, and reducing the need for costly emergency food aid) [1, 2].
A Proven Case (Somalia 2011 vs. 2017): This is a powerful A/B test. In 2011, a delayed, reactive response to famine warnings saw over 260,000 deaths. In 2017, similar drought forecasts were met with a large-scale early response, which successfully averted a full-blown famine [3].
For predictable crises, AA seems to be a high-CE, evidence-based intervention that fits comfortably within EA frameworks.
1. The “Tourniquet” Framework for Protracted Crises
This is the point I’m most focused on, as it applies to complex, ongoing crises like Sudan, where the “anticipation” window may seem to have passed.
My concern is that even in a full-blown crisis, things can always get worse. The crisis “cascades.” An initial conflict (a security crisis) triggers a displacement crisis, which triggers a health system collapse, which triggers a cholera epidemic and a food security crisis (IPC Phase 4).
The “tourniquet” framework argues that humanitarian aid in this context, while reactive to the initial shock, is critically preventative of the next, worse cascade.
This is not a hypothetical. This is what is happening in Sudan right now:
Preventing Mass Famine: As of September 2024, the Integrated Food Security Phase Classification (IPC) has confirmed that Famine (IPC Phase 5) is already occurring in towns like El Fasher and Kadugli. Over 21 million people are in Crisis (IPC Phase 3+) [4]. Aid from the WFP is not just “reacting” to hunger; it may be the only tourniquet preventing the 375,000+ people currently in Catastrophe (IPC 5) and the millions in Emergency (IPC 4) from cascading into mass starvation.
Preventing Epidemic Collapse: As of late 2024, Sudan is facing one of the world’s worst cholera outbreaks, with over 120,000 suspected cases [5]. With 70-80% of health facilities non-functional, UNICEF and WHO’s work to provide oral rehydration salts and clean water is not just “reacting” to cholera; it is applying the tourniquet to prevent an uncontrolled epidemic from killing tens of thousands.
I recognize the cost-effectiveness evidence here is less precise than for GiveWell interventions. But in crisis contexts, waiting for perfect data is itself a choice with costs. The cascade logic—preventing famine from becoming mass starvation, preventing cholera from becoming uncontrolled epidemic—is structurally high-impact even when exact figures are elusive. Demanding Randomized Controlled Trial level evidence for crisis response is effectively a decision to not act, which itself has an implicit CE assumption baked in.
1. The Portfolio Risk-Return Framework
It may be worth thinking about crisis giving in terms of portfolio theory. Different interventions have different risk-return profiles:
GiveWell charities: Steady, reliable impact with strong evidence (high certainty, proven returns)
Anticipatory Action: Strong expected value with good evidence (7:1 ROI, growing evidence base)
Crisis “tourniquet”: Higher variance, but with significant upside potential and bounded downside
In crisis response, the downside is bounded—even if Sudan aid is “only” as cost-effective as a standard charity, you’ve still saved lives. But the upside is significant: if you successfully prevent cascade collapse (famine to epidemic to state failure), you may have 10-100x impact. Focusing on reputable organizations (MSF, IRC, UNICEF) with operational excellence increases the probability of realizing those high returns.
A sophisticated giving portfolio can include high-certainty interventions alongside some allocation to higher-variance, higher-upside opportunities. This isn’t abandoning EA principles; it’s applying them with more nuance.
1. Portfolio Sustainability and Donor Engagement
Finally, there’s a meta-consideration: donor sustainability. Allocating 10-20% to emotionally engaging, tangible crises—even if the CE evidence is less certain—may sustain long-term commitment to effective giving. If this prevents burnout and maintains 80% highly effective giving over decades rather than years, the expected value is substantial.
This isn’t “giving in” to emotion; it’s recognizing that sustained impact requires sustainable practice. Humans are the optimization engine in EA, and maintaining that engine matters too. Personal connection to tangible crises may also help prevent the abstract utilitarian failure modes that can come from treating all suffering as mere numbers.
A Revised Framework
This suggests a more nuanced “moral portfolio” for donations:
80% Core (GiveWell): Maximizing CE on chronic, tractable problems
10% High-Impact Crisis (AA): High-CE prevention for acute, predictable problems
10% Urgent “Tourniquet” (Acute Response): High-leverage prevention for ongoing, cascading problems
Portfolio Sustainability: Maintaining long-term donor engagement and impact
From this perspective, donating to a high-quality organization (like MSF, UNICEF, or the IRC) in Sudan isn’t just a low-CE “heart” donation. It’s a defensible allocation that combines cascade prevention logic, portfolio diversification, and long-term sustainability considerations.
EA’s strength is rigorous thinking about impact. But rigor shouldn’t mean rigidity. A sophisticated giving approach can include all of these elements while remaining committed to doing the most good possible.
Sources
[1] FAO (Food and Agriculture Organization). (2023). “Impact of disasters on agriculture and food 2023.” Cites cost-benefit ratios of up to 7.1 (i.e., $1 saves $7.10) for anticipatory action.
<https://www.fao.org/3/cc7900en/online/impact-of-disasters-on-agriculture-and-food-2023/anticipatory-action-interventions.html>
[2] WFP (World Food Programme). (2025). “COP30: How Anticipatory Action helps people prepare.” Notes their first large-scale AA rollout was “at half the cost” of a traditional response.
<https://www.wfp.org/stories/cop30-how-anticipatory-action-helps-people-prepare-extreme-weather-strikes>
[3] Refugees International. (2022). “We Were Warned: Unlearned Lessons of Famine in the Horn of Africa.” Details how the “hard lessons” from the 2011 famine failure were applied to successfully avert famine in 2017-2018.
<https://www.refugeesinternational.org/reports-briefs/we-were-warned-unlearned-lessons-of-famine-in-the-horn-of-africa/>
[4] IPC (Integrated Food Security Phase Classification). (September 2024). “Sudan: Famine confirmed in El Fasher and Kadugli towns.” The official analysis confirming Famine (IPC Phase 5) and detailing that 21.2 million people faced high acute food insecurity (IPC 3+) as of September 2024.
<https://www.ipcinfo.org/ipcinfo-website/countries-in-focus-archive/issue-137/en/>
[5] Wikipedia / Health Authorities. (October 2024). “2024–2025 Sudanese cholera epidemic.” Synthesizes WHO, UNICEF, and Ministry of Health reports, noting over 120,496 cases and 3,368 deaths recorded by mid-October 2024.
<https://en.wikipedia.org/wiki/2024%E2%80%932025_Sudanese_cholera_epidemic>