Hide table of contents

The post "Doing EA Better" contains some critiques of the EA movement's approach to studying and ranking x-risks. These criticisms resonated with me and I wish we paid more attention to them. There were concerns about the original post being quite long and mixing a lot of different topics together, so I decided to extract some relevant sections into a separate post to enable focused discussion.

The original post is, per site policy, available under a Creative Commons BY 4.0 license, so I am excerpting it as permitted by this license.

We need to stop reinventing the wheel

Summary: EA ignores highly relevant disciplines to its main area of focus, notably Disaster Risk Reduction, Futures Studies, and Science & Technology Studies, and in their place attempts to derive methodological frameworks from first principles. As a result, many orthodox EA positions would be considered decades out of date by domain-experts, and important decisions are being made using unsuitable tools.

EA is known for reinventing the wheel even within the EA community. This poses a significant problem given the stakes and urgency of problems like existential risk.

There are entire disciplines, such as Disaster Risk Reduction, Futures Studies, and Science and Technology Studies, that are profoundly relevant to existential risk reduction yet which have been almost entirely ignored by the EA community. The consequences of this are unsurprising: we have started near to the beginning of the history of each discipline and are slowly learning each of their lessons the hard way.

For instance, the approach to existential risk most prominent in EA, what Cremer and Kemp call the “Techno-Utopian Approach” (TUA), focuses on categorising individual hazards (called “risks” in the TUA),[41] attempting to estimate the likelihood that they will cause an existential catastrophe within a given timeframe, and trying to work on each risk separately by default, with a homogenous category of underlying “risk factors” given secondary importance.

However, such a hazard-centric approach was abandoned within Disaster Risk Reduction decades ago and replaced with one that places a heavy emphasis on the vulnerability of humans to potentially hazardous phenomena.[42] Indeed, differentiating between “risk” (the potential for harm), “hazards” (specific potential causes of harm) and “vulnerabilities” (aspects of humans and human systems that render them susceptible to the impacts of hazards) is one of the first points made on any disaster risk course. Reducing human vulnerability and exposure is generally a far more effective method of reducing risk posed by a wide variety of hazards, and far better accounts for “unknown unknowns” or “Black Swans”.[43]

Disaster risk scholarship is also revealing the growing importance of complex patterns of causation, the interactions between threats, and the potential for cascading failures. This area is largely ignored by EA existential risk work, and has been dismissed out of hand by prominent EAs.

As another example, Futures & Foresight scholars noted the deep limitations of numerical/probabilistic forecasting of specific trends/events in the 1960s-70s, especially with respect to long timescales as well as domains of high complexity and deep uncertainty[44], and low-probability high-impact events (i.e. characteristics of existential risk). Practitioners now combine or replace forecasts with qualitative foresight methods like scenario planning, wargaming, and Causal Layered Analysis, which explore the shape of possible futures rather than making hard-and-fast predictions. Yet, EA’s existential risk work places a massive emphasis on forecasting and pays little attention to foresight. Few EAs seem aware that “Futures Studies” as a discipline exists at all, and EA discussions of the (long-term) future often imply that little of note has been said on the topic outside of EA.[45]

These are just two brief examples.[46] There is a wealth of valuable insights and data available to us if we would only go out and read about them: this should be a cause for celebration!

But why have they been so neglected? Regrettably, it is not because EAs read these literatures and provided robust arguments against them; we simply never engaged with them in the first place. We tried to create the field of existential risk almost from first principles using the methods and assumptions that were already popular within our movement, regardless of whether they were suitable for the task.[47]

We believe there could be several disciplines or theoretical perspectives that EA, had it developed a little differently earlier on, would recognise as fellow travellers or allies. Instead, we threw ourselves wholeheartedly into the Founder Effect, and in our over-dependence on a few early canonical thinkers (i.e. MacAskill, Ord, Bostrom, Yudkowsky etc.), we thus far lost out on all that they have to offer.

This expands to a broader question: if we were to reinvent (EA approaches to) the field of Existential Risk Studies from the ground up, how confident are we that we would settle on our current way of doing things?

The above is not to say that all views within EA ought to always reflect mainstream academic views; there are genuine shortcomings to traditional academia. However, the sometimes hostile attitude EA has to academia has hurt our ability to listen to its contributions as well as those of experts in general.


On the hasty prioritization of AI risk and biorisk

OpenPhil’s global catastrophic risk/longtermism funding stream is dominated by two hazard-clusters – artificial intelligence and engineered pandemics[56] – with little affordance given to other aspects of the risk landscape. Even within this, AI seems to be seen as “the main issue” by a wide margin, both within OpenPhil and throughout the EA community.

This is a problematic practice, given that, for instance:

The prioritisation relies on questionable forecasting practices, which themselves sometimes take contestable positions as assumptions and inputs

There is significant second-order uncertainty around the relevant risk estimates

The ITN framework has major issues, especially when applied to existential risk

It is extremely sensitive to how a problem is framed, and often relies on rough and/or subjective estimates of ambiguous and variable quantities

  • This poses serious issues when working under conditions of deep uncertainty, and can allow implicit assumptions and subconscious biases to pre-determine the result
  • Climate change, for example, is typically considered low-neglectedness within EA, but extreme/existential risk-related climate work is surprisingly neglected
  • What exactly makes a problem “tractable”, and how do you rigorously put a number on it?

It ignores co-benefits, response risks, and tipping points

It penalises projects that seek to challenge concentrations of power, since this appears “intractable” until social tipping points are reached[57]

It is extremely difficult and often impossible to meaningfully estimate the relevant quantities in complex, uncertain, changing, and low-information environments

It focuses on evaluating actions as they are presented, and struggles to sufficiently value exploring the potential action space and increasing future optionality

Creativity can be limited by the need to appeal to a narrow range of grantmaker views[58]

The current model neglects areas that do not fit [neatly] into the two main “cause areas”, and indeed it is arguable whether global catastrophic risk can be meaningfully chopped up into individual “cause areas” at all

A large proportion (plausibly a sizeable majority, depending on where you draw the line) of catastrophic risk researchers would, and if you ask, do, reject[59]:

  • The particular prioritisations made
  • The methods used to arrive at those prioritisations, and/or
  • The very conceptualisation of individual “risks” itself

It is the product of a small homogenous group of people with very similar views

There are important efforts to mitigate some of these issues, e.g. cause area exploration prizes, but the central issue remains.

The core of the problem here seems to be one of objectives: optimality vs robustness. Some quick definitions (in terms of funding allocation):

  • Optimality = the best possible allocation of funds
    • In EA this is usually synonymous with “the allocation with the highest possible expected value”
    • This typically has a unstated second component: “assuming that our information and our assumptions are accurate”
  • Robustness = capacity of an allocation to maintain near-optimality given conditions of uncertainty and change

In seeking to do the most good possible, EAs naturally seek optimality, and developed grantmaking tools to this end. We identify potential strategies, gather data, predict outcomes, and take the actions that our models tell us will work the best.[60] This works great when you’re dealing with relatively stable and predictable phenomena, for instance endemic malaria, as well as most of the other cause areas EA started out with.

However, now that much of EA’s focus has turned on to global catastrophic risk, existential risk, and the long-term future, we have entered areas where optimality becomes fragility. We don’t want most of our eggs in one or two of the most speculative baskets, especially when those eggs contain billions of people. We should also probably adjust for the fact that we may over-rate the importance of things like AI for reasons discussed in other sections

Given the fragility of optimality, robustness is extremely important. Existential risk is a domain of high complexity and deep uncertainty, dealing with poorly-defined low-probability high-impact phenomena, sometimes covering extremely long timescales, with a huge amount of disagreement among both experts and stakeholders along theoretical, empirical, and normative lines. Ask any risk analyst, disaster researcher, foresight practitioner, or policy strategist: this is not where you optimise, this is where you maintain epistemic humility and cover all your bases. Innumerable people have learned this the hard way so we don’t have to.

Thus, we argue that, even if you strongly agree with the current prioritisations / methods, it is still rational for you to support a more pluralist and robustness-focused approach given the uncertainty, expert disagreement, and risk management best-practices involved.

22

0
0

Reactions

0
0

More posts like this

Comments5


Sorted by Click to highlight new comments since:

Thank you for extracting these things!

Ironically, this comment will not be an object-level criticism, and is more a meta-rant.

As someone who believes that the existential risk from AI is significant, and more significant than other existential risks, I am becoming more annoyed that a lot of the arguments for taking AI xrisk less serious are not object-level arguments, but indirect arguments.

If you are worried that EA prioritizes AI xrisk too much, maybe you should provide clear arguments why the chance that advanced AI will kill all of humanity this century is extremely small (eg below 2%). (or provide other arguments like "actually the risk is 10% but there is nothing you can do to improve it").

The following are not object-level arguments: "You are biased", "You are a homogeneous group", "You are taking this author of Harry Potter fanfiction too seriously, please listen to the the people we denote as experts™ instead", "Don't you think, that as a math/CS person, it aligns suspiciously well with your own self-interest to read a 100-page google doc on eliciting latent knowledge in a rundown hotel in northern england instead of working for google and virtuously donate 10% of your income to Malaria bednets?"

Maybe I am biased, but that does not mean I should completely dismiss my object-level beliefs such as my opinion on deceptive Mesaoptimization.

Argue that the Neural Networks that Google will build in 2039 do not contain any Mesaoptimization at all!

Relay the arguments by domain-experts from academic disciplines such as "Futures Studies", and "Science & Technology Studies", so that new EAs can decide by themselves whether they believe the common arguments about orthogonality thesis and instrumental convergence!

Argue that the big tech companies will solve corrigibility on their own, and don't need any help from a homogeneous group of EA nerds!

To be fair, I have seen arguments that say something like "AGI is very unlikely to be build this century". But the fact that some of the readers will have doubts whether these very lines were produced by chatGPT or a human should give you doubt of the position that reaaching human level intelligence with trillions of parameters is impossible in the next 70 years.

I think the problem here is that you are requiring your critics to essentially stay within the EA framework of quantitative thinking and splitting risks, but that framework is exactly what is being criticized

I think something that's important here is that indirect arguments can show that given other approaches, you may come to different conclusions; not just on prioritisation of 'risks'(I hate using that word!), but also on techniques to reduce those as well. For instance, I still think that AI and Biorisk are extremely significant contributors to risk, but probably would take on pretty different approaches to how we deal with this based on trying to consider these more indirect criticisms of the methodologies etc used

Exactly. For example, by looking at vulnerabilities in addition to hazards like AGI and engineered pandemics, we might find a vulnerability that is more pressing to work on than AI risk.

That said, the EA x-risk community has discussed vulnerabilities before: Bostrom's paper "The Vulnerable World Hypothesis" proposes the semi-anarchic default condition as a societal vulnerability to a broad class of hazards.

To be clear, if you make arguments of the form "X is a more pressing problem then AI risk" or "here is a huge vulnerability X, we should try to fix that" then I would consider that an object-level argument, if you actually name X.

Curated and popular this week
 ·  · 10m read
 · 
Regulation cannot be written in blood alone. There’s this fantasy of easy, free support for the AI Safety position coming from what’s commonly called a “warning shot”. The idea is that AI will cause smaller disasters before it causes a really big one, and that when people see this they will realize we’ve been right all along and easily do what we suggest. I can’t count how many times someone (ostensibly from my own side) has said something to me like “we just have to hope for warning shots”. It’s the AI Safety version of “regulation is written in blood”. But that’s not how it works. Here’s what I think about the myth that warning shots will come to save the day: 1) Awful. I will never hope for a disaster. That’s what I’m trying to prevent. Hoping for disasters to make our job easier is callous and it takes us off track to be thinking about the silver lining of failing in our mission. 2) A disaster does not automatically a warning shot make. People have to be prepared with a world model that includes what the significance of the event would be to experience it as a warning shot that kicks them into gear. 3) The way to make warning shots effective if (God forbid) they happen is to work hard at convincing others of the risk and what to do about it based on the evidence we already have— the very thing we should be doing in the absence of warning shots. If these smaller scale disasters happen, they will only serve as warning shots if we put a lot of work into educating the public to understand what they mean before they happen. The default “warning shot” event outcome is confusion, misattribution, or normalizing the tragedy. Let’s imagine what one of these macabrely hoped-for “warning shot” scenarios feels like from the inside. Say one of the commonly proposed warning shot scenario occurs: a misaligned AI causes several thousand deaths. Say the deaths are of ICU patients because the AI in charge of their machines decides that costs and suffering would be minimize
 ·  · 14m read
 · 
This is a transcript of my opening talk at EA Global: London 2025. In my talk, I challenge the misconception that EA is populated by “cold, uncaring, spreadsheet-obsessed robots” and explain how EA principles serve as tools for putting compassion into practice, translating our feelings about the world's problems into effective action. Key points:  * Most people involved in EA are here because of their feelings, not despite them. Many of us are driven by emotions like anger about neglected global health needs, sadness about animal suffering, or fear about AI risks. What distinguishes us as a community isn't that we don't feel; it's that we don't stop at feeling — we act. Two examples: * When USAID cuts threatened critical health programs, GiveWell mobilized $24 million in emergency funding within weeks. * People from the EA ecosystem spotted AI risks years ahead of the mainstream and pioneered funding for the field starting in 2015, helping transform AI safety from a fringe concern into a thriving research field. * We don't make spreadsheets because we lack care. We make them because we care deeply. In the face of tremendous suffering, prioritization helps us take decisive, thoughtful action instead of freezing or leaving impact on the table. * Surveys show that personal connections are the most common way that people first discover EA. When we share our own stories — explaining not just what we do but why it matters to us emotionally — we help others see that EA offers a concrete way to turn their compassion into meaningful impact. You can also watch my full talk on YouTube. ---------------------------------------- One year ago, I stood on this stage as the new CEO of the Centre for Effective Altruism to talk about the journey effective altruism is on. Among other key messages, my talk made this point: if we want to get to where we want to go, we need to be better at telling our own stories rather than leaving that to critics and commentators. Since
 ·  · 3m read
 · 
A friend of mine who worked as a social worker in a hospital told me a story that stuck with me. She had a conversation with an in-patient having a very difficult time. It was helpful, but as she was leaving, they told her wistfully 'You get to go home'. She found it hard to hear—it felt like an admonition. It was hard not to feel guilt over indeed getting to leave the facility and try to stop thinking about it, when others didn't have that luxury. The story really stuck with me. I resonate with the guilt of being in the fortunate position of being able to go back to my comfortable home and chill with my family while so many beings can't escape the horrible situations they're in, or whose very chance at existence depends on our work. Hearing the story was helpful for dealing with that guilt. Thinking about my friend's situation it was clear why she felt guilty. But also clear that it was absolutely crucial that she did go home. She was only going to be able to keep showing up to work and having useful conversations with people if she allowed herself proper respite. It might be unfair for her patients that she got to take the break they didn't, but it was also very clearly in their best interests for her to do it. Having a clear-cut example like that to think about when feeling guilt over taking time off is useful. But I also find the framing useful beyond the obvious cases. When morality feels all-consuming Effective altruism can sometimes feel all consuming. Any spending decision you make affects how much you can donate. Any activity you choose to do takes time away from work you could be doing to help others. Morality can feel as if it's making claims on even the things which are most important to you, and most personal. Often the narratives with which we push back on such feelings also involve optimisation. We think through how many hours per week we can work without burning out, and how much stress we can handle before it becomes a problem. I do find that