Hide table of contents

I often think about "the road to hell is paved with good intentions".[1] I'm unsure to what degree this is true, but it does seem that people trying to do good have caused more negative consequences in aggregate than one might naively expect.[2] "Power corrupts" and "power-seekers using altruism as an excuse to gain power" are two often cited reasons for this, but I think don't explain all of it.

A more subtle reason is that even when people are genuinely trying to do good, they're not entirely aligned with goodness. Status-seeking is a powerful motivation for almost all humans, including altruists, and we frequently award social status to people for merely trying to do good, before seeing all of the consequences of their actions. This is in some sense inevitable as there are no good alternatives. We often need to award people with social status before all of the consequences play out, both to motivate them to continue to try to do good, and to provide them with influence/power to help them accomplish their goals.

A person who consciously or subconsciously cares a lot about social status will not optimize strictly for doing good, but also for appearing to do good. One way these two motivations diverge is in how to manage risks, especially risks of causing highly negative consequences. Someone who wants to appear to do good would be motivated to hide or downplay such risks, from others and perhaps from themselves, as fully acknowledging such risks would often amount to admitting that they're not doing as much good (on expectation) as they appear to be.

How to mitigate this problem

Individually, altruists (to the extent that they endorse actually doing good) can make a habit of asking themselves and others what risks they may be overlooking, dismissing, or downplaying.[3]

Institutionally, we can rearrange organizational structures to take these individual tendencies into account, for example by creating positions dedicated to or focused on managing risk. These could be risk management officers within organizations, or people empowered to manage risk across the EA community.[4]

Socially, we can reward people/organizations for taking risks seriously, or punish (or withhold rewards from) those who fail to do so. This is tricky because due to information asymmetry, we can easily create "risk management theaters" akin to "security theater" (which come to think of it, is a type of risk management theater). But I think we should at least take notice when someone or some organization fails, in a clear and obvious way, to acknowledge risks or to do good risk management, for example not writing down a list of important risks to be mindful of and keeping it updated, or avoiding/deflecting questions about risk.[5] More optimistically, we can try to develop a culture where people and organizations are monitored and held accountable for managing risks substantively and competently.

  1. ^

    due in part to my family history

  2. ^

    Normally I'd give some examples here, but we can probably all think of some from the recent past.

  3. ^

    I try to do this myself in the comments.

  4. ^

    an idea previously discussed by Ryan Carey and William MacAskill

  5. ^

    However, see this comment.




Sorted by Click to highlight new comments since:

My main altruistic endeavor involves thinking and writing about ideas that seem important and neglected. Here is a list of the specific risks that I'm trying to manage/mitigate in the course of doing this. What other risks am I overlooking or not paying enough attention to, and what additional mitigations I should be doing?

  1. Being wrong or overconfident, distracting people or harming the world with bad ideas.
    1. Think twice about my ideas/arguments. Look for counterarguments/risks/downsides. Try to maintain appropriate uncertainties and convey them in my writings.
  2. The idea isn't bad, but some people take it too seriously or too far.
    1. Convey my uncertainties. Monitor subsequent discussions and try to argue against people taking my ideas too seriously or too far.
  3. Causing differential intellectual progress in an undesirable direction, e.g., speeding up AI capabilities relative to AI safety, spreading ideas that are more useful for doing harm than doing good.
    1. Check ideas/topics for this risk. Self-censor ideas or switch research topics if the risk seems high.
  4. Being first to talk about some idea, but not developing/pursuing it as vigorously as someone else might if they were first, thereby causing a net delay in intellectual or social progress.
    1. Not sure what to do about this one. So far not doing anything except to think about it.
  5. PR/political risks, e.g., talking about something that damages my reputation or relationships, and in the worst case harms people/causes/ideas associated with me.
    1. Keep this in mind and talk more diplomatically or self-censor when appropriate.

There's also the unilateralist's curse: suppose someone publishes an essay about a dangerous, viral idea that they misjudge to be net-positive; after 20 other people also thought about it but judged it to be net-negative.

Individually, altruists [...] can make a habit of asking themselves and others what risks they may be overlooking, dismissing, or downplaying.

Institutionally, we can rearrange organizational structures to take these individual tendencies into account, for example by creating positions dedicated to or focused on managing risk.

I’ve been surprised by how this seems to be a bit of a blind spot in our community.[1] I’ve previously written a couple of comments—excerpted below—on this theme, about the state of community building. These garnered a decent number of upvotes, but I don’t think they led to any concrete actions or changes. (For instance, the second comment never received a reply from Open Phil.)

My attempts to raise this concern [about optimizing for numbers/hype at the expense of i) cause prio, ii) addressing particular talent bottlenecks, and iii) mitigating downside risks] with other community builders, including those above me, were mostly dismissed. This worried me. It seemed like the community building machine was not open to the hypothesis that (some of) what it was doing might be ineffective, or, worse, net negative. (More on the latter below.) On top of this, there seemed to be a tricky second-order effect at play: evaporative cooling whereby the community builders who developed concerns like mine exited, only to be replaced by more bullish community builders. The result: a disproportionately bullish community building machine. And there didn't appear to be any countermeasures in place. For example, there was plenty of funding available if one wanted a paid role doing community building. But, in addition to the social disincentive, there was no funding available for evaluating/critiquing the impact of community building—at least, not that I was aware of.


There was near-consensus that Open Phil should generously fund promising AI safety community/movement-building projects they come across

Would you be able to say a bit about to what extent members of this working group have engaged with the arguments around AI safety movement-building potentially doing more harm than good? For instance, points 6 through 11 of Oli Habryka's second message in the “Shutting Down the Lightcone Offices” post (link). If they have strong counterpoints to such arguments, then I imagine it would be valuable for these to be written up.


  1. ^

    I mean, if one has a high prior on one’s actions being robustly positive, then it makes sense to continue full steam ahead without worrying about risks. (Because there is a tradeoff: spending time considering risks means spending less time acting.) However, I don’t think this level of confidence is warranted for the vast majority of longtermist interventions. For more, see this comment by Linch.

While drafting this post, I wrote down and then deleted an example of "avoiding/deflecting questions about risk" because the person I asked such a question is probably already trying to push their organization to take risks more seriously, and probably had their own political considerations for not answering my question, so I don't want to single them out for criticism, and also don't want to damage my relationship with this person or make them want to engage less with me or people like me in the future.

Trying to enforce good risk management via social rewards/punishments might be pretty difficult for reasons like these.

Individually, altruists (to the extent that they endorse actually doing good) can make a habit of asking themselves and others what risks they may be overlooking, dismissing, or downplaying.

I think this works well when done in private, but asking around among friends is difficult for people who don't have an extensive EA network and risks that they inadvertently only ask around within their filter bubble.

Asking around publicly, e.g., on the Forum, is something that I and probably others too have mostly come to regret. Currently it's still uncommon to try to red-team your own interventions publicly, so when someone does do it, the intervention is not perceived as particularly well red-teamed but as particularly risky.

This could be avoided by making such red-teaming a lot more common, but that is hard. Perhaps a dedicated subforum could help too, one where only people interested in helping with such red-teaming efforts see the posts.

Curated and popular this week
Relevant opportunities