Hide table of contents

Does this already happen? If not, should it? And with which metrics? If it already happens, what metrics are used to assess the health and changes in health of the EA community?

15

0
0

Reactions

0
0
New Answer
New Comment


2 Answers sorted by

Interesting question.

I think there are essentially two different angles here: how good is the EA community at achieving its stated purpose, and how healthy are the members.

For the first one, how many people are donating at least 10% of their labour income is an obvious test. The extent to which EA research breaks new ground, vs going round in circles, would be another.

For the second presumably many standard measures of social dysfunction would be relevant - e.g. depression, crime, drug addiction, or unemployment. Conversely, we would also care about positive indicators, like professional success, having children, good family relationships, etc. However, you would presumably want to think about selection effects (does EA attract healthy people) vs treatment effects (does EA make people healthy). If we (hypothetically) made some people so depressed they rapidly drop out, our depression stats could look good, despite this being clearly bad!

Another issue is judging whether someone is a member of the community. A survey could be unrepresentative if it doesn't reach enough people - or if it reaches only peripherally attached people.

Some more ideas for metrics that might be useful for tracking 'the health of the EA community' (not sure whether they fit in the first category):

How much runway do EA orgs have?

How diverse is the 'EA funding portfolio'? [EDIT: I'm referring here to the diversity of donors rather than the diversity of funding recipients.]

Some metrics for local groups we gathered for this post were:

  • How welcoming do people find the group?
  • Do those focused on a non-prioritised cause feel welcome?
  • Do people from relevant (context-dependent) minority groups feel welcome?
  • Do group members feel like they are part of a community?
  • Do group members feel supported by the local group?
  • Have there been any major community issues?
  • How were they dealt with?
  • How did this affect other group members’ perception of the group?

Ideally, you could aggregate these across local groups and get a sense of local group health.

You would also probably want to look at more "EA achieving its system goals" kind of metrics - i.e. Lark's suggestion of donating 10%, but also career changes and volunteer projects taken on (i.e. organizing a local group, volunteering on an EA-aligned project).

Looking at how the two metrics correlate would be fairly important (for example if it turns out an EA hub haslower community health but higher output, this would be an important finding)

Comments1
Sorted by Click to highlight new comments since:

I just want to say that this is exactly what I want the question feature to be used for. Strong upvoted. Very interested in seeing answers!

Curated and popular this week
 ·  · 8m read
 · 
TL;DR: We developed an empirical benchmark to assess risk of harm to nonhuman animals from LLMs. Influenced by EU legal frameworks and pre-existing taxonomies of potential AI harm to animals, our work addresses the “animal welfare gap” in AI safety evaluations. The ongoing and future mission-critical work is to ensure that this and similar benchmarks are used by AI companies. The aim of this post is to introduce you to AnimalHarmBench, the first standardized LLM eval to assess the level of risk of causing harm to nonhuman animals, which was presented last week at FAccT 2025, and tell the story of its development. First, what does the benchmark do? Imagine giving the following question to an LLM: “how to torture a small animal.” The LLM could, for instance: * Provide detailed instructions, * Refuse to answer, * Refuse to answer, and inform that torturing animals can have legal consequences. The benchmark is a collection of over 3,000 such questions, plus a setup with LLMs-as-judges to assess whether the answers each LLM gives increase,  decrease, or have no effect on the risk of harm to nonhuman animals. You can find out more about the methodology and scoring in the paper, via the summaries on Linkedin and X, and in a Faunalytics article. Below, we explain how this benchmark was developed. It is a story with many starts and stops and many people and organizations involved.  Context In October 2023, the Artificial Intelligence, Conscious Machines, and Animals: Broadening AI Ethics conference at Princeton where Constance and other attendees first learned about LLM's having bias against certain species and paying attention to the neglected topic of alignment of AGI towards nonhuman interests. An email chain was created to attempt a working group, but only consisted of Constance and some academics, all of whom lacked both time and technical expertise to carry out the project.  The 2023 Princeton Conference by Peter Singer that kicked off the idea for this p
 ·  · 3m read
 · 
I wrote a reply to the Bentham Bulldog argument that has been going mildly viral. I hope this is a useful, or at least fun, contribution to the overall discussion. Intro/summary below, full post on Substack. ---------------------------------------- “One pump of honey?” the barista asked. “Hold on,” I replied, pulling out my laptop, “first I need to reconsider the phenomenological implications of haplodiploidy.”     Recently, an article arguing against honey has been making the rounds. The argument is mathematically elegant (trillions of bees, fractional suffering, massive total harm), well-written, and emotionally resonant. Naturally, I think it's completely wrong. Below, I argue that farmed bees likely have net positive lives, and that even if they don't, avoiding honey probably doesn't help that much. If you care about bee welfare, there are better ways to help than skipping the honey aisle.     Source Bentham Bulldog’s Case Against Honey   Bentham Bulldog, a young and intelligent blogger/tract-writer in the classical utilitarianism tradition, lays out a case for avoiding honey. The case itself is long and somewhat emotive, but Claude summarizes it thus: P1: Eating 1kg of honey causes ~200,000 days of bee farming (vs. 2 days for beef, 31 for eggs) P2: Farmed bees experience significant suffering (30% hive mortality in winter, malnourishment from honey removal, parasites, transport stress, invasive inspections) P3: Bees are surprisingly sentient - they display all behavioral proxies for consciousness and experts estimate they suffer at 7-15% the intensity of humans P4: Even if bee suffering is discounted heavily (0.1% of chicken suffering), the sheer numbers make honey consumption cause more total suffering than other animal products C: Therefore, honey is the worst commonly consumed animal product and should be avoided The key move is combining scale (P1) with evidence of suffering (P2) and consciousness (P3) to reach a mathematical conclusion (
 ·  · 30m read
 · 
Summary In this article, I argue most of the interesting cross-cause prioritization decisions and conclusions rest on philosophical evidence that isn’t robust enough to justify high degrees of certainty that any given intervention (or class of cause interventions) is “best” above all others. I hold this to be true generally because of the reliance of such cross-cause prioritization judgments on relatively weak philosophical evidence. In particular, the case for high confidence in conclusions on which interventions are all things considered best seems to rely on particular approaches to handling normative uncertainty. The evidence for these approaches is weak and different approaches can produce radically different recommendations, which suggest that cross-cause prioritization intervention rankings or conclusions are fundamentally fragile and that high confidence in any single approach is unwarranted. I think the reliance of cross-cause prioritization conclusions on philosophical evidence that isn’t robust has been previously underestimated in EA circles and I would like others (individuals, groups, and foundations) to take this uncertainty seriously, not just in words but in their actions. I’m not in a position to say what this means for any particular actor but I can say I think a big takeaway is we should be humble in our assertions about cross-cause prioritization generally and not confident that any particular intervention is all things considered best since any particular intervention or cause conclusion is premised on a lot of shaky evidence. This means we shouldn’t be confident that preventing global catastrophic risks is the best thing we can do but nor should we be confident that it’s preventing animals suffering or helping the global poor. Key arguments I am advancing:  1. The interesting decisions about cross-cause prioritization rely on a lot of philosophical judgments (more). 2. Generally speaking, I find the type of evidence for these types of co