(Content warning: this post mentions a question from the 2024 EA Survey. If you haven't answered it yet and plan to do so, please do that first before reading on)
The 2024 EA survey asks people which of the following interventions they prefer:
- An intervention that averts 1,000 DALYs with 100% probability
- An intervention that averts 100,000 DALYs with 1.5% probability
This is a simple question in theory: (2) has 50% more expected value.
In practice, I believe this is an absurd premise, the kind that never happens in real life. How would you know that the probability that an intervention works is 1.5%?
My rule of thumb is that most real-world probabilities could be off by a percentage point or so. Note that this is different from it being 1% too low or too high; it is an entire percentage point. For the survey question, it might well be that intervention (1)'s success rate is only 99%, and intervention (2) could have a success rate anywhere in the low percentages.
I don't have a good justification for this rule of thumb[1]. Part of it is probably psychological: humans are most familiar with concepts like "rare". We occasionally use percentages but rarely (no pun intended) use permilles or smaller units. Parts of it is technical: probabilities that are small are harder to directly measure, so they are derived from a model. The model is imperfect, and the model inputs are likely to be imprecise.
For intervention (1), my rule of thumb does not have a large effect on the overall impact. For intervention (2), the effect is very large[2]. This makes the survey question so hard to answer, and the answers so hard to interpret.
There are, of course, established ways to deal with this mathematically. For example, one could use a portfolio approach that allocates some fraction of resources to intervention (2). Such strategies are valuable, even necessary, to deal with this type of question. As a survey respondent, I felt frustrated with having just two options. I feel that the survey question creates a false sense of "all you need is expected value"; it asks for a black-and-white answer where the reality has lots of shades.[3]
My recommendation and plea: Please communicate humbly, especially when using very low probabilities. Consider that all your numbers, but low probabilities especially, might be inaccurate. When designing thought experiments, keep them as realistic as possible, so that they elicit better answers. This reduces misunderstandings, pitfalls, and potentially compounding errors. It produces better communication overall.
- I welcome pointers to research about this! ↩︎
- The effect is large, in the sense that the expected intervention value could be 500 DALYs or 2500 DALYs. However, the expected expected intervention value does not change if we just add symmetric error margins. ↩︎
- Caveat: I don't know what the survey question was intended to measure. It might well be a good question, given its goal. ↩︎
My intuitive reaction to this is "Way to screw up a survey."
Considering that three people agree-voted your post, I realize I should probably come away with this with a very different takeaway, more like "oops, survey designers need to put in extra effort if they want to get accurate results, and I would've totally fallen for this pitfall myself."
Still, I struggle with understanding your and the OP's point of view. My reaction to the original post was something like:
Why would this matter? If the estimate could be off by 1 percentage point, it could be down to 0.5% or up to 2.5%, which is still 1.5% in expectation. Also, if this question's intention were about the likelihood of EA orgs being biased, surely they would've asked much more directly about how much respondees trust an estimate of some example EA org.
We seem to disagree on use of thought experiments. The OP writes:
I don't think this is necessary and I could even see it backfiring. If someone goes out of their way to make a thought experiment particularly realistic, maybe respondees might get the impression that it is asking about a real-world situation where they are invited to bring in all kinds of potentially confounding considerations. But that would defeat the point of the thought experiment (e.g., people might answer based on how much they trust the modesty of EA orgs, as opposed to giving you their personal tolerance of risk of the feeling of having had no effect/wasted money in hindsight). The way I see it, the whole point of thought experiments is to get ourselves to think very carefully and cleanly about the principles we find most important. We do this by getting rid of all the potentially confounding variables. See here for a longer explanation of this view.
Maybe future surveys should have a test to figure out how people understand the use of thought experiments. Then, we could split responses between people who were trying to play the thought experiment game the intended way, and people who were refusing to play (i.e., questioning premises and adding further assumptions).
*On some occasions, it makes sense to question the applicability of a thought experiment. For instance, in the classical "what if you're a doctor who has the opportunity to kill a healthy patient during routine chek-up so that we could save the lives of 4 people needing urgent organ transplants," it makes little sense to just go "all else is equa! Let's abstract away all other societal considerations or the effect on the doctor's moral character."
So, if I were to write a post on thought experiments today, I would add something about the importance of re-contextualizing lessons learned within a thought experiments to the nuances of real-world situations. In short, I think my formula would be something like, "decouple within thought experiments, but make sure add an extra thinking step from 'answers inside a thought experiment' to 'what can we draw from this in terms of real-life applications.'" (Credit to Kaj Sotala, who once articulated a similar point in probably a better way.)