Can we trust wellbeing surveys? A pilot study of comparability, linearity, and neutrality

Conrad S; CasparKaiser; MichaelPlant; Samuel Dupret

Can we trust wellbeing surveys? A pilot study of comparability, linearity, and neutrality

Conrad S,

Comments 10

Sorted by

New & upvoted

NickLaing

Thanks for this, it is interesting and important.

I don't however think these issues with point estimates are biggest problem with wellbeing research, these issues are important yes for calibration, but a bigger problem is whether reported increases in wellbeing after an intervention are real or biased. I have said this before, apologies for being a stuck record.

These two biases which don't necessarily affect point estimates (like you discuss above) but affect before and after measurements...

Demand/ courtesy bias. Giving higher wellbeing score after the intervention because you think that is what the researcher wants.
"Future hope" bias. Giving higher scores after any intervention, thinking (often rationally and correctly) that the positive report will make you more likely to get other, even different types of help in future. This could be a huge problem in surveys among the poor but there's close to no research on it.

These might be hard to research and are undrafted, but I think it is important to try.

We should keep in mind though these two bias don't only affect wellbeing surveys, but to some degree any self reported survey, for example the majority of give directly's data.

Samuel Dupret

Hi Nick,

Thanks for pointing out both kinds of biases. These biases can cause a failure of comparability. Concretely, if an intervention causes you to give counterfactually higher scores as a matter of ‘courtesy’ to the researcher, then the intervention changed the meaning of each given response category.

I therefore take it that you don’t think that our particular tests of comparability will cover the two biases you mention. If so, I agree. However, my colleague has given reasons for why we might not be as worried about these sorts of biases.

I don’t think this can be tested in our current survey format, but it might be testable in a different design. We are open to suggestions!

NickLaing

Not only courtesy, but also future hope (which I think may be more important here).

Yeah it's really hard to test. I think validity of point estimates are pretty reasonable for wellbeing surveys and I agree with most of the reasoning on this post.

It's very had to test those biases ethically, but probably possible. Not in this kind of survey anyway.

The reasons he gave for not being worried about those biases were not unreasonable, but based on flimsy evidence. Especially future hope bias which may not have been researched at all.

geoffrey

I enjoyed this a lot. I've been meaning to delve into well-being measurement and this was a nice entry-point into the field.

One thing I'm not clear on is whether vignette anchors (or any of the comparability methods) can correct for non-overlapping well-being scales. You talked about an example like this:

But I'm more interested in examples like this:

Measuring these larger SWB (subjective well-being) differences seems crucial for detecting interpersonal differences across societies and picking up on how intense pain / pleasure can be at the long tails. The non-overlap area seems like it can get extremely big.

CasparKaiser

Hi geoffrey!

Yes, you are right.

All of the methods we are currently thinking of require that for all respondents i,j the top response threshold for person i must be at least as large as the bottom response threshold for person j. `

However, with the vignettes, I believe that this is in part testable.
Suppose that for a given vignette no person selected the top response category, and no person selected the bottom response category. Additionally suppose that the assumptions in section 4.1.1 of the report hold (i.e. that people perceive vignettes similarly, and use the same scale for their own wellbeing as for the vignettes). In that case all respondents’ scales must have at least some overlap with each other.

We have not checked this though I imagine that it would show overlap of scales. Would this kind of test convince you?

As an aside, in section 4.6.1 we show that almost all respondents choose either “The most/least satisfied that any human could possibly be” or “The most/least satisfied that you personally think you could become” as the endpoints of the scale. Since the latter set of endpoints is contained by the former set of endpoints, this evidence also seems to suggest that scales overlap.

geoffrey

Hi Caspar,

Thanks for the response. On second thought, my objection might be different than what I initially suggested. I do think the test of overlap of scales as you mentioned would be an interesting test to run, but it doesn't seem to be capturing the overlap I ultimately care about.

Maybe this comment can captures my complaint better. We don't have any access to what "the most/least satisfied that any human could possibly be". We don't even have access to "the most/least satisfied you personally think you could become".

As a personal example, I would take most of my worst post-therapy days over most of my best pre-therapy days. Younger me has no access to realizing how much satisfied I could be with life, or even how broadly people are in general.

I might be using the language wrong, but I think I'm hinting at differences in the latent scale of well-being or satisfaction... which doesn't feel like it's knowable.

Guy Raveh

I have a difficulty with this idea of a neutral point, below which it is preferable to not exist. At the very least, this is another baked in assumption - that the worst wellbeing imaginable is worse than non-existence.

There are two reasons for me being troubled with this assumption:

I've been living with a chronic illness for many years, which causes constant suffering. I'm expected to keep living like that for decades to come. I can't accept the idea that there's a point of suffering beyond which I should not live.
Giving such a point will allow one to make decisions about whether people should live or die. As a rule that I personally believe in, we should never make such decisions.

MichaelPlant

Hello Guy. This is an important, tricky, and often unpleasant, issue to discuss. I'm speaking for myself here: HLI doesn't have an official view on this issue, except that it's complicated and needs more thought; I'm still not sure how to think about this.

I'll respond to your second comment first. You say we should not decide whether people live or die. Whilst I respect the sentiment, this choice is unfortunately unavoidable. Healthcare systems must, for instance, make choices between quality and quantity of lives - there are not infinite resources. The well-worn QALY and DALY measures exist in the hope of making such choices in a more principled way. Charitable donors, when deciding where to give, might support any one of a variety of life-saving charities, or charities that focus on something else - in a sense, they are choosing whether people live or die. Because we have to make such choices anyway, it doesn't depend on where the neutral point is fixed. As we note in section 2.2. of this recent HLI report, QALYs allow states worse than death, whereas DALYs do not. Yet both trade-off quality vs quantity of life.

Turning to your first comment, I note this is a topic about which opinion seems to split. Some people, such as yourself, think existence is always better than non-existence. Others think that life can be worse than death - a life of unrelenting suffering, perhaps - and that people can be rational in seeking to end their own lives. Curiously, I note that people are quite ready to accept that, when it comes to factory farming, those animals would lead bad lives, so it is better that they never exist. That said, I don't think this is an issue we can expect, or need, to find unanimity on. Those who think life is always worth living would, presumably, want to make comparisons differently from those who think life is not always worth living. One way to capture these different views, and explore their implications, is exactly by varying the level of the neutral point.

Guy Raveh

I'm aware that by prioritising how to use limited resources, we're making decisions about people's lives. But there's a difference between saying "we want to save everyone, but can't" and saying "This group should actually not be saved, because their lives are so bad".

Curiously, I note that people are quite ready to accept that, when it comes to factory farming, those animals would lead bad lives, so it is better that they never exist.

I actually agree! But I don't think it's the same thing. I don't want to kill existing animals; I want to not intentionally create new ones for factory farms. Continued existence is better than death if you already exist. Creating someone just to suffer is a different matter. This isn't symmetric (and as a mathematician, I note that that means it can't be described by just giving some "local" numerical rating to each state of being and comparing them).

Geoffrey Miller

Guy - thank you for this comment. I'm very sorry about your suffering.

I think EAs should take much more seriously the views of people like you who have first-hand experience with these issues. We should not be assuming that 'below neutral utility' implies 'it's better not to be alive'. We should be much more empirical about this, and not make strong a priori assumptions grounded in some over-simplified, over-abstracted view of utilitarianism.

We should listen to the people, like you, who have been living with chronic conditions -- whether pain, depression, PTSD, physical handicaps, cognitive impairments, or whatever -- and try to understand what keeps people going, and why they keep going.

Comments

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·1w ago·Curated 5d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

114

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·6d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

How (not) to fundraise from Anthropic staff

Jack Lewars·5d ago·7m read

Adapted from my Substack, Funding Anthropalypse. Short version: if you want a share of the coming Anthropic and OpenAI windfall - the $37bn+ that could be in play next year - the way in is to become 'legibly excellent', so the evaluators and donors that frontier lab staff already trust point them to yo...

Recent opportunities to take action

Starting an EA group @ SUNY Binghamton

micahzarin·4h ago·1m read

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·1d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·1d ago·3m read

MichaelPlant

^{^}

Author note: Conrad Samuelsson, Samuel Dupret, and Caspar Kaiser contributed to the conceptualization, methodology, investigation, analysis, data curation, and writing (original as well as review and editing) of the project. Michael Plant contributed to the conceptualization, supervision, and writing (review and editing) of the project.

^{^}

A standard philosophical explanation for what makes death bad for us (assuming it can be bad for us) is deprivationism, which says that death is bad because and to the extent it deprives us of the goods of life. Hence, death is bad for us if we would have had a good life and, conversely, death is good for us if we would have had a bad life. Here we take it that a good(/bad/neutral) life is one with overall positive(/negative/neutral) wellbeing. See, for example, Nagel (1970).

^{^}

The need to, and difficulty of, assigning values to both various states of life and to death is also a familiar challenge for measures of quality- and disability-adjusted life years (QALYs and DALYs). For discussion, see, for example, Sassi (2006).

^{^}

The fact that the scale mixes three concepts into one seems problematic.

^{^}

This is entailed by, for instance, a standard formulation of utilitarianism. In classical utilitarianism, the value of an outcome is the sum total of wellbeing in it, where wellbeing consists in happiness. On this view, ceteris paribus, extending an overall happy life is good, whereas extending an overall unhappy life is bad. ‘Good’ and ‘bad’ are understood either in terms of being good/bad for the person or good/bad ‘for the world’. We are not endorsing classical utilitarianism here, but merely point out that aligning the neutral point with the zero point on the appropriate wellbeing scale (whatever that happens to be) would be a textbook view in ethics.

Can we trust wellbeing surveys? A pilot study of comparability, linearity, and neutrality

Can we trust wellbeing surveys? A pilot study of comparability, linearity, and neutrality

Summary

1. Linearity, comparability, and neutrality as challenges for wellbeing research

1.1 Comparability

1.2 Linearity

1.3 Neutrality

2. General outline of the survey