Hide table of contents

Background: Earlier this year, I attended a great presentation by Natália Mendoça about experience sampling. Here's the deck from her presentation.

A takeaway from the presentation was that QALYs are constructed in a way that skews cause prioritization towards particular causes. Alternative metrics have different skews, so using an alternative metric could lead to very different cause prioritization.

For example, under the QALY framework, one year with "some problems walking about" is considered to be about as bad as one year with "moderate anxiety or depression."

For anyone who's had some experience with depression or anxiety, as well as with "some problems walking about," it should be obvious that moderate depression or anxiety are (much) worse than moderate mobility problems, pound for pound. (Please reach out if you disagree with this, I want to pick your brain if you do.)

An alternative metric to QALYs is called experience sampling. Last month, Natália posted about experience sampling on the Forum. The post was moderately upvoted, though no one commented on it.

A takeaway from that post is that rolling out an experience-sampling framework seems very tractable.


This research direction seems like plausibly a high priority for EA, as basing cause prioritization on a different metric could lead to notably different priority causes.

In particular, experience sampling appears to give a higher weight to mental health disorders than QALYs does, so it's plausible that under an experience-sampling framework, mental health interventions would be higher priority than global health interventions.

Given the potential magnitude of this delta in prioritization (between the experience-sampling & QALY frameworks), it's surprising to me that there's not been more interest in investigating alternatives to the QALY in the EA community.

To be clear, I'm not claiming that the experience-sampling method is superior to QALYs. I'm claiming that it is constructed in an equally plausibly way to the QALY, and that it probably results in drastically different cause prioritization. One potentially robust path forward could be to split the difference between prioritization implied by QALYs and prioritization implied by experience sampling.


[Disclosure: In February 2019, I corresponded about the experience-sampling idea with Alex Foster of the EA Meta Fund. He said my points were "certainly quite compelling," but the correspondence fell off.

I heard later from another source that the EA Meta Fund didn't end up getting excited about the idea, though they didn't say why not.]

New Answer
New Comment


3 Answers sorted by

Three thoughts. First, it's not really the case that EAs use QALYs/DALYs. GWWC and GiveWell used to use them , but GWWC no longer exists as an independent entity and GiveWell now use their own metric. 80k mostly focus on the far future and so QALYs/DALYs aren't of primary interest. Have I missed someone? I think Founders Pledge do use them. Not sure what goes on 'under the hood' for The Life You Can Save's recommendations.

Second, even if you wanted to use the experience sampling method (ESM) as your measure of wellbeing, you couldn't because there isn't enough data on it. There are only two academic projects which have tried to collect data en masse - trackyourhappiness and mappiness. The former is now defunct (Killingsworth works for Microsoft now I believe) and the latter isn't actively being used (I spoke to the creator, George MacKerron a couple of months ago) I discuss this in a previous forum post. The best I think we can do, if we want to use subjective wellbeing (SWB) measure is life satisfaction.

Third, I think ESM is the theoretically ideal measure of happiness and thus EA - indeed, everyone - should use it as the outcome measure of impact (I assume wellbeing consists in happiness). What follows is that ESM is superior to all other measures of wellbeing, including QALYs/DALYs, wealth, etc. I'm hoping to do some research using ESM at some point in the future if I can.

I think ESM is the theoretically ideal measure of happiness and thus EA - indeed, everyone - should use it as the outcome measure of impact (I assume wellbeing consists in happiness).

As you laid out in this comment, it looks like experience sampling is not getting strong uptake in academia.

Here's a short argument:

  • (a) Experience-sampling is theoretically the best way to measure happiness
  • (b) It's feasible to build experience-sampling infrastructure, e.g. Natália's mobile app proposal
  • (c) Academics & other stakeholders aren't planning
... (read more)
4
MichaelPlant
I think your short argument misses the point. The obstacle isn't the lack of such infrastructure - I imagine academics could use the existing tools if they asked politely or created their own - but the lack of demand for such infrastructure.
8
Milan Griffes
I'm imagining that EA could provide the demand for such infrastructure (EA cause prioritizers would be its customers).
First, it's not really the case that EAs use QALYs/DALYs. GWWC and GiveWell used to use them, but GWWC no longer exists as an independent entity and GiveWell now use their own metric.

This is a good point.

I think that GWWC & GiveWell's earlier use of QALYs created a lot of path dependence, such that current EA prioritization remains influenced by the QALY framework even though no organization explicitly uses it at present.

Considering an alternate timeline can help draw out the path dependence:

Imagine a world where the DCP project started in 2
... (read more)
5
MichaelPlant
I find this to be the most plausible explanation of what has happened. Your counterfactual story is rather helpful!

A minor correction: GiveWell uses DALY to measure mortality and morbidity. (Well, for malaria they actually don't look at the impact of prevention on morbidity, only mortality, since the former is relatively small -- see row 22 here.) Maybe what you had in mind is their "moral weights" which they use to convert between life years and income.


Like cole_haus points out below, ESM's results would enter disability weights (which are used to construct DALYs) to affect how health interventions are prioritized. Currently disability weights invo... (read more)

I don't really see ESM as being in opposition to QALYs. It seems like it's a method that you would use as an input in QALY weight determinations. Wikipedia lists some of the current methods for deriving QALY weights as:

Time-trade-off (TTO): Respondents are asked to choose between remaining in a state of ill health for a period of time, or being restored to perfect health but having a shorter life expectancy.
Standard gamble (SG): Respondents are asked to choose between remaining in a state of ill health for a period of time, or choosing a medical intervention which has a chance of either restoring them to perfect health, or killing them.
Visual analogue scale (VAS): Respondents are asked to rate a state of ill health on a scale from 0 to 100, with 0 representing being dead and 100 representing perfect health. This method has the advantage of being the easiest to ask, but is the most subjective.

There's also the "day reconstruction method" (DRM). The Oxford Handbook of Happiness talks about ESM, DRM and others relevant measurement approaches at various points.

I'd guess the trouble with using ESM, DRM and some other methods like them for QALY weights is it's hard to isolate the causal effect of particular conditions using these methods.

I don't really see ESM as being in opposition to QALYs. It seems like it's a method that you would use as an input in QALY weight determinations.

It can be, yes, but QALYs trade off length and quality of life, whereas ESM would only tell you about QoL/wellbeing. There would still need to be some other process (or some major assumptions) to anchor the results on the QALY (or DALY) scale.

I suspect experience sampling is much more costly and time-consuming to get data on than alternatives, and there's probably much less data. Life satisfaction or other simple survey questions about subjective wellbeing might be good enough proxies, and there's already a lot of available data out there.

Here's a pretty comprehensive post on using subjective wellbeing:

A Happiness Manifesto: Why and How Effective Altruism Should Rethink its Approach to Maximising Human Welfare by Michael Plant

Another good place to read more about this is https://whatworkswellbeing.org/our-work/measuring-evaluating/

Comments7
Sorted by Click to highlight new comments since:

>For anyone who's had some experience with depression or anxiety, as well as with "some problems walking about," it should be obvious that moderate depression or anxiety are (much) worse than moderate mobility problems, pound for pound.

That's obvious for rich people, but not at all obvious for someone who risks hunger as a result of mobility problems.

This is a great point.

From my (limited) experience witnessing poverty in the developing world, it's not clear to what extent moderate mobility problems increase risk of hunger.

They certainly don't help, but in many developing-world contexts, it seems like there's a measure of social surplus / social safety net which provides some buffer against hunger for people with chronic health problems.

I wonder if there's a paper on this...

I tried experience sampling myself for about a year and a half (intro, conclusion) and it made me much more skeptical of the system. I'm just not that sure how happy I am at any given point:

When I first started rating my happiness on a 1-10 scale I didn't feel like I was very good at it. At the time I thought I might get better with practice, but I think I'm actually getting worse at it. Instead of really thinking "how do I feel right now?" it's really hard not to just think "in past situations like this I've put down '6' so I should put down '6' now".

And:

I don't have my phone ping me during the night, because I don't want it to wake me up. Before having a kid this worked properly: I'd plug in my phone, which turns off pings, promptly fall asleep, wake up in the morning, unplug my phone. Now, though, my sleep is generally interrupted several times a night. Time spent waiting to see if the baby falls back asleep on her own, or soothing her back to sleep if she doesn't, or lying awake at 4am because it's hard to fall back asleep when you've had 7hr and just spent an hour walking around and bouncing the baby; none of these are counted. On the whole, these experiences are much less enjoyable than my average; if the baby started sleeping through the night such that none of these were needed anymore I wouldn't see that as a loss at all. Which means my data is biased upward. I'm curious how happiness sampling studies have handled this; people with insomnia would be in a similar situation.

I agree that DALY/QALY measurements aren't great either, though.

Instead of really thinking "how do I feel right now?" it's really hard not to just think "in past situations like this I've put down '6' so I should put down '6' now".

Interesting. I've done experience-sampling to track my mood for about 3 years, and haven't noticed this dynamic. (It generally feels like I'm answering the question "how do I feel right now?")

Just another data point.

Before having a kid this worked properly...

This is a great point. I don't have children & this hasn't been a problem for me. Totally makes sense that this comes up once you're a parent.

I agree that DALY/QALY measurements aren't great either, though.

My intuition is that aggregating the results of the two methods would outperform either method individually, because they skew in different directions.

[Disclosure: In February 2019, I corresponded about the experience-sampling idea with Alex Foster of the EA Meta Fund. He said my points were "certainly quite compelling," but the correspondence fell off.

Please note that the content of my correspondence with Milan was exploratory but primarily from a position of skepticism. Whilst technically accurate I find this quote to be misleading and not very good form.

Sorry, I wasn't trying to misrepresent you.

My story about what happened here:

  • Over email, I pitched a "compare QALYs to experience-sampling" project
  • You were initially skeptical about the idea
  • Over the course of our correspondence, you updated to being less skeptical about the project, though not totally sold on it
  • Eventually, you got busy & the correspondence stopped

Does that match your story?

That's fine! Thanks.

Curated and popular this week
 ·  · 5m read
 · 
[Cross-posted from my Substack here] If you spend time with people trying to change the world, you’ll come to an interesting conundrum: Various advocacy groups reference previous successful social movements as to why their chosen strategy is the most important one. Yet, these groups often follow wildly different strategies from each other to achieve social change. So, which one of them is right? The answer is all of them and none of them. This is because many people use research and historical movements to justify their pre-existing beliefs about how social change happens. Simply, you can find a case study to fit most plausible theories of how social change happens. For example, the groups might say: * Repeated nonviolent disruption is the key to social change, citing the Freedom Riders from the civil rights Movement or Act Up! from the gay rights movement. * Technological progress is what drives improvements in the human condition if you consider the development of the contraceptive pill funded by Katharine McCormick. * Organising and base-building is how change happens, as inspired by Ella Baker, the NAACP or Cesar Chavez from the United Workers Movement. * Insider advocacy is the real secret of social movements – look no further than how influential the Leadership Conference on Civil Rights was in passing the Civil Rights Acts of 1960 & 1964. * Democratic participation is the backbone of social change – just look at how Ireland lifted a ban on abortion via a Citizen’s Assembly. * And so on… To paint this picture, we can see this in action below: Source: Just Stop Oil which focuses on…civil resistance and disruption Source: The Civic Power Fund which focuses on… local organising What do we take away from all this? In my mind, a few key things: 1. Many different approaches have worked in changing the world so we should be humble and not assume we are doing The Most Important Thing 2. The case studies we focus on are likely confirmation bias, where
 ·  · 1m read
 · 
Are you looking for a project where you could substantially improve indoor air quality, with benefits both to general health and reducing pandemic risk? I've written a bunch about air purifiers over the past few years, and its frustrating how bad commercial market is. The most glaring problem is the widespread use of HEPA filters. These are very effective filters that, unavoidably, offer significant resistance to air flow. HEPA is a great option for filtering air in single pass, such as with an outdoor air intake or a biosafety cabinet, but it's the wrong set of tradeoffs for cleaning the air that's already in the room. Air passing through a HEPA filter removes 99.97% of particles, but then it's mixed back in with the rest of the room air. If you can instead remove 99% of particles from 2% more air, or 90% from 15% more air, you're delivering more clean air. We should compare in-room purifiers on their Clean Air Delivery Rate (CADR), not whether the filters are HEPA. Next is noise. Let's say you do know that CADR is what counts, and you go looking at purifiers. You've decided you need 250 CFM, and you get something that says it can do that. Except once it's set up in the room it's too noisy and you end up running it on low, getting just 75 CFM. Everywhere I go I see purifiers that are either set too low to achieve much or are just switched off. High CADR with low noise is critical. Then consider filter replacement. There's a competitive market for standardized filters, where most HVAC systems use one of a small number of filter sizes. Air purifiers, though, just about always use their own custom filters. Some of this is the mistaken insistence on HEPA filters, but I suspect there's also a "cheap razors, expensive blades" component where manufacturers make their real money on consumables. Then there's placement. Manufacturers put the buttons on the top and send air upwards, because they're designing them to sit on the floor. But a purifier on the floor takes up
 ·  · 4m read
 · 
[Note: I (the primary author) am writing this entirely in a personal capacity. Funding for the bounty and donations mentioned in this post comes entirely from personal savings and the generosity of friends and family. Colleagues at Open Philanthropy (my employer) reviewed this post at my request, but this project is completely unaffiliated with Open Philanthropy.]   In 2023, GiveWell reported that it received over $250M from more than 30,000 donors, excluding Open Philanthropy. I expect (though haven’t confirmed) that at least $50M of this came from unmatched retail donations, meaning from individuals who don’t work at a company that offers a donation match. I can’t help but hope that there may be some way to offer these donors a source of matching funds that wouldn’t otherwise go toward charitable causes. Over the last couple of years, friends and I have spent >100 hours looking into potential legal, collaborative corporate donation matching opportunities. I think there may be promising ways to partner with corporate donors, but I haven’t found a way forward that I am comfortable with, and I don’t think I’m the best person to continue work on this project. Some donors may be choosing to give through surrogates (friends who work at companies that match donations) without understanding the risks involved. My understanding is that there can be several (particularly if donors send surrogates money conditionally, e.g., by asking them to sign an agreement to give through their company’s match): * The surrogate might inadvertently violate their company’s terms for donation matching. * The surrogate, donor, or company might fail an IRS audit if they don’t correctly report the donations + match. * The surrogate or donor might be sued by the company. * The company might discontinue its matching program and/or claw back funds from recipient nonprofits. “Getting to yes” with a corporate partner in a completely legal, transparent, and good faith way could direct signi