Hide table of contents

The Happier Lives Institute (HLI) currently builds its cost-effectiveness evaluations on life satisfaction (LS) scores, without adjustment for issue with those scores like scale norming. Research makes clear that this assumption is problematic. The solution remains less clear but there are promising directions researchers could take to inform charity evaluations.

Fin Moorhouse, now a staff member at Longview Philanthropy, introduced scale norming to the EA Forum here back in November 2020, describing scale norming as occurring “when the scale I use for reporting my SWB changes compared to your scale, or my scale at a different time”. He proposed several methodologies to correct for it: retrospective rescaling, reference class projection, and momentary affect calibration. More recent research has identified other methods like cognitive interviewing or complex non-linear models that explain more variation in life satisfaction.

No one has, to my knowledge, applied or discussed the issue of scale norming directly to HLI’s methodology. HLI's 2025 Research Agenda acknowledges some challenges, such as “how comparable peoples’ answers to wellbeing questionnaires are” and concerns about “reliability and validity” in low and middle income countries, but these remain exploratory. HLI’s 2023 working paper, “Can I Get a Little Less Life Satisfaction, Please?” by Michael Plant, emphasizes the limitations of life satisfaction, but I still so no formal methodology offered or adopted in HLI's published work to address interpersonal comparability, scale drift, or norming (e.g., retrospective rescaling, reference class projection). 

At present, HLI continues to rank interventions by wellbeing-per-dollar using LS data as if it were stable, linear, and interpersonally comparable across vastly different contexts. Until HLI formally accounts for this in its cost-effectiveness models, its conclusions, though promising, remain more uncertain than they need to be.

As proposed by Moorhouse, the adjustment could be follow-up interviews or surveys to ask participants to retrospectively re-rate past life satisfaction from their current perspective. This allows researchers to observe how internal scales shift over time (e.g., someone now rating a “7” might say their past “7” was actually a “5”). These re-ratings can be used to estimate and correct for scale drift.

Or instead of (or alongside) LS scores, HLI could use affect (positive and negative emotion) or the Day Reconstruction Method (DRM). These methods ask participants to report their actual experiences during the day rather than provide abstract ratings. They are less vulnerable to scale drift, more anchored in concrete experiences, and have been shown to predict life outcomes robustly.

This critique should not be mistaken for a rejection of HLI’s project. I currently believe that Subjective Well-Being (SWB) is the “least wrong” metric we currently have for quantifying moral impact for humans and more central than alternatives like QALYs, DALYs, or abstract preference satisfaction. In fact, scale norming can obscure improvements. For example, someone whose wellbeing rises sharply may still report similar LS scores because their internal standards have shifted upward. HLI’s current impact estimates may actually be too conservative.

It is also worth highlighting that scale norming doesn’t necessarily undermine HLI’s recommendations and may actually support them. If beneficiaries of an intervention report only modest LS gains due to rising internal standards, the true impact on their lived experience could be greater than the data reflect. For example take the following assumptions:

  • An intervention (say psychotherapy or cash transfers) is estimated by HLI to improve LS by +0.5 points on a 0–10 scale.
  • Let's assume from retrospective rescaling that beneficiaries later report their pre-intervention “5” was actually a “3”, a 2-point underestimation.
  • This suggests their internal scale shifted: their current “5” corresponds to a higher actual well-being than their earlier “5”.

Original estimate (HLI): Gain = 0.5 LS points

Adjusted for scale norming:

  • True baseline LS was 3
  • New LS is 5.5 (0.5 improvement on the reported 5)
  • True gain = 5.5 – 3 = 2.5 LS points

That equates to a 400% increase in the estimated effect once scale norming is corrected for. This is a fictional example, but it shows how distortive reporting scale drift could be and the need for direct consideration. 

Comments4


Sorted by Click to highlight new comments since:

This should have way more attention than it's currently receiving

I'm currently working on a paper which suggests 'scale norming' could lead to quite a large bias/underestimate of average national life satisfaction. Hope to post a version of this on the Forum soon.

Thanks to your post, I see HLI's 2023 pilot study ("Can we trust wellbeing surveys?") explores methods to correct for interpersonal scale-use differences. Although it doesn’t appear these methods have been incorporated into HLI’s cost-effectiveness models, maybe we’ll see how much scale norming might alter cost-effectiveness results like WELLBYs-per-dollar in time. 

Curated and popular this week
 ·  · 10m read
 · 
Regulation cannot be written in blood alone. There’s this fantasy of easy, free support for the AI Safety position coming from what’s commonly called a “warning shot”. The idea is that AI will cause smaller disasters before it causes a really big one, and that when people see this they will realize we’ve been right all along and easily do what we suggest. I can’t count how many times someone (ostensibly from my own side) has said something to me like “we just have to hope for warning shots”. It’s the AI Safety version of “regulation is written in blood”. But that’s not how it works. Here’s what I think about the myth that warning shots will come to save the day: 1) Awful. I will never hope for a disaster. That’s what I’m trying to prevent. Hoping for disasters to make our job easier is callous and it takes us off track to be thinking about the silver lining of failing in our mission. 2) A disaster does not automatically a warning shot make. People have to be prepared with a world model that includes what the significance of the event would be to experience it as a warning shot that kicks them into gear. 3) The way to make warning shots effective if (God forbid) they happen is to work hard at convincing others of the risk and what to do about it based on the evidence we already have— the very thing we should be doing in the absence of warning shots. If these smaller scale disasters happen, they will only serve as warning shots if we put a lot of work into educating the public to understand what they mean before they happen. The default “warning shot” event outcome is confusion, misattribution, or normalizing the tragedy. Let’s imagine what one of these macabrely hoped-for “warning shot” scenarios feels like from the inside. Say one of the commonly proposed warning shot scenario occurs: a misaligned AI causes several thousand deaths. Say the deaths are of ICU patients because the AI in charge of their machines decides that costs and suffering would be minimize
 ·  · 14m read
 · 
This is a transcript of my opening talk at EA Global: London 2025. In my talk, I challenge the misconception that EA is populated by “cold, uncaring, spreadsheet-obsessed robots” and explain how EA principles serve as tools for putting compassion into practice, translating our feelings about the world's problems into effective action. Key points:  * Most people involved in EA are here because of their feelings, not despite them. Many of us are driven by emotions like anger about neglected global health needs, sadness about animal suffering, or fear about AI risks. What distinguishes us as a community isn't that we don't feel; it's that we don't stop at feeling — we act. Two examples: * When USAID cuts threatened critical health programs, GiveWell mobilized $24 million in emergency funding within weeks. * People from the EA ecosystem spotted AI risks years ahead of the mainstream and pioneered funding for the field starting in 2015, helping transform AI safety from a fringe concern into a thriving research field. * We don't make spreadsheets because we lack care. We make them because we care deeply. In the face of tremendous suffering, prioritization helps us take decisive, thoughtful action instead of freezing or leaving impact on the table. * Surveys show that personal connections are the most common way that people first discover EA. When we share our own stories — explaining not just what we do but why it matters to us emotionally — we help others see that EA offers a concrete way to turn their compassion into meaningful impact. You can also watch my full talk on YouTube. ---------------------------------------- One year ago, I stood on this stage as the new CEO of the Centre for Effective Altruism to talk about the journey effective altruism is on. Among other key messages, my talk made this point: if we want to get to where we want to go, we need to be better at telling our own stories rather than leaving that to critics and commentators. Since
 ·  · 3m read
 · 
A friend of mine who worked as a social worker in a hospital told me a story that stuck with me. She had a conversation with an in-patient having a very difficult time. It was helpful, but as she was leaving, they told her wistfully 'You get to go home'. She found it hard to hear—it felt like an admonition. It was hard not to feel guilt over indeed getting to leave the facility and try to stop thinking about it, when others didn't have that luxury. The story really stuck with me. I resonate with the guilt of being in the fortunate position of being able to go back to my comfortable home and chill with my family while so many beings can't escape the horrible situations they're in, or whose very chance at existence depends on our work. Hearing the story was helpful for dealing with that guilt. Thinking about my friend's situation it was clear why she felt guilty. But also clear that it was absolutely crucial that she did go home. She was only going to be able to keep showing up to work and having useful conversations with people if she allowed herself proper respite. It might be unfair for her patients that she got to take the break they didn't, but it was also very clearly in their best interests for her to do it. Having a clear-cut example like that to think about when feeling guilt over taking time off is useful. But I also find the framing useful beyond the obvious cases. When morality feels all-consuming Effective altruism can sometimes feel all consuming. Any spending decision you make affects how much you can donate. Any activity you choose to do takes time away from work you could be doing to help others. Morality can feel as if it's making claims on even the things which are most important to you, and most personal. Often the narratives with which we push back on such feelings also involve optimisation. We think through how many hours per week we can work without burning out, and how much stress we can handle before it becomes a problem. I do find that