The Happier Lives Institute (HLI) currently builds its cost-effectiveness evaluations on life satisfaction (LS) scores, without adjustment for issue with those scores like scale norming. Research makes clear that this assumption is problematic. The solution remains less clear but there are promising directions researchers could take to inform charity evaluations.
Fin Moorhouse, now a staff member at Longview Philanthropy, introduced scale norming to the EA Forum here back in November 2020, describing scale norming as occurring “when the scale I use for reporting my SWB changes compared to your scale, or my scale at a different time”. He proposed several methodologies to correct for it: retrospective rescaling, reference class projection, and momentary affect calibration. More recent research has identified other methods like cognitive interviewing or complex non-linear models that explain more variation in life satisfaction.
No one has, to my knowledge, applied or discussed the issue of scale norming directly to HLI’s methodology. HLI's 2025 Research Agenda acknowledges some challenges, such as “how comparable peoples’ answers to wellbeing questionnaires are” and concerns about “reliability and validity” in low and middle income countries, but these remain exploratory. HLI’s 2023 working paper, “Can I Get a Little Less Life Satisfaction, Please?” by Michael Plant, emphasizes the limitations of life satisfaction, but I still so no formal methodology offered or adopted in HLI's published work to address interpersonal comparability, scale drift, or norming (e.g., retrospective rescaling, reference class projection).
At present, HLI continues to rank interventions by wellbeing-per-dollar using LS data as if it were stable, linear, and interpersonally comparable across vastly different contexts. Until HLI formally accounts for this in its cost-effectiveness models, its conclusions, though promising, remain more uncertain than they need to be.
As proposed by Moorhouse, the adjustment could be follow-up interviews or surveys to ask participants to retrospectively re-rate past life satisfaction from their current perspective. This allows researchers to observe how internal scales shift over time (e.g., someone now rating a “7” might say their past “7” was actually a “5”). These re-ratings can be used to estimate and correct for scale drift.
Or instead of (or alongside) LS scores, HLI could use affect (positive and negative emotion) or the Day Reconstruction Method (DRM). These methods ask participants to report their actual experiences during the day rather than provide abstract ratings. They are less vulnerable to scale drift, more anchored in concrete experiences, and have been shown to predict life outcomes robustly.
This critique should not be mistaken for a rejection of HLI’s project. I currently believe that Subjective Well-Being (SWB) is the “least wrong” metric we currently have for quantifying moral impact for humans and more central than alternatives like QALYs, DALYs, or abstract preference satisfaction. In fact, scale norming can obscure improvements. For example, someone whose wellbeing rises sharply may still report similar LS scores because their internal standards have shifted upward. HLI’s current impact estimates may actually be too conservative.
It is also worth highlighting that scale norming doesn’t necessarily undermine HLI’s recommendations and may actually support them. If beneficiaries of an intervention report only modest LS gains due to rising internal standards, the true impact on their lived experience could be greater than the data reflect. For example take the following assumptions:
- An intervention (say psychotherapy or cash transfers) is estimated by HLI to improve LS by +0.5 points on a 0–10 scale.
- Let's assume from retrospective rescaling that beneficiaries later report their pre-intervention “5” was actually a “3”, a 2-point underestimation.
- This suggests their internal scale shifted: their current “5” corresponds to a higher actual well-being than their earlier “5”.
Original estimate (HLI): Gain = 0.5 LS points
Adjusted for scale norming:
- True baseline LS was 3
- New LS is 5.5 (0.5 improvement on the reported 5)
- True gain = 5.5 – 3 = 2.5 LS points
That equates to a 400% increase in the estimated effect once scale norming is corrected for. This is a fictional example, but it shows how distortive reporting scale drift could be and the need for direct consideration.
This should have way more attention than it's currently receiving
Thanks to your post, I see HLI's 2023 pilot study ("Can we trust wellbeing surveys?") explores methods to correct for interpersonal scale-use differences. Although it doesn’t appear these methods have been incorporated into HLI’s cost-effectiveness models, maybe we’ll see how much scale norming might alter cost-effectiveness results like WELLBYs-per-dollar in time.