Hide table of contents

Warning: Discussions of extreme pain and suffering.

In the US, one QALY is valued at around $100,000–$150,000 [1]. In the UK, a WELLBY is valued at £13,000 [2]. Such thresholds help governments and philanthropists decide whether interventions are cost-effective and worth implementing.

However, these metrics (plus the DALY) are largely insensitive to instances of extreme suffering (at least in their current form), so if we want to perform cost-effectiveness analyses of interventions aimed at reducing extreme suffering, we need either to supplement these metrics or introduce new ones.

This post explores some considerations relevant to deciding how much we should value averting instances of extreme suffering. In particular, we’re interested in strategies to estimate the dollar value of averting a “Day Lived in Extreme Suffering” (DLES), defined as a day spent in “the most urgent suffering at the level of approximately 9/10 and above.” [3] However, I won’t attempt to calculate specific numbers in this post, but rather outline possible approaches for future research.

As a motivating example, consider the work of the US nonprofit National Religious Campaign Against Torture, which aims (among others) to “ensure that U.S.-sponsored torture of detainees never happens again.” One might be interested in asking how many instances of torture their work has averted and at what cost, and how that calculation compares to, say, Clusterbusters’ efforts to prevent cluster headache attacks. Similarly, we might ask how much a government should be willing to spend to ensure that all patients with terminal cancer get access to adequate pain relief.

To state the obvious: Putting a dollar value on some unit of extreme suffering doesn’t necessarily mean we can trade one for the other. This is perhaps more intuitive in discussions of the value of a statistical life, which aims to quantify real-life trade-offs rather than to make a normative claim about the value of life (which philosophers may argue is incommensurate and therefore impossible to compare with other goods [4]). However, governments, insurance companies, philanthropists, etc. are already—perhaps implicitly—making such trade-offs, and the severity of the suffering involved demands that we make the trade-offs transparent.

Reasoning about a Day Lived in Extreme Suffering

Since discussing dollar values and numeric pain scales risks obscuring the severity of extreme suffering, it’s worth developing intuitions for what “≥9/10 suffering” means, so that every time we read “DLES,” we can mentally substitute it with a better intuition. As one would expect, though, 9/10 is not a crisp threshold, and the heavy-tailed valence hypothesis [5] suggests that vast suffering is compressed at the end of the scale: the upper limit to how severely someone can suffer is unfathomably high.

To start developing some intuitions, I suggest we start by thinking of a DLES as 24 independent instances of one hour experiencing extreme suffering.

The motivation for splitting a day into one-hour instances is two-fold:

  1. Several conditions that cause extreme suffering typically last in the order of magnitude of hours, which facilitates comparisons. For example, cluster headache attacks (considered by many the most painful medical condition [6]) often last between 15 minutes and 3 hours. Passing a kidney stone also usually takes hours [7].
  2. Using hours as the default order of magnitude elicits better intuitions regarding trade-offs. That is, it is preferable to ask, for example, “how many hours of moderate suffering would you trade against an hour of extreme suffering” than “against a second (or a minute) of extreme suffering.”

But what exactly counts as extreme suffering? And how do we compare physical vs psychological suffering?

Let us start with physical pain.[1] One way to think about a DLES could be as 24 invasive surgical procedures performed without anesthesia, each lasting one hour. I believe that’s what a single, severe cluster headache attack feels like. More specifically, based on hundreds of testimonials I’ve read, I imagine a severe cluster headache attack being similar to undergoing something like one of these two surgical procedures without anesthesia:[2]

  1. Microvascular Decompression (MVD): Typically performed on patients with trigeminal neuralgia, MVD involves drilling[3] a coin-sized hole through the skull, roughly behind the ear, and inserting a microscope and a metal micro-dissector to locate the blood vessel that's pressing on the trigeminal nerve. A small pad is then inserted between the nerve and the blood vessel to prevent the vessel from touching the nerve again. (Duration: 1.5–3.5 hours under general anesthesia.)

  2. Enucleation (removal of the entire eye): This procedure is carried out in cases of severe damage to the eye or cancer in the eye’s middle layer. It involves cutting the optic nerve, extraocular muscles, and connective tissue. The area is richly innervated by the ophthalmic branch of the trigeminal nerve, so it’s one of the most sensitive in the body. (Duration: 45–75 minutes under general anesthesia.)

Fortunately, anesthesia is widely available in the US: 40 million anesthetics are administered every year [8], and over 60,000 patients per day receive general anesthesia [9].

We recently estimated that cluster headache causes about 3 million DLES per year worldwide [10]—about 189,000 in the US alone. That’s 4.5 million hours in the US. Even if our estimates were off by an order of magnitude, that’d still be 450,000 hours of extreme suffering every year in the US from cluster headache alone: nearly half a million hour-long invasive surgical procedures without anesthesia annually. This truly is a tragedy of incredible proportions hiding in plain sight.

Extreme psychological suffering

While psychological pain may be qualitatively different from physical pain, it can be just as severe. Consider these quotes from patients with depression [11]:

I have suffered from severe, recurrent depression for 40 years. The psychological pain that I felt during my depressed periods was horrible and more severe than my current physical pain associated with metastases in my bones from cancer.

 

The pain from my recent episode passing a urinary stone did not compare in severity to the pain and suffering I experienced during my depression when I was so intensely suicidal.

 

It is like being in a black hole and trying to claw my way up to get out of it but I keep slipping further and further down that hole. The suffering is torture. It is the worst pain that I know.

One can reasonably debate whether, for example, the most severe form of physical torture feels worse than the worst psychological suffering, for some reasonable definition of “worse.” Or whether certain characteristics of psychological pain make it more unbearable than physical pain (such as a dimension of hopelessness). But for our purposes, it may suffice to establish a threshold (e.g., 9/10) at which either physical or psychological pain becomes unbearable. Anything above that would count as “extreme.”

Is there a proxy or metric for extreme psychological pain analogous to the ≥9/10 threshold for physical pain?

Several scales have been developed in the literature to measure psychological pain. Widely used scales include the Psychological Assessment Pain Scale, the Orbach and Mikulincer Mental Pain Scale, and the Psychache Scale (PAS). These are trait-level scales, i.e., they assess a person's tendency or capacity to experience psychological pain across different situations and over extended periods. For example, the questionnaires may ask the respondent to what extent they agree with statements such as "My pain makes my life seem dark and hopeless," "I feel empty inside," or "I feel that my mental pain is too intense to bear."

However, a 0–10 visual analog scale has been suggested to also capture psychological pain, namely, the Physical and Psychological Pain Visual Analogue Scale (PPP-VAS) [12]. This scale asks patients to rate six items from 0 to 10: their current level of psychological pain, the maximum and mean psychological pain experienced in the past 15 days, and those three metrics but for physical pain. Research has shown the scale's validity and reliability in measuring psychological pain [13]. So perhaps we could operationalize extreme psychological pain analogously to physical pain, i.e., pain rated at ≥9/10 on the PPP-VAS.

Situations that may give rise to such severe levels of pain include the acute shock of a tragic loss (such as a parent losing a child), experiencing a crisis of self-hatred as a result of emotional abuse or bullying, severe benzodiazepine withdrawal leading to psychosis and terror, or various severe mental illnesses (such as psychotic depression).

Improving existing metrics vs developing new metrics

Should we insist on using a new metric like the DLES or find ways to adapt existing metrics (such as the QALY, DALY, or WELLBY) to better capture extreme suffering?

To illustrate the challenge of using existing metrics, let’s look at the DALY in the context of cluster headache. Recall that the DALY burden of a disease is the sum of two terms: Years Lived with Disability (YLD) and Years of Life Lost (YLL). The Global Burden of Disease considers that headache conditions do not cause death, so YLL = 0. Headache YLDs are then calculated as:

For migraine (4th largest source of YLD globally), we get:

For cluster headache, there’s no official disability weight,[4] but even if it were equal to 1.0, we’d get (using rough orders of magnitude):

This is a rounding error compared to other diseases.

Note that the disability weight is really the only lever we can modify, but for the total DALY burden of cluster headache to be significant and comparable to the most burdensome conditions, its DALY weight would have to be much, much larger than one.[5] It seems unlikely that health economists would consider such changes, especially given how controversial analogous efforts have been (such as introducing negative QALY weights to account for states worse than death [14]).

Perhaps more importantly, the DALY, QALY, and WELLBY were simply not originally designed to quantify instances of extreme suffering. As a thought experiment, imagine you’re told you have to undergo an invasive surgical procedure lasting one hour, but no anesthesia is available. Furthermore, imagine that there is no risk of dying and that you will be free of pain after the surgery. While such a situation would result in only one hour of disability (resulting in a negligible DALY burden), clearly, we would do almost anything we could to avoid it.

Having dedicated metrics like the DLES or the YLSS (Years Lived in Severe Suffering, capturing suffering at ≥7/10 [3]) would help legitimize the problem and cleanly separate the questions we’re trying to address. The QALY, DALY, and WELLBY would still prove valuable within their scope of validity.

Challenges in quantifying extreme suffering

There are a multitude of challenges in attempting to quantify extreme suffering, including the cognitive biases of sufferers and non-sufferers, and the methodological difficulties in gathering reliable data.

To begin with, given the heavy-tailed nature of valence [5], most people can’t properly understand how severe the extremes can be. This fact causes the 0–10 linear pain scale commonly used in medicine to break down, particularly at the upper en. As David Pearce put it [15]:

It's easy to convince oneself that things can't really be that bad, that the horror invoked is being overblown, that what is going on elsewhere in space-time is somehow less real than this here-and-now, or that the good in the world somehow offsets the bad. Yet however vividly one thinks one can imagine what agony, torture or suicidal despair must be like, the reality is inconceivably worse. Hazy images of Orwell's 'Room 101' barely hint at what I'm talking about. The force of 'inconceivably' is itself largely inconceivable here.

One consequence is that people who have not been exposed to the most severe forms of pain may overestimate the severity of their own pain. For example, patients with severe migraines may rate the pain as close to 9 or 10, whereas people who get both migraines and cluster headaches may rate migraine pain as a 5/10 [6]. Cluster headache patients themselves often rate attacks at higher than 10 (e.g., 11/10 or 12/10) when they experience a new, previously unthinkable level of pain. Ceiling effects also reflect this fact [10].

Recognizing that numerical ratings may be subject to different interpretations, the Welfare Footprint Institute (WFI) has suggested using a nominal scale with four discrete categories instead of numerical scales (such as the 0–10 VAS) [19]. The categories they use are Annoying, Hurtful, Disabling, and Excruciating, and each is accompanied by a detailed description to minimize misinterpretations and facilitate comparisons across conditions and individuals. While this categorization was developed in the context of animal welfare, it may also prove useful for discussions of human suffering.

Using this framework, the suffering captured by a DLES would tentatively correspond to the Excruciating category,[6] described as follows:

All conditions and events associated with extreme levels of pain that are not normally tolerated even if only for a few seconds. In humans, it would mark the threshold of pain under which many people choose to take their lives rather than endure the pain. This is the case, for example, of scalding and severe burning events. Behavioral patterns associated with experiences in this category may include loud screaming, involuntary shaking, extreme muscle tension, or extreme restlessness. Another criterion is the manifestation of behaviors that individuals would strongly refrain from displaying under normal circumstances, as they threaten body integrity (e.g. running into hazardous areas or exposing oneself to sources of danger, such as predators, as a result of pain or of attempts to alleviate it). The attribution of conditions to this level must therefore be done cautiously. Concealment of pain is not possible.

Future work could involve extending WFI’s framework to include psychological suffering in humans. For instance, Excruciating psychological pain could be described as follows:

Intense mental agony that is truly unbearable. Even momentary exposure to it feels intolerable. The suffering is so extreme that the person loses the will to live if it cannot be relieved. Immediate escape behaviors occur or are strongly contemplated. At this level, concealing the pain is impossible: the person may be wailing, distraught, or in a state of panic or catatonic despair. It is often marked by active suicidal ideation or attempts, as many would choose to take their own lives rather than endure such agony.

Another problem involves framing effects. In particular, people report inconsistent trade-off numbers when comparing conditions of different severity depending on the elicitation method. In a study in the US by Ubel et al. [16], respondents were asked their preferences to treat four conditions with different degrees of severity:

  1. A cyst on the hand that would not disturb functioning, but occasionally would cause mild pain;
  2. A knee damage that would prevent people from exercising, cause some difficulty when walking and cause moderate pain one hour daily;
  3. Constant, often severe headaches, that can be decreased with medicines, but not be eliminated without reducing the ability to concentrate; and
  4. Appendicitis (which untreated will cause death within hours or days).

Asking participants to assign a 0–1 utility value to each condition resulted in very different implied trade-offs compared to asking them directly how many people they’d rather cure from each condition:

 Implied XDirectly measured X
1 Appendicitis vs. X cysts100No trade-off. Always priority to appendicitis.
1 Appendicitis vs. X knee damages1712,000
1 Appendicitis vs. X headaches10800

Table 1: Implied vs directly measured trade-offs for different conditions, reproduced from [17].

If directly-measured trade-off ratios are much higher than what linear utilities suggest in cases of death (in the example above, from appendicitis), then we should expect to see even higher ratios for conditions so severe that patients often opt to take their lives.

Generally, many studies have documented people’s preferences to prioritize those who are worse off, casting serious doubts on the validity of unidimensional priority setting based on the QALY or DALY. (See [17,18] for a review of the literature.)

Another consequence of underestimating the severity of suffering is that it can affect our willingness-to-pay (WTP) to avert such suffering. We already systematically fail to take small steps or incur minor costs to avoid relatively large risks (e.g., from pollution, traffic accidents, or respiratory diseases). Such willingness may also depend on our level of wealth. A government’s WTP for any given intervention also depends on its budget and available infrastructure. For example, in 2019, public spending on health per capita in the US was $9,386, whereas in Uganda it was only $24 in 2017 [20].

Another challenge is that our willingness to avoid instances of extreme suffering may scale nonlinearly with the duration of the instance. For example, many cluster headache patients manage to cope and persevere thanks to having breaks in between attacks and bouts. Coping would be even harder if the attacks lasted much longer. Suicidality among chronic cluster headache patients is indeed higher than among episodic patients [21].

Relatedly, we seem to have developed coping mechanisms that lead us to underestimate the severity of our past suffering.

Experiencing extreme suffering may also warp our perception of time, further complicating the picture (cf. the pseudo-time arrow).

Overall, more research is needed on these questions. Unfortunately, as Magnus Vinding has pointed out, most of us avoid thinking about suffering most of the time, among others because it can be very unpleasant to do so, contributing to a lack of research on the topic:

The worst forms of suffering are so terrible that merely thinking about them for a brief moment can leave the average sympathetic person in a state of horror and darkness for a good while, and therefore, quite naturally, we strongly prefer not to contemplate these things. [22]

For example, in a survey of Irish medical trainees, the most commonly cited reason for not going into pain medicine was the “psychologically challenging patient cohort.” [23] And as Magnus emphasizes, groupthink exacerbates the problem: others don’t seem to think the problem is that bad, and they aren’t working on it, so why would we?

Potential approaches

The question “how much should we value averting a DLES?” is relevant insofar as it helps us determine which interventions governments or philanthropists should fund to avert a DLES.

Governmental approaches

Let’s begin with government decisions. Governments are often interested in answering “what’s the cost-effectiveness threshold (CET) at which we should be willing to pay for an extra unit of benefit?” The unit of benefit may be, for example, a QALY. There are four widely used methods to calculate CETs: willingness-to-pay (WTP) methods, precedent methods, opportunity cost methods, and GDP-based methods [20,24].

1. Willingness-to-pay methods

WTP methods estimate CETs by determining how much individuals would pay for health improvements (typically per QALY). Such WTP can be measured directly (through surveys) or indirectly (e.g., through market behavior analysis).

Direct methods typically follow a two-step process:

  1. Estimating the utility (usually in terms of QALYs) of a given condition, using e.g.:
    1. Time trade-offs (“You have 20 years left with chronic pain. How many years of perfect health would be equally valuable?"), or
    2. Standard gambles (“Choose between living with chronic pain for certain, or a treatment with 80% success rate and 20% chance of death. At what success probability would you be indifferent?”)
  2. Asking for WTP, e.g., through:
    1. Open-ended questions ("What's the maximum you'd pay annually for this treatment?")
    2. Bidding games (“Would you pay $3k for this treatment?” → “Yes” → “Would you pay $4k for this treatment?” → “No” → iterate until convergence)
    3. Discrete choice (ask different people "Would you pay $X for this treatment?", then gather statistics, “90% said ‘yes’ to $X, 70% said ‘yes’ to $Y”, etc.)

One can then divide the WTP by the utility (e.g. $3,000 per 0.25 QALY = $12,000 per QALY).

Here are some recommendations to estimate a CET for a DLES using WTP methods (apologies for all the acronyms!):

  • Use realistic and detailed descriptions of extreme suffering, however uncomfortable it might be. Analogies with torture or surgical procedures without anesthesia may help (see “Reasoning about a Day Lived in Extreme Suffering”), as well as testimonials by patients with those conditions.
  • Alternatively, deprioritize input from the general public on healthcare allocation decisions, a position supported by a large-scale survey of doctors, healthcare managers, and the general public [25].
  • Prioritize surveying patients who suffer or have suffered from extremely painful conditions.
  • To minimize recall bias, prioritize responses from patients currently experiencing such suffering (such as cluster headache patients in an active bout or people experiencing severe benzodiazepine withdrawal).
  • Prioritize responses from patients with a longer history of extreme pain, since they will be better calibrated.[7]

  • Include family members, caregivers, and medical professionals who have witnessed extreme suffering.

2. Precedent methods

These methods estimate thresholds by examining the cost-effectiveness of previously funded interventions. The commonly reported values of $50,000–$150,000 per QALY in the US are obtained this way [24]. Such methods may lead to arbitrary thresholds, since previous values may have been estimated in an ad hoc manner.

For DLES estimates, one could look at how much the government is already willing to spend on:

  • Anesthesia: The US likely spends tens or hundreds of billions of dollars annually on anesthesia, including e.g. the cost of anesthetic drugs, anesthesiologist salaries, specialized equipment, training and certification programs, insurance costs, etc. (Recall that 40 million anesthetics are administered every year in the US, and over 60,000 patients per day receive general anesthesia.)
  • Torture victims: If the US government has compensated victims of torture, such compensation amounts could be used as a benchmark. Here’s an example of a California man who received $900,000 in a settlement after being wrongly detained and abused by local authorities. This article provides a few additional examples from other countries.
  • Other: One could look at how much is spent on pain management, palliative care, suicide prevention, humanitarian aid in crisis zones, etc.

3. Opportunity cost methods

Opportunity cost methods consider additional budgetary constraints, such that an intervention is worth funding if it is more cost-effective than the least cost-effective interventions already being funded. Such methods are considered more robust than, e.g., WTP or precedent methods, but they’re more difficult to implement in practice [26].

It’s unclear how best to include DLES considerations, for at least two reasons:

  1. It may not be possible to directly compare interventions aimed at reducing DLES burden with existing QALY/DALY-based interventions.
  2. It may not be advisable to have a separate budget for DLES interventions specifically, as some DLES interventions could be eliminated due to not being as cost-effective as other DLES interventions, while potentially still being overwhelmingly more cost-effective (given certain philosophical assumptions) than other non-DLES interventions.

Any ideas on how to use this approach are welcome.

4. GDP-based methods

In 2015, the WHO-CHOICE suggested that interventions with an incremental cost-effectiveness ratio less than 3x the GDP per capita of a country should be considered cost-effective. However, despite its widespread use, this threshold has been heavily criticized, to the extent that the WHO-CHOICE has distanced itself from this recommendation [24]. The main issue is that it tends to vastly overestimate cost-effectiveness thresholds, especially in LMICs. As a result, we won’t explore this option.

Multi-criteria decision analysis (MCDA)

Over-reliance on a single cost-effectiveness threshold may lead to suboptimal allocation of resources and a host of misaligned incentives. In practice, however, governments rarely rely on a single criterion to prioritize health interventions. At the same time, the process to set priorities is often done ad hoc or based on (possibly irrational) historical precedents [27]. Recognizing that such decisions involve multiple competing criteria, some have suggested applying ideas from multi-criteria decision analysis (MCDA), which is routinely applied in other fields.

In the context of health prioritization, MCDA may involve developing performance matrices capturing different relevant dimensions, for example:

OptionsCost-effectivenessSeverity of diseaseDisease of the poorAge
Antiretroviral treatment in HIV/AIDSUS$200 per DALY4Yes15 years and older
Treatment of childhood pneumoniaUS$20 per DALY4Yes0–14 years
Inpatient care for acute schizophreniaUS$2,000 per DALY2No15 years and older
Plastering for simple fracturesUS$50 per DALY1NoAll

Table 2: Simplified example of a performance matrix with four different decision criteria, reproduced from [27].

Decision-makers can then prioritize interventions using qualitative or quantitative criteria:

  • Qualitative criteria: Perhaps one of the options dominates, i.e., it performs at least as well on all criteria and better on at least one. Or one could incorporate certain ethical criteria (such as a concern for those worst off) into a holistic decision.
  • Quantitative criteria: One could try to assign numerical values to each category, and then aggregate them according to some mathematical model.

MCDA is much more likely to prioritize conditions involving extreme suffering. As Sun et al. point out, single-criterion CET methods are rejected in some countries because they do not fully consider certain societal values, such as the concern for those who are worst off [26].

In particular, the severity of a disease should be given significant weight, which does not happen by default in “QALY egalitarian” calculations where “a QALY is a QALY is a QALY” [18,28]. Indeed, several studies have shown that respondents in countries like Norway, the US, Iceland and the UK give at least equal priority to patients with “severe health problems which improve a little with treatment” as they do to patients with “moderate health problems which improve considerably with treatment,” which goes against naive QALY-maximizing approaches [18].

To elicit accurate weights for the importance of severe health conditions relative to mild or moderate ones, the recommendations from the section on WTP methods apply.

Philanthropic approaches

Philanthropic organizations are not subject to the same constraints as governments, so they might use different CETs. For instance, GiveWell may use a metric like “cost per life saved” and decide that an intervention is worth funding if it meets the threshold “10x as cost-effective as direct cash transfers.”

Currently, no major philanthropists prioritize healthcare interventions using metrics such as “DLES averted per dollar,” but much of the DALY/QALY methodological machinery developed by GiveWell (or others) could be translated into philanthropic efforts aimed at reducing the most extreme forms of suffering. For example, one could compile a portfolio of interventions that reduce the DLES burden and develop benchmarks accordingly.

Some ideas for interventions in this space may include:

  • Efforts to distribute morphine and other opioids to treat terminal cancer in low-income countries.
  • Medical research into next-generation flumazenil analogs to reverse benzodiazepine dependence.
  • Advocacy campaigns to accelerate access to tryptamines for cluster headache.
  • Scalable psychotherapeutic treatments in low-income countries that prioritize cases of severe depression.
  • Advocacy efforts to prevent the use of torture.
  • Policy initiatives to ensure that hospitals worldwide have well-equipped and well-staffed burn units.

Discussion

Given the near-universal agreement across ethical traditions regarding the importance of prioritizing those who suffer most, it is surprising how little effort goes into understanding and addressing extreme suffering. Even in communities where triaging is common sense (such as the effective altruism community or even the medical community at large), actual investment into reducing extreme suffering is minuscule, especially relative to its importance.

Taking triage seriously (and assuming widespread sympathy for intense suffering) means, in my opinion, that any government’s top health priority should be to bring the DLES burden down as much as possible.[8] Indeed, if we or a loved one were experiencing easily preventable torture, we would want the government not to spare any expense (and so many other government priorities would seem insignificant in comparison, and justifiably so). Alarmingly, though, we still have a very poor understanding of the largest sources of DLES globally.[9] Mapping the global DLES burden should therefore be a top priority.

If reducing extreme suffering is overwhelmingly important, should we be concerned about such interventions absorbing all available healthcare funding? I.e., might this be an instance of a utility monster? In practice, no. I suspect that we can significantly reduce the global DLES burden with modest investments, especially for suffering caused by medical conditions. (Averting instances of torture by totalitarian regimes or terrorist groups is, however, significantly less tractable.)

For instance, morphine is very cheap—it is only due to regulatory restrictions, supply chain issues, and infrastructure limitations that terminal cancer patients in low-income countries struggle to access it.

Similarly, ensuring that every cluster headache patient has unrestricted access to high-flow oxygen, psilocybin, or DMT vape pens would require very modest investments, especially compared to conditions requiring very expensive medications or sophisticated equipment. And if the pain of a severe cluster headache attack is comparable to torture or surgery without anesthesia, then one could even argue that cluster headache patients (or others in a similar situation) should sue the government for not providing all possible pain relief options (as should someone who gets denied anesthesia for surgery). Relative to the potential legal settlement costs, providing universal pain relief would be a bargain.

Conclusion

Governments and philanthropists who take triage seriously should make extreme suffering much more prominent in their prioritization discussions. We still know very little about the global burden of DLES (sources, prevalence, duration, and severity), how individuals would trade a QALY or a WELLBY against a DLES, or what the most cost-effective interventions are to reduce the DLES burden. Much more research is needed.

The EA community seems uniquely well-positioned to make progress on these questions. We can repurpose many techniques and frameworks that have been stress-tested in the QALY or WELLBY context to tackle questions of intense suffering. The neglectedness of these topics means that any contributions now would significantly add to our understanding of the problem. I hope some of the ideas in this essay spark further discussions and publications on these topics on the forum.

References

1. Institute for Clinical and Economic Review. ICER’s Reference Case for Economic Evaluations: Elements and Rationale. (Institute for Clinical and Economic Review, 2024). at <https://icer.org/wp-content/uploads/2024/02/Reference-Case-4.3.25.pdf>

2. HM Treasury. Wellbeing Guidance for Appraisal: Supplementary Green Book Guidance. (2021). at <https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1005388/Wellbeing_guidance_for_appraisal_-_supplementary_Green_Book_guidance.pdf>

3. Leighton, J. The Tango of Ethics: Intuition, Rationality and the Prevention of Suffering. (Imprint Academic, 2023). at <https://www.imprint.co.uk/product/tango/>

4. Value of life. Wikipedia (2025). at <https://en.wikipedia.org/w/index.php?title=Value_of_life&oldid=1289599760>

5. Gómez-Emilsson, A. & Percy, C. The heavy-tailed valence hypothesis: the human capacity for vast variation in pleasure/pain and how to test it. Front. Psychol. 14, (2023).

6. Burish, M. J., Pearson, S. M., Shapiro, R. E., Zhang, W. & Schor, L. I. Cluster headache is one of the most intensely painful human conditions: Results from the International Cluster Headache Questionnaire. Headache J. Head Face Pain 61, 117–124 (2021).

7. Gómez-Emilsson, A. & Parra-Hinojosa, A. The Quest for a Stone-Free World: Chanca Piedra (Phyllanthus niruri) as an Acute and Prophylactic Treatment for Kidney Stones and Their Associated Extreme Negative Valence. Eff. Altruism Forum (2025). at <https://forum.effectivealtruism.org/posts/JNNrkeWdTHrS87opd/the-quest-for-a-stone-free-world-chanca-piedra-phyllanthus-1>

8. Anesthesia and Sedation. at <https://www.jointcommission.orghttps://www.jointcommission.org/resources/for-consumers/speak-up-campaigns/anesthesia-and-sedation/>

9. Brown, E. N., Lydic, R. & Schiff, N. D. General Anesthesia, Sleep, and Coma. N. Engl. J. Med. 363, 2638–2650 (2010).

10. Parra-Hinojosa, A., Percy, C. & Gómez-Emilsson, A. The Heavy Tail of Extreme Pain Exacerbates Health Inequality: Evidence from Cluster Headache Underinvestment. SSRN Scholarly Paper at https://doi.org/10.2139/ssrn.5255179 (2024)

11. Mee, S., Bunney, B. G., Reist, C., Potkin, S. G. & Bunney, W. E. Psychological pain: A review of evidence. J. Psychiatr. Res. 40, 680–690 (2006).

12. Olié, E., Guillaume, S., Jaussent, I., Courtet, P. & Jollant, F. Higher psychological pain during a major depressive episode may be a factor of vulnerability to suicidal ideation and act. J. Affect. Disord. 120, 226–230 (2010).

13. Alacreu-Crespo, A. et al. Are visual analogue scales valid instruments to measure psychological pain in psychiatric patients? J. Affect. Disord. 358, 150–156 (2024).

14. Schneider, P. The QALY is ableist: on the unethical implications of health states worse than dead. Qual. Life Res. 31, 1545–1552 (2022).

15. Pearce, D. Negative Utilitarianism. Hedonistic Imperative at <https://www.hedweb.com/negutil.htm>

16. Ubel, P. A., Loewenstein, G., Scanlon, D. & Kamlet, M. Individual Utilities Are Inconsistent with Rationing Choices: A Partial Explanation of Why Oregon’s Cost-Effectiveness List Failed. Med. Decis. Making 16, 108–116 (1996).

17. Nord, E. Severity of illness versus expected benefit in societal evaluation of healthcare interventions. Expert Rev. Pharmacoecon. Outcomes Res. 1, 85–92 (2001).

18. Shah, K. K. Severity of illness and priority setting in healthcare: A review of the literature. Health Policy 93, 77–84 (2009).

19. Welfare Footprint Institute. Pain Intensity Categories. Welf. Footpr. Inst. at <https://welfarefootprint.org/technical-definitions/pain-intensities/>

20. Kazibwe, J. et al. The Use of Cost-Effectiveness Thresholds for Evaluating Health Interventions in Low- and Middle-Income Countries From 2015 to 2020: A Review. Value Health 25, 385–389 (2022).

21. Jürgens, T. P. et al. Impairment in episodic and chronic cluster headache. Cephalalgia 31, 671–682 (2011).

22. Vinding, M. The Principle of Sympathy for Intense Suffering. Magnus Vinding (2018). at <https://magnusvinding.com/2018/09/03/the-principle-of-sympathy-for-intense-suffering/>

23. O’Sullivan, M. & O’Gara, A. Perceived barriers to a career in pain medicine in the Republic of Ireland. Ir. J. Med. Sci. 1971 - 193, 371–374 (2024).

24. Santos, A. S., Guerra-Junior, A. A., Godman, B., Morton, A. & Ruas, C. M. Cost-effectiveness thresholds: methods for setting and examples from around the world. Expert Rev. Pharmacoecon. Outcomes Res. 18, 277–288 (2018).

25. Heginbotham, C. in Rationing Action 141–156 (BMJ Publishing Group, 1993).

26. Sun, L., Peng, X., Li, S. & Huang, Z. Cost-effectiveness thresholds or decision-making threshold: a novel perspective. Cost Eff. Resour. Alloc. 21, 72 (2023).

27. Baltussen, R. & Niessen, L. Priority setting of health interventions: the need for multi-criteria decision analysis. Cost Eff. Resour. Alloc. 4, 14 (2006).

28. Culyer, A. J. The morality of efficiency in health care—some uncomfortable implications. Health Econ. 1, 7–18 (1992).


  1. ^

     I’ll use pain and suffering interchangeably, even though they are generally not.

  2. ^

     It is also worth pointing out that, unlike situations of severe physical trauma involving massive endorphin release that masks the pain, the neurological pain of cluster headache is not masked by such mechanisms, which likely explains why patients often describe the pain as “unadulterated” or “exquisite”.

  3. ^

     It is common for cluster headache patients to use drilling analogies to describe their pain. Here are some from r/clusterheads:

    “If you imagine drilling through your eyebrow ridge and into your eye, then backing the drill out, then in, then out.... That's how it feels to me.”

    “A demon drilling your temple , while simultaneously stabbing you in the eye with an ice pick. Literally the worst pain I’ve experienced in 53 years.”

    “I've described it as a large metal drill a mile long covered in rough, flaking rust, that is slowly drilling into my eyesocket and out through my ear.”

  4. ^

     But cluster headache is expected to be included in the next version of the Global Burden of Disease, likely with a very high disability weight of 0.7.

  5. ^

     Alternatively, since we only care about the relative values of the weights, one could compress the 0–1 scale nonlinearly, which is what I did here.

  6. ^

     I used WFI’s Hedonic-Track GPT tool to estimate the annual global burden of cluster headache in DLES using their four categories. You can find the results in this spreadsheet. We previously estimated the global DLES burden to be between 2.6–3.6 million days. Using their tool, I get 700k–2m counting only Excruciating time and 11m–33m using both Excruciating and Disabling time. However, the numbers are very sensitive to the choice of the metric “fraction of attacks whose maximum intensity reaches excruciating/disabling/hurtful levels.” Data on this point is very sparse.

  7. ^

     Here’s an example of a patient who suffered from cluster headaches chronically for 51 years, who also suffered from “many other severe, painful conditions” such as end-stage 5 COPD, adenocarcinoma lung cancer, achalasia disease, abdominal hernias, a broken back, severe osteoporosis, etc.

  8. ^

     I also strongly agree with this sentiment by Magnus Vinding: “In sum, by my lights, effective altruism proper is equivalent to effectively reducing extreme suffering. This, I would argue, is the highest meaning of “improving the world” and “benefiting others”, and hence what should be considered the ultimate goal of effective altruism.” [22]

  9. ^

     OPIS is actively working on this question; WFI also had some related plans; and I hope to continue working on these questions with QRI.

  10. Show all footnotes

54

1
0

Reactions

1
0

More posts like this

Comments6
Sorted by Click to highlight new comments since:

Executive summary: This exploratory post argues that extreme suffering—such as a “Day Lived in Extreme Suffering” (DLES), encompassing intense physical or psychological pain—is vastly undervalued by existing metrics like QALYs and DALYs, and calls for dedicated research into how we might better quantify and prioritize the alleviation of such suffering in policy and philanthropy.

Key points:

  1. Existing metrics inadequately capture extreme suffering: Tools like QALYs, DALYs, and WELLBYs often overlook short-term but intense suffering (e.g. torture, cluster headaches), as they emphasize duration and average impact over extremity.
  2. Proposed new metric—DLES: The author introduces the concept of a “Day Lived in Extreme Suffering” as a more appropriate unit for evaluating acute, excruciating pain, whether physical or psychological, and outlines ways to conceptualize and communicate its severity.
  3. Current burden may be substantial: For instance, cluster headaches alone may cause millions of DLES annually, with each attack likened to enduring surgery without anesthesia—underscoring an underappreciated public health burden.
  4. Adapting evaluation frameworks: The post explores how DLES-based assessments could fit into existing cost-effectiveness paradigms (e.g. willingness-to-pay, precedent spending, multi-criteria decision analysis), and how both governments and philanthropists might integrate such metrics.
  5. Major uncertainties and research needs: Key gaps include how people would trade a DLES against QALYs/WELLBYs, what interventions most cost-effectively avert DLES, and how to rigorously define and measure extreme suffering.
  6. Call to action for the EA community: Given its focus on neglected and tractable problems, the author suggests effective altruists are particularly well-suited to develop tools, metrics, and priorities around extreme suffering and should treat this as a top research and advocacy frontier.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Excellent post Alfredo. Agree wholeheartedly. 

I really like the specific detailed surgical procedures performed without anaesthetic as a way to get a sense for what its like to have these experiences. 

Great post, Alfredo. I added a reference to it in my Preparatory Notes for the Measurement of Suffering

You say "Given the near-universal agreement across ethical traditions regarding the importance of prioritizing those who suffer most, it is surprising how little effort goes into understanding and addressing extreme suffering. Even in communities where triaging is common sense (such as the effective altruism community or even the medical community at large), actual investment into reducing extreme suffering is minuscule, especially relative to its importance."

I will just answer that with this Call for Collaboration to Systematically Address the Problem of Suffering in the World.

Thank you for your feedback, Robert, and for your commitment to reducing suffering! I'll stay tuned!

Thanks for the post, Alfredo.

Using this framework, the suffering captured by a DLES would tentatively correspond to the Excruciating category,[6] described as follows

Cynthia Schuck-Paim from WFI said "Examples [of excruciating pain] would include severe burning in large areas of the body, dismemberment, or extreme torture".

Taking triage seriously (and assuming widespread sympathy for intense suffering) means, in my opinion, that any government’s top health priority should be to bring the DLES burden down as much as possible.

Would you still believe this under expectational total hedonistic utilitarianism?

Thanks, Vasco!

Would you still believe this under expectational total hedonistic utilitarianism?

Someone who is fully bought into expectational total hedonistic utilitarianism and nothing else would probably not agree with that conclusion, no. (I don't endorse such a form of utilitarianism. My views align much more closely to something like xNU+.)

I do, however, hope that people across the ethical spectrum can acknowledge that we could be doing much more to relieve extreme suffering without necessarily making significant compromises elsewhere. We already managed to provide universal access to anesthesia (at least in developed countries), so we could do much more for those who are still being left behind at a small fraction of the cost.

Curated and popular this week
Relevant opportunities