Strong longtermism is subject to worries about fanaticism; it might require
“sacrific[ing] arbitrarily much, with probability arbitrarily close to 1 in ‘fanatical’ pursuit of an extremely unlikely but enormously larger payoff.”
Depending on the orders of magnitude involved, fanaticism can run strongly counter to our intuitions, to the point that fanatical demands have been cited as a reductio against certain decision theories. In “The case for strong longtermism,” Greaves and MacAskill identify the accusation of fanaticism as one of the most compelling arguments against longtermism. Despite this, fanaticism receives relatively little attention in the rest of the volume. Askell and Neth discuss “future fanaticism,” but only to the extent of identifying it as a potential objection to longtermism and suggesting that well-calibrated longtermists are less likely to act fanatically than might be assumed. Other authors (notably Unruh) note the “demandingness” of longtermism, but this is a more general worry; a decision theory can be demanding (require arbitrarily large sacrifices) without being fanatical (e.g. by only requiring such sacrifices when payoff is likely or assured).
Greaves and MacAskill offer a tentative defense of fanaticism, on both intuitive and technical grounds. Additional technical arguments are found in Beckstead and Thomas (2020), who try to show (among other things) that seeking to temper fanaticism by discarding probabilities below some threshold is doomed to failure. Here I offer replies to these arguments, renewing the intuitive concern about fanaticism and attempting to rescue the method of discarding small probabilities.
Bike helmets and seatbelts
One problem with trying to reason intuitively about fanaticism is that we simply don’t have good intuitions about very large or very small numbers in the abstract. As such, Greaves and MacAskill instead inspect everyday situations where we already make decisions with respect to very small numbers. They use the examples of riding a bike or driving a car, which incur about one in a million odds (a very small probability) of death (a very large harm) per 35 miles ridden or 500 miles driven, and point out that it doesn’t seem at all fanatical to choose to wear a helmet or seatbelt and so reduce the risk somewhat. Since we can reasonably assume that we can “positively affect the very long-term future with probabilities well above [one in a million],” we then ought not to regard longtermism as problematically fanatical, or at least no more so than helmet- or seatbelt-wearing.
These analogies seem to me to fail on two counts. First, recall our earlier definition of fanaticism. Helmet- and seatbelt-wearing certainly involve extremely unlikely but very rewarding payoffs (averting an otherwise fatal crash), but are hardly representative of a willingness to sacrifice arbitrarily much. Indeed, I suspect it is precisely because the costs of helmet- and seatbelt-wearing are so trivial that we tend not to regard the decision to wear a helmet or seatbelt as fanatical. Such decisions seem closer to the choice whether or not to pick up a free lottery ticket off the ground than to, for instance, decisions to fund existential risk mitigation instead of near-term charities[1].
Second, by choosing a risk-per-distance as the point of comparison, Greaves and MacAskill have smuggled in confusion about the interaction of fanaticism with probabilities in continuous spaces. They want to show that acting on one in a million odds is not fanatical, and seem willing to countenance that acting on one in a billion odds might be. These are roughly the probabilities of death associated with driving 500 and 1 miles, respectively. The issue, of course, is that one can drive 500 miles by driving 1 mile 500 times. If the anti-fanatic is only permitted to evaluate marginal probabilities, they risk incoherence: concluding at the start of each mile that wearing a seatbelt reduces odds of death by less than one in a billion and is thus fanatical, they travel 500 miles with no seatbelt, despite having previously concluded that wearing a seatbelt for a 500 mile journey is not fanatical[2]. Since distance is continuous, the same argument structure can be applied at any level of risk-per-distance to arrive at a contradiction.
Previewing arguments to come, I suggest that the anti-fanatic should instead be allowed to integrate risk, and devise a policy based on the total. The lifetime odds of dying in a car crash are about 1 in 100, surely well above the threshold for fanaticism. Perhaps we should then view the decision to wear a seatbelt for a 500 mile drive not as reflecting the result of a marginal expected-value calculation, but as reflecting a more general commitment to seatbelt-wearing as a means of reducing this overall risk. I conclude for the moment that, while Greaves and MacAskill are correct that helmet- and seatbelt-wearing are not fanatical, they are wrong to infer from this that arbitrary sacrifices for one in a million odds of very large payoff are intuitively justified.
Nicolausian discounting revisited
Confronted with an event probability on the order of e.g. 10^-20, the temptation is to simply disregard it (discount it down to zero), no matter how weighty the consequences might be should the event come to pass. Monton (2019) defends a version of this strategy, which he terms “Nicolausian discounting” after Nicolaus Bernoulli, who in 1714 proposed it as a solution to the St. Petersburg paradox. Nicolausian discounting allows us to avoid fanaticism without resorting to ethically contentious strategies like discounting the value of future lives, and is intuitively attractive, but presents some technical issues in need of clarification. Here I’ll defend Nicolausian discounting against a few objections.
The first, rhyming with the seatbelt example, concerns fine-graining of probability spaces. Suppose we wish to evaluate the value of taking some action with payoff sampled at random from some continuous space (say, the real numbers between 0 and 1). As Beckstead and Thomas point out “each specific outcome has probability zero,” and so naive Nicolausian discounting appears to endorse disregarding the whole lot[3], whereas standard expected-value reasoning allows us to integrate over the distribution. Can Nicolausian discounting handle these cases?
I think it can, if we allow for some integration prior to discounting. Each point in the payoff space, while itself having probability zero, exists in some neighborhood that taken together has probability above the discount threshold. We thus ought to be able to integrate each of these neighborhoods until we have a discrete distribution of probabilities that we can deal with normally. If we think of normal integration as being described by a converging series of Reimann sums, this suggestion amounts to something like taking the sum where the width of each rectangle equals or exceeds our discount threshold.
A similar strategy allows us to deal with Beckstead and Thomas’ first objection to “timidity.” Suppose we have a prospect to obtain some reward with certainty. We are repeatedly offered to trade our current prospect for one that pays off with very slightly worse probability but with vastly greater reward. Each prospect intuitively seems better than the one before, and so it seems unreasonably “timid” to refuse to trade, but after many such trades we find ourselves transformed into fanatics, holding tiny probabilities of enormous reward.
To resolve the paradox, we’ll apply discounting twice. Whatever change in probability is offered is assumed to be below our discount threshold (otherwise, we should feel comfortable using normal expected-value reasoning). As such, we discount it, and appraise the new prospect as presenting the same probability of reward as the previous one, making it obviously superior. If we are offered another trade, we integrate the total change (the previous, discounted change plus the next potential one) and discount again if it is still below our threshold; else, we consider the trade as offering probability worse by the total integrated change. In effect, we quantize changes in probability to the smallest unit that we care about[4]. If the payoff probability of the offered prospect ever falls below our discount threshold, we discount it down to zero and refuse to make additional trades, avoiding timidity without falling prey to fanaticism.
An additional “timidity” objection arises when we consider pairs of prospects with payoff probabilities on opposite sides of the discount threshold. Suppose our current prospect pays off with probability above, but arbitrarily close to, the discount threshold. If we are offered to trade it for one that pays off with probability below the discount threshold, we will always refuse, even if the absolute difference in probability is arbitrarily small and the difference in payoff arbitrarily large.
To the extent that this objection differs from the previous one, it is by degree rather than kind. Monton offers a clear solution in the form of a discount function that approaches zero smoothly, such that probabilities arbitrarily close to the discount threshold will be discounted arbitrarily close to zero. Beckstead and Thomas briefly consider this sort of approach in the context of Buchak’s (2013) “risk-weighted expected utility theory,” but conclude that it will still be vulnerable to fanaticism unless utility is bounded above. This seems mistaken; a hyperbolic discount function, for instance, that goes smoothly to zero at some finite value, nicely manages edge cases around the threshold while still discarding sufficiently small probabilities.
It will, of course, still be the case that any prospect above the discount threshold will be preferred to any below it, but for those who find Nicolausian discounting intuitively plausible this is more of a feature than a bug; if I think prospects with sufficiently low payoff probability are worthless, then of course I will reject them in favor of more probable prospects, even with lower payoff. Smooth discount functions simply capture the additional intuition that, if prospects with a given payoff probability are worthless, prospects with almost that probability are almost worthless[5].
Order of operations
I have tried to reconcile Nicolausian discounting with continuous, or nearly continuous, probability distributions by allowing for some integration prior to discounting. The skeptical reader might point out here that the details of this procedure will bear importantly on the contours of the resulting decision theory. Indeed, we already have a theory wherein one first integrates over possible outcomes, then makes decisions; it is called expected-value theory. As such, I face the question of how much integrating to do and when. If we first integrate over everything then discount, we’re just taking expected values (and will never discount, since the probability of getting some outcome is always 1), whereas if we discount then integrate, we can’t handle continuous distributions.
Earlier I gestured at discretizing such distributions by integrating into “bins” of size equal to our discount threshold. As it stands, this is still a bit underspecified. Suppose we maintain a discount threshold of one in a million. We have a prospect of obtaining one unit of utility with certainty, and can trade it for either of the following prospects[6]:
A:
- Ten million units with probability one in 1.5 million
- One unit with probability one in 1.5 million
- Zero units otherwise
B:
- Ten million units with probability one in 1.5 million
- Nine million units with probability one in 1.5 million
- Zero units otherwise
In both cases, the total payoff probability is higher than the discount threshold, but is divided among outcomes each with probability below the discount threshold. Integrating as little as possible, just until we reach our discount threshold (which in this case just means averaging the two non-zero payoffs for each prospect), we arrive at the following redescription:
A:
- An average of five million (plus a little bit) units with probability ~1.3 in a million
- Zero otherwise
B:
- An average of 9.5 million units with probability ~1.3 in a million
- Zero otherwise
Now everything is above our discount threshold, so we apply standard expected-value reasoning and conclude that, while B is better than A, both are better than our current prospect. Yet, the two offered prospects seem importantly different, and not just in terms of expected payoff. Prospect B presents us with two small probabilities of enormous payoffs that, taken together, are probable enough for us to care. Prospect A, on the other hand, presents a probability of enormous payoff small enough that we have committed not to care about it; it is only when averaged together with a similarly improbable mediocre payoff that it avoids being discounted. Our averaging strategy has introduced a hole through which some residual fanaticism has managed to leak; in valuing Prospect A over our current prospect, we are leaning on a tiny probability of enormous payoff, which is exactly what we adopted Nicolausing discounting to avoid.
We can plug this hole, at the cost of introducing another free parameter. Instead of integrating willy-nilly until the probability of the integrand reaches our discount threshold, we’ll set some tolerance, and only integrate possible outcomes with payoffs within the tolerance of each other. This could be additive (e.g. integrate outcomes with payoffs +- 1000 units of utility from each other) or multiplicative (e.g. integrate outcomes with payoffs within an order of magnitude of each other). Returning to our two prospects, we might then say that Prospect A offers a negligible change of a small payoff and negligible chance of a big payoff, so we assess it as having zero expected payoff, whereas Prospect B offers a small (but just barely non-negligible) change of a big payoff, so we assess it as having better expected payoff than our current prospect.
Fine, says the skeptic. How about then (supposing a discount threshold of one in a million and a tolerance of an order of magnitude) a prospect with two million possible outcomes, all equiprobable, with payoffs each separated by two orders of magnitude? How are we to evaluate such a prospect?
Here I will throw up my hands and admit defeat; integration with tolerances will not prevent a Nicolausian discounter from discarding every possible outcome. In my defense I can say only that I challenge any decision theory to evaluate this prospect in a way that seems reasonable on its face. The expected value, for instance, is many many orders of magnitude away from almost every outcome, and so it seems the conclusions that one can draw from it are rather limited. If I have succeeded in forcing objections to Nicolausian discounting to reach this level of pathology, I will be content.
In summary, we have arrived at a version of Nicolausian discounting augmented with smooth discount functions to avoid threshold effects, integration to deal with continuous or nearly continuous distributions and repeated gambles, and a tolerance when choosing which possibilities to integrate over to avoid creeping fanaticism. Is this an ad hoc monstrosity of a decision theory, constructed merely to avoid a handful of specific objections? On the contrary, I would argue that each addition is quite reasonable on its own merits. Discount functions, as mentioned, reflect our intuition that almost-equivalent things should be treated almost-equivalently (the same intuition that continuum arguments try to leverage in defense of fanaticism); integration, the intuition that one should consider collective outcomes collectively; tolerances, the intuition that alike outcomes combine much more naturally than unlike outcomes. Perhaps there is some master decision theory that captures all of this more elegantly, but lacking such a theory I think augmented Nicolausian discounting presents a more reasonable heuristic for decision-making than either bare Nicolausian discounting or vanilla expected-value theory[7].
Fanaticism and longtermism
If the preceding arguments are to be believed, our everyday decision making with respect to e.g. seatbelts and bike helmets does not imply that fanaticism is reasonable, and it may be possible to construct a coherent decision theory that discards very small probabilities rather than allowing them to dominate decision making via sufficiently high payoffs. Does this weaken the case for strong longtermism?
To first order, no, or at least I don’t think so. As Greaves and MacAskill point out, even quite conservative estimates of risk reduction from spending e.g. $1B on AI safety give probabilities on the order of 10^-5, and this seems implausibly high for a discount threshold. In some respects, my account is even favorable to longtermists; an individual donation of, say, $10 might reduce x-risk by a negligible amount and thus seem fanatical, but I have argued that repeated or collective actions ought to be evaluated collectively, and so if the $10 is part of humanity’s strategy to spend $1T and in doing so reduce x-risk by a non-negligible probability (or something like that), the worry disappears.
That said, employing augmented Nicolausian discounting means that the enormous potential value of the future does not create quite the same carte blanche for longtermists as it does with expected-value reasoning. Taking Greaves and MacAskill's main estimate of 10^24 future beings, and assuming ~$4000 saves a life today via e.g. bednets, then expected-value reasoning says that any use of $4k that reduces x-risk by at least 1 in 10^24 is as good or better than bednets. At the moment, we happen to be in a situation where many organizations focused on x-risk reduction can do much better than 1 in 10^24, but if we fund the Planetary Society, Center for Health Security, Alignment Research Center, and other high-impact groups to saturation, we might hit our discount threshold well before the expected value calculation tells us that bednets are a better decision. At the point where we’re considering funding Goofus the Independent Alignment Researcher™ whose work plausibly reduces x-risk by 1 in 10^20, I am quite comfortable discounting this probability down to zero and funding near-term charity instead.
Still, these are concerns for a funding landscape dramatically different from the one of 2025. While worries about fanaticism make the case for strong longtermism more contingent, the core argument that we ought to devote substantially greater resources to safeguarding the far future survives unscathed.
References
Askell, A., & Neth, S. (2025). Longtermist myopia. In H. Greaves, J. Barrett, & D. Thorstad (Eds.), Essays on Longtermism: Present Action for the Distant Future (pp. 17–49). Oxford UP.
Beckstead, N., & Thomas, T. (2021). A paradox for tiny probabilities and enormous values. Global Priorities Institute.
Buchak, L. (2013). Risk and Rationality. Oxford UP.
Greaves, H., & MacAskill, W. (2025). The case for strong longtermism. In H. Greaves, J. Barrett, & D. Thorstad (Eds.), Essays on Longtermism: Present Action for the Distant Future (pp. 17–49). Oxford UP.
Monton, B. (2019). How to Avoid Maximizing Expected Utility. Philosopher’s Imprint, 19(18).
Unruh, C. F. (2025). Against a moral duty to make the future go best. In H. Greaves, J. Barrett, & D. Thorstad (Eds.), Essays on Longtermism: Present Action for the Distant Future (pp. 17–49). Oxford UP.
Wilkinson, H. (2022). In Defence of Fanaticism. Ethics, 132(2), 445–477. https://doi.org/10.1086/716869
- ^
Of course, the cost of putting on a seatbelt isn’t literally zero; it’s a little bit inconvenient. Does this inconvenience become comparable to the lives counterfactually lost by diverting philanthropic funding when scaled up? Taking Greaves and MacAskill's example, suppose 10^8 dollars buys either one in a million odds of averting AI catastrophe, saving (minimally) 10^14 future lives, or certainty of saving 10^5 present lives via e.g. bednets. Supposing the odds of choosing to wear a seatbelt saving your life are comparable, the analogy holds if having to put on a seatbelt is about one billionth as bad as death. My intuition says it is not, but this intuition, being about a very small number, is subject to skepticism.
- ^
As in the previous footnote, this logic could fail if the cost of putting on a seatbelt genuinely exceeds the benefit at (or above) a one mile voyage. Then 500 one-mile voyages mean 500 seatbelt applications, and the cost still exceeds the benefit. But this is assuming normal expected-value reasoning, and in examining fanaticism we are interested in precisely those cases where we might not want to use this sort of reasoning.
- ^
This precise example is somewhat curious; strictly, Nicolausian discounting should endorse discounting each probability down to zero, but as every specific probability is zero already, it’s unclear what this changes. The force of the objection stands if we instead replace the continuous distribution with very many possible payoffs of infinitesimal (but nonzero) probability.
- ^
Here I am echoing Monton, who argues that agents employing Nicolausing discounting should evaluate the consequences of repeated decisions collectively, as opposed to marginally (as is allowed of agents playing e.g. iterated prisoner’s dilemma). He declines to describe what this collective evaluation might look like; I am attempting to do so.
- ^
I note that smooth discount functions, discounting small differences, and integrating over repeated small differences seem sufficient to diffuse Wilkinson’s “scale independence or absurd sensitivity” dilemma, as presented in (Wilkinson, 2022). Wilkinson seems to acknowledge this in footnote 45, but ignores it in the subsequent discussion.
- ^
The exact numbers are not so important, and were chosen mainly to present a situation where neither the relevant probabilities pre-integration nor those post-integration are arbitrarily close to the discount threshold. Still, they are small, and focusing on edge or borderline edge cases seems to somewhat betray the commonsense spirit of Nicolausian discounting, but as the objections in the literature are full of just this sort of edge case, I am attempting to preempt one such possible objection.
- ^
Many of the remaining technical questions – how do we choose a discount threshold or function? Is such a choice arbitrary? – are addressed deftly by Monton. I fear I have already spent enough of this essay reproducing his arguments, and so omit further discussion.
Great post, thanks for writing!
I buy that individuals should try to pick "policies" and psychologically commit themselves to them, rather than only evaluate actions one at a time. I think this totally makes sense for seatbelts and helmets. However, I'm not sure it requires evaluating actions collectively at a fundamental normative level rather than practically, especially across individuals. I think we can defend wearing seatbelts and helmets with Nicolausian discounting without supporting longtermism or x-risk work to most individuals, even if the marginal x-risk opportunity were similar to the average or best already funded.
In particular, I know that if I don't wear my seatbelt this time in a car by some logic that is not very circumstance-specific, I could use similar logic in the future to keep talking myself out of wearing a seatbelt, and those risks would accumulate into a larger risk that could be above the discount threshold. So I should stop myself now to minimize that risk. I should consider the effects of my reasoning and decision now on my own future decisions.
However, I don't have nearly as much potential influence over humanity's x-risk strategy (causally or acausally) and the probability of an existential catastrophe. The typical individual has hardly any potential influence.
Also, separately, how would you decide who or what is included in the collective? Should we include the very agents creating the problems for us?