Hide table of contents

What risks should we be willing to take when deploying superintelligent systems? Such systems could prove quite dangerous, and so we may wish to delay their use in order to improve their safety. But superintelligent systems can also be used to defend ourselves against other existential threats, such as bioweapons, nuclear weapons, or indeed, other malicious AI systems. In deciding the optimal time to deploy superintelligent systems, we must therefore trade off the risks from deploying these systems with the gain in protection they afford us once deployed.

My aim here is to develop a simple model in which to explore this trade off. While I will phrase the model around the question of when to deploy a superintelligent system, similar issues arise more generally any time we must trade state risks, which are risks from existing in a vulnerable state, against transition risks, which are risks that arise from transitioning out of the vulnerable state.[1] For instance, developing certain biological capabilities might permanently reduce biorisk in the long-term, but cause a temporary short-livedincrease in biorisk; likewise, certain geoengineering projects might permanently reduce risks from climate change in the long-term but could potentially cause a catastrophe of its own if the deployment is botched. In each of these cases, waiting longer increases the state risk we are exposed to. But it also gives us more time to find and mitigate potential transition risks.

The Model

Let us assume that

  1. At any given time  we face a background extinction rate of .
  2. At any given time we can build a superintelligent system. Let be the probability that the system we build is aligned. We assume that if the system is aligned, we end up in a "win-state": with the help of its aligned superintelligent friend, humanity achieves its full potential. If the system is unaligned, however, humanity goes extinct.
  3. Our only goal is to maximize the probability we end up in a "win-state".[2]

This model is very simplistic. In particular, we assume that we will always succeed at building a superintelligent system if we try. This assumption, however, is not as restrictive as it first appears. If we only will succeed at building a superintelligent system with some probability and otherwise fail, then so long as trying and failing costs little, the value of does not change whether trying to create a superintelligence now is net positive or negative, relative to doing nothing. We can therefore apply our model even to cases where success is not guaranteed, so long as failure is not too costly.

We also ignore the possibility of S-risks, either background or caused by our superintelligent system. It is straightforward to allow for this, but at the cost of making the model a bit more complicated.

Solving the Model

Let us begin by considering the probability that humanity survives to time , assuming that no attempt is made to build a superintelligent system. This quantity satisfies the differential equation If we decide to build a superintelligent system at time , the overall probability that humanity realizes its potential is then given by Differentiating with respect to , we find that From this we conclude that if we are at some time for which then we can always improve our survival probability by waiting. Note in particular that because , it is always optimal to wait if If instead we have delaying superintelligent deployment decreases our survival probability.[3] The optimal time to to deploy the system occurs when or, in other words, when the relative gain in is precisely equal to the background extinction rate.

Even without further specifying and , it is immediately clear that so long as the background risk is bound from below, we should always eventually deploy our superintelligent system no matter how dangerous the system is. After all, constant risk guarantees that humanity will eventually go extinct, and so any gamble, no matter how small, is preferable to this. We can make this argument more quantitative by assuming that the background extinction risk is bounded so that , in which case the optimal time to deploy satisfies the inequality: This can be derived by directly integrating the condition for optimal deployment, or, alternatively, can instead also be derived by noting that

Quantitative Estimates

To make further progress, let us assume that the background extinction rate is some constant . Let us furthermore assume that transition risk decreases exponentially, so that We then find that at the optimal time to deploy our superintelligent system, Because the extinction risk monotonically decreases in our model, also represents the greatest possible transition risk we should be willing to take: if ever we find that then we should deploy the superintelligent system, and otherwise should wait.

To get a feel for what kind of risk we should be willing to take, let us assume that the AGI deployment risk decreases by 10% per year and that the baselines extinction rate is 0.1% per year (so that and ), then we find that .

So far we have been thinking of as the background rate of extinction for all humanity. Most actors, however, are probably more interested in their own survival (or, at least, the survival of their ideals or descendants) than in the survival of humanity more broadly. For such actors, the value of they will work with is necessarily larger than the background extinction rate for humanity, and will therefore be willing to take larger risks than is optimal for humanity. For instance, a government threatened by nuclear war might take to be closer to even in relatively peaceful times, and so (again taking ) would in this model be willing to deploy a superintelligent system when . Because individual actors may be willing to take larger gambles than is optimal for humanity, our model exhibits a "risk compensation effect", whereby publically spreading a safety technique may actually increase existential risk, if it reduces to below the value for which a selfish actor might rationally deploy the system, but above the value for which it is rational for humanity to deploy the system.

Conclusion

My aim here has been to develop a model to explore the optimal time to build a potentially dangerous superintelligent system. Act too quickly and the superintelligent system might be more dangerous than necessary; but if we wait too long we may instead expose ourselves to background risks that we could have used our superintelligent system to avoid. Hence, the optimal time to build a superintelligent system depends not only on the risk that such a system causes extinction, but also on both the background extinction risk and on the rate at which we can improve the safety of our superintelligent systems. While any quantitative estimate of the optimal AI risk to accept is necessarily very speculative, I was surprised by how easily risks on the order of a few percent could turn out to be a rational gamble for humanity. Rather worryingly, selfish actors who value their own survival should be willing to take even riskier gambles, such that spreading AI safety techniques may not always reduce existential risk.

Acknowledgements

I'm grateful to Abigail Thomas, Owen Cotton-Barratt, and Fin Moorhouse, for encouragement, discussions, and feedback.


    1. This categorization of risks is introduced by Nick Bostrom in Superintelligence, although he uses the term step risk rather than transition risk used by Toby Ord in Chapter 7 of The Precipice. ↩︎

    2. In essence this means that we are ignoring the possibility that some "win-states" might be better much better than others, and are assuming that the value of the win-state is so great that additional pleasure or suffering which occurs before the win-state can be ignored. ↩︎

    3. This is only true locally. In some cases it is possible that after waiting a sufficiently long time, will increase again, leading to an overall increase in the probability that humanity survives. ↩︎

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
 ·  · 16m read
 · 
This is a crosspost for The Case for Insect Consciousness by Bob Fischer, which was originally published on Asterisk in January 2025. [Subtitle.] The evidence that insects feel pain is mounting, however we approach the issue. For years, I was on the fence about the possibility of insects feeling pain — sometimes, I defended the hypothesis;[1] more often, I argued against it.[2] Then, in 2021, I started working on the puzzle of how to compare pain intensity across species. If a human and a pig are suffering as much as each one can, are they suffering the same amount? Or is the human’s pain worse? When my colleagues and I looked at several species, investigating both the probability of pain and its relative intensity,[3] we found something unexpected: on both scores, insects aren’t that different from many other animals.  Around the same time, I started working with an entomologist with a background in neuroscience. She helped me appreciate the weaknesses of the arguments against insect pain. (For instance, people make a big deal of stories about praying mantises mating while being eaten; they ignore how often male mantises fight fiercely to avoid being devoured.) The more I studied the science of sentience, the less confident I became about any theory that would let us rule insect sentience out.  I’m a philosopher, and philosophers pride themselves on following arguments wherever they lead. But we all have our limits, and I worry, quite sincerely, that I’ve been too willing to give insects the benefit of the doubt. I’ve been troubled by what we do to farmed animals for my entire adult life, whereas it’s hard to feel much for flies. Still, I find the argument for insect pain persuasive enough to devote a lot of my time to insect welfare research. In brief, the apparent evidence for the capacity of insects to feel pain is uncomfortably strong.[4] We could dismiss it if we had a consensus-commanding theory of sentience that explained why the apparent evidence is ir
 ·  · 7m read
 · 
Introduction I have been writing posts critical of mainstream EA narratives about AI capabilities and timelines for many years now. Compared to the situation when I wrote my posts in 2018 or 2020, LLMs now dominate the discussion, and timelines have also shrunk enormously. The ‘mainstream view’ within EA now appears to be that human-level AI will be arriving by 2030, even as early as 2027. This view has been articulated by 80,000 Hours, on the forum (though see this excellent piece excellent piece arguing against short timelines), and in the highly engaging science fiction scenario of AI 2027. While my article piece is directed generally against all such short-horizon views, I will focus on responding to relevant portions of the article ‘Preparing for the Intelligence Explosion’ by Will MacAskill and Fin Moorhouse.  Rates of Growth The authors summarise their argument as follows: > Currently, total global research effort grows slowly, increasing at less than 5% per year. But total AI cognitive labour is growing more than 500x faster than total human cognitive labour, and this seems likely to remain true up to and beyond the point where the cognitive capabilities of AI surpasses all humans. So, once total AI cognitive labour starts to rival total human cognitive labour, the growth rate of overall cognitive labour will increase massively. That will drive faster technological progress. MacAskill and Moorhouse argue that increases in training compute, inference compute and algorithmic efficiency have been increasing at a rate of 25 times per year, compared to the number of human researchers which increases 0.04 times per year, hence the 500x faster rate of growth. This is an inapt comparison, because in the calculation the capabilities of ‘AI researchers’ are based on their access to compute and other performance improvements, while no such adjustment is made for human researchers, who also have access to more compute and other productivity enhancements each year.
 ·  · 21m read
 · 
Introduction ~440 billion shrimps are farmed each year [1]. This is over 5x the total number of all farmed land animals put together [2]. Many farmed shrimps suffer from conditions that can and should be addressed, such as poor water quality, high stocking densities, inhumane slaughter methods, and avoidable mutilations (such as eyestalk ablation) [3]. Shrimp Welfare Project is an organisation of people who believe that shrimps are capable of suffering and deserve our moral consideration [4]. We aim to cost-effectively reduce the suffering of billions of farmed shrimps. This post is essentially an expanded version of our 2025 Funding Proposal.  If you want the TL;DR version of this post, I'd recommend reading that. (Shr)Impact and Vision Shrimp Welfare Project has four workstreams, two of which we consider our Core or Foundational workstreams - those are Corporate Engagement and Farmer Support. Two more are relatively new, but we think they have a lot of potential, and those are Research & Policy, and Precision Welfare. For each workstream, I want to talk you through: * Our mission statement for the workstream * The problem we’re trying to solve through this workstream, * The strategy we’re taking to solve the problem, * The successes we’ve had so far * And our vision for 2030 Core: Corporate Engagement Catalysing industry-wide adoption of pre-slaughter stunning by buying and deploying electrical stunners to early adopters to build towards a tipping point that achieves critical mass. Problem (and Context) When we started Shrimp Welfare Project, we planned to originally work only directly with farmers. However, we soon became aware that unlike a lot of fish farming, which is often produced and consumed domestically, shrimps instead were bought and sold on the global market. In particular, most shrimps are farmed in the Global South (in places like Ecuador, India, and Vietnam), and then exported to countries in the Global North (such as those in Euro