How bad would human extinction be?

arvomm

How bad would human extinction be?

arvomm

22 min readOct 23, 2023

Comments 25

Sorted by

New & upvoted

Dan_Keys

Why are these expected values finite even in the limit?

It looks like this model is assuming that there is some floor risk level that the risk never drops below, which creates an upper bound for survival probability through n time periods based on exponential decay at that floor risk level. With the time of perils model, there is a large jolt of extinction risk during the time of perils, and then exponential decay of survival probability from there at the rate given by this risk floor.

The Jupyter notebook has this value as r_low=0.0001 per time period. If a time period is a year, that means a 1/10,000 chance of extinction each year after the time of perils is over. This implies a 10^-43 chance of surviving an additional million years after the time of perils is over (and a 10^-434 chance of surviving 10 million years, and a 10^-4343 chance of surviving 100 million years, ...). This basically amounts to assuming that long-lived technologically advanced civilization is impossible. It's why you didn't have to run this model past the 140,000 year mark.

This constant r_low also gives implausible conditional probabilities. e.g. Intuitively, one might think that a technologically advanced civilization that has survived for 2 million years after making it through its time of perils has a pretty decent chance of making it to the 3 million year mark. But this model assumes that it still has a 1/10,000 chance of going extinct next year, and a 10^-43 chance of making it through another million years to the 3 million year mark.

This seems like a problem for any model which doesn't involve decaying risk. If per-time-period risk is 1/n, then the model becomes wildly implausible if you extend it too far beyond n time periods, and it may have subtler problems before that. Perhaps you could (e.g.) build a time of perils model on top of a decaying r_low.

Michael St Jules 🔸

(Commenting on mobile, so excuse the link formatting.)

Including his estimate (guess?) of 1 in a million risk per century in the long run:

https://forum.effectivealtruism.org/posts/zLZMsthcqfmv5J6Ev/the-discount-rate-is-not-zero?commentId=GzhapzRs7no3GAGF3

In general, even assigning a low but non-tiny probability to low long run risks can allow huge expected values.

See also Tarsney's The Epistemic Challenge to Longtermism https://philarchive.org/rec/TARTEC-2 which is basically the cubic model here, with consistent per period risk rate over time, but allowing uncertainty over the rate.

Thorstad has recently responded to Tarsney's model, by the way: https://ineffectivealtruismblog.com/2023/09/22/mistakes-in-the-moral-mathematics-of-existential-risk-part-4-optimistic-population-dynamics/

arvomm

Good to hear from you Michael! Some thoughts:

You're right that the Tarsney paper was an important driver in bringing cubic to this framework. That's why it's a key source in the value cases summary. Modelling uncertainty is an excellent next step for various scenarios.
Thanks very much for the link to David's response. I hadn't seen that!
Good to have the link to Carl's thread, it'll be valuable to run these models and get some visualisations with that 1 in a million estimate too!

Michael St Jules 🔸

It also seems worth mentioning grabby alien models, which, from my understanding, are consistent with a high probability of eventually encountering aliens. But again, we might not have near-certainty in such models or eventually encountering aliens. And I don't know what kind of timeline this would happen on according to grabby alien models; I haven't looked much into them.

Dan_Keys

One way to build risk decay into a model is to assume that the risk is unknown within some range, and to update on survival.

A very simple version of this is to assume an unknown constant per-century extinction risk, and to start with a uniform distribution on the size of that risk. Then the probability of going extinct in the first century is 1/2 (by symmetry), and the probability of going extinct in the second century conditional on surviving the first is smaller than that (since the higher-risk worlds have disproportionately already gone extinct) - with these assumptions it is exactly 1/3. In fact these very simple assumptions match Laplace's law of succession, and so the probability of going extinct in the nth century conditional on surviving the first n-1 is 1/(n+1), and the unconditional probability of surviving at least n centuries is also 1/(n+1).

More realistic versions could put more thought into the prior, instead of just picking something that's mathematically convenient.

arvomm

Thank you very much Dan for your comments and for looking into the ins and outs of the work and highlighting various threads that could improve it.

There are two quite separate issues that you brought up here. First about infinite value, which can be recovered with new scenarios and, second, the specific parameter defaults used. The parameters the report used could be reasonable but also might seem over-optimistic or over-pessimistic, depending on your background views.

I totally agree that we should not anchor on any particular set of parameters, including the default ones. I think this is a good opportunity to emphasise one of the limitations in the concluding remarks saying that "we should be especially cautious about over-updating from specific quantitative conclusions". As you hinted, one important reason for this is that the chosen parameters do not have enough data behind them and are not puzzles-free.

Some thoughts sparked by the comments in this thread:

You're totally right to point out that the longer we survive in expectation the longer the simulation needs to be run for us to observe convergence.
I agree that risk is unlikely to be time-invariant for long eras, and I'm really excited about bringing in more realistic structures, like the one you suggest: an enriched Time of Perils with decaying risk. I'm hoping WIT or other interested researchers do more to spell out what these structures imply about the value of risk mitigation.
On the flip side of the default r_low seeming too high, if seen from the point of view of the start of a century, it'd imply a probability of surviving each century.
A tiny r_low might be more realistic, though I confess lacking strong intuitions either way about how risk will behave in the coming centuries, let alone millennia. In my mind, risk could decay or increase, and I do hope the patterns so far, for example these last 500 years, are nothing to go by.
Your point about conditional probabilities is a good way to introduce and think about thought experiments on risk profiles. It made me think that a civilisation like the one you describe surviving different hurdles could be modelled under Great Filters where you indeed use an r_low orders of magnitude smaller than the current default and you'd get something that fits the picture you'd suggest much better, even without introducing any modifications like the decaying risk. Let me know if you play around with the code to visualise this.

Linch

(speaking for myself)

The conditional risk point seems like a very interesting crux between people; I've talked both to people who think the point is so obviously true that it's close to trivial and to people who think it's insane (I'm more in the "close to trivial" position myself).

Michael St Jules 🔸

Another way to get infinite EV in the time of perils model would be to have a nonzero lower bound on the per period risk rate across a rate sequence, but allow that lower bound to vary randomly and get arbitrarily close to 0 across rate sequences. You can basically get a St Petersburg game, with the right kind of distribution over the long-run lower bound per period risk rate. The outcome would have finite value with probability 1, but still infinite EV.

EDIT: To illustrate, if f(r), the expected value of the future conditional on a per period risk rate r in the limit, goes to infinity as r goes to 0, then the expected value of f(r) will be infinite over at least some distributions for r in an interval (0, b], which excludes 0.

Furthermore, if you assign any positive credence to subdistributions over the rates together that give infinite conditional EV, then the unconditional expected value will be infinite (or undefined). So, I think you need to be extremely confident (imo, overconfident) to avoid infinite or undefined expected values under risk neutral expectational total utilitarianism.

JWS 🔸

This is absolutely fantastic work! One of the Forum posts of the year so far! A really good step towards getting robust estimates of xRisk work, would be great to see other work following up on this research agenda (both OAT and your own).^[1]

Some thoughts:

If I understand correctly, the value gained from action is always the same if the fractional reduction in xRisk is the same, ceteris paribus? That still means that there seems to be a tradeoff between assuming a high rate of xRisk and believing in astronomical value, assuming that the cost of an intervention is linear in size relative to the amount decrease (i.e. decrease xrisk from 50% to 40% in a given t is 10 times as hard than reducing from 50% to 49% - would be interesting to see this worked out robustly) I think that's a robust finding which seems to be unintuitive both for EAs and EA critics
If you had to (and think it's appropriate to do so), what do you think the default assumptions of xRisk mitigation efforts in EA currently believe to be true? I'd guess it'd be 'time of perils' and maybe quadratic or cubic growth? But as you point out, the difference between quadratic/cubic is immense, and could easily flip whether it would be the best marginal option for altruistic funding.
I'd be interested to see what BOTEC EVs look like under this model and some assumptions. Thorstad has done something like this, but it'd be good to get a more robust sense of what parameter configurations would be needed to make xRisk reduction become competitive with top-rated GiveWell Charities
Your finding on convergence is I think very important, not least because it undercuts one of the most common criticisms of xRisk/longtermist work "this assigns infinite value to future people which justifies arbitrary moral harm to current people" which just turns out to not hold under your models here. Not going to hold my breath for this critics to update though.
Great work sharing the notebook <3 really love the transparency, I think something like this should become more standard (not just in EA, but everywhere) so wanted to give you big props for exposing your model/code/parameters for anyone to check.

So yeah, great work, love it! Would love to see and support more work along these lines.

^{^}
The new acronym could be ATOM perhaps? ;)

arvomm

Thank you for all the comments JWS, I found your excitement contagious.

Some thoughts on your thoughts:

I couldn't agree more that there'd be a lot of value from laying out parameter configurations. We have some more work coming out as part of this sequence that aims to help fill this gap!
I think it'd be great to see some survey data on what the commonly assumed risk patterns and valued trajectories are in the EA community. I've made a push from my little corner to hopefully get some data on common views. Whichever they are, you're right to point out the immense differences in what could they imply.
I'm really happy that you found the notebook useful. I'll make sure to update the GitHub with any new features and code discussions.

Vasco Grilo🔸

Nice comments!

If you had to (and think it's appropriate to do so), what do you think the default assumptions of xRisk mitigation efforts in EA currently believe to be true?

My guess would be Time of Perils, but with a risk decaying exponentially to 0 after it (instead of a low constant risk).

Your finding on convergence is I think very important, not least because it undercuts one of the most common criticisms of xRisk/longtermist work "this assigns infinite value to future people which justifies arbitrary moral harm to current people" which just turns out to not hold under your models here.

Something similar to that critique (replacing infinite by astronomically large, and arbitrary by significant) could still hold if the risk decays to 0.

arvomm

It's true there are other scenarios that would recover infinite value. And the proof fails, as mentioned in the convergence section, with changes like , or when the logistic cap $c \to \infty$ and we end up in the exponential case.

All that said, it is plausible that the universe has a finite length after all, which would provide that finite upper bound. Heat death, proton decay or even just the amount of accessible matter could provide physical limits. It'd be great to see more discussions on this informed by updated astrophysical theories.

Vasco Grilo🔸

Thanks for following up!

Personally, I do not think allowing the risk to decay to 0 is problematic. For a sufficiently long timeframe, there will be evidential symmetry between the risk profiles of any 2 actions (e.g. maybe everything that is bound together will dissolve), so the expected value of mitigation will eventually reach 0. As a result, the expected cumulative value of mitigation always converges.

RomanHauksson

This is excellent research! The quality of Rethink Priorities’ output consistently impresses me.

A couple questions:

What software did you use to create figure 1?
What made you decide to use discrete periods in your model as opposed to a continuous risk probability distribution?

arvomm

Thank you very much Roman!

I used blender, modelled the 3D spheres, rendered it and photoshop for the text.
Discrete-time was inherited from the previous framework (OAT). It can be simpler, but continuous is sometimes more tractable and better suited for models emphasising other features. For example, when modelling economic growth directly, or when thinking about utility, or when we want to express a hazard rate that is micro-founded on some risk mechanism, those models would generally be better expressed in continuous time. This recent paper is a good example of the typical setups economics papers use in continuous time.

Siebe

I don't have the spare brain power to dig into this, but are you assuming that all possible trajectories have positive value?

arvomm

Hi Siebe, yes, all the scenarios of this report assume positive value at all times. I don’t think it’s certain that this will happen which is why the concluding remarks mention “investigating value trajectories that feature negative value” as a possible extension. So, yes, I completely agree this is something to look into in more depth.

Siebe

Right yeah, that makes sense.

I actually asked the same question as this research in my 2019 MA philosophy thesis and came to the informal conclusion that actions that moral disagreement about what is valuable + empirical uncertainty make it all very difficult: http://www.sieberozendal.com/wp-content/uploads/2020/01/Rozendal-S.T.-2019-Uncertainty-About-the-Expected-Moral-Value-of-the-Long-Term-Future.-MA-Thesis.pdf

You might find it interesting, though it's much less formally sophisticated than your work :)

Arepo

Great post - I'm embarrassed to have missed it til now! One key point I disagree with:

there might be interventions that reduce risk a lot for not very long or not very much but for a long time. But actions that drastically reduce risk and do so for a long time are rare.

I think there are two big possible exceptions to the latter claim: benign AI and becoming sustainably multiplanetary. EAs have discussed the former a lot, and I don't have much to add (though I'm highly sceptical of it as an arbitrary-value lock-in mechanism on cosmic timelines). I think the latter is more interestingly unexplored. Christopher Lankhof made a case for it here, but didn't get much engagement, and what criticism he did get seems quite short-term to me: basically that shelters are a cheaper option, and therefore we should prioritise them.

Such criticism might or might not be true in the next few decades. But beyond that, if AI neither kills us nor locks us in to a dystopic or utopic path, and if there are no lightcone-threatening technologies available (e.g. the potential ability to trigger a false vacuum decay), then it seems like by far our best defence against extinction will be simple numbers. The more intelligent life there is in the more places, the bigger and therefore more improbable an event would have to be to kill everyone.

A naive - but I think reasonable, given above caveats - calculation would be to treat the destruction of life around each planet as at least somewhat independent. That would give us some kind of exponential decay function of extinction risk, such that your credence in extinction might be a(1-b)^(p-1), where a is some constant or function representing the risk of a single-planet civilisation going extinct, b is some decay rate - of max(1/2) for total complete independence of extinction on each planet - and p is the number of planets in your civilisation. Absent universe-destroying mechanisms or unstoppable AI, this credence would quickly approach 0.

Obviously 'creating an self-sustaining settlement on a new planet' isn't exactly an everyday occurrence, but with a century or two of continuous technological progress (less, given rapid economic acceleration via e.g. moderately benign AI) it seems likely to progress via 'doable' to 'actually pretty straightforward'. The same technologies that establish the first such colony will go a very long way towards establishing the next few.

In the shorter term, 'self-sustainingness' needn't be an all or nothing proposition. A colony that could e.g. effectively recycle its nutrients for a decade or two would still likely serve as a better defence against e.g. biopandemics than any refuge on Earth - and unlike those on Earth, would be constantly pressure tested even before the apocalypse, so might end up being easier to make reliably robust (vs on-Earth shelters) than simple cost-analyses would suggest.

arvomm

Thank you for adding various threads to the conversation Arepo! I don't disagree with what I take to be your main point: benign AI and interstellar travel are likely to have a big impact. I will say though, while their success might significantly reduce risk, and for a long time, any given intervention is unlikely to make major progress towards them. Hence, at the intervention level, I'm tempted to remain sceptical about the abundance of interventions that dramatically reduce risk for a long time.

Vasco Grilo🔸

Great post! Some nitpicks...

In the 2nd sum, t = 1 and 500 are out of format. Before the 4th sum, rlow should be r_{low}. In the 4th sum, 10100 should be 10^100, and should be on top of the summation symbol.

You say r_0 is the starting risk, but the above implies r(0) = r_0 + r_inf. So I think r_0 should be replaced by r_0 - r_inf above, such that r(0) = r_0. I do not think this is relevant because I guess r_0 >> r_inf, so r_0 - r_inf is roughly equal to r_0.

When discussing the value and eventually the cost effectiveness of risk mitigation, a useful and more realistic efficacy is one basis point

f refers to a relative reduction in risk (not absolute), so I think you mean 0.01 % above (not "one basis point"). 1 basis point refers to an absolute variation of 0.01 pp.

arvomm

Thank you very much for your words Vasco! And thank you for catching those formatting typos, I've corrected them now.

In order:

Two underscores seemed to have got lost in translation to markdown! Should be there now.
You're right to point out that, in this context, but it isn't exactly $r_{0}$ . I was using that approximation for the exposition but should have made that clearer, especially in the code. I've made minor corrections to reflect this.
I'll also improve the phrasing to make the sentence you mentioned on $f = 0.0001$ clearer.

Thanks again!

JKM

Previous work has referred to such a risk as 'existential risk'. But this is a misnomer. Existential risk is technically broader and it encompasses another case: the risk of an event that drastically and permanently curtails the potential of humanity. For the rest of this report we characterise the risk as that of extinction where previous work has used 'existential'.

I was happy to see this endnote, but then I noticed several uses of "existential risk" in this abridged report when I think you should have said "extinction risk". I'd recommend going through to check this.

arvomm

It's good to hear that you agree extinction is the better term in this framework. Though I think it makes sense to talk about the more general 'existential' term in the exposition sometimes. In particular, for entirely pedagogical reasons, I decided to leave it with the original terminology in the summary since readers who are already familiar with the original models might skim this post or miss that endnote, and the definition of risk hasn't changed. I see this report, and the footnote, as asking researchers that, from hereon, we use extinction when the maths are set up like they are here. All that said, I've indeed noticed instances after the summary where the conceptual accuracy would be improved by making that swap. Thank you again; I'll keep a closer eye on this, especially in future revised versions of the full report.

Vasco Grilo🔸

Hi Arvo,

Proposition 1. The expected value of the world is finite if existential risk does not converge to zero.

I just wanted to note the overall expected value of the world may be driven by cases in which existential risk converges to 0, because the future should be discounted at its minimum. I also have the impression supporters of existential risk mitigation find the converge of existential risk to 0 quite plausible. In any case, I think there will still be convergence of the value of mitigation. After a sufficiently long time, the counterfactual value of mitigation will be 0 due to evidential symmetry, so the sum describing the value of mitigation will end in ... 0 + 0 + 0 + 0 ..., thus converging.

Comments

More from the author

Curated and popular this week

How (not) to fundraise from Anthropic staff

Jack Lewars·1w ago·7m read

Adapted from my Substack, Funding Anthropalypse. Short version: if you want a share of the coming Anthropic and OpenAI windfall - the $37bn+ that could be in play next year - the way in is to become 'legibly excellent', so the evaluators and donors that frontier lab staff already trust point them to yo...

If you're agentic, work in biosecurity

sharmaayushmaan🔸·4d ago·7m read

Disclaimer: Although I work on the Groups Team at CEA, I’m writing this in a personal capacity, and this post does not constitute an endorsement by CEA. Agency - the realisation that you really can just do things. TL;DR Biosecurity needs people (of any background) who are agentic and have a high execution velocity and track record....

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·2d ago·2m read

TL;DR: Marginal Victories is a new initiative to provide 1:1 career advising, opportunities, and resources for people exploring high-leverage U.S. democracy preservation and political work. Built by impact-oriented people doing pro-democracy work, the early MVP is now up at marginalvictories.org. Fill out the 10-minute form now to receive these resources as they become available over the next few...

Recent opportunities to take action

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·2d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·3d ago·3m read

Starting an EA group @ SUNY Binghamton

micahzarin·1d ago·1m read

Arepo

Great post - I'm embarrassed to have missed it til now! One key point I disagree with:

there might be interventions that reduce risk a lot for not very long or not very much but for a long time. But actions that drastically reduce risk and do so for a long time are rare.

^{^}

Previous work has referred to such a risk as 'existential risk'. But this is a misnomer. Existential risk is technically broader and it encompasses another case: the risk of an event that drastically and permanently curtails the potential of humanity. For the rest of this report we characterise the risk as that of extinction where previous work has used 'existential'.

^{^}

The reasoning goes that if there is always a high level of background risk to humanity, then we should expect to go extinct soon anyway, which means the importance of avoiding any one particular risk is not as valuable as it may seem. For more details see the full report here.

^{^}

In particular, Thorstad explores how, in this model, extinction risk pessimism fails to support and sometimes hinders the thesis that extinction risk mitigation is of astronomical value.

^{^}

For example, Thorstad relaxes each of the A1, A4 and A5 assumptions.

^{^}

The models thus far centred around mitigating risk for one century only. Thorstad comments on one additional case: when risk is permanently mitigated, calling it 'global risk reduction'.

^{^}

We leave A4 untouched because it introduces diminishing returns in risk reduction (see more the details Adamczewski discusses), which we find realistic.

^{^}

A3 is a core assumption in the extended and simplified versions of this model. Relaxing it would amount to changing the approach completely.

^{^}

That said, the risk and value trajectories usually need adjusting when considering a different time unit. For more details see the section on adjustments on the full report here.

^{^}

In its most general form, $r^{'}$ could be any new risk vector that $M$ has brought about. All there is left to evaluate the value of the action is to compute $E (w^{'}) - E (w)$ .

^{^}

Alternatively, an altruistic intervention could seek to improve the future by positively influencing the value trajectory; that is, by bringing about a better $v^{'}$ rather than a new $r^{'}$ . Such actions, deserve a separate analysis.

^{^}

So far we have been writing $E (w)$ to abbreviate $E (w (r, v, T))$ , where $r, v$ and $T$ are, respectively, the risk vector (sometimes termed 'risk profile'), the value vector and the maximum number of periods in our universe, which could be infinite. Note that a different class of interventions might focus on increasing the value of the world from $v = (v_{1}, v_{2}, . . .)$ to $v^{'} = (v_{1}, v_{2}, . . .)$ , which would also result in negative value according to $E (w) - E (w^{'})$ . Exploring these is not within the scope of this report.

^{^}

Here: $v_{t}$ is the value at time $t$ , $c$ is the cap value the $v_{t}$ can reach and $s$ is the starting value at $t = 0$ . $v_{c}$ is a constant, normalised to 1 in all the simulations. More generally, we interpret $v_{c}$ as one year of value in $2023$ , which in human terms is roughly $8$ billion people enjoying life at an average of $0.85$ QALYs each.

^{^}

Other work, has considered exponential without a cap. There seem to be good reasons to posit a cap, however high, like the physical limits on how much matter is accessible to humans in our expanding universe.

^{^}

The probability of dying each year that gives a 0.2 probability of dying over 100 years is approximately 0.00222894771 or 0.22%. To see why, consider the following binary outcomes model. Let $p$ be the probability of dying in a given year. The implied probability of surviving for one year is $1 - p$ . The probability of surviving for 100 years consecutively would be $(1 - p)^{100}$ . Given that there's a 0.2 probability of dying over 100 years, the probability of surviving the entire 100 years is $1 - 0.2 = 0.8$ . Thus, $(1 - p)^{100} = 0.8$ .

^{^}

Which is congruent with a $(1 - 0.0001)^{100} \approx 0.99004933869$ probability of surviving each century.

^{^}

Numerical approximations of the expected value of $M$ converge in this setting for large $T$ so an infinite universe could be thought of as finite, without loss of generality. See the Convergence section for a discussion of convergence.

^{^}

An excellent informal introduction to great filters can be found here.

^{^}

Tentatively, ordering infinite cardinalities could be a good option in those cases.

^{^}

For example by $T \cdot {max}_{v_{t}} {v_{1}, v_{2}, . . . v_{T}}$ .

^{^}

On the latter point, to calculate the actual difference that our efforts makes to the effects of persistence will require future work. For example, imagine you do an action, $M$ , at $t = 1$ that mitigates risk for the next 10 years. If you hadn't done $M$ , someone else would have taken that same action at $t = 5$ . How should we measure the persistence and value of $M$ in this case? The treatment of 'contingency' here can help guide our thoughts.

^{^}

Because of computational limits, the expected value calculation assumes a cap of 120 thousand years. This is more than long enough in most scenarios, where a $T$ this large achieves the same behaviour as $T \to \infty$ , but nuances arise in the exponential decay case, see the notebook for a thorough discussion of those.

^{^}

Recall the previous footnote defining $v_{c}$ .

^{^}

In particular, Figure 1's exponential decay values were approximated using the first 100,000 years.

^{^}

I'm happy to help with this.

	Constant	Linear	Quadratic	Cubic	Logistic
$v_{t}$	$v_{c}$	$t v_{c}$	$t^{2} v_{c}$	$t^{3} v_{c}$	$\frac{c}{1 + \frac{c - s}{s} e^{- γ t}}$

How bad would human extinction be?

How bad would human extinction be?

Executive Summary

Abridged Report

Introduction

Generalised Model: Arbitrary Risk Profile

Value

Great Filters and the Time of Perils Hypothesis

Decaying Risk

Results

Convergence

The Expected Value of Mitigating Risk Visualised

Concluding Remarks

Acknowledgements