Hide table of contents

Preface: I'm quite uncertain about this model. It may be making too many simplifying assumptions to be useful, and I may not be applying Laplace's Law of Succession correctly. I haven't seen this approach anywhere, but I'm not confident it hasn't been done before. I'm really interested in any comments or feedback anyone may have!

Acknowledgements: Thanks to Alex HT and Will Payne for interesting discussion, and thanks to Aimee Watts for proof-reading. All mistakes my own.

Summary

  • Humans seem likely to make much more technological progress in the next century then has been made previously.
  • Under a model based on Bostrom's urn of creativity, having not discovered a devastating technology so far does not give much reason to think we won't in the future.
  • This model serves as a way of deriving a prior for technological existential risk, and there might be good reasons to lower the expected risk.
  • This model may be too naïve an application of the urn of creativity, or of Laplace's model, and may be flawed.

The Urn of Creativity Model

In the Vulnerable World Hypothesis, Bostrom describes technological progress as drawing balls blindly out of an urn. Some are white balls representing technologies that are relatively harmless, and some are black, representing technologies that upon discovery lead to human extinction.

So far humanity hasn't drawn a black ball.

If we think existential risk in the next century mostly comes from dangerous new technologies, then we can think of estimating existential risk in the next century as estimating the chance we draw a black ball.

Laplace's Law of Succession

Suppose we have no information about the ratio of black to white balls in the urn except our record so far. A mathematical rule called Laplace's Law of Succession then says that the best estimate of the chance of drawing a black ball on the next go is , where n is the number of balls we've drawn so far, and s is the number of black balls we've drawn so far. So for us, s=0.

How do we calculate the probability of one black ball in the next m draws? Well we can't just multiply the probability by itself, because every time you don't draw a black ball, you should revise down your probability of drawing black. So the probability of drawing a black ball in the next m draws having drawn s black balls in the last n is

So assuming no black draws so far we get .

 

Applying the model

How do we choose n? Well we haven't been literally drawing balls from an urn, but we can try and approximate a 'draw' by the equivalent time/resources needed for discovering a new technology. This is of course also hard to estimate and highly variable. But I don't think the final answer is too sensitive to how we break up the time/resource into units.

Time

Firstly we can just approximate from time. Suppose human's have been discovery new technology every year since the agricultural revolution. So we've drawn ~10,000 times. So n=10000, m=100.

We get 

So a 1% chance of extinction next century.

But we haven't exactly been discovering technology at the same rate for the last 10,000 years. Suppose we think the vast majority of technology was discovered in the last ~250 years. Then n=250 instead. Then we get

   

So a 28.4% chance of extinction in next 100 years.

Person-years

However, even over the last 250 years the rate of progress has been increasing a lot, and so has the population! What if we used "person-years"? i.e. one draw from the urn is 1 year of life lived by a single person. Then we can use historical population estimates and future predictions. The total person years lived since 1750 is ~6.2 x 10^11, and the total number of person years we can expect in the next century is about  ~10 x 10^12 [1].So n=6.2 x 10^11, m=10^12. 

Then we get

 .  

So a 62% chance of extinction.

GDP

We could also consider using GDP instead. Suppose one draw from the urn is equivalent to $1 billion of GDP (though this doesn't matter too much for the final answer). Then there has been GDP of $3.94 x 10^15 since 1750 and if we just assume 2% annual growth we can expect $3.76 x 10^16  in the next century. So n=3.94 x 10^6, m=3.76 x 10^7.

Then we get

 

So a 90% chance of extinction in the next century. 

Uncertainties

There are quite a few things I'm uncertain about in this approach.

  • How does the model change if we update on inside view information?
    • We don't just have the information that previous technology wasn't disastrous, we also know what the technology was, and have a sense of how hard it is for a technology to cause devastation. This seems to be extra information not included in the model. Laplace's law of succession works if we have a prior of a uniform distribution over the possible densities of black balls in the urn. I'm not sure how the final answers would change if this prior changed.
  • Am I correct in thinking the final answer is not too sensitive to the choice of a unit of a draw?
    • From just experimenting with different "units" e.g. $1 billion GDP equal to a draw, the final answer doesn't seem to sensitive to the units. However I haven't shown mathematically why this is the case.
  • Is the urn of creativity an over-simplification to the extent that this model is irrelevant?
    • We might, for example, expect that the chance of a technology being a black ball is going to increase over time as technology becomes more powerful. This might be analogous to the black balls being further down the urn. I'm unsure also how to incorporate this into the model and whether this would update us to think the risk higher or lower than calculated above. On the one hand it seems to clearly increase it if we think black balls will only become more likely to be drawn in the future. But on the other hand, perhaps if we would naturally not expect many black balls earlier in human history, then we can't infer much from not finding any.

 

Conclusion

Overall, I think this model shouldn't be taken too literally. But I think the following takeaway is interesting. Given how much technological advancement there is likely to be in the future compared to the past, we cannot be that confident that future technologies will not pose significant risks based only on the lack of harm done so far. 

I'm really interested in any feedback or critiques of this approach, and whether it's already been discussed somewhere else!

 

[1] Check this spreadsheet for my data calculations- data obtained from Our World in Data

10

0
0

Reactions

0
0

More posts like this

Comments2


Sorted by Click to highlight new comments since:

So you'd in general be correct in applying Laplace's law to this kind of scenario except that you run into selection effects (a keyword to Google is anthropic effect, or anthropic principle.) I.e., suppose that the chance of human extinction was actually much higher, on the order of 10% per year. Then, after 250 years, Earth will probably not have any humans, but if it does and they use Laplace's rule to estimate its chances, it will overshoot them by a lot. That is, they can't actually update on extinction happening because if it happens nobody will be there to update.

There is a magic trick where I give you a deck of cards, tell you to shuffle it, and choose a card however you want, and then I guess it correctly. Most of the time it doesn't work, but on the 1/52 chance that it does, it looks really impressive (or so I'm told, I didn't have the patience to do it enough times). There is also a scam based on a similar principle.

On the other hand, Laplace's law is empirically really quite brutal, and in my experience tends to output probabilities that are too high. In particular, I'd assign some chance to there being no black balls, and that would eventually bring my probability of extinction close to 0, whereas Laplace's law always predicts that an event will happen if given enough time (even if it has never happened before).

Overall, I guess I'd be more interested in trying to figure out the pathways to extinction and their probabilities. For technologies which already exist, that might involve looking at close calls, e.g., nuclear close calls.

Thanks for your comment!

I hadn't thought to think about selection effects, thanks for pointing that out. I suppose Bostrom actually describes black balls as technologies that cause catastrophe but doesn't set the bar as high as extinction. Then drawing a black ball doesn't affect future populations drastically, so perhaps selection effects don't apply?

Also, I think in The Precipice Toby Ord makes some inferences for natural extinction risk given the length of time humanity has existed for? Though I may not be remembering correctly. I think the logic was something like "Assume we're randomly distributed amongst possible humans. If existential risk was very high, then there'd be a very small set of worlds in which humans have been around for this long, and it would be very unlikely that we'd be in such a world. Therefore it's more likely that our estimate of existential risk is too high".   This then seems quite similar to my model of making inferences based on not having previously drawn a black ball. I don't think I understand selection effects too well though so I appreciate any comments on this!

Curated and popular this week
 ·  · 11m read
 · 
Confidence: Medium, underlying data is patchy and relies on a good amount of guesswork, data work involved a fair amount of vibecoding.  Intro:  Tom Davidson has an excellent post explaining the compute bottleneck objection to the software-only intelligence explosion.[1] The rough idea is that AI research requires two inputs: cognitive labor and research compute. If these two inputs are gross complements, then even if there is recursive self-improvement in the amount of cognitive labor directed towards AI research, this process will fizzle as you get bottlenecked by the amount of research compute.  The compute bottleneck objection to the software-only intelligence explosion crucially relies on compute and cognitive labor being gross complements; however, this fact is not at all obvious. You might think compute and cognitive labor are gross substitutes because more labor can substitute for a higher quantity of experiments via more careful experimental design or selection of experiments. Or you might indeed think they are gross complements because eventually, ideas need to be tested out in compute-intensive, experimental verification.  Ideally, we could use empirical evidence to get some clarity on whether compute and cognitive labor are gross complements; however, the existing empirical evidence is weak. The main empirical estimate that is discussed in Tom's article is Oberfield and Raval (2014), which estimates the elasticity of substitution (the standard measure of whether goods are complements or substitutes) between capital and labor in manufacturing plants. It is not clear how well we can extrapolate from manufacturing to AI research.  In this article, we will try to remedy this by estimating the elasticity of substitution between research compute and cognitive labor in frontier AI firms.  Model  Baseline CES in Compute To understand how we estimate the elasticity of substitution, it will be useful to set up a theoretical model of researching better alg
 ·  · 7m read
 · 
Crossposted from my blog.  When I started this blog in high school, I did not imagine that I would cause The Daily Show to do an episode about shrimp, containing the following dialogue: > Andres: I was working in investment banking. My wife was helping refugees, and I saw how meaningful her work was. And I decided to do the same. > > Ronny: Oh, so you're helping refugees? > > Andres: Well, not quite. I'm helping shrimp. (Would be a crazy rug pull if, in fact, this did not happen and the dialogue was just pulled out of thin air).   But just a few years after my blog was born, some Daily Show producer came across it. They read my essay on shrimp and thought it would make a good daily show episode. Thus, the Daily Show shrimp episode was born.   I especially love that they bring on an EA critic who is expected to criticize shrimp welfare (Ronny primes her with the declaration “fuck these shrimp”) but even she is on board with the shrimp welfare project. Her reaction to the shrimp welfare project is “hey, that’s great!” In the Bible story of Balaam and Balak, Balak King of Moab was peeved at the Israelites. So he tries to get Balaam, a prophet, to curse the Israelites. Balaam isn’t really on board, but he goes along with it. However, when he tries to curse the Israelites, he accidentally ends up blessing them on grounds that “I must do whatever the Lord says.” This was basically what happened on the Daily Show. They tried to curse shrimp welfare, but they actually ended up blessing it! Rumor has it that behind the scenes, Ronny Chieng declared “What have you done to me? I brought you to curse my enemies, but you have done nothing but bless them!” But the EA critic replied “Must I not speak what the Lord puts in my mouth?”   Chieng by the end was on board with shrimp welfare! There’s not a person in the episode who agrees with the failed shrimp torture apologia of Very Failed Substacker Lyman Shrimp. (I choked up a bit at the closing song about shrimp for s
 ·  · 9m read
 · 
Crosspost from my blog.  Content warning: this article will discuss extreme agony. This is deliberate; I think it’s important to get a glimpse of the horror that fills the world and that you can do something about. I think this is one of my most important articles so I’d really appreciate if you could share and restack it! The world is filled with extreme agony. We go through our daily life mostly ignoring its unfathomably shocking dreadfulness because if we didn’t, we could barely focus on anything else. But those going through it cannot ignore it. Imagine that you were placed in a pot of water that was slowly brought to a boil until it boiled you to death. Take a moment to really imagine the scenario as fully as you can. Don’t just acknowledge at an intellectual level that it would be bad—really seriously think about just how bad it would be. Seriously think about how much you’d give up to stop it from happening. Or perhaps imagine some other scenario where you experience unfathomable pain. Imagine having your hand taped to a frying pan, which is then placed over a flame. The frying pan slowly heats up until the pain is unbearable, and for minutes you must endure it. Vividly imagine just how awful it would be to be in this scenario—just how much you’d give up to avoid it, how much you’d give to be able to pull your hand away. I don’t know exactly how many months or years of happy life I’d give up to avoid a scenario like this, but potentially quite a lot. One of the insights that I find to be most important in thinking about the world is just how bad extreme suffering is. I got this insight drilled into me by reading negative utilitarian blogs in high school. Seriously reflecting on just how bad extreme suffering is—how its intensity seems infinite to those experiencing it—should influence your judgments about a lot of things. Because the world is filled with extreme suffering. Many humans have been the victims of extreme suffering. Throughout history, tort