Hide table of contents

People should stop falling back on the argument that working on AI safety research is a 'Pascal's Mugging', because that doesn't address the actual reasons people who work on AI safety think we should work on AI safety today.

Most people who work on AI think the chances of affecting the outcome are not infinitesimal, but rather entirely macroscopic, the same way that voting in an election has a low but real chance of changing the outcome, or having an extra researcher has a low but real chance of causing us to invent a cure for malaria sooner, or having an extra person on ebola containment makes it less likely to become a pandemic.

For example someone involved might believe:

i) There's a 10% chance of humanity creating a 'superintelligence' within the next 100 years.

ii) There's a 30% chance that the problem can be solved if we work on it harder and earlier.

iii) A research team of five suitable people starting work on safety today and continuing through their working lives would raise the odds of solving the problem by 1% of that (0.3 percentage points). (This passes a sanity check, as they would represent a 20% increase in the effort being made today.)

iv) Collectively they therefore have a 0.03% chance of making an AI significantly more aligned with human values in the next 100 years, such that individual person involved has a 0.006 percentage point share.

Note that the case presented here has nothing to do with there being some enormous and arbitrary value available if you succeed, which is central to the weirdness of the Pascal's Mugging case.

Do you think the numbers in this calculation are way over-optimistic? OK - that's completely reasonable!

Do you think we can't predict whether the sign of the work we do now is positive or negative? Is it better to wait and work on the problem later? There are strong arguments for that as well!

But those are the arguments that should be made and substantiated with evidence and analysis, not quick dismissals that people are falling for a 'Pascal's Mugging', which they mostly are not.

Given the beliefs of this person, this is no more a Pascal's Mugging than working on any basic science research, or campaigning for an outsider political campaign, or trying to reform a political institution. These all have unknown but probably very low chances of making a breakthrough, but could nevertheless be completely reasonable things to try to do.

Here's a similar thing I wrote years ago: If elections aren't a Pascal's Mugging, existential risk work shouldn't be either.

Postscript

As far as I can see all of these are open possibilities:

1) Solving the AI safety problem will turn out to be unnecessary, and our fears today are founded on misunderstandings about the problem.

2) Solving the AI safety problem will turn out to be relative straightforward on the timeline available.

3) It will be a close call whether we manage to solve it in time - it will depend on how hard we work and when we start.

4) Solving the AI safety problem is almost impossible and we would have to be extremely lucky to do so before creating a super-intelligent machine. We are therefore probably screwed.

We collectively haven't put enough focussed work into the problem yet to have a good idea where we stand. But that's hardly a compelling reason to assume 1), 2) or 4) and not work on it now.

Comments16


Sorted by Click to highlight new comments since:

It might not be a strong response to the whole cause area, but isn't it the only response to the Bostrom-style arguments linked below? Which in my experience covers the majority of the arguments I hear in favour of x-risk.

Very few one line arguments are strong responses to whole world views that smart people actually believe, so I sort of feel like there's nothing to see here.

I asked Bostrom about this and he said he never even made this argument in this way to the journalist. Given my experience the the media misrepresenting everything you say and wanting to put sexy ideas into their pieces, I believe him.

The New Yorker writer got it straight out of this paper of Bostrom's (paragraph starting "Even if we use the most conservative of these estimates"). I've seen a couple of people report that Bostrom made a similar argument at EA Global.

Look, no doubt the argument has been made by people in the past, including Bostrom who wrote it up for consideration as a counterargument. I do think the 'astronomical waste' argument should be considered, and it's far from obvious that 'this is a Pascal's Mugging' is enough to overcome its strength.

But it's also not the main reason, only reason, or best reason, many people who work on these problems could ground their choice to do so.

So if you dismiss this argument, before you dismiss the work, move on to look at what you think is the strongest argument, not the weakest.

I actually think there's an appropriate sense in which it is the strongest argument -- not in that it's the most robust, but in that it has the strongest implications. I think this is why it gets brought up (and that it's appropriate to do so).

Agreed - despite being counterintuitive, it's not obviously a flawed argument.

If I were debating you on the topic, it would be wrong to say that you think it's a Pascal's mugging. But I read your post as being a commentary on the broader public debate over AI risk research, trying to shift it away from "tiny probability of gigantic benefit" in the way that you (and others) have tried to shift perceptions of EA as a whole or the focus of 80k. And in that broader debate, Bostrom gets cited repeatedly as the respectable, mainstream academic who puts the subject on a solid intellectual footing.

(This is in contrast to MIRI, which as SIAI was utterly woeful and which in its current incarnation still didn't look like a research institute worthy of the name when I last checked in during the great Tumblr debate of 2014; maybe they're better now, I don't know.)

In that context, you'll have to keep politely telling people that you think the case is stronger than the position your most prominent academic supporter argues from, because the "Pascal's mugging" thing isn't going to disappear from the public debate.

I have no opinion on what Bostrom did or didn't say, to be clear. I've never even spoken to him. Which is why I said 'Bostrom-style'. But I have heard this argument, in person, from many of the AI risk advocates I've spoken to.

Look, any group in any area can present a primary argument X, be met by (narrow) counterargument Y, and then say 'but Y doesn't answer our other arguments A, B, C!'. I can understand why that sequence might be frustrating if you believe A, B, C and don't personally put much weight on X, but I just feel like that's not an interesting interaction.

It seems like Rob is arguing against people using Y (the Pascal's Mugging analogy) as a general argument against working on AI safety, rather than as a narrow response to X.

Presumably we can all agree with him on that. But I'm just not sure I've seen people do this. Rob, I guess you have?

I get what you're saying, but, e.g., in the recent profile of Nick Bostrom in the New Yorker:

No matter how improbable extinction may be, Bostrom argues, its consequences are near-infinitely bad; thus, even the tiniest step toward reducing the chance that it will happen is near-­infinitely valuable. At times, he uses arithmetical sketches to illustrate this point. Imagining one of his utopian scenarios—trillions of digital minds thriving across the cosmos—he reasons that, if there is even a one-per-cent chance of this happening, the expected value of reducing an existential threat by a billionth of a billionth of one per cent would be worth a hundred billion times the value of a billion present-day lives. Put more simply: he believes that his work could dwarf the moral importance of anything else.

While the most prominent advocate in the respectable-academic part of that side of the debate is making Pascal-like arguments, there's going to be some pushback about Pascal's mugging.

I've also seen Eliezer (the person who came up with the term Pascal's mugging) give talks where he explicitly disavows this argument.

Two things:

i) I bet Bostrom thinks the odds of a collective AI safety effort of achieving its goal is better than 1%, which would itself be enough to avoid the Pascal's Mugging situation.

ii) This is a fallback position from which you can defend the work if someone thinks it almost certainly won't work. I don't think we should do that, instead we should argue that we can likely solve the problem. But I see the temptation.

iii) I don't think it's clear you should always reject a Pascal's Mugging (or if you should, it may only be because there are more promising options for enormous returns than giving it to the mugger).

Yes! Thank you for this. Pascal's Muggings have to posit paranormal/supernatural mechanisms to work. But x-risk isn't like that. Big difference which people seem to overlook. And Pascal's Muggings involve many orders of magnitude smaller chances than even the most pessimistic x-risk outlooks.

I agree with your second point but not your first. Also it's possible you mean "optimistic" in your second point: if x-risks themselves are very small, that's one way for the change in probability as a result of our actions to be very small.

I mean pessimism about the importance of x-risk research, which is more or less equivalent to optimism about the future of humanity. Similar idea.

Btw., this article series of yours convinced me of the importance of AI safety work. Thank you and good work!

Curated and popular this week
 ·  · 5m read
 · 
Today, Forethought and I are releasing an essay series called Better Futures, here.[1] It’s been something like eight years in the making, so I’m pretty happy it’s finally out! It asks: when looking to the future, should we focus on surviving, or on flourishing? In practice at least, future-oriented altruists tend to focus on ensuring we survive (or are not permanently disempowered by some valueless AIs). But maybe we should focus on future flourishing, instead.  Why?  Well, even if we survive, we probably just get a future that’s a small fraction as good as it could have been. We could, instead, try to help guide society to be on track to a truly wonderful future.    That is, I think there’s more at stake when it comes to flourishing than when it comes to survival. So maybe that should be our main focus. The whole essay series is out today. But I’ll post summaries of each essay over the course of the next couple of weeks. And the first episode of Forethought’s video podcast is on the topic, and out now, too. The first essay is Introducing Better Futures: along with the supplement, it gives the basic case for focusing on trying to make the future wonderful, rather than just ensuring we get any ok future at all. It’s based on a simple two-factor model: that the value of the future is the product of our chance of “Surviving” and of the value of the future, if we do Survive, i.e. our “Flourishing”.  (“not-Surviving”, here, means anything that locks us into a near-0 value future in the near-term: extinction from a bio-catastrophe counts but if valueless superintelligence disempowers us without causing human extinction, that counts, too. I think this is how “existential catastrophe” is often used in practice.) The key thought is: maybe we’re closer to the “ceiling” on Survival than we are to the “ceiling” of Flourishing.  Most people (though not everyone) thinks we’re much more likely than not to Survive this century.  Metaculus puts *extinction* risk at about 4
 ·  · 6m read
 · 
This is a crosspost from my new Substack Power and Priorities where I’ll be posting about power grabs, AI governance strategy, and prioritization, as well as some more general thoughts on doing useful things.  Tl;dr I argue that maintaining nonpartisan norms on the EA Forum, in public communications by influential community members, and in funding decisions may be more costly than people realize. Lack of discussion in public means that people don’t take political issues as seriously as they should, research which depends on understanding the political situation doesn’t get done, and the community moves forward with a poor model of probably the most consequential actor in the world for any given cause area - the US government. Importantly, I don’t mean to say most community members shouldn’t maintain studious nonpartisanship! I merely want to argue that we should be aware of the downsides and do what we can to mitigate them.    Why nonpartisan norms in EA are a big deal Individual politicians (not naming names) are likely the most important single actors affecting the governance of AI. The same goes for most of the cause areas EAs care about. While many prominent EAs think political issues may be a top priority, and politics is discussed somewhat behind closed doors, there is almost no public discussion of politics. I argue the community’s lack of a public conversation about the likely impacts of these political actors and what to do in response to them creates large costs for how the community thinks about and addresses important issues (i.e. self-censorship matters actually). Some of these costs include:  * Perceived unimportance: I suspect a common, often subconscious, thought is, 'no prominent EAs are talking about politics publicly so it's probably not as big of a deal as it seems'. Lack of public conversation means social permission is never granted to discuss the issue as a top priority, it means the topic comes up less & so is thought about less, and i
 ·  · 4m read
 · 
Context: I’m a senior fellow at Conservation X Labs (CXL), and I’m seeking support as I attempt to establish a program on humane rodent fertility control in partnership with the Wild Animal Initiative (WAI) and the Botstiber Institute for Wildlife Fertility Control (BIWFC). CXL is a biodiversity conservation organization working in sustainable technologies, not an animal welfare organization. However, CXL leadership is interested in simultaneously promoting biodiversity conservation and animal welfare, and they are excited about the possibility of advancing applied research that make it possible to ethically limit rodent populations to protect biodiversity.  I think this represents the wild animal welfare community’s first realistic opportunity to bring conservation organizations into wild animal welfare work while securing substantial non-EA funding for welfare-improving interventions.  Background Rodenticides cause immense suffering to (likely) hundreds of millions of rats and mice annually through anticoagulation-induced death over several days, while causing significant non-target harm to other animals. In the conservation context, rodenticides are currently used in large-scale island rat and mouse eradications as a way of protecting endemic species. But these rodenticides kill lots of native species in addition to the mice and rats. So advancements in fertility control would be a benefit to both conservation- and welfare-focused stakeholders. CXL is a respected conservation organization with a track record of securing follow-on investments for technologies we support (see some numbers below). We are interested in co-organizing a "Big Think" workshop with WAI and BIWFC. The event will launch an open innovation program (e.g., a prize or a challenge process) to accelerate fertility control development. The program would specifically target island conservation applications where conservation groups are already motivated to replace rodenticides, but would likely