- In his recent 80k podcast, Will argued that we don't have a decision theory that solves the Pascal's Mugging problem
- In a recent post from Greg Lewis, I added a comment on how I think about Pascal's Muggings
- Can someone explain why my comment doesn't count as a decision theory that solves the Pascal's Mugging problem?
I haven't spent a ton of time thinking about this, so I'm expecting someone will be able to easily clarify something that I don't know.
- Will believes we don't have a decision theory to resolve the Pascal's Mugging issue
Here's the quote from the 80k podcast transcript:
Will MacAskill: Where I think, very intuitively, I can produce some guarantee of good — like saving a life, or a one in a trillion trillion trillion trillion trillion trillion trillion chance of producing a sufficiently large amount of value. That expected value — that is, the amount of value multiplied by the probability of achieving it — is even greater than the expected value of saving one life.
Will MacAskill: So what do we do? I’m like, “Eh, it really doesn’t seem you should do the incredibly-low-probability thing.” That seems very intuitive. And then the question is, can you design a decision theory that avoids that implication without having some other incredibly counterintuitive implications? It turns out the answer is no, actually.
Rob Wiblin: Right.
Will MacAskill: Now, this isn’t to say that you should do the thing that involves going after tiny probabilities of enormous amounts of value. It’s just within the state where we can formally prove that there’s a paradox here — that, I’m sorry, the philosophers have failed you: we have no good answer about what to do here.
My resolution is that I think you should do the extremely-low probability thing, as long as it's really true that it actually will have the one in a trillion ... trillion chance of producing the huge amount of value.
The problem is that in practice, you almost certainly never would be confident that that's the case, as opposed to, e.g. the proposed action not really making the huge value outcome more likely at all. Or as opposed to the one minus one in a trillion ... trillion chance outcomes actually dominating.
2. I recently set out a heuristic that makes sense to me
Here's a link to the comment, and I copy and paste almost all of it below.
What I wrote below appears to provide a useful framework for thinking about Pascal's Mugging situations.
The Reversal/Inconsistency Test:
If the logic seems to lead to taking action X, and seems to equally validly lead to taking an action inconsistent with X, then I treat it as a Pascal's Mugging.
Examples:
- Original Pascal's Mugging:
- The original Pascal's Mugging suggests you should give the mugger your 10 livres in the hope that you get the promised 10 quadrillion Utils.
- The test: It seems equally valid that there's an "anti-mugger" out there who is thinking "if Pascal refuses to give the mugger the 10 livres, then I will grant him 100 quadrillion Utils". There is no reason to privilege the mugger who is talking to you, and ignore the anti-mugger whom you can't see.
- Conclusion: fails the Reversal/Inconsistency Test, so treat as a Pascal's Mugging and ignore.
- Extremely unlikely s-risk example:
- I claim that the fart goblins of Smiggledorf will appear in the winter solstice of the year 2027, and magically keep everyone alive for 1 googleplex years, but subject them to constant suffering by having to smell the worst farts you've ever imagined. The smells are so bad that the suffering that each person experiences in one minute is equivalent to 1 million lifetimes of suffering.
- The only way to avoid this horrific outcome is to earn as much money as you can, and donate 90% of your income to a very nice guy with the EA Forum username "sanjay".
- The test: Is there any reason to believe that donating all this money will make the fart goblins less likely to appear, as opposed to more?
- Conclusion: fails the Reversal/Inconsistency Test, so treat as a Pascal's Mugging and ignore.
- Extremely likely x-risk example:
- In the distant land of Utopi-doogle, everyone has a wonderful, beautiful life, except for one lady called Cassie who runs around anxiously making predictions. Her first prediction is incredibly specific and falsifiable, and turns out to be correct. Same for her second, and her third, and after 100 highly specific, falsifiable and incredibly varied predictions, with a 100% success rate, she then predicts that Utopi-doogle will likely explode killing everyone.
- The only way to save Utopi-doogle is for every able-bodied adult to stamp their foot while saying Abracadabra. Unfortunately, you have to get the correct foot -- if some people are stamping their right foot and some are stamping their left foot, it won't work. If everyone is stamping their left foot, this will either mean that Utopi-doogle is saved, or that Utopi-doogle will be instantly destroyed.
- A politician sets up a Left Foot movement arguing that we should try to save Utopi-doogle by arranging a simultaneous left foot stamp.
- The test: The simultaneous left foot stamp has equal chance of causing doom as of saving civilisation.
- Conclusion: fails the Reversal/Inconsistency Test, so treat the politician's suggestion as a Pascal's Mugging and ignore.
- Note, interestingly, that other actions -- such as further research -- are not necessarily a Pascal's Mugging. (Could we ask Cassie about simultaneous stamping of the right foot?)
- How some people perceive AI safety risk:
- Let's assume that, despite recent impressive successes by AI capabilities researchers, human-level AGI has a low (10^-12) chance of happening in the next 200 years
- Let's also concede that, if such AGI arose, humanity would have a <50% chance of survival unless we had solved alignment.
- Let's continue being charitable to the importance of AI safety and assume that in just over 200 years, humanity will reach a state of utopia which last for millenia, as long as we haven't wiped ourselves out before then, which means that extinction in the next 200 years would mean 10^20 lives lost
- The raw maths seems to suggest that work on AI safety is high impact.
- The test: If we really are that far from AGI, can any work we do really help? Are we sure that any AI safety research we do now will actually make safe AI more likely and not less likely? There are a myriad ways we could make things worse, e.g. we could inadvertently further capabilities research; the research field could be path-dependent, and our early mistakes could damage the field more than just leaving it be until we understand the field better, we might realise that we need to include some ethical thinking, but we incorporate the ethics of 2022, and later realise the ethics of 2022 was flawed, etc.
- Conclusion: fails the Reversal/Inconsistency Test, so treat as a Pascal's Mugging and ignore.
- Note that in this scenario, it is true that the AGI scenario is highly unlikely, but the important thing is not that it's unlikely, it's that it's unactionable.
3. Can someone explain why my comment doesn't count as a decision theory that solves the Pascal's Mugging problem?
I'm open to the possibility that there are subtleties about exactly what is meant by a decision theory that I'm not aware of. Or perhaps there are reasons to believe that the heuristic I set out above is flawed (I haven't actually used it in anger much).
Generally I don't think your Reversal/Inconsistency Test is applicable.
If someone with a gun says they will kill me if I don't give them what is in my wallet, I should probably give them what is in my wallet. This is because I (rationally in my opinion) think that this will reduce the chances of them killing me. Of course they might actually be planning to kill me only if I give them the wallet, but given what I know about the world this possibility seems less likely than the alternative. So the best EV calculation I can do tells me to give the money.
I think the above argument also applies to the original Pascal's mugging.
Similarly, whilst both are possibilities, it just seems more likely that AI safety research will make AI more safe than less safe. Generally, from what we know about the world, if people work towards increasing a particular variable they are more likely to increase it than decrease it (all other things equal).
I don’t think your realistic-mugging example is a strong counter to the underlying idea of the response: of course in a realistic mugging it can make sense to hand over your wallet given the widely-accepted plausibility that the person will shoot you, and/or there is no comparably plausible case to make about the opposite outcome (you get shot if and only if you hand over the wallet).
I will grant that some counter-possibilities I and others have given to pascal’s mugging scenarios are not always clearly more plausible (e.g., “if you give your wallet to this crazy person wearing a tin colander and brass gloves, you might not have any money to give to the REAL world-destroying mugger later on”, “there’s a chance the crazy-looking person is testing humanity and will destroy the world if and only if I give him the money”). However, this misses a broader insight I (and maybe also Sanjay?) emphasize, which is that there are many ways in which doing this could instrumentally undermine your pursuit of more-plausible paths to Unfathomably Huge Impact, such as by costing you money/resources, undermining your social/intellectual reputation, making you feel really dumb afterwards, wasting your time and cognitive energy, etc. Given your time constraints for cognition and thought you may not be able to identify any specific reason to offset the mugger’s threat, but that’s fine.
Sure there are costs to giving the wallet away, but if you're an EV-maximiser you probably still should.
You could tell the mugger that you don't want to give him the wallet because you'd feel like an idiot if you did, and that you've got an amazing alternative way to do good with the money. He could then just promise you that he'll create way more utility than originally promised until it makes up for any such costs or missed opportunities (in EV terms).
First, just to clarify, are you still pushing the realistic-mugger objection as a demonstration of your argument? That (combined with the part about AI safety, which I address later) seemed like the bulk of your original argument.
Second, you seem to bring up a separate idea:
Okay, so I immediately further discount the likelihood of that being true (e.g., "if that's true, why didn't you just say so to begin with...") which offsets a large portion of the claimed increase, and I just repeat the original reasoning process: "insofar as there is any credence to assign to what this person is saying, there are better reasons to think that sacrificing my resources, reputation, psychological wellbeing, time, etc. will have worse expected value: if it's actually possible to break the laws of physics and produce a googolplex utilons, it's unlikely that giving over the wallet is the way to achieve that... etc."
Ultimately, the intuition/answer is fairly obvious for me, and intuition is a valid input in a decision framework. (If someone wants to object here and say that decision theories have to eschew such forms of intuition and must strictly rely on fully-fleshed out arguments and/or other formal processes, that just seems to be universally impossible regardless of whether the situation is about Pascal's mugging, a simple prisoner's dilemma, or anything else.)
Also:
Why can't I just copy that reasoning and think "by choosing to not hand over my wallet to Trass, the Tin-and-Brass God of Mugging, I am working towards reducing the likelihood of existential catastrophe, which means I am more likely to be reducing it. QED"
(To be clear, I'm not disputing that it's reasonable to think that working on AI safety generally has positive expected value in terms of increasing AI safety vs. decreasing AI safety, but that's totally detached from Pascalian mugging.)
I used the realistic mugger scenario to illustrate a point I think also applies to the original Pascal's mugging. I'm now discussing the original Pascal's mugging.
In the original Pascal's mugging the key idea is that the mugger can promise more and more utility to overcome any barriers you might have to handing over the wallet. Yes the probability you think the mugger is telling the truth might decrease too, but at some point the probability will stop decreasing or at least won't decrease as fast as the mugger can promise more. Remember the mugger can promise ANY amount of utility and numbers can get unimaginely big. Of course he can easily promise enough to overcome any reasonable decrease in probability.
Not sure what you're getting at. I'm saying that, on average, people trying to achieve a goal will mean they are more likely to achieve it than if they weren't trying to achieve the goal. This was simply to counter Sanjay's point about AI and is indeed detached to the discussion of Pascal's mugging.
You're only responding to the first (and admittedly less important) half of my response. The key issue is that you can just use that same logic against the new claims and still defeat the : for any given increase in the mugger's claims, you can come up with an equally large (but still more probable) increase in opportunity cost or other objections. If the mugger says "no wait, actually I'll give you ten googolplex of utilons instead of just one googolplex, and I'll bet you don't think that less than 10% as likely," I could just say "Hmm, yeah maybe it's only 20% as likely so that would double the expected value, except actually I now think the opportunity costs and other downsides are two to three times greater than what I thought before, so actually the costs still outweigh the gains."
If we want to just end this back and forth, we'll just jump straight to claims of infinity: a 20% chance of achieving an infinitely good outcome has higher expected value (or, is a better option) than a 10% chance of achieving an infinitely good outcome, ceteris paribus.
I don't think I originally saw that part of Sanjay's comment/post, and I agree that there is probably something wrong with his arguments there. I partially thought you were making that point as an additional argument in favor of giving your wallet to the mugger, since you would be doing so in order to try to reduce x-risk.
But do you actually have a good reason to believe the costs and opportunity costs will rapidly increase to unimaginably high levels? Why would this be the case?
The costs and opportunity costs shouldn't vary much (if at all) all depending on the amount the mugger is promising. They are quite independent.
A few responses:
1. One which I already noted:
In other words, there are reasons to believe:
1a. The opportunity costs[1] are correlated with outcome magnitudes that this person is claiming;
1b. The risk-costs[2] are correlated (as above...)
2. No, I don't have that many "good" reasons, but that's been a key point I've tried emphasizing: you don't need good, explicit reasoning to fight completely absurd reasoning, you just need better/less-bad reasoning, which can still be fairly dubious.
Such as "if I give my wallet over now, I won't have money to give to a REAL Trass or Smiggledorfian fart goblin later on," or "if I waste my time and money here, I have less time to ponder the design of the universe and find a flaw in the matrix that allows me to escape and live forever in paradise (or to figure out which supernatural/religious belief is valid and act on that to get infinite utilons)."
Such as, "Actually, this person is testing me, and will do the exact opposite of what they claim if and only if I do what they are telling me to do."
Again I just think the mugger being able to say any number means they can overcome any alternative (in EV terms).
For example, you can calculate the EV of the possibility of meeting a real wizard in a year and them generating an insane amount of utility. Your EV calculation will spit out some sort of number right? Well the mugger in front of you can beat whatever number that is by saying whatever number they need to to beat it.
Or maybe the EV is undefined because you start talking about infinite utility (as you allude to). In this case EV reasoning pretty much breaks down.
Which brings me to the point that rejecting EV reasoning in the first place can get you out of handing over the wallet. Maybe there's a problem with EV reasoning, for example when dealing with infinite utilities, or a problem with EV reasoning when probabilities get really small.
Indeed I'm not sure we should actually hand over the wallet because I'm open to the possibility EV reasoning is flawed, at least in this case. See the St. Petersburg Paradox and the Pasadena Game for potential issues with EV reasoning. I still think EV reasoning is pretty useful in more regular circumstances though.
My response to this is the same as the one I gave originally to Sanjay. I don't think this is a compelling argument.
EDIT: Also see Derek Shiller's comment which I endorse.
Hence why I wrote out "infinite utilons" with such emphasis and in a previous comment also wrote "If we want to just end this back and forth, we'll just jump straight to claims of infinity [...]". But you do continue:
I disagree (at least insofar as I conceptualize EV and if you are just saying "you can't compare fractional infinities"), as I already asserted:
Summarizing your thoughts, you offer the following conclusion:
I'm also somewhat open to the idea that EV reasoning is flawed, especially my interpretation of the concept. However:
You know, I actually think I've gone completely down the wrong lines with my argument and if I could erase my previous comments I would.
The point is that Pascal's mugging is a thought experiment to illustrate a point, and I can alter it as required. So I can remove the possibility of amazing alternatives for the money in the wallet if I have to.
E.g. let's say you have a large amount of money in your wallet and, for whatever reason, you can only use that money to buy yourself stuff you really like. E.g. let's say it's a currency that you know can only be taken by a store that sells fancy cars, clothes and houses. Assume you really do know you can't use the money for anything else (it's a thought experiment so I can make this restriction).
Now imagine that you run into a Pascal's mugging. Your "opportunity costs are correlated with outcome magnitudes that the person is claiming" argument no longer applies in this thought experiment. Do you now hand over the wallet and miss out on all of that amazing stuff you could have bought from the store?
I know this all sounds weird, but thought experiments are weird...
Well, with those extra assumptions I would no longer consider it a Pascalian Mugging, I would probably just consider it a prone-to-mislead thought experiment.
Would I take a 1*10−10 chance of getting 1015 utilons over the option of a 100% chance of 1 utilon? Well, if we assume that:
Then the answer is probably yes?
I have a hard-to-articulate set of thoughts on what makes some thought experiments valuable vs. misleading, with one of them being something along the lines of "It's unhelpful to generate an unrealistic thought experiment and then use the answer/response produced by a framework/system as an argument against using that framework/system in the real world if the argument is (at least implicitly) 'Wow, look at what an unreasonable answer this could produce for real world problems,' especially given how people may be prone to misinterpret such evidence (especially due to framing effects) for frameworks/systems that they are unfamiliar with or are already biased against."
But I don't have time to get into that can of worms, unfortunately.
Pascal's mugging is only there to uncover if there might be a potential problem with using EV maximisation when probabilities get very small. It's a thought experiment in decision theory. For that reason I actually think my altered thought experiment is useful, as I think you were introducing complications that distract from this central message of the thought experiment. Pascal's mugging doesn't in itself say anything about the relevance of these issues to real life. It may be all a moot point at the end of the day.
It sounds to me as if you don't see any issues with EV maximisation when probabilities get very small. So in my altered thought experiment you would indeed give away your wallet to some random dude claiming to be a wizard, thereby giving up all those awesome things from the store. It's worth at least noting that many people wouldn't do the same, and who is right or wrong is where the interesting conundrum lies.
I don’t think many people are capable of actually internalizing all of the relevant assumptions that in real life would be totally unreasonable, nor do most people have a really good sense of why they have certain intuitions in the first place. So, it’s not particularly surprising/interesting that people would have very different views on this question.
Two thoughts:
1.) You really need the probabilities of the mugger and anti-mugger to be nearly exactly equal. if there is a slight edge to believing the mugger rather than the hypothetical anti-mugger, that is enough to get the problem off the ground. There is a case to be made for giving a slight edge to what the mugger says: some smart philosophers think testimony is a basic source of evidence, such that if someone actually says P, that is a (possibly very small) reason to believe P. Even if these philosophers are almost certainly wrong, you shouldn't be 100% confident that they are. The right way to respond to your uncertainty about the epistemic significance of testimony is to give some small edge to the truth of actual testimony you hear vs. hypothetical testimony you make up. That is enough to lead standard decision theory to tell you to hand over the money.
2.) Pascal's mugger issues seem most pressing in cases where our reasons don't look like they might be perfectly balanced. I've suggested some examples. You don't consider any cases where we clearly do have asymmetric reasons supporting very very small probabilities for very very very high expected utilities.
Unless I'm misunderstanding your comment, I think it misses the point (sorry if I've misunderstood). You're pointing to unequal/asymmetric probabilities, but the point of the test is that I can create a payoff which is large enough to outweigh the asymmetry.
I (the person being mugged) am creating the anti-mugger, so I can determine the payoff to be large enough that the anti-mugger wins in expectation.
I'm sorry I only read your post quickly, but it seems that your examples are in fact subject to the reversal/inconsistency test, and also that you acknowledge those issues in your post.
You’re proving too much with your anti-mugger argument. This argument essentially invalidates EV reasoning in all practical cases.
For example, you could use EV reasoning to determine that you should give to an animal charity. But then you could imagine a demon whose sole purpose is to torture everyone on earth for the rest of time if you give to that animal charity. The probability of the demon is very small, but as you say you can make the negative payoff associated with the demon arbitrarily large, so that it becomes a very bad idea to give to the animal charity on EV grounds.
Being able to construct examples such as these means you can never justify doing anything through EV reasoning. So either your argument is wrong, or we give up on EV reasoning altogether.
A bit more detail on the examples from item (2)
Your first example (quantum/many worlds): I don't think it's clear that the quantum worlds example is more likely to be net positive than net negative. You talk about the Many Worlds hypothesis and say that our "power to produce quantum events <...> gives us the power to pretty trivially exponentially increase the total amount of value (for better or worse) in the world by astronomical numbers." (emphasis added). In this case I don't need to apply the reversal/inconsistency test, because the original statement already indicates that it could go either way. I.e. no case is made for the proposed action being net positive.
Your second example (evangelism/Pascal's wager): I think you again acknowledge the problem:
"There are significant complications to Pascal's argument: it isn't clear which religion is right, and any choice with infinite rewards on one view may incur infinite punishments on another which are hard to compare. "
To be more specific, if you decided that converting everyone to religion X was the best choice, I could concoct religion anti-X. Under the doctrine of anti-X, every time you convert someone to religion X, it creates a large infinity* of suffering, and this large infinity is very big.
Sure, you might think that there are asymmetric reasons to believe religion X over religion anti-X. E.g. maybe a billion people believe religion X, whereas I'm the only one supporting religion anti-X, but I've constructed the payoffs to be much larger in favour of religion anti-X to offset this.
* If you really want to get into the details about the large infinity, we could say that each time we convert one person to religion X, we create a large infinity of new humans, and there exists a bijective mapping between that new set of humans and the real number line. Each of the new humans is subjected to an infinity of suffering which is more gruesome than the suffering in the hell of religion X.
My answer is that the intuitions in favour of expected value require enough repetitions for it to even out. If you were facing enough Pascal’s mugging situations such that it would even out, and it wouldn’t encourage more mugging, then I would say that you should pay, otherwise it depends on your values.
I'm not sure I understand. Does this mean that we should not do any work on human extinction work because there have not been enough repetitions (or indeed any repetitions) of previous instances of humanity being made extinct? (or replace extinction with existential and the same argument might apply?)
By "enough repetitions", I meant that it makes more sense to use a straight expected value calculation the higher the ratio of the frequency to the inverse probability. So let's suppose you're playing Russian Roulette with 12 chambers. The inverse probability of dying would be 12 and lets suppose you're going to be playing it 12 times (assuming you don't die first). That is a reasonable ratio so it makes a decent amount of sense to use the expected value. Not that the ratio did not involve the number of times you died, but the number of times the game was played.
Hi Sanjay,
I like the approach to Pascal's Mugging described in this post from GiveWell (search for "Pascal’s Mugging refers"). Based on this analysis from Dario Amodei, given a prior X1 and estimate X2, the expected posterior can be calculated in a bayesian way via the inverse-variance approach:
The above implies that, if the estimate X2 is much more uncertain than the prior X1 (i.e. V(X2)>>V(X1)), E(X) is roughly equal to E(X1). As a result, if someone asks me on the street for 1 $, promising 1 T$ (= E(X2)) in return, it makes sense to ignore the promise. the uncertainty of my prior (e.g. E(X1)= -1 $, and V(X1) = 1 $) would be much smaller than the uncertainty of the promise (V(X2) >> 1 $).
I think your "reversal/inconsistency test" is a way of assessing V(X2). "If the logic seems to lead to taking action X [i.e. being Pascal-mugged], and seems to equally validly lead to taking an action inconsistent with X", the variance of the estimate is probably much larger than that of the prior (V(X2)>>V(X1)).
I don't quite get your right foot/left foot example. Cassie says that Utopi-doogle will soon 'likely' (I'll assume this is somewhere around 70%-90%) explode, killing everyone, and that the only solution is for everyone to stamp the same foot at once; if they guess correctly as to which foot to stamp, they survive, otherwise, they die.
To me, it seems that the politician who starts the Left Foot Movement is attempting to bring them from 'likely' death (again, 70-90%), the case if nothing is done or if the foot stamps aren't coordinated, to a new equilibrium with 50% chance of death; either he is correct and the world is saved, or he is wrong and the world is destroyed.
How is this a Pascal's Mugging? If the politician's movement succeeds, x-risk is reduced significantly, right?
The problem with reposting things that I wrote several weeks ago is that I can no longer remember the rationale that I had in mind when I wrote it! I'll try to jog my memory and give a clearer explanation. (Hopefully the explanation isn't that I just didn't think carefully enough when putting the example together!)
Strong upvoted, since I would very much like to see an answer on this and largely agree with your line of reasoning, (as I indicated in a response to your comment): I don’t see why it’s considered an unsolved problem or paradox, as the solutions I’ve devised and seen make sense to me.
One of the most important implications for Pascal's mugging is that by and large, the "reality is normal" meme from LessWrong and parts of EA is incorrect, that is, extremes, not moderate outcomes win out in the long-run. This has implications for longtermism, but one of the bigger ones is they need to at least admit the possibility of fanatical projects (hopefully kept secret.)
However, we can blunt the force of the bullet we have to bite, and a comment from Harrison Durland shows how: