1438 karmaJoined


Re 1, as Richard says: "Wenar scathingly criticized GiveWell—the most reliable and sophisticated charity evaluators around—for not sufficiently highlighting the rare downsides of their top charities on their front page.8 This is insane: like complaining that vaccine syringes don’t come with skull-and-crossbones stickers vividly representing each person who has previously died from complications. He is effectively complaining that GiveWell refrains from engaging in moral misdirection. It’s extraordinary, and really brings out why this concept matters." 

Re 2: I just don't think this is true.  EAs often note the uncertainty.  

3. This is true but constantly talked about EAs.  Furthermore, I don't know what the alternative is supposed to be--just ignore all non-quantifiable harms. 

This just seemed to be a list of false claims about things GiveWell forgot to consider, a series of ridiculous claims about philosophy, and no attempt to compare the benefits to the costs.  Yes, lots of EA charities have various small downsides, most of which are taken into account, but those are undetectable compared to the hundreds of thousands of lives saved.  He suggests empowering local people, which is a good applause line, but it's vague.  Most local people are not in a position to do high quality comparisons between different interventions.  

Thank you!  It was your speech at the OFTW meeting that largely inspired it. 

We all agree that you should get utility.  You are pointing out that FDT agents get more utility.  But once they are already in the situation where they've been created by the demon, FDT agents get less utility.  If you are the type of agent to follow FDT, you will get more utility, just as if you are the type of agent to follow CDT while being in a scenario that tortures FDTists, you'll get more utility.  The question of decision theory is, given the situation you are in, what gets you more utility--what is the rational thing to do.  Eliezer's turns you into the type of agent who often gets more utility, but that does not make it the right decision theory.  The fact that you want to be the type of agent who does X doesn't make doing X rational if doing X is bad for you and not doing X is rewarded artificially.  

Again, there is no dispute about whether on average one boxers or two boxers get more utility or which kind of AI you should build. 

Wait sorry it’s hard to see the broader context of this comment on account of being on my phone and comment sections being hard to navigate on ea forum. I don’t know if I said eliezer had 100% credence, but if I did, that was wrong.

He didn't quote it--he linked to it.  I didn't quote the broader section because it was ambiguous and confusing.  The reason not accounting for interactionist dualism matters is because it means that he misstates the zombie argument, and his version is utterly unpersuasive. 

The demon case shows that there are cases where FDT loses, as is true of all decision theories.  IF the question is which decision theory will programming into an AI generate most utility, then that's an empirical question that depends on facts about the world.  If it's once you're in a situation which  will get the most utility, well, that's causal decision theory.  

Decision theories are intended as theories of what is rational for you to do.  So it describes what choices are wise and which choices are foolish.  I think Eliezer is confused about what a decision theory is, but that is a reason to trust his judgment less.  

In the demon case, we can assume it's only almost infallible, so every million times it makes a mistake.  The demon case is a better example, because I have some credence in EVT, and EVT entails you should one box.  I am waaaaaaaaaaaay more confident FDT is crazy than I am that you should two box. 

I would agree with the statement "if Eliezer followed his decision theory, and the world was such that one frequently encountered lots of Newcombe's problems and similar, you'd end up with more utility."  I think my position is relatively like MacAskill's in the linked post where he says that FDT is better as a theory of the agent you should want to be than what's rational.  

But I think that rationality won't always benefit you.  I think you'd agree with that.  If there's a demon who tortures everyone who believes FDT, then believing FDT, which you'd regard as rational, would make you worse off.  If there's another demon who will secretly torture you if you one box, then one boxing is bad for you!  It's possible to make up contrived scenarios that punish being rational--and Newcombe's problem is a good example of that.

Notably, if we're in the twin scenario or the scenario that tortures FDTists, CDT will dramatically beat FDT.  

I think the example that's most worth focusing on is the demon legs cut off case.  I think it's not crazy at all to one box, and have maybe 35% credence that one boxing is right.  I have maybe 95% credence that you shouldn't cut off your legs in the demon case, and 80% confidence that the position that you can is crazy, in the sense that if you spent years thinking about it while being relatively unbiased you'd almost certainly give it up. 

I know you said you didn't want to repeatedly go back and forth, but . . . 

Yes, I agree that if you have some psychological mechanism by which you can guarantee that you'll follow through on future promises--like programming an AI--then that's worth it.  It's better to be the kind of agent who follows FDT (in many cases).  But the way I'd think about this is that this is an example of rational irrationality, where it's rational to try to get yourself to do something irrational in the future because you get rewarded for it.  But remember, decision theories are theories about what's rational, not theories about what kind of agent you should be.  

I think we agree with both of the following claims: 

  1. If you have some way to commit in advance to follow FDT in cases like the demon case or the bomb case, you should do so.  
  2. Once you are in those cases, you have most reason to defect.  
  3. Given that you can predict that you'll have most reason to defect, you can sort of psychologically make a deal with your future self where you say "NO REALLY, DON'T DEFECT, I'M SERIOUS."  

My claim though, is that decision theory is about 2, rather than 1 or 3.  No one disputes that the kinds of agents who two box do worse than the kinds of agents who one box--the question is about what you should do once you're in that situation. 

If an AI is going to encounter Newcombe's problem a lot, everyone agrees you should program it to one box. 

Load more