WM

William_MacAskill

10480 karmaJoined

Sequences
1

The Better Futures Series

Comments
264

I, of course, agree

One additional point, as I'm sure you know,  is that potentially you can also affect P(things go really well | AI takeover). And actions to increase ΔP(things go really well | AI takeover) might be quite similar to actions that increase ΔP(things go really well | no AI takeover). If so, that's an additional argument for those actions compared to affecting ΔP(no AI takeover).


Re the formal breakdown, people sometimes miss the BF supplement here which goes into this in a bit more depth. And here's an excerpt from a forthcoming paper, "Beyond Existential Risk", in the context of more precisely defining the "Maxipok" principle. What it gives is very similar to your breakdown, and you might find some of the terms in here useful (apologies that some of the formatting is messed up):

"An action x’s overall impact (ΔEVx) is its increase in expected value relative to baseline.  We’ll let C refer to the state of existential catastrophe, and b refer to the baseline action. We’ll define, for any action x: Px=P[¬C | x] and Kx=E[V |¬C, x]. We can then break overall impact down as follows:

ΔEVx = (Px – Pb) Kb+ Px(Kx– Kb)  

We call (Px – Pb) Kb the action’s existential impact and Px(Kx– Kb)  the action’s trajectory impact. An action’s existential impact is the portion of its expected value (relative to baseline) that comes from changing the probability of existential catastrophe; an action’s trajectory impact is the portion of its expected value that comes from changing the value of the world conditional on no existential catastrophe occurring.

We can illustrate this graphically, where the areas in the graph represent overall expected value, relative to a scenario with a guarantee of catastrophe:




With these in hand, we can then define:

Maxipok (precisified): In the decision situations that are highest-stakes with respect to the longterm future, if an action is near‑best on overall impact, then it is close-to-near‑best on existential impact.

 



[1] Here’s the derivation. Given the law of total expectation:

E[V|x] = P(¬C | x)E[V |¬C, x] + P(C | x)E[V |C, x]

To simplify things (in a way that doesn’t affect our overall argument, and bearing in mind that the “0” is arbitrary), we assume that E[V |C, x] = 0, for all x, so:

E[V|x] = P(¬C | x)E[V |¬C, x] 

And, by our definition of the terms:

P(¬C | x)E[V |¬C, x]  = PxKx

So:

ΔEVx=  E[V|x] – E[V|b] = PxKx – PbK

Then adding (PxKb – PxKb) to this and rearranging gives us:

ΔEVx = (Px–Pb)Kb + Px(Kx–Kb)"
 

(Also, thank you for doing this analysis, it's great stuff!)

Rutger Bregman isn’t on the Forum, but sent me this message and gave me permission to share:

Great piece! I strongly agree with your point about PR. EA should just be EA, like the Quakers just had to be Quakers and Peter Singer should just be Peter Singer.

Of course EA had to learn big lessons from the FTX saga. But those were moral and practical lessons so that the movement could be proud of itself again. Not PR-lessons. The best people are drawn to EA not because it’s the coolest thing on campus, but because it’s a magnet for the most morally serious + the smartest people.

As you know, I think EA is at it’s best when it’s really effective altruism (“I deeply care about all the bad stuff in the world, desperately want to make it difference, so I gotta think really fcking hard about how I can make the biggest possible difference”)  and not altruistic rationalism  (“I’m super smart, and I might as well do a lot of good with it”).

This ideal version EA won’t appeal to all super talented people of course, but that’s fine. Other people can build other movements for that. (It’s what we’re trying to do at The School for Moral Ambition..)

Argh, thanks for catching that! Edited now.

If this perspective involves a strong belief that AI will not change the world much, then IMO that's just one of the (few?) things that are ~fully out of scope for Forethought

 

I disagree with this. There would need to be some other reason for why they should work at Forethought rather than elsewhere, but there are plausible answers to that — e.g. they work on space governance, or they want to write up why they think AI won't change the world much and engage with the counterarguments. 

I can't speak to the "AI as a normal technology" people in particular, but a shortlist I created of people I'd be very excited about includes someone who just doesn't buy at all that AI will drive an intelligence explosion or explosive growth.

I think there are lots of types of people where it wouldn't be a great fit, though. E.g. continental philosophers; at least some of the "sociotechnical" AI folks; more mainstream academics who are focused on academic publishing. And if you're just focused on AI alignment, probably you'll get more at a different org than you would at Forethought. 

More generally, I'm particularly keen on situations where V(X, Forethought team) is much greater than than V(X) + V(Forethought team), either because there are synergies between X and the team, or because X is currently unable to do the most valuable work they could in any of the other jobs they could be in. 

Thanks for writing this, Lizka! 

Some misc comments from me:

  • I have the worry that people will see Forethought as "the Will MacAskill org", at least to some extent, and therefore think you've got to share my worldview to join. So I want to discourage that impression! There's lots of healthy disagreement within the team, and we try to actively encourage disagreement. (Salient examples include disagreement around: AI takeover risk; whether the better futures perspective is totally off-base or not;  moral realism / antirealism; how much and what work can get punted until a later date; AI moratoria / pauses; whether deals with AIs make sense; rights for AIs; gradual disempowerment).
  • I think from the outside it's probably not transparent just how involved some research affiliates or other collaborators are, in particular Toby Ord, Owen Cotton-Barratt, and Lukas Finnveden.
  • I'd in particular be really excited for people who are deep in the empirical nitty-gritty — think AI2027 and the deepest criticisms of that; or gwern; or Carl Shulman; or Vaclav Smil. This is something I wish I had more skill and practice in, and I think it's generally a bit of a gap in the team.
  • While at Forethought, I've been happier in my work than I have in any other job. That's a mix of: getting a lot of freedom to just focus on making intellectual progress rather various forms of jumping through hoops; the (importance)*(intrinsic interestingness) of the subject matter; the quality of the team; the balance of work ethic and compassion among people — it really feels like everyone has each other's back; and things just working and generally being low-drama. 

I'm not even sure your arguments would be weak in that scenario. 

Thanks - classic Toby point!  I agree entirely that you need additional assumptions.

I was imagining someone who thinks that, say, there's a 90% risk of unaligned AI takeover, and a 50% loss of EV of the future from other non-alignment issues that we can influence. So EV of the future is 5%.

If so, completely solving AI risk would increase the EV of the future to 50%; halving both would increase it only to 41%.

But, even so, it's probably easier to halve both than to completely eliminate AI takeover risk, and more generally the case for a mixed strategy seems strong. 

Load more