T

tobycrisford

404 karmaJoined

Comments
89

I'm at least finding it useful figuring out exactly where we disagree. Please stop replying if it's taking too much of your time, but not because of the downvotes!

I guess you are imagining that humans either go extinct or have a long future where they go on to realise lots of value.

This isn't quite what I'm saying, depending on what you mean by "lots" and "long". For your "impossible for an intervention to have counterfactual effects for more than a few centuries" claim to be false, we only need the future of humanity to have a non-tiny chance of being longer than a few centuries (not that long), and for there to be conceivable interventions which have a non-tiny chance of very quickly causing extinction. These interventions would then meaningfully affect counterfactual utility for more than a few centuries.

To be more concrete and less binary, suppose we are considering an intervention that has a risk p of almost immediately leading to extinction, and otherwise does nothing. Let U be the expected utility generated in a year, in 500 years time, absent any intervention. If you decide to make this intervention, that has the effect of changing U to (1-p)U, and so the utility generated in that far future year has been changed by pU.

For this to be tiny/non-meaningful, we either need p to be tiny, or U to be tiny (or both).

Are you saying:

  1. There are no concievable interventions someone could make with p non-tiny.
  2. U, expected utility in a year in 500 years time, is approximately 0.
  3. Something else... my setup of the situation is wrong, or unrealistic..?

Thanks for the detailed reply on that! You've clearly thought about this a lot, and I'm very happy to believe you're right on the impact of nuclear war, but It sounds like you are more or less opting for what I called option 1? In which case, just substitute nuclear war for a threat that would literally cause extinction with high probability (say release of a carefully engineered pathogen with high fatality rate, long incubation period, and high infectiousness). Wouldn't that meaningfully affect utility for more than a few centuries? Because there would be literally no one left, and that effect is guaranteed to be persistent! Even if it "just" reduced the population by 99%, that seems like it would very plausibly have effects for thousands of years into the future.

It seems to me that to avoid this, you have to either say that causing extinction (or near extinction level catastrophe) is virtually impossible, through any means, (what I was describing as option 1) or go the other extreme and say that it is virtually guaranteed in the short term anyway, so that counterfactual impact disappears quickly (what I was describing as option 2). Just so I understand what you're saying, are you claiming one of these two things? Or is there another way out that I'm missing?

More broadly, I find it very implausible that an intervention today could meaningully (counterfactually) increase/decrease (after adjusting for noise) expected total hedonistic utility more than a few centuries from now.

 

Causing extinction (or even some planet scale catastrophe with thousand-year plus consequences that falls short of extinction) would be an example of this wouldn't it? Didn't Stanislav Petrov have the opportunity to meaningfully change expected utility for more than a few centuries?

I can only think of two ways of avoiding that conclusion:

  1. Global nuclear war wouldn't actually meaningfully reduce utility for more than a few centuries from when it happens.
  2. Nuclear war, or some similar scale catastrophe, is bound to happen within a few centuries anyway, so that after a few centuries the counterfactual impact disappears. Maybe the fact that stories like Petrov's exist is what allows you to be confident in this.

I think either of these would be interesting claims, although it would now feel to me like you were the one using theoretical considerations to make overconfident claims about empirical questions. Even if (1) is true for global nuclear war, I can just pick a different human-induced catastrophic risk as an example, unless it is true for all such examples, which is an even stronger claim.

It seems implausible to me that we should be confident enough in either of these options that all meaningful change in expected utility disappears after a few centuries.

Is there a third option..?

I do actually think option 2 might have something going for it, it's just that the 'few centuries' timescale maybe seems too short to me. But, if you did go down route 2, then Toby Ord's argument as far as you were concerned would no longer be relying on considerations thousands of years from now. That big negative utility hit he is predicting would be in the next few centuries anyway, so you'd be happy after all?

Your model says that instantly going from q_0 to q_1 is bad, but I do not think real world interventions allow for discontinuous changes in progress. So you would have to compare "value given q_0" with "value given q_1" + "value accumulated in the transition from q_0 to q_1". By neglecting this last term, I believe your model underestimates the value of accelerating progress.

Sure, but the question is what do we change by speeding progress up. I considered the extreme case where we reduce the area under the curve between q_0 and q_1 to 0, in which case we lose all the value we would have accumulated in passing between those points without the intervention.

If we just go faster, but not discontinuously, we lose less value, but we still lose it, as long as that area under the curve has been reduced. The quite interesting thing is that it's the shape of the curve right now that matters, even though the actual reduction in utility is happening far in the future.

The point of my comment was to show that a wide range of different possible models would all exhibit the property Toby Ord is talking about here, even if they involve lots of complexity and randomness. A lot of these models wouldn't predict the large negative utility change at a specific future time, that you find implausible, but would still lead to the exact same conclusion in expectation.

I'm a fan of empiricism, but deductive reasoning has its place too, and can sometimes allow you to establish the existence of effects which are impossible to measure. Note this argument is not claiming to establish a conclusion with no empirical data. It is saying if certain conditions hold (which must be determined empirically) then a certain conclusion follows.

Realised after posting that I'm implicitly assuming you will hit q_1, and not go extinct before. For interventions in progress, this probably has high probability, and the argument is roughly right. To make it more general, once you get to this line:

E(Total value given q_0) = E(Value before first q_1) + E(Value after first q_1)

Next line should be, by Markov:

E(Total future value given q_0) = E(Value before first q_1) + P(hitting q_1) E(Value given q_1)

So:

E(Value given q_1) = (E(Value given q_0) - E(Value before first q_1)) / P(hitting q_1)

Still gives the same conclusion if P(hitting q_1) is close to 1, although can give a very different conclusion if P(hitting q_1) is small (would be relevant if e.g. progress was tending to decrease in time towards extinction, in which case clearly bumping q_0 up to q_1 is better!)

I disagree with you on this one Vasco Grilo, because I think the argument still works even when things a more stochastic.

To make things just slightly more realistic, suppose that progress, measured by q, is a continuous stochastic process, which at some point drops to zero and stays there (extinction). To capture the 'endogenous endpoint' assumption, suppose that the probability of extinction in any given year is a function of q only. And to simplify things, lets assume it's a Markov process (future behaviour depends only on current state, and is independent of past behaviour).

Suppose current level of progress is q_0, and we're considering an intervention that will cause it to jump to q_1. We have, in absence of any intervention:

Total future value given currently at q_0 = Value generated before we first hit q_1 + Value generated after first hitting q_1

By linearity of expectation:

E(Total future value given q_0) = E(Value before first q_1) + E(Value after first q_1)

By Markov property (small error in this step, corrected in reply, doesn't change conclusions):

E(Total future value given q_0) = E(Value before first q_1) + E(Total future value given q_1)

So as long as E(Value before first q_1) is positive, then we decrease expected total future value by making an intervention that increases q_0 to q_1.

It's just the "skipping a track on a record" argument from the post, but I think it is actually really robust. It's not just an artefact of making a simplistic "everything moves literally one year forward" assumption.

I'm not sure how deep the reliance on the Markov property in the above is. Or how dramatically this gets changed when you allow the probability of extinction to depend slightly on things other than q. It would be interesting to look at that more.

But I think this still shows that the intuition of "once loads of noisy unpredictable stuff has happened, the effect of your intervention must eventually get washed out" is wrong.

I'm not sure if I agree with this.

I think your characterization of EAs is spot on, but I don't think it's a bad thing.

I've been loosely involved in a few different social movements (student activism, vegan activism, volunteering for a political party) and what makes EA unique is exactly the attitude that you're describing here. Whenever I went to an EA meetup or discussion group, people spent most of their time discussing things that EA could be getting fundamentally wrong. In my admittedly limited experience, that is really weird! And it's also brilliant! Criticisms of EA, and it's currently popular ideas, are taken extremely seriously, and in good faith.

I think a necessary consequence of this attitude is that the EA label becomes something people adopt only with a healthy degree of embarrassment and apologeticness. It is not a badge of pride. Because as soon as it becomes an identity to be proud of, it becomes much harder and emotionally draining to carefully consider the important criticisms of it.

I think you are right that there are probably downsides to this attitude as well. But I worry about what it would mean for the EA movement if it ever lost it.

I feel like I should also acknowledge the irony in the fact that in this particular context, it is you who are criticizing an aspect of the EA movement, and me who is jumping to defend it and sing its virtues! I'm not sure what this means but it's a bit too meta for my liking so I'll end my comment there!

Thanks for your interesting thoughts on this!

On the timelines question, I know Chollet argues AGI is further off than a lot of people think, and maybe his views do imply that in expectation, but it also seems to me like his views introduce higher variance into the prediction, and so would also allow for the possibility of much more rapid AGI advancement than the conventional narrative does.

If you think we just need to scale LLMs to get to AGI, then you expect things to happen fast, but probably not that fast. Progress is limited by compute and by data availability.

But if there is some crucial set of ideas yet to be discovered, then that's something that could change extremely quickly. We're potentially just waiting for someone to have a eureka moment. And we'd be much less certain what exactly was possible with current hardware and data once that moment happens. Maybe we could have superhuman AGI almost overnight?

This a really interesting way of looking at the issue!

But is PASTA really equivalent to "a system that can automate the majority of economically valuable work"? If it specifically is supposed to mean the automation of innovation, then that sounds closer to Chollet's definition of AGI to me: "a system that can efficiently acquire new skills and solve open-ended problems"

Thanks for this interesting summary! These are clearly really powerful arguments for biting the bullet and accepting fanaticism. But does this mean that Hayden Wilkinson would literally hand over their wallet to a pascal mugger, if someone attempted to Pascal mug them? Because Pascals mugger doesn't have to be a thought experiment. It's a script you could literally say to someone in real life, and I'm assuming that if I tried it on a philosopher advocating for fanaticism, then I wouldn't actually get their wallet. Why is that? What's the argument that lets you not follow through on that in practice?

Load more