Existential Choices Symposium with Will MacAskill and other special guests (3-5pm GMT Monday)

Toby Tremlett🔹

EA Forum Bot Site
EA Forum

Hide table of contents

Best of: Existential Choices Week

Existential Choices Symposium with Will MacAskill and other special guests (3-5pm GMT Monday)

by Toby Tremlett🔹

14th Mar 20252 min read 154

70

Cause prioritizationExistential riskBuilding effective altruismExistential Choices Debate WeekAnnouncements and updatesEffective Altruism ForumEvents on the EA Forum

Frontpage

Existential Choices Symposium with Will MacAskill and other special guests (3-5pm GMT Monday)

How it works:

Brief bios for participants (mistakes mine):

NB- To help conversations happen smoothly, I'd recommend sticking to one idea per top-level comment (even if that means posting multiple comments at once).

154 comments

For Existential Choices Debate Week, we’re trying out a new type of event: the Existential Choices Symposium. It'll be a written discussion between invited guests and any Forum user who'd like to join in.

How it works:

Any forum user can write a top-level comment that asks a question or introduces a consideration, the answer of which might affect people’s answer to the debate statement^[1]. For example: “Are there any interventions aimed at increasing the value of the future that are as widely morally supported as extinction-risk reduction?” You can start writing these comments now.
The symposium’s signed-up participants, Will MacAskill, Tyler John, Michael St Jules, Andreas Mogensen and Greg Colbourn, will respond to questions, and discuss them with each other and other forum users, in the comments.
To be 100% clear - you, the reader, are very welcome to join in any conversation on this post. You don't have to be a listed participant to take part.

This is an experiment. We’ll see how it goes and maybe run something similar next time. Feedback is welcome (message me with feedback here).

The symposium participants will be online between 3 - 5 pm GMT on Monday the 17th.

Brief bios for participants (mistakes mine):

Will MacAskill is an Associate Professor of moral philosophy at the University of Oxford, and Senior Research Fellow at Forethought. He wrote the books Doing Good Better, Moral Uncertainty, and What We Owe The Future. He is the cofounder of Giving What We Can, 80,000 Hours, Centre for Effective Altruism and the Global Priorities Institute.
Tyler John is an AI researcher, grantmaker, and philanthropic advisor. He is an incoming Visiting Scholar at the Cambridge Leverhulme Centre for the Future of Intelligence and an advisor to multiple philanthropists. He was previously the Programme Officer for emerging technology governance and Head of Research at Longview Philanthropy. Tyler holds a PhD in philosophy from Rutgers University—New Brunswick, where his dissertation focused on longtermist political philosophy and mechanism design, and the case for moral trajectory change.
Michael St Jules is an independent researcher, who has written on “philosophy of mind, moral weights, person-affecting views, preference-based views and subjectivism, moral uncertainty, decision theory, deep uncertainty/cluelessness and backfire risks, s-risks, and indirect effects on wild animals”.
Andreas Mogensen is a Senior Research Fellow in Philosophy at the Global Priorities Institute, part of the University of Oxford’s Faculty of Philosophy. His current research interests are primarily in normative and applied ethics. His previous publications have addressed topics in meta-ethics and moral epistemology, especially those associated with evolutionary debunking arguments.
Greg Colbourn is the founder of CEEALAR and is currently a donor and advocate for Pause AI, which promotes a global AI moratorium. He has also supported various other projects in the space over the last 2 years.

Thanks for reading! If you'd like to contribute to this discussion, write some questions below which could be discussed in the symposium.

NB- To help conversations happen smoothly, I'd recommend sticking to one idea per top-level comment (even if that means posting multiple comments at once).

^{^}
You can find the debate statement, and all its caveats, here.

70 Reactions

Discussion Thread: Existential Choices Debate Week

176 comments43 karma

Beyond Extinction: Revisiting the Question and Broadening Our View

3 comments36 karma

Mentioned in

92Moral error as an existential risk

56EA Forum update (June 2025)

Comments154

Sorted by

New & upvoted

Click to highlight new comments since: Today at 12:35 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

OscarD🔸Mar 1725

A broader coalition of actors will be motivated to pursue extinction prevention than longtermist trajectory changes.^[1] This means:

Extinction risk reduction work will be more tractable, by virtue of having broader buy-in and more allies.
Values change work will be more neglected.^[2]

Is this a reasonable framing - if so which effect dominates or how can we reason through this?

^{^}
For instance, see Scott Alexander on the benefits of extinction risk as a popular meme compared to longtermism.
^{^}
I argeud for something similar here.

William_MacAskillMar 1715

I agree with the framing.

Quantitatively, the willingness to pay to avoid extinction even just from the United States is truly enormous. The value of a statistical life in the US — used by the US government to estimate how much US citizens are willing to pay to reduce their risk of death — is around $10 million. The willingness to pay, therefore, from the US as a whole, to avoid a 0.1 percentage point of a catastrophe that would kill everyone in the US, is over $1 trillion. I don’t expect these amounts to be spent on global catastrophic risk reduction, but they show how much latent desire there is to reduce global catastrophic risk, which I’d expect to become progressively mobilised with increasing indications that various global catastrophic risks, such as biorisks, are real. [I think my predictions around this are pretty different than some others, who expect the world to be almost totally blindsided. Timelines and gradualness of AI takeoff is of course relevant here.]

In contrast, many areas of better futures work are likely to remain extraordinarily neglected. The amount of even latent interest in, for example, ensuring that resources outside of our solar system are put to ... (read more)

tylermjohn

Mar 19

A cynical and oversimplified — but hopefully illuminating — view (and roughly my view) is that trajectory changes are just longterm power grabs by people with a certain set of values (moral, epistemic, or otherwise). One argument in the other direction is that lots of people are trying to grab power — it's all powerful people do! And conflict with powerful people over resources is a significant kind of non-neglectedness. But very few people are trying to control the longterm future, due to (e.g.) hyperbolic discounting. So on this view, neglectedness provisionally favours trajectory changes that don't reallocate power until the future, so that they are not in competition with people seeking power today. A similar argument would apply to other domains where power can be accrued but where competitors are not seeking power.

Andreas_Mogensen

Mar 17

One reason you might believe in a difference in terms of tractability is the stickiness of extinction, and the lack of stickiness attaching to things like societal values. Here's very roughly what I have in mind, running roughshod over certain caveats and the like. The case where we go extinct seems highly stable, of course. Extinction is forever. If you believe some kind of 'time of perils' hypothesis, surviving through such a time should also result in a scenario where non-extinction is highly stable. And the case for longtermism arguably hinges considerably on such a time of perils hypothesis being true, as David argues. By contrast, I think it's natural to worry that efforts to alter values and institutions so as to beneficially effect the very long-run by nudging us closer to the very best possible outomces are far more vulnerable to wash-out. The key exception would be if you suppose that there will be some kind of lock-in event. So does the case for focusing on better futures work hinge crucially, in your view, on assigning significant confidence to lock-in events occuring within the near-term?

William_MacAskill

Mar 17

Yeah, I think that lock-in this century is quite a bit more likely than extinction this century. (Especially if we're talking about hitting a points of no return for total extinction.) That's via two pathways: - AGI-enforced institutions (including AGI-enabled immortality of rulers). - Defence-dominance of star systems I do think that "path dependence" (a broader idea than lock-in) is a big deal, but most of the long-term impact of that goes via a billiards dynamic: path-dependence on X, today, affects some lock-in event around X down the road. (Where e.g. digital rights and space governance are plausible here.)

Andreas_Mogensen

Mar 17

I think my gut reaction is to judge extinction this century as at least as likely as lock-in, though a lot might depend on what's meant by lock-in. But I also haven't thought about this much!

Christopher Clay

Mar 17

I see the argument about the US Government's statistical value of a life used a lot - and I'm not sure if I agree. I don't think it echoes public sentiment - rather a government's desire to remove itself of blame. Note how much more is spent per life on say, air transport than disease prevention.

Toby Tremlett🔹Mar 1811

Yeah I've always been a bit sceptical of this as well. Surely it's just a yardstick that a department uses to decide between which investments it should make, rather than a considered (or even descriptive) "value of a life" for the US Government.
Descriptively - the US government would spend far more for a few lives if those lives were hostages of a foreign adversary, and probably has far less willingness to pay for cheap ways the US govt could save lives (idk what these are, probably there are examples in public health).
Basically - I don't think it's a number that can be meaningfully extrapolated to figure out the value of avoiding extinction or catastrophe, because the number was designed with far smaller trade-offs in mind, and doesn't really make sense outside of its intended purpose.

David_Moss

Mar 17

This might vary between: * The level of the abstract memes: * I agree "reducing risk of extinction (potentially in the near term)" may be more appealing than "longtermist trajectory change" * The level of concrete interventions: * "Promoting democracy" (or whatever one decides promotes long term value) might be more appealing than "reducing risk from AI"[1] (though there is likely significant variation within concrete interventions). 1. ^ Though our initial work does not suggest this.

finmMar 1723

The main question of the debate week is: “On the margin, it is better to work on reducing the chance of our extinction than increasing the value of the future where we survive”.

Where “our” is defined in a footnote as “earth-originating intelligent life (i.e. we aren’t just talking about humans because most of the value in expected futures is probably in worlds where digital minds matter morally and are flourishing)”.

I'm interested to hear from the participants how likely they think extinction of “earth-originating intelligent life” really is this century. Note this is not the same as asking what your p(doom) is, or what likelihood you assign to existential catastrophe this century.

My own take is that literal extinction of intelligent life, as defined, is (much) less than 1% likely to happen this century, and this upper-bounds the overall scale of the “literal extinction” problem (in ITN terms). I think this partly because the definition counts AI survival as non-extinction, and I truly struggle to think of AI-induced catastrophes leaving only charred ruins, without even AI survivors. Other potential causes of extinction, like asteroid impacts, seem unlikely on their own terms. As s... (read more)

William_MacAskill

Mar 17

I think what we should be talking about is whether we hit the "point of no return" this century for extinction of Earth-originating intelligent life. Where that could mean: Homo sapiens and a most other mammals get killed off in an extinction event this century; then technologically-capable intelligence never evolves again on Earth; so all life dies off within a billion years or so. (In a draft post that you saw of mine, this is what I had in mind.) The probability of this might be reasonably high. There I'm at idk 1%-5%.

OscarD🔸

Mar 17

Notably, the extinction event in this scenario is non-AI related I assume? And needs to occur before we have created self-sufficient AIs.

Toby Tremlett🔹

Mar 17

I think the "earth-originating intelligent life" term should probably include something that indicates sentience/ moral value. Perhaps you could read that into "intelligent" but that feels like a stretch. But I didn't want to imply that a world with no humans but many non-conscious AI systems would count as anything but an extinction scenario - that's one of the key extinction scenarios.

tylermjohn

Mar 17

Yeah, do you have other proposed reconceptualisations of the debate? One shower thought I've had is that maybe we should think of the debate as about whether to focus on ensuring that humans have final control over AI systems or ensuring that humans do good things with that control. But this is far from perfect.

Andreas_Mogensen

Mar 17

If we assume that natural risks are negligible, I would guess that this probably reduces to something like the question of what probability you put on extinction or existential catastrophe due to anthropogenic biorisk? Since biorisk is likely to leave much of the rest of Earthly life unscathed, it also hinges on what probability you assign to something like "human level intelligence" evolving anew. I find it reasonably plausible that a cumulative technological culture of the kind that characterizes human beings is unlikely to be a convergent evolutionary outcome (for the reasons given in Powell, Contingency and Convergence), and thus if human beings are wiped out, there is very little probability of similar traits emerging in other lineages. So human extinction due to a bioengineered pandemic strikes me as maybe the key scenario for the extinction of earth-originating intelligent life. Does that seem plausible?

Davidmanheim

Mar 17

I would add the (in my view far more likely) possibility of Yudkowskian* paperclipping via non-sentient AI, which given our currently incredibly low level of control of AI systems, and the fact that we don't know how to create sentience, seems like the most likely default. *) Specifically, the view that paperclipping occurs by default from any complex non-satiable implicit utility function, rather than the Bostromian paperclipping risk of accidentally giving a smart AI a dumb goal.

Greg_Colbourn ⏸️

Mar 17

I think it hinges on whether our AI successors would be counted as "life", or whether they "matter morally". I think the answer is likely no to both[1]. Therefore the risk of extinction boils down to risk of misaligned ASI wiping out the biosphere. Which I think is ~90% likely this century on the default trajectory, absent a well enforced global moratorium on ASI development. 1. ^ Or at least "no" to the latter; if we consider viruses to be life that don't matter morally, or are in fact morally negative, we can consider (default rogue) ASI to be similar.

OscarD🔸Mar 1716

If the true/best/my subjective axiology is linear in resources (e.g. total utilitarianism), lots of 'good' futures will probably capture a very small fraction of how good the optimal future could have been. Conversely, if axiology is not linear in resources (e.g. intuitive morality, average utilitarianism), good futures seem more likely to be nearly optimal. Therefore whether axiology is linear in resources is one of the cruxes for the debate week question.

Discuss.

William_MacAskill

Mar 17

The easiest way, in my view, to make a near-optimal future very likely, conditional on non-extinction, is if value is bounded above. There's an argument that this is the common sense view. E.g. consider: Common-sense Eutopia: In the future, there is a very large population with very high well-being; those people are able to do almost anything they want as long as they don’t harm others. They have complete scientific and technological understanding. War and conflict are things of the past. Environmental destruction has been wholly reversed; Earth is now a natural paradise. However, society is limited only to the solar system, and will come to an end once the Sun has exited its red giant phase, in about five billion years. Does this seem to capture less than one 10^22th of all possible value? (Because there are ~10^22 affectable stars, so civilisation could be over 10^22 times as big). On my common-sense moral intuitions, no. Making this argument stronger: Normally, quantities of value are defined are in terms of the value of risky gambles. So what it means to say that Common-sense Eutopia is less than one 10^22th of all possible value is that a gamble with a one in 10^22 chance of producing an ideal-society-across-all-the-stars, and a (1 - 1/10^22) chance of near-term extinction, is better than producing Common-sense Eutopia for certain. But that seems wild. Of all the issues facing classical utilitarianism, this seems the most problematic to me.

tylermjohn

Mar 17

Yes. So you doubt fanaticism, the view that a tiny chance of an astronomically good outcome can be more valuable than a certainty of a decent outcome. What about in the case of certainty? Do you doubt the utilitarian's objection to Common-sense Eutopia? This kind of aggregation seems important for the case for longtermism. (See the pages of paper dolls in What We Owe the Future.)

William_MacAskill

Mar 17

Yeah, I think the issue (for me) is not just about fanaticism. Give me Common-sense Eutopia or a gamble with a 90% chance of extinction and a 10% chance of Common-sense Eutopia 20 times the size, and it seems problematic to choose the gamble. (To be clear - other views, on which value is diminishing, are also really problematic. We're in impossibility theorem territory, and I see the whole things as a mess; I don't have a positive view I'm excited about.) Re WWOTF: You can (and should) think that there's huge amounts of value at stake in the future, and even think that there's much more value at stake in the future than there is in the present century, without thinking that value is linear in number of happy people. It diminishes the case a bit, but nowhere near enough for longtermism to not go through.

Lukas_Gloor

Mar 17

As you say you can block the obligation to gamble and risk Common-sense Eutopia for something better in different ways/for different reasons. For me, Common-sense Eutopia sounds pretty appealing because it ensures continuity for existing people. Considering many people don't have particularly resource-hungry life goals, Common-sense Eutopia would score pretty high on a perspective where it matters what existing people want for the future of themselves and their loved ones. Even if we say that other considerations besides existing people also matter morally, we may not want those other considerations to just totally swamp/outweigh how good Common-sense Eutopia is from the perspective of existing people.

tylermjohn

Mar 17

Sure you could have a view that it's great to have 10^12 people, but no more than that, but that seems like a really weird thing to have written in the stars. Or that all that matters is creating the Machine God, so we haven't attained any value yet. But that doesn't seem great. Do you have a gloss on a kind of view that threads the needle nicely without being too crazy, even if it doesn't ultimately withstand scrutiny?

Davidmanheim

Mar 17

How much do you think that having lots of mostly or entirely identical future lives is differently valuable than having vastly different positive lives? (Because that would create a reasonable view on which a more limited number of future people can saturate the possible future value.)

OscarD🔸

Mar 17

Bostrom discusses things like this in Deep Utopia, under the label of 'interestingness' (where even if we edit post-humans to never be subjectively bored, maybe they run out of 'objectively interesting' things to do and this leads to value not being nearly as high as it could otherwise be). I don't think he takes a stance on whether or how much interestingness actually matters, but I am only ~half way through the book so far.

Seth Herd

Mar 19

This seems almost exactly like the repugnant conclusion. Taken to extremes, intuition disagrees with logic. When that happens, it's usually the worse for intuition. I'm not a utilitarian, but I find the repugnant conclusion impossible to reject if you are. If you want chose what is good for everyone, there's little argument what that is in those cases. And if we're talking about what's good for everyone, that's got to be a linear sum of what's good for each someone. If the sum is nonlinear, who exactly is worth less than the others? This leads to the repugnant conclusion and your conclusion here. Other definitions of "good for everyone" seem to always mean "what I idiosyncratically prefer for everyone else but me".

tylermjohn

Mar 17

There are funky axiologies where value is superlinear in resources — basically any moral worldview that embraces holism. If you think that the whole, arranged a particular way, is more valuable than the parts, then you will be even more precious about how precisely the world should be arranged than the total utilitarian.

Michael St Jules 🔸

Mar 17

Since others have discussed the implications, I want to push a bit on the assumptions. I worry that non-linear axiologies[1] end up endorsing egoism, helping only those whose moral patienthood you are most confident in or otherwise prioritizing them far too much over those of less certain moral patienthood. See Oesterheld, 2017 and Tarsney, 2023. (I also think average utilitarianism in particular is pretty bad, because it would imply that if the average welfare is negative (even torturous), adding bad lives can be good, as long as they're even slightly less bad than average.) Maybe you can get around this with non-aggregative or partially aggregative views. EDIT: Or, if you're worried about fanaticism, difference-making views. 1. ^ Assuming completeness, transitivity and the independence of irrelevant alternatives and each marginal moral patient matters less.

OscarD🔸

Mar 17

I also think average utilitarianism doesn't seem very plausible. I was just using it as an example of a non-linear theory (though as Will notes if any individual is linear in resources so is the world as a whole, just with a smaller derivative).

William_MacAskill

Mar 17

Unpacking this: on linear-in-resources (LIR) views, we could lose out on most value if we (i) capture only a small fraction of resources that we could have done, and/or (ii) use resources in a less efficient way than we could have done. (Where on a LIR view, there is some use of resources that has the highest value/unit of resources, and everything should be used in that way.) Plausibly at least, only a tiny % of possible ways of using resources are close to the value produced by the highest value/unit of resources use. So, the thinking goes, merely getting non-extinction isn't yet getting you close to a near-best future - instead you really need to get from a non-extinction future to that optimally-used-resources future, and if you don't then you lose out on almost all value.

William_MacAskill

Mar 17

Average utilitarianism is approx linear in resources as long as at least one possible individual's wellbeing is linear in resources. I.e. we create Mr Utility Monster, who has wellbeing that is linear in resources, and give all resources to benefiting Mr Monster. Total value is the same as it would be under total utilitarianism, just divided by a constant (namely, the number of people who've ever lived).

Andreas_Mogensen

Mar 17

I wasn't sure if it's really useful to think about value being linear in resources on some views. If you have a fixed population and imagine increasing the resources they have available, I assume that the value of the outcome is a strictly concave function of the resource base. Doubling the population might double the value of the outcome, although it's not clear that this constitutes a doubling of resources. And why should it matter if the relationship between value and resources is strictly concave? Isn't the key question something like whether there are potentially realizable futures that are many orders of magnitude more valuable than the default or where we are now? Answering yes seems compatible with thinking that the function relating resources to value is strictly concave and asymptotes, so long as it asymptotes somewhere suitably high up on the scale of value.

William_MacAskill

Mar 17

Certainly given current levels of technology, but perhaps not given future technology (e.g. indefinite life-extension technology), at least if individual wellbeing is proportional to number of happy years lived. "Doubling the population might double the value of the outcome, although it's not clear that this constitutes a doubling of resources." I was thinking you'd need twice as many resources to have twice as many people? "And why should it matter if the relationship between value and resources is strictly concave? Isn't the key question something like whether there are potentially realizable futures that are many orders of magnitude more valuable than the default or where we are now? Answering yes seems compatible with thinking that the function relating resources to value is strictly concave and asymptotes, so long as it asymptotes somewhere suitably high up on the scale of value. " Yes, in principle, but I think that if you have the upper-bound view, then you do so on the basis of common-sense intuition. But if so, then I think the upper bound is probably really low in cosmic scales - like, if we already have a Common Sense Eutopia within the solar system, I think we'd be more than 50% of the way from 0 to the upper bound.

Andreas_Mogensen

Mar 17

Another reason you might have an upper bound is that the axioms of expected utility theory require your utility function to be bounded given the most natural generalization to the case of countably infinite gambles.

William_MacAskill

Mar 17

Agreed!

Michael St Jules 🔸Mar 1714

Starting my own discussion thread.

My biggest doubt for the value of extinction risk reduction is my (asymmetric) person-affecting intuitions: I don't think it makes things better to ensure future people (or other moral patients) come to exist for their own sake or the sake of the value within their own lives. But if future people will exist, I want to make sure things go well for them. This is summarized by the slogan "Make people happy, not make happy people".

If this holds, then extinction risk reduction saves the lives of people who would otherwise die in an extinction event, which is presumably good for them, but this is only billions of humans.^[1] If we don't go extinct, then the number of our descendant moral patients could be astronomical. It therefore seems better to prioritize our descendant moral patients conditional on our survival because there are far far more of them.

Aliens (including alien artificial intelligence) complicate the picture. We (our descendants, whether human, AI or otherwise) could

use the resources aliens would have otherwise for our purposes instead of theirs, i.e. replace them,
help them, or
harm them or be harmed by them, e.g. through conflict.

&nbs... (read more)

William_MacAskill

Mar 17

Surely any of our actions changes who exists in the future? So we aren't in fact benefiting them? (Whereas we can benefit specific aliens, e.g. by leaving our resources for them - our actions today don't affect the identities of those aliens.)

Michael St Jules 🔸

Mar 17

Yes, we probably aren't benefitting future individuals in a strict and narrow person-affecting sense much or at all. However, there are some other person-affecting views that are concerned with differences for future moral patients: 1. On wide person-affecting views, if Alice would have a better life than Bob, then it's better for Alice to come to exist than for Bob to come to exist, all else equal (the Nonidentity problem). This doesn't imply it's better to ensure Alice or Bob exists than neither does, though. (See Thomas, 2019 or Meacham, 2012 for examples). 2. On asymmetric person-affecting views, it can still be good to prevent bad lives. (This needn't imply antinatalism, because it could be that good lives can offset bad lives (Thomas, 2019, Pummer, 2024).)

Andreas_Mogensen

Mar 17

I don't remember 100%, but I think that Thomas and Pummer might both not be arguing for or articulating an axiological theory that ranks outcomes as better or worse, but rather a non-consequentialist theory of moral obligations/oughts. For my own part, I think views like that are a lot more plausible, but the view that it doesn't make the outcome better to create additional happy lives seems to me very hard to defend.

Michael St Jules 🔸

Mar 17

I think Thomas did not take a stance on whether it was axiological or deontic in the GPI working paper, and instead just described the structure of a possible view. Pummer described his as specifically deontic and not axiological. I'm not sure what should be classified as axiological or how important the distinction is. I'm certainly giving up the independence of irrelevant alternatives, but I think I can still rank outcomes in a way that depends on the option set.

Andreas_Mogensen

Mar 17

I tend to be sceptical of appeals to value as option-set dependent as a means of defending person-affecting views, for the reason that we needn't imagine outcomes as things that someone is able to choose to bring about, as opposed to just something that happens to be the case. If you imagine the possible outcomes this way, then you can't appeal to option-set dependence to block the various arguments, since the outcomes are not options for anyone to realize. And if, say, it makes the outcome better if an additional happy person happens to exist without anyone making it so, then it is hard to see why it should be otherwise when someone brings about that the additional happy person exists. (Compare footnote 9 in this paper/report.)

Michael St Jules 🔸

Mar 17

Hmm, interesting. I think this bit from the footnote helped clarify, since I wasn't sure what you meant in your comment: I might be inclined to compare outcome distributions using the same person-affecting rules as I would for option sets, whether or not they're being chosen by anyone. I think this can make sense on actualist person-affecting views, illustrated with my "Best in the outcome argument"s here, which is framed in terms of betterness (between two outcome distributions) and not choice. (The "Deliberation path argument" is framed in terms of choice.) Then, I'd disagree with this:

William_MacAskill

Mar 17

Nice argument, I hadn't heard that before!

Andreas_Mogensen

Mar 17

I'm pretty sure that Broome gives an argument of this kind in Weighing Lives!

Andreas_Mogensen

Mar 17

I tend to think that the arguments against any theory of the good that encodes the intuition of neutrality are extremely strong. Here's one that I think I owe to Teru Thomas (who may have got it from Tomi Francis?). Imagine the following outcomes, A - D, where the columns are possible people, the numbers represent the welfare of each person when they exist, and # indicates non-existence. A 5 -2 # # B 5 # 2 2 C 5 2 2 # D 5 -2 6 # I claim that if you think it's neutral to make happy people, there's a strong case that you should think that B isn't better than A. In other words, it's not better to prevent someone from coming to exist and enduring a life that's not worth living if you simultaneously create two people with lives worth living. And that's absurd. I also think it's really hard to believe if you believe the other side of the asymmetry: that it's bad to create people whose lives are overwhelmed by suffering. Why is there pressure on you to accept that B isn't better than A? Well, first off, it seems plausible that B and C are equally good, since they have the same number of people at the same welfare levels. So let's assume this is so. Now, if you accept utilitarianism for a fixed population, you should think that D is better than C, since all the same people exist in these outcomes, and there's more total/average welfare. (I'm pretty sure you can support this kind of verdict on weaker assumptions if necessary.) So let's suppose, on this basis, that D is better than C. B and C are equally good. I assume it follows that D is better than B. Suppose that B were better than A. Since D is better than B, it would follow that D is better than A as well. But we know this can't be so, if it's neutral to make happy people, because D and A differ only in the existence of an extra person who has a life worth living. The neutrality principle entails that D isn't better than B. But it's absurd to think that B isn't better than A. Arguments like this mak

Michael St Jules 🔸

Mar 17

I might go back and forth on whether "the good" exists, as my subjective order over each set of outcomes (or set of outcome distributions). This example seems pretty compelling against it. However, I'm first concerned with "good/bad/better/worse to someone" or "good/bad/better/worse from a particular perspective". Then, ethics is about doing better by and managing tradeoffs between these perspectives, including as they change (e.g. with additional perspectives created through additional moral patients). This is what my sequence is about. Whether "the good" exists doesn't seem very important.

Lukas_Gloor

Mar 17

If we imagine that world C already exists, then yeah, we should try to change C into D.(Similarly, if world D already exists, we'd want to prevent changes from D to C.) So, if either of the two worlds already exists, D>C. Where the way you're setting up this argument turns controversial, though, is when you suggest that "D>C" is valid in some absolute sense, as opposed to just being valid (in virtue of how it better fulfills the preferences of existing people) under the stipulation of starting out in one of the worlds (that already contains all the relevant people). Let's think about the case where no one exists so far, where we're the population planners for a new planet that can either shape into C or D. (In that scenario, there's no relevant difference between B and C, btw.) I'd argue that both options are now equally defensible because the interests of possible people are under-defined* and there are defensible personal stances on population ethics for justifying either.** *The interests of possible people are underdefined not just because it's open how many people we might create. In addiiton, it's also open who we might create: Some human psychological profiles are such that when someone's born into a happy/priviledged life, they adopt a Buddhist stance towards existence and think of themselves as not having benefitted from being born. Other psychological profiles are such that people do think of themselves as grateful and lucky for having been born. (In fact, others yet even claim that they'd consider themselves lucky/grateful even if their lives consisted of nothing but torture). These varying intuitions towards existence can inspire people's population-ethical leanings. But there's no fact of the matter of "which intuitions are more true." These are just difference interpretations for the same sets of facts. There's no uniquely correct way to approach population ethics. **Namely, C is better on anti-natalist harm reduction grounds (at least depending

Michael St Jules 🔸

Mar 17

I'll get back to you on this, since I think this will take me longer to answer and can get pretty technical.

tylermjohn

Mar 17

How asymmetric do you think things are? I tend to deprioritise s-risks (both accidental and intentional) because it seems like accidental suffering and intentional suffering will be a very small portion of the things that our descendants choose to do with energy. In everyday cases I don't feel a pull to putting a lot of weight on suffering. But I feel more confused when we get to tail cases. Maximising pleasure intuitively feels meh to me, but maximising suffering sounds pretty awful. So I worry that (1) all of the value is in the tails, as per Power Laws of Value and (2) on my intuitive moral tastes the good tails are not that great and the bad tails are really bad.

Michael St Jules 🔸

Mar 17

I think I'm ~100% on no non-instrumental benefit from creating moral patients. Also pretty high on no non-instrumental benefit from creating new desires, preferences, values, etc. within existing moral patients. (I try to develop and explain my views in this sequence.) I haven't thought a lot about tradeoffs between suffering and other things, including pleasure, within moral patients that would exist anyway. I could see these tradeoffs going like they would for a classical utilitarian, if we hold an individual's dispositions fixed. To be clear, I'm a moral anti-realist (subjectivist), so I don't think there's any stance-independent fact about how asymmetric things should be. Also, I'm curious if we can explain why you react like this: Some ideas: Complexity of value but not disvalue or the urgency of suffering is explained by the intensity of desire, not unpleasantness? Do you have any ideas?

OscarD🔸

Mar 17

(I have not read all of your sequence.) I'm confused how being even close to 100% on something like this is appropriate, my sense is generally just that population ethics is hard, humans have somewhat weak minds in the space of possible minds, and our later post-human views on ethics might be far more subtle or quite different.

Michael St Jules 🔸

Mar 17

I'm a moral anti-realist (subjectivist), so I don't think there's an objective (stance-independent) fact of the matter. I'm just describing what I would expect to continue to endorse under (idealized) reflection, which depends on my own moral intuitions. The asymmetry is one of my strongest moral intuitions, so I expect not to give it up, and if it conflicts with other intuitions of mine, I'd sooner give those up instead.

Davidmanheim

Mar 17

Worth pointing out that extinction by almost any avenue we're discussing seriously would kill a lot of people who already exist.

-1

Greg_Colbourn ⏸️

Mar 17

I think in practical terms this isn't mutually exclusive with ensuring our survival. The immediate way to secure our survival, at least for the next decade or so, is a global moratorium on ASI. This also reduces s-risks from ASI, and keeps our options open for reducing human-caused s-risk (i.e. we can still avoid factory farming in space colonization).

Michael St Jules 🔸Mar 1712

That seems true, but I'm not convinced it's the best way to reduce s-risks on the margin. See, for example, Vinding, 2024.

I'd also want to see a fuller analysis of ways it could backfire. For example, a pause might make multipolar scenarios more likely by giving more groups time to build AGI, which could increase the risks of conflict-based s-risks.

Greg_Colbourn ⏸️

Mar 17

That wouldn't really be a pause! A proper Pause (or moratorium) would include a global taboo on AGI research to the point where as few people would be doing it as are working on eugenics now (and they would be relatively easy to stop).

Michael St Jules 🔸

Mar 17

A pause would still give more groups more time catch up on existing research and to build infrastructure for AGI (energy, datacenters), right? Then when the pause is lifted, we could have more players at the research frontier and ready to train frontier models.

Greg_Colbourn ⏸️

Mar 17

Any realistic Pause would not be lifted absent a global consensus on proceeding with whatever risk remains.

Michael St Jules 🔸

Mar 21

Couldn't a country just opt out unilaterally, and then others follow suit? And should we trust their assessment of s-risks even if proceeding by global consensus?

Greg_Colbourn ⏸️

Mar 21

Having the superpowers on board is the main thing. If others opt out, then enforcement against them can be effective in that case. No, but it's far better than what we have now.

Greg_Colbourn ⏸️

Mar 17

Vinding says: But he does not justify this equality. It seems highly likely to me that ASI-induced s-risks are on a much larger scale than human-induced ones (down to ASI being much more powerful than humanity), creating a (massive) asymmetry in favour of preventing ASI.

Jordan ArelMar 1611

Will MacAskill stated in a recent 80,000 hours podcast that he believes marginal work on trajectory change toward a best possible future rather than a mediocre future seems likely significantly more valuable than marginal work on extinction risk.

Could you explain what the key crucial considerations are for this claim to be true, and a basic argument for why think each of the crucial considerations resolves in favor of this claim?

Would also love to hear if others have any other crucial considerations they think weigh in one direction or the other.

tylermjohn

Mar 17

Will is thinking about this much more actively and will give the best answer, but here are some key crucial considerations: * How tractable is extinction risk reduction and trajectory change work? * As a part of that, are there ways that we can have a predictable and persistent effect on the value of the long-term future other than by reducing extinction risk? * How good is the future by default? * How good are the best attainable futures? These are basically Tractability and Importance from the INT framework. Some of the biggest disagreements in the field are over how likely we are to achieve eutopia by default (or what % of eutopia we will achieve) and what, if anything, can be done to predictably shape the far future. Populating and refining a list of answers to this last question has been a lot of the key work of the field over the past few years.

Jordan Arel

Mar 21

Thanks Tyler! I think this is spot on. I am nearing the end of writing a very long report on this type of work so I don’t have time at the moment to write a more detailed reply (and what I’m writing is attempting to answer these questions). One thing that really caught my eye was when you mentioned: I am deeply interested in this field, but not actually sure what is meant by “the field.” Could you point me to what search terms to use and perhaps the primary authors or research organizations who have published work on this type of thing?”

Greg_Colbourn ⏸️

Mar 19

I think another crucial consideration is how likely, and near, extinction is. If it is near, with high likelihood (and I think it is down to misaligned ASI being on the horizon), then it's unlikely there will be time for trajectory change work to bear fruit.

Greg_Colbourn ⏸️

Mar 17

I think Will MacAskill and Finn Morehouse's paper rests on the crucial consideration that aligning ASI is possible (by anyone at all). They haven't established this (EDIT: by this I mean they don't cite to any supporting arguments for this, rather than personally coming up with the arguments themselves. But as far as I know, there aren't any supporting arguments for the assumption, and in fact there are good arguments on the other side for why aligning ASI is fundamentally impossible).

Davidmanheim

Mar 17

This seems like a really critical issue, and I'd be very interested in hearing whether this is disputed by @tylermjohn / @William_MacAskill.

tylermjohn

Mar 17

I think there is a large minority chance that we will successfully align ASI this century, so I definitely think it is possible.

Davidmanheim

Mar 17

To clarify, do you think there's a large minority change that it is possible to align an arbitrarily powerful system, or do you think there is a large minority chance that it is going to happen with the first such arbitrarily powerful system, such that we're not locked in to a different future / killed by a misaligned singleton?

Greg_Colbourn ⏸️

Mar 17

Why do you think this? What make you think that it's possible at all?[1] And what do you mean by "large minority"? Can you give an approximate percentage? 1. ^ Or to paraphrase Yampolskiy: what makes it possible for a less intelligent species to indefinitely control a more intelligent species (when this has never happened before)?

Davidmanheim

Mar 17

To respond to Yampolskiy without disagreeing with the fundamental point, I think it's definitely possible for a less intelligent species to align or even indefinitely control a boundedly and only slightly more intelligent species, especially given greater resources, speed, and/or numbers, and sufficient effort. The problem is that humans aren't currently trying to limit the systems or trying much to monitor, much less robustly align or control them.

Greg_Colbourn ⏸️

Mar 17

Fair point. But AI is indeed unlikely to top out at merely "slighlty more" intelligent. And it has the potential for a massive speed/numbers advantage too.

Davidmanheim

Mar 17

Yes, by default self-improving AI goes very poorly, but this is a plausible case where would could have aligned AGI, if not ASI.

William_MacAskillMar 1710

(Crossposted from a quicktake I just did).

Clarifying "Extinction"

I expect this debate week to get tripped up a lot by the term “extinction”. So here I’m going to distinguish:

Human extinction — the population of Homo sapiens, or members of the human lineage (including descendant species, post-humans, and human uploads), goes to 0.
Total extinction — the population of Earth-originating intelligent life goes to 0.

Human extinction doesn’t entail total extinction. Human extinction is compatible with: (i) AI taking over and creating a civilisation for as long as it can; (ii) non-human biological life evolving higher intelligence and building a (say) Gorilla sapiens civilisation.

The debate week prompt refers to total extinction. I think this is conceptually cleanest. But it’ll trip people up as it means that most work on AI safety and alignment is about “increasing the value of futures where we survive” and not about “reducing the chance of our extinction” — which is very different than how AI takeover risk has been traditionally presented. I.e. you could be strongly in favour of "increasing value of futures in which we survive" and by that mean that the most important thi... (read more)

Greg_Colbourn ⏸️

Mar 17

So in the debate week statement (footnote 2) it says "earth-originating intelligent life". What if you disagree that AI counts as "life"? I expect that a singleton ASI will take over and will not be sentient or conscious, or value anything that humans value (i.e. the classic Yudkowskian scenario).

William_MacAskill

Mar 17

Why so confident that: - It'll be a singleton AI that takes over - That it will not be conscious? I'm at 80% or more that there will be a lot of conscious AIs, if AI takes over.

Greg_Colbourn ⏸️

Mar 17

Interesting. What makes you confident about AI consciousness?

Greg_Colbourn ⏸️

Mar 17

Not sure why this is downvoted, it isn't a rhetorical question - I genuinely want to know the answer.

Davidmanheim

Mar 17

I'm surprised you think future AI would be so likely to be conscious, given the likely advantages of creating non-conscious systems in terms of simplicity and usefulness. (If consciousness is required for much greater intelligence, I would feel differently, but that seems very non-obvious!)

Greg_Colbourn ⏸️

Mar 17

Not be conscious: shares no evolutionary history or biology with us (I guess it's possible it could find a way to upload itself into biology though..)

finmMar 1714

Do you think octopuses are conscious? I do — they seem smarter than chickens, for instance. But their most recent common ancestor with vertebrates was some kind of simple Precambrian worm with a very basic nervous systems.

Either that most recent ancestor was not phenomenally conscious in the sense we have in mind, in which case consciousness arose more than once in the tree of life. Or else it was conscious, in which case consciousness would seem easy to reproduce (wire together some ~1,000 nerves).

Greg_Colbourn ⏸️

Mar 17

I could believe consciousness arose more than once in the tree of life (convergent evolution has happened for other things like eyes and flight). But also, it's probably a sliding scale, and the simple ancestor may well be at least minimally conscious. Fair point. AI could well do this (and go as far as uploading into much larger biological structures, as I pointed to above).

Greg_Colbourn ⏸️

Mar 17

I don't think this is likely to happen though, absent something like moral realism being true, centred around sentient experiences, and the AI discovering this.

Greg_Colbourn ⏸️

Mar 17

Singleton takeover seems very likely simply down to the speed advantage of the first mover (at the sharp end of the intelligence explosion it will be able to do subjective decades of R&D before the second mover gets off the ground, even if the second mover is only hours behind).

finm

Mar 17

Where are you getting those numbers from? If by “subjective decades” you mean “decades of work by one smart human researcher”, then I don't think that's enough to secure it's position as a singleton. If you mean “decades of global progress at the global tech frontier” then imagining that the first-mover can fit ~100 million human research-years into a few hours shortly after (presumably) pulling away from the second-mover in a software intelligence explosion, then I'm skeptical (for reasons I'm happy to elaborate on).

Greg_Colbourn ⏸️

Mar 17

Thinking about it some more, I think I mean something more like "subjective decades of strategising and preparation at the level of intelligence of the second mover", so it would be able to counter anything the second mover does to try and gain power. But also there would be software intelligence explosion effects (I think the figures you have in your footnote 37 are overly conservative - human level is probably closer to "GPT-5").

Maxime Riché 🔸Mar 169

Thank you for organizing this debate!

Here are several questions. They are related to two hypotheses, that could, if both significantly true, make impartial longtermists update the value of Extinction-Risk reduction downward (potentially by 75% to 90%).

Civ-Saturation Hypothesis: Most resources will be claimed by Space-Faring Civilizations (SFCs) regardless of whether humanity creates an SFC.
Civ-Similarity Hypothesis: Humanity's Space-Faring Civilization would produce utility similar to other SFCs (per unit of resource controlled).

For context... (read more)

tylermjohn

Mar 17

Civ-Saturation seems plausible, though only if there are other agents in the affectable universe. I don't have a good view on this, and yours is probably better. Civ-Similarity seems implausible. I at least have some control over what humans do in the future, so I can steer things towards the futures I judge best. I don't have any control over what aliens do. And there are large differences between the best and middling futures as I argue in Power Laws of Value .

Maxime Riché 🔸

Mar 17

You can find a first evaluation of the Civ-Saturation hypothesis in Other Civilizations Would Recover 84+% of Our Cosmic Resources - A Challenge to Extinction Risk Prioritization. It seems pretty accurate as long as you assume EDT. > Civ-Similarity seems implausible. I at least have some control over what humans do in the future Maybe there is a misunderstanding here. The Civ-Similarity is not about having control; it is not about marginal utility. It is that the expected utility (not the marginal) produced by space-faring civilizations given either human ancestry or alien ancestry, are similar. The single strongest argument in favour of this hypothesis is that we are too uncertain about how conditioning on human ancestry or alien ancestry changes the utility produced in the far future by a space-faring civilization. We are too uncertain to say that U(far future | human ancestry) significantly differs from U(far future | alien ancestry).

tylermjohn

Mar 17

No, I don't think there's a misunderstanding. It's more that I think the future could go many different ways with wide variance in expected value, and I can shape the direction the human future goes but I cannot shape the direction that alien futures go. What do you think about just building and letting misaligned AGI loose? That seems fairly similar to letting other civilisations take over. (Apologies that I haven't read your evaluation.)

Maxime Riché 🔸

Mar 20

I think it's a better model to think about humanity and aliens ICs as randomly sampled from among Intelligent Civilizations (ICs) with the potential to create a space-faring civilization. Alien civilizations also have a chance at succeeding at aligning their ASI with positive moral values. Thus, by assuming the Mediocrity Principle, we can say that the expected value produced by both is similar (as long as we don't gain information that we are different). Misaligned AIs, are not sampled from this distribution. Thus, letting loose a misaligned AGI does not produce the same expected value. I.e. letting loose humanity is equivalent to letting loose an alien IC (if we can't predict their differences and the impact of such differences), but letting loose a misaligned AGI does not produce the same expected value. I hope that makes sense. You can also see the comment by MacAskill just below. For clarity, I think that letting loose a misaligned AGI is strongly negative, as argued in posts I published.

William_MacAskill

Mar 17

Thanks! I haven't read your stuff yet, but it seems like good work; and this has been a reason in my mind for being more in favour of trajectory change than totla extinction reduction for a while. It would only reduce the value of extinction risk reduction by an OOM at most, though? I'm sympathetic to something in Mediocrity direction (for AI-built civilisations as well as human-built civilisations), but I think it's very hard to have a full-blooded Mediocrity principle if you also think that you can take actions today to meaningfully increase or decrease the value of Earth-originating civilisation. Suppose that Earth-originating civilisation's value is V, and if we all worked on it we could increase that to V+ or to V-. If so, then which is the right value for the alien civilisation? Choosing V rather than V+ or V- (or V+++ or V--- etc) seems pretty arbitrary. Rather, we should think about how good our prospects are compared to a random draw civilisation. You might think we're doing better or worse, but if it's possible for us to move the value of the future around, then it seems we should be able to reasonably think that we're quite a bit better (or worse) than the random draw civ.

Maxime Riché 🔸

Mar 17

Right, at most, one OOM. Higher updates would require us to learn that the universe is more Civ-Saturated than our current best guess. This could be the case if: - humanity's extinction would not prevent another intelligent civilization from appearing quickly on Earth - OR that intelligent life in the universe is much more frequent (e.g., to learn that intelligent life can appear around red dwarfs whose lifespan is 100B to 1T years). I guess, as long as V ~ V+++ ~ V--- (like the relative difference is less than 1%), then it is likely not a big issue. However, the relative difference may become large only when we become significantly more certain about the impact of our actions, e.g., if we are the operators choosing the moral values of the first ASI.

LizkaMar 178

What actually changes about what you’d work on if you concluded that improving the future is more important on the current margin than trying to reduce the chance of (total) extinction (or vice versa)?

Curious for takes from anyone!

OscarD🔸

Mar 17

It felt surprisingly hard to come up with important examples of this, I think because there is some (suspicious?) convergence between both extinction prevention and trajectory changing via improving the caution and wisdom with which we transition to ASI. This both makes extinction less likely (through more focus on alignment and control work, and perhaps slowing capabilities progress or differential accelerating safety-oriented AI applications) and improves the value of surviving futures (by making human takeovers, suffering digital minds etc less likely). But maybe this is just focusing on the wrong resolution. Breaking down 'making the ASI transition wiser', if we are mainly focused on extinction, AI control looks especially promising but less so otherwise. Digital sentience and rights work looks better if trajectory changes dominate, though not entirely. Improving company and government (especially USG) understanding of relevant issues seems good for both. Obviously, asteroids, supervolcanoes, etc work looks worse if preventing extinction is less important. Biorisk I'm less sure about - non-AI mediated extinction from bio seems very unlikely, but what would a GCR pandemic do to future values? Probably ~neutral in expectation, but plausibly it could lead to the demise of liberal democratic institutions (bad), or to a post-recovery world that is more scared and committed to global cooperation to prevent that recurring (good).

DavidmanheimMar 168

How much of the argument for working towards positive futures rather than existential security rests on conditional value, as opposed to expected value?

One could argue for conditional value, that in worlds where strong AI is easy and AI safety is hard, we are doomed regardless of effort, so we should concentrate on worlds where we could plausibly have good outcomes.

Alternatively, one could be confident that the probability of safety is relatively high, and make the argument that we should spend more time focused on positive futures because it's likely alre... (read more)

Christopher Clay

Mar 17

Interesting argument - I don't know much about this argument, but my thoughts are that there's not much value in thinking in terms of conditional value. If AI Safety is doomed to fail, there's not much value focusing on good outcomes which won't happen, when there are great global health interventions today. Arguably, these global health interventions could also help at least some parts of humanity have a positive future.

Davidmanheim

Mar 17

I don't think that logic works - in the worlds where AI safety fails, humans go extinct, and you're not saving lives for very long, so the value of short term EA investments is also correspondingly lower, and you're choosing between "focusing on good outcomes which won't happen," as you said, and focusing on good outcomes which end almost immediately anyways. (But to illustrate this better, I'd need to work an example, and do the math, and then I'd need to argue about the conditionals and the exact values I'm using.)

Christopher Clay

Mar 17

great point - thanks you changed my view!

tylermjohn

Mar 17

I'll just share that for me personally the case rests on expected value. I actually think there is a lot that we can do to make AI existential safety go better (governance if nothing else), and this is what I spend most of my time on. But the expected value of better futures seems far higher given the difference in size between the default post-human future and the best possible future.

Davidmanheim

Mar 17

So it sounds like this might be a predictive / empirical dispute about probabilities conditional on slowing AI and avoiding extinction, and the likely futures in each case, and not primarily an ethical theory dispute?

tylermjohn

Mar 17

That is an excellent question. I think ethical theory matters a lot — see Power Laws of Value. But I also just think our superintelligent descendants are going to be pretty derpy and act on enlightened self-interest as they turn the stars into computers, not pursue very good things. And that might be somewhere where, e.g., @William_MacAskill and I disagree.

Greg_Colbourn ⏸️

Mar 17

I think it rests a lot on conditional value, and that is very unsatisfactory from a simple moral perspective of wanting to personally survive and have my friends and family survive. If extinction risk is high, and near (and I think it is!) we should be going all out to prevent it (i.e. pushing for a global moratorium on ASI). We can then work out the other issues once we have more time to think about them (rather than hastily punting on a long shot of surviving just because it appears higher EV now).

William_MacAskill

Mar 17

Fin and I talk a bit about the "punting" strategy here. I think it works often, but not in all cases. For example the AI capability level that poses a meaningful risk of human takeover comes earlier than the AI capability level that poses a meaningful risk of AI takeover. Because some humans are coming with loads of power, already, and the amount of strategic intelligence you need to take over, if you already have loads of power, is less than the strategic capability you need if you're starting off with almost none (which will be true of the ASI).

Davidmanheim

Mar 17

This seems like a predictive difference about AI trajectories and control, rather than an ethical debate. Does that seem correct to you (and/or to @Greg_Colbourn ⏸️ ?)

Greg_Colbourn ⏸️ Mar 1710

Yeah, I think a lot of the overall debate -- including what is most ethical to focus on(!) -- depends on AI trajectories and control.

Greg_Colbourn ⏸️

Mar 17

I don't think it comes meaningfully earlier. It might only be a few months (an AI capable of doing the work of a military superpower would be capable of doing most work involved in AI R&D, precipitating an intelligence explosion). And the humans wielding the power will lose it to the AI too, unless they halt all further development of AI (which seems unlikely, due to hubris/complacency, if nothing else). Any ASI worthy of the name would probably be able to go straight for an unstoppable nanotech computronium grey goo scenario.

JackMMar 178

Do you agree that the experience of digital minds likely dominates far future calculations?

This leads me to want to prioritize making sure that if we do create digital minds, we do so well. This could entail raising the moral status of digital minds, improving our ability to understand sentience and consciousness, and making sure AI goes well and can help us with these things.

Extinction risk becomes lower importance to me. If we go extinct we get 0 value from digital minds which seems bad, but it also means we avoid the futures where we create them and the... (read more)

William_MacAskillMar 178

Discussion topic: People vary a lot in the extent to which, and how likely it is, that post-AGI, different people will converge on the same moral views. I feel fairly sceptical about having a high likelihood of convergence; I certainly don't think we should bank on it.

[See my response to Andreas below. Here I meant "convergence" as shorthand to refer to "fully accurate, motivational convergence".]

Michael St Jules 🔸Mar 1714

I'm also pretty skeptical of convergence, largely because I'm a moral anti-realist. I don't see why we would converge to any view in particular, except by coincidence or mechanisms that don't track stance-independent moral truths (because there are none). It's just people weighing their own particular moral intuitions. Humans differ in our most basic moral intuitions and leanings.

Barring value lock-in, I suspect there would be convergence towards the recognition that unnecessary suffering is bad and worth preventing (when cheap enough) because this seems pretty widely held and something societies move towards, but I'd guess there will still be disagreement on some of these:

population ethics
hedonism vs preference views vs others
whether non-sentient/non-conscious things matter terminally, e.g. preserving nature
whether groups of moral patients have special status beyond their aggregates, e.g. ethnic groups, species
deontology vs consequentialism vs virtue ethics
what counts as conscious/sentient (I think this is partly normative, not just empirical)
decision theory, attitudes towards risk and ambiguity, fanaticism

Greg_Colbourn ⏸️

Mar 17

Or more basic things like religion, nationalism. People will want to shape their utopias in the image of their religious concept of heaven, and the idealised versions of their countries.

Andreas_Mogensen

Mar 17

Could you clarify what you mean by 'converge'? One thing that seems somewhat tricky to square is believing that convergence is unlikely, but that value lock-in is likely. Should we understand convergence as involving agreement in views facilitated by broadly rational processes, or something along those lines, to be contrasted with general agreement in values that might be facilitated by irrational or arational forces, of the kind that might ensure uniformity of views following a lock-in scenario?

William_MacAskill

Mar 17

Yeah, thanks for pushing me to be clearer: I meant "convergence" as shorthand to refer to "fully accurate, motivational convergence". So I mean a scenario where people have the correct moral views, on everything that matters significantly, and are motivated to act on those moral views. I'll try to say FAM-convergence from now on.

OscarD🔸

Mar 17

A bull case for convergence: * Factory farming, and to a lesser extent global poverty, persist because there are some costs to ending them, and the rich aren't altruistic enough (or the altruists aren't rich enough) to end them. Importantly, it will not just be that factory farming itself ends, but due to cognitive dissonance, people's moral views towards nonhumans will likely change a lot too once ~no-one is eating animals. So there will predictably be convergence on viewing c2025 treatment of animals as terrible. * There is an ongoing homogenization of global culture which will probably continue. As the educational and cultural inputs to people converge, it seems likely their beliefs (including moral beliefs) will also converge at least somewhat. * Some fraction of current disagreements about economic/political/moral questions are caused just by people not being sufficiently informed/rational. So those disagreements would go away when we have ~ideal post-human reasoners. * A more ambitious version of the above is that perhaps post-humans will take epistemic humility very seriously, and they will know that all their peers are also very rational, so they will treat their own moral intuitions as little evidence of what the true/best/idealised-upon-reflection moral beliefs are. Then, everyone just defers very heavily to the annual survey of all of (post)humanity's views on e.g. population axiology rather than backing their own intuition. * (Arguably this doesn't count as convergence if people's intuitions still differ, but I think if people's all-things-considered beliefs, and therefore their actions, converge that is enough.) But I agree we shouldn't bank on convergence!

Toby Tremlett🔹

Mar 17

I'm wondering whether we should expect worlds which converge on moral views to converge on bad moral views. From the space of world religions - we've seen a trend where we converge over time (at least from a high level of abstraction where we can refer to "Christianity" and "Islam" rather than "mega-church Christians" or whatever). Is this because the religions that succeed are exclusive and expansionary? Of all religions that have existed, I know that many of them don't much care if you also worship other gods. My empirical (ish) question is whether we should expect world in which a sizable fraction of the population follows the same religion to be one where the religion they follow is exclusive (you can't follow others) and expansionary (other people should also follow this religion). PS- I know that not all Christians or Muslims are exclusionary about other religions, this is over-simplified. This is relevant because, if this is a mechanism, we might expect the same thing of morality or political organisation - beliefs which demand you don't follow others, and that others follow the same beliefs as you, rather than tolerant beliefs. Perhaps this would make it more likely that futures which converge have converged on something extreme and closed, rather than exploratory and open. This is pretty vague - just wondering if others a) know more than me about the religion question and can speak to that or b) have had similar thoughts, or c) think that the existence of exclusive and expansionary (and wrong) ideologies might make convergence more likely.

Toby Tremlett🔹

Mar 17

Maybe another way to think about this (dropping the religion stuff - don't want to cast aspersions on any particular religions) is that we could think of black-ball and white-ball ideologies (like the Bostrom thought experiment where black-balls = technologies which can cause extinction). Perhaps certain ideologies are just much more exclusive and expansion focused than others - black-balls. You can pick out as many white-balls as you like, but picking out a black-ball means you have to get rid of your white-balls. Even if there are few black-balls in the bag, you'd always end up holding one.

OscarD🔸

Mar 17

Interesting, is this the sort of thing you have in mind? It at least seems similar to me, and I remember thinking that post got at something important.

tylermjohn

Mar 17

Hopefully they do not just converge on the same moral views, but also good ones!

Davidmanheim

Mar 17

A negotiated paretotopian future could create lots of moral value regardless of values not converging on their own.

Greg_Colbourn ⏸️

Mar 17

Yes, this is yet another reason for a moratorium on further-AGI development imo. If everyone has a genie with unlimited wishes, and are all pushing the world in different directions, the result will be chaos. Yampolskiy's solution to this is everyone having their own private solipsistic universe simulations...

tylermjohnMar 177

Position statement: I chose 36% disagreement. AMA!

My view is that Earth-originating civilisation, if we become spacefaring, will attain around 0.0001% of all value. This still makes extinction risk astronomically valuable (it's equivalent to optimising a millionth of the whole cosmos!), but if we could increase the chance of optimising 1% of the universe by 1%, this would be 100x more valuable than avoiding extinction. (You're not going to get an extremely well grounded explanation of these numbers from me, but I hope they make my position clearer.)

My view... (read more)

William_MacAskill

Mar 17

So you think: 1. People with your values control 1 in 1 millionth of future resources, or less? This seems pessimistic! 2. But maybe you think it's just you who has your values and everyone else would converge on something subtly different - different enough to result in the loss of essentially all value. Then the 1-in-1-million would no longer seem so pessimistic. But if so, then suppose I'm Galactic Emperor and about to turn everything into X, best by my lights... do you really take a 99.9% chance of extinction, and a 0.1% chance of stuff optimised by you, instead? 3. And if so, do you think that Tyler-now has different values than Tyler-2026? Or are you worried that he might have slightly different values, such that you should be trying to bind yourself to the mast in various ways? 4. Having such a low v(future) feels hard to maintain in light of model uncertainty and moral uncertainty. E.g. what's the probability you have that: i. People in general just converge on what's right? ii. People don't converge, but a significant enough fraction converge with you that you and others end up with more than 1milllionth of resources? iii. You are able to get most of what you want via trade with others?

tylermjohn

Mar 17

Thank you, Will, excellent questions. And thanks for drawing out all of the implications here. Yeah I'm a super duper bullet biter. Age hasn't dulled my moral senses like it has yours! xP Yes, I take (2) on the 1 vs 2 horn. I think I'm the only person who has my exact values. Maybe there's someone else in the world, but not more than a handful at most. This is because I think our descendants will have to make razor-thin choices in computational space about what matters and how much, and these choices will amount to Power Laws of Value. I generally like your values quite a bit, but you've just admitted that you're highly scope insensitive. So even if we valued the same matter equally as much, depending on the empirical facts it looks like I should value my own judgment potentially nonillions as much as yours, just on scope sensitivity grounds alone! Yup, I am worried about this and I am not doing much about it. I'm worried that the best thing that I could do would simply be to go into cryopreservation right now and hope that my brain is uploaded as a logically omniscient emulation with its values fully locked in and extrapolated. But I'm not super excited about making that sacrifice. Any tips on ways to tie myself to the mast? It would be something like: P(people converge on my exact tastes without me forcing them to) + [P(kind of moral or theistic realism I don't understand)*P(the initial conditions are such that this convergence happens)*P(it happens quickly enough before other values are locked in)*P(people are very motivated by these values)]. To hazard an-off-the cuff guess, maybe 10^-8 + 10^-4*0.2*0.3*0.4, or about 2.4*10^-6. I should be more humble about this. Maybe it turns out there just aren't that many free parameters on moral value once you're a certain kind of hedonistic consequentialist who knows the empirical facts and those people kind of converge to the same things. Suppose that's 1/30 odds vs my "it could be anything" modal view. Then suppose

William_MacAskill

Mar 17

Thanks! I appreciate you clarifying this, and for being clear about it. Views along these lines are what I always expect subjectivists to have, and they never do, and then I feel confused.

tylermjohn

Mar 17

Thank you! IMO the best argument for subjectivists not having these views would be thinking that (1) humans generally value reasoning processes, (2) there are not that many different reasoning processes you could adopt or as a matter of biological or social fact we all value roughly the same reasoning processes, and (3) these processes have clear and determinate implications. Or, in short, Kant was right: if we reason from the standpoint of "reason", which is some well-defined and unified thing that we all care about, we all end up in the same place. But I reject all of these premises. The other argument is that our values are only determinate over Earthly things we are familiar with in our ancestral environment, and among Earthly things we empirically all kinda care about the same things. (I discuss this a bit here.)

Greg_Colbourn ⏸️

Mar 17

How do these considerations affect what you are doing / spending resources on? Does it change the calculus if extinction is likely to happen sooner? (see also comment here).

tylermjohnMar 1711

Part of what it means that I try to support thinking on this issue, e.g. by seed-funding NYU MEP and doing this discussion, and doing my own thinking on it.

At this stage the thing I'm most excited about supporting is market-based mechanisms for democratic AI alignment like this. Also excited about trying to get more resources to work on AI welfare, utilitarianism, and to groups like Forethought: A new AI macrostrategy group.

In practice I spend more resources on extinction risk reduction. Part of this is just because I'd really prefer not to die in my 30s. When an EA cares for their family taking away time from extinction risk they're valuing their family as much as 10^N people. I see myself as doing something similar here.

Greg_Colbourn ⏸️ Mar 1710

In practice I spend more resources on extinction risk reduction. Part of this is just because I'd really prefer not to die in my 30s.

Thanks for saying this. I feel likewise (but s/30s/40s :))

Davidmanheim

Mar 17

No. I've said this before elsewhere, and it's not directly relevant to most of this discussion, but I think it's very worth reinforcing; EA is not utilitarianism, and the commitment to EA does not imply that you have any obligatory trade-off between yourself or your family's welfare and your EA commitment. If, as is the generally accepted standard, a "normal" EA commitment is of 10% of your income and/or resources, it seems bad to suggest that such an EA should not ideally spend the other 90% of their time/effort on personal things like their family. (Note that in addition to being a digression, this is a deontological rather than decision-theoretic point.)

Greg_Colbourn ⏸️

Mar 17

Not sure exactly what you mean here - do you mean attending to family matters (looking after family) taking away time from working on extinction risk reduction?

tylermjohn

Mar 17

Yes. Which, at least on optimistic assumptions, means sacrificing lots of lives.

Greg_Colbourn ⏸️

Mar 17

Fair point. But this applies to a lot of things in EA. We give what we can.

Greg_Colbourn ⏸️ Mar 177

My position is that Timelines are short, p(doom) is high: a global stop to frontier AI development until x-safety consensus is our only reasonable hope (this post needs updating, to factor in things like inference time compute scaling, but my conclusions remain the same).

The problem is that no one has even established whether aligning or controlling ASI is theoretically, let alone practically, possible. Everything else (whether there is a human future at all past the next few years) is downstream of that.

tylermjohn

Mar 17

Suppose p(doom) is 90%. Then preventing extinction multiplies the value of the world by 10 in expectation. But suppose that the best attainable futures are 1000 times better than the default non-extinction scenario. Then ensuring we are on track to get the best possible future multiplies the value of the world by 100 in expectation, even after factoring in the 90% chance of extinction. In this toy model, you should only allocate your resources to reducing extinction if it is 10 times more tractable than ensuring we are on track to get the best possible future, at the current margin. You might think that we can just defer this to the future. But I've assumed in the set-up that the default future is 1/1000th as good as the best future. So apparently our descendants are not going to be very good at optimising the future, and we can't trust them with this decision. Where do you think this goes wrong?

Greg_Colbourn ⏸️

Mar 17

>suppose that the best attainable futures are 1000 times better than the default non-extinction scenario This seems rather arbitrary. Why woiuld preventing extinction now guarantee that we (forever) lose that 1000x potential? >In this toy model, you should only allocate your resources to reducing extinction if it is 10 times more tractable than ensuring we are on track to get the best possible future, at the current margin. I think it is. Gaining the best possible future requires aligning an ASI, which has not been proven to be even theoretically possible afaik.

tylermjohn

Mar 17

It's super arbitrary! Just trying to pull out your own model. I give one argument in Power Laws of Value. One underrated argument for focusing on non-alignment issues like trajectory change is: * Either alignment is easy or hopeless * If it's hopeless, we shouldn't work on it * If it's easy, we shouldn't work on it It only makes sense to work on alignment if we happen to fall in the middle, where marginal efforts can make a difference before it's too late. If you think this space is small, then it doesn't look like a very tractable problem.

Greg_Colbourn ⏸️

Mar 17

If alignment is hopeless (and I think it is), we should work on preventing ASI from ever being built! That's what I'm doing.

tylermjohn

Mar 17

Oh no! But then we are likely to lose out on almost all value because we won't have the enormous digital workforce needed to settle the stars. It seems like we should bank on having some chance of solving alignment (at least for some architecture, even if not the current deep learning paradigm) and work towards that at least over the next couple hundred years.

Greg_Colbourn ⏸️

Mar 17

To bank on that we would need to have established at least some solid theoretical grounds for believing it's possible - do you know of any? I think in fact we are closer to having the opposite: solid theoretical grounds for believing it's impossible!

Davidmanheim

Mar 17

I think we can thread the needle by creating strongly non-superintelligent AI systems which can be robustly aligned or controlled. And I agree that we don't know how to do that at present, but we can very likely get there, even if the proofs of unalignable ASI hold up.

Greg_Colbourn ⏸️

Mar 17

What level of intelligence are you imagining such a system as being at? Some percentile on the scale of top performing humans? Somewhat above the most intelligent humans?

Davidmanheim

Mar 17

I think we could do what is required for colonizing the galaxy with systems that are at or under the level of 90th percentile humans, which is the issue raised for the concern that otherwise we "lose out on almost all value because we won't have the enormous digital workforce needed to settle the stars."

Greg_Colbourn ⏸️

Mar 17

Agree. But I'm sceptical that we could robustly align or control a large population of such AIs (and how would we cap the population?), especially considering the speed advantage they are likely to have.

Greg_Colbourn ⏸️ Mar 176

Question: what level of extinction risk are people personally willing to accept in order to realise higher expected value in the futures where we survive? How much would the extinction coming in the next 5 years effect this? Or the next 1 year? How is this reflected in terms of what you are working on / spending resources on?

tylermjohn

Mar 17

I hope my position statement makes my view at least sort of clear. Though as I said to you, my moral values and my practices do come apart!

Greg_Colbourn ⏸️

Mar 17

Personally, I think p(ASI in the next 5 years)>70%, and p(death|ASI)~90%. And this is wholly unacceptable just in terms of my own survival, let alone everyone else's. Philosophically justifying such a risk of death does not help when it's becoming so viscerally real. See also my comments on this post.

Jordan ArelMar 195

@William_MacAskill, what are the main characteristics we should aim for “Viatopia” to have?

Michael St Jules 🔸Mar 17*4

For a given individual, can they have a higher probability of averting extinction (i.e. making the difference) or for a different long-term trajectory change? If you discount small enough probabilities of making a difference or are otherwise difference-making risk averse (as an individual), would one come out ahead as a result?

Some thoughts: extinction is a binary event. But there's a continuum of possible values that future agents could have, including under value lock-in. A small tweak in locked-in values seems more achievable counterfactually than being... (read more)

JoA🔸Mar 173

I have a question, and then a consideration that motivates it, which is also framed as a question that you can answer if you like.

If an existential catastrophe occurs, how likely is it to wipe out all animal sentience on earth?

I've already asked that question here (and also, to some acquaintances working in AI Safety, but the answers have very much differed - it seems we're quite far from a consensus on this, so it would be interesting to see perspectives from the varied voices taking part in this symposium.

Less important question, but that may clari... (read more)

tylermjohn

Mar 17

Here are three toy existential catastrophe scenarios to think about: * A biological catastrophe (potentially from AI) which kills all humans and leaves animals mostly untouched, due to their biology * A paperclipping-style AI takeover scenario where AI turns everything into something else * A human disempowerment scenario where humans are left alive but substantively lose control over the future and its direction I think it would be pretty interesting to think about interventions one could take to make the world persistently better for wild animals in the event that humans go extinct from biological catastrophe. I'm not sure you could do much, but it could be very impactful if worst-case bio gets bad enough! My view is that bio x-risk is fairly low, so the scenarios where there are no humans but there are nonhuman animals (in the near future) are pretty unexpected.

William_MacAskill

Mar 17

In the first of these, I think most of the EV comes from whether technologically-capable intelligence evolves or not. I'm more likely or not on that (for say extinction via bio-catastrophe), but not above 90%.

tylermjohn

Mar 17

Have you thought about whether there any interventions that could transmit human values to this technologically capable intelligence? The complete works of Bentham and an LLM on a ruggedised solar powered laptop that helps them translate English into their language... Not very leveraged given the fraction within a fraction within a fraction of success, but maybe worth one marginal person.

lillyMar 142

This is a cool idea! Will this be recorded for people who can't attend live?

Edit: nevermind, I think I'm confused; I take it this is all happening in writing/in the comments.

Toby Tremlett🔹

Mar 14

Yep it'll all be in the comments, so if you aren't around you can read it later (and I'm sure a bunch of the conversations will continue, just potentially without the guests) this was a good flag btw - I've changed the first sentence to be clearer!