AGI ruin mostly rests on strong claims about alignment and deployment, not about society

RobBensinger

MichaelPlantApr 24 20235

To chime in, I think it would be helpful to distinguish between:

1. AI risks on a 'business as usual' model, where society continues as it was before, ie not doing much

and

2. AI risks given different levels of society response.

This would then be analogous to familiar discussions about climate change, where people talk about different CO2 rise scenarios, how bad each would be, and also how much effort is required to achieve different levels of reduced emissions. I recognise it's not very easy to specify options for 2, but it seems worth a try. To decide how much effort to put in, we need to understand the risk in 1 and how much it can go down for versions of 2, and the costs involved.

To elaborate, someone could say

(A) we're almost certainly screwed, whatever we do

(B) we might be screwed, but not if we get our act together, which we're not doing now

(D) there's nothing to worry about in the first place.

Obviously, these aren't the only options. (A), (C), and (D) imply that few or no additional resources are useful, whereas (B) implies extra resources are worthwhile. My impression is Yudowsky's line is (A).

RobBensingerApr 24 20239

To chime in, I think it would be helpful to distinguish between:
1. AI risks on a 'business as usual' model, where society continues as it was before, ie not doing much
and
2. AI risks given different levels of society response.

I like this! Richard Ngo and Eliezer discuss this a bit in Ngo's view on alignment difficulty:

[Ngo] (Sep. 25 [2021] Google Doc)

Perhaps the best way to pin down disagreements in our expectations about the effects of the strategic landscape is to identify some measures that could help to reduce AGI risk, and ask how seriously key decision-makers would need to take AGI risk for each measure to be plausible, and how powerful and competent they would need to be for that measure to make a significant difference. Actually, let’s lump these metrics together into a measure of “amount of competent power applied”. Some benchmarks, roughly in order (and focusing on the effort applied by the US):

Banning chemical/biological weapons
COVID
- Key points: mRNA vaccines, lockdowns, mask mandates
Nuclear non-proliferation
- Key points: Nunn-Lugar Act, stuxnet, various treaties
The International Space Station
- Cost to US: ~$75 billion
Climate change
- US expenditure: >$154 billion (but not very effectively)
Project Apollo
- Wikipedia says that Project Apollo “was the largest commitment of resources ($156 billion in 2019 US dollars) ever made by any nation in peacetime. At its peak, the Apollo program employed 400,000 people and required the support of over 20,000 industrial firms and universities.”
WW1
WW2

[Yudkowsky][12:02] (Sep. 25 [2021] comment)

WW2

This level of effort starts to buy significant amounts of time. This level will not be reached, nor approached, before the world ends.

See the post for more discussion, including an update from Eliezer: "I've updated somewhat off of Carl Shulman's argument that there's only one chip supply chain which goes through eg a single manufacturer of lithography machines (ASML), which could maybe make a lock on AI chips possible with only WW1 levels of cooperation instead of WW2."

Eliezer's Pausing AI Developments Isn't Enough. We Need to Shut it All Down is also trying to do something similar, as is his (written-in-2017) post Six Dimensions of Operational Adequacy in AGI Projects.

I interpret "Pausing AI Developments Isn't Enough" as saying "if governments did X, then we'd still probably be in enormous amounts of danger, but there would now be a non-tiny probability of things going well". (Maybe even a double-digit probability of things going well for humanity.)

Eliezer doesn't think governments are likely to do X, but he thinks we should make a desperate effort to somehow pull off getting governments to do X anyway on EV grounds: there aren't any markedly-more-hopeful alternatives, and we're all dead if we fail.

(Though there may be some other similarly-hopeless-but-worth-trying-anyway options, like moonshot attempts to solve the alignment problem, or a Manhattan Project to build nanotechnology, or what-have-you. My Eliezer-model wants highly competent and sane people pursuing all of these unlikely-to-work ideas in parallel, because then it's more likely that at least one succeeds.)

Six Dimensions of Operational Adequacy in AGI Projects divides amounts of effort into "token", "improving", "adequate", "excellent", and "unrealistic", but it doesn't say how high the risk level is under different buckets. I think this is mostly because Eliezer's model gives a macroscopic probability to success if an AGI project is "adequate" on all six dimensions at once, and a tiny probability to success if it falls short of adequacy on any dimension.

My Eliezer-model thinks that "token" and "improving" both mean you're dead, and he doesn't necessarily think he can give meaningful calibrated confidences that distinguish degrees of deadness when the situation looks that bad.

(A) we're almost certainly screwed, whatever we do
(B) we might be screwed, but not if we get our act together, which we're not doing now
(C) we might be screwed, but not if we get our act together, which I'm confident will happen anyway
(D) there's nothing to worry about in the first place.
Obviously, these aren't the only options. (A), (C), and (D) imply that few or no additional resources are useful, whereas (B) implies extra resources are worthwhile. My impression is Yudowsky's line is (A).

Seems like a wrong framing to me. My model (and Eliezer's) is that A and B are both right: We're almost certainly screwed, whatever we do; but not if humanity gets its act together in a massive way (which we're currently not doing, but should try to do because otherwise we're dead).

"No additional resources are useful" makes it sound like Eliezer is advocating for humanity to give up, which he obviously isn't doing. Rather, my view and Eliezer's is that we should try to save the world (because the alternative is ruin), even though some things will have to go miraculously right in order for our efforts to succeed.

RobBensingerApr 24 20234

Dustin Moskovitz comments on Twitter:

The deployment problem is part of societal response to me, not separate.
[...] Eg race dynamics, regulation (including ability to cooperate with competitors), societal pressure on leaders, investment in watchdogs (human and machine), safety testing norms, whether things get open sourced, infohazards.

"The deployment problem is hard and weird" comes from a mix of claims about AI (AGI is extremely dangerous, you don't need a planet-sized computer to run it, software and hardware can and will improve and proliferate by default, etc.) and about society ("if you give a decent number of people the ability to wield dangerous AGI tech, at least one or them will choose to use it").

The social claims matter — two people who disagree about how readily Larry Page and/or Mark Zuckerberg would put the world at risk might as a result disagree about whether a Good AGI Project has median 8 months vs. 12 months to do a pivotal act.

When I say "AGI ruin rests on strong claims about the alignment problem and deployment problem, not about society", I mean that the claims you need to make about society in order to think the alignment and deployment problems are that hard and weird, are weak claims (e.g. "if fifty random large AI companies had the ability to use dangerous AGI, at least one would use it"), and that the other claims about society required for high p(doom) are weak too (e.g. "humanity isn't a super-agent that consistently scales up its rationality and effort in proportion to a problem's importance, difficulty, and weirdness").

Arguably the difficulty of the alignment problem itself also depends in part on claims about society. E.g., the difficulty of alignment depends on the difficulty of the task we're aligning, which depends on "what sort of task is needed to end the acute x-risk period?", which depends again on things like "will random humans destroy the world if you hand them world-destroying AGI?".

The thing I was trying to communicate (probably poorly) isn't "Alignment, Deployment, and Society partitions the space of topics", but rather:

High p(doom) rests on strong claims about AI/compute/etc. and quite weak claims about humanity/society.
The most relevant claims (~all the strong ones, and an important subset of the weak ones) are mostly claims about the difficulty, novelty, and weirdness of the alignment and deployment problems.

RobBensingerApr 24 20232

Note that if it were costless to make the title way longer, I'd change this post's title from "AGI ruin mostly rests on strong claims about alignment and deployment, not about society" to the clearer:

The AGI ruin argument mostly rests on claims that the alignment and deployment problems are difficult and/or weird and novel, not on strong claims about society

EA Forum Bot Site
EA Forum

AGI ruin mostly rests on strong claims about alignment and deployment, not about society

16

16

Reactions