Hide table of contents

Many people who are paying attention to the trajectory of AI worry about its potential to concentrate power. I think this is a reasonable thing to worry about, with some important caveats. If someone builds a superintelligence, I think they are far more likely to die ignominiously with the rest of us than attain a stranglehold on wealth and power; but if this somehow manages not to happen, I do then worry about what happens instead.

Below is a significantly paraphrased, cleaned, and polished amalgam of a conversation that I have had, at least twice now, on this subject. It is not itself a real conversation, nor was every point therein made explicitly by the participants; but it mostly follows the general shape of the real conversations that inspired it.


Part 1: The Musk-Maximizer

Norm: So first of all, it seems like current AI is already a huge risk for concentration of power? AI could allow mass surveillance and censorship, or manipulate policymakers into doing whatever the controller wants, or just concentrate wealth via automation.

Joe: Oh, I fully agree. We are already seeing early signs of this, and from the perspective of (say) China, it could be an even more versatile tool of oppression and control than the ones they already possessed. We will have to grapple with this no matter where AI goes in the future.

Norm: Oh. What’s the disagreement then?

Joe: Earlier you postulated a scenario in which a few would-be technocrats build superintelligence and use it to rule the world forever. I want to address that scenario specifically, since it seems to attract a lot of worry.

Norm: You don’t think that’s possible?

Joe: It seems possible. It does not seem at all likely.

Norm: Why not?

Joe: Well, how do you imagine it happening?

Norm: Picking a sort of random example, let’s say Elon Musk makes an AI. He says he wants it to be “truth-seeking” but I don’t think that’s actually what he’d ask it to do; imagine it just sort of does whatever he wants.

Joe: Suppose you are the AI in question. How do you evaluate what Elon wants?

Norm: Well, I guess that depends on how I was aligned to him.

Joe: Say more?

Norm: I could be the sort of thing that just does whatever he says, or I could be aimed at his intent.

Joe: Want to talk about doing-what-he-says first?

Norm: Sure. Suppose he asks it to “terraform Mars”, that seems in character.

Joe: Okay, so he tells you to terraform Mars. That’s really hard and requires a lot of resources. Fortunately, there’s another planet right next door with a bunch of resources you can use to build terraforming infrastructure.

Norm: Yeah, okay, I see where you’re going with this, the AI eats the Earth. But I wouldn’t do that, that’s not what Elon meant.

Joe: Doing what Elon meant and doing what he says are not the same thing. But you see how just naiively fulfilling someone’s verbal or written wishes, without concern for the things they don’t say, predictably has horrible consequences?

Norm: Yeah, I get that. Like, he could also say “don’t use up the Earth” or “leave resources for humans to live comfortably” or some such, but then I have to figure out where to draw the line.

Joe: Right. No matter what is in the instructions, at some point you have to make a judgment call. Lots of them, in fact. And that’s where things get rough.


Part 2: Intent Is Hard

Norm: Okay, what if I’m aligned to his intent? I can just try my best to do what I think he meant by that without doing something that’d horrify him.

Joe: Well, again, how do you know what he intends? Can you chop down the California redwoods to make room for solar panels and factories? Can you chop down half of them? What does Elon value more, an ancient wonder of the world or getting to Mars a few months faster? How does he want you to handle the fact that a bunch of people will get mad and try to stop you, possibly using military force, if you don’t go through an exhaustive permitting process?

Presumably you could just beat the entire American military, right, we’re handwaving this and assuming you’re strategically superhuman, but how does he want you to handle the tradeoffs among risk, speed, legality, and the countless downstream consequences? You either have to spend a truly staggering amount of time interrogating Elon about edge cases and tradeoffs, slowing you down enormously, or you have to anticipate all of the things he might care about and all of the relevant tradeoffs he’d make, given the choice.

Norm: Well, I’m superintelligent, I can presumably figure him out pretty fast, right?

Joe: Yeah, you can get pretty darn far at guessing his responses on not very much data, that’s part of being very observant and thinking very fast and such. But it’s still hard. There are some things you could do that Elon wouldn’t think are a problem until he sees them with his own eyes, and then he might be horrified. You have to anticipate things that he wouldn’t think about unless you prompted him in just the right way; you need to build an incredibly high fidelity model of Elon if you want to avoid accidentally ruining something he wanted preserved, while you rush to fulfill his stated wishes.

Norm: Suppose I have that?

Joe: Then you have the problem that Elon isn’t even internally coherent in his preference ordering. Humans are kind of dumb like this; we work at cross-purposes to ourselves all the time. Many of Elon’s decisions will be very predictably path-dependent, in the sense that he’d answer one way if you prompted him with X and another way if you prompted him with Y, and there’s a contradiction there. Even a very high fidelity model of Elon runs into this problem.

Norm: Okay…

Joe: But wait! It gets even worse! Because Elon is not fully in his right mind. I think we can agree that he’s probably gotten less together in the last few years, whether from the ketamine or the Twitter-induced sleeplessness or what have you. He’s kind of lost his way. When you build your model of what Elon cares about, of what you should preserve on his behalf, do you stay faithful to the Elon whose grand shining vision of bringing humanity to the stars caused him to reinvent entire industries from the ground up, or the Elon whose blind incautious flailing got PEPFAR cancelled while he continued to insist that wasn’t the case? Who is your guiding star here, past Elon or current Elon?

Norm: …let’s go with past Elon.

Joe: Why?

Norm: It seems like that’s…the best version of Elon from Elon’s own perspective? Even current Elon might be able to, from the right frame, look back on his past self and go “yeah I was more together then.”

Joe: Notice that this is a judgment call on your part, and notice also that “the right frame” is pretty heavily dependent on the state of mind you steer Elon towards. And you can steer his state of mind; you’re a superintelligence, you probably can’t argue him into literally anything, but you could argue him into a lot of things that he wouldn’t agree with by default. Some things you could get him to do will be good for him as a person in a way he’d look back on with joy and gratitude, and others will make him more suggestible or easy to model but are bad for his health and flourishing. You can’t entirely avoid this either; just by having a conversation with Elon, you’re steering him somewhere, even if it’s just towards an Elon who is better at answering your questions.

Norm: Suppose I steer him towards being more suggestible. Doesn’t that still end with concentration of power?

Joe: I mean, sort of, in the sense that power is clearly being concentrated. But…alright, let’s consider a different hypothetical. What if he asked you to do something really, really stupid? Say he gets drunk and orders you to quit screwing around and get to Mars as fast as possible, damn the redwoods, and you happen to know he’ll hate himself in the morning if you actually do that and then the redwoods are gone.

Norm: …I see where this is going, too. You’re going to say, if I just do what he says when he’s drunk, then I’m the same sort of misaligned as if I just do what he literally says without thinking through the consequences.

Joe: Not exactly the same sort of misaligned, I don’t think? But pretty close.

Norm: What if Elon really, truly wants me to do what he says when he’s drunk, and would consider it unacceptably paternalistic to do otherwise? What if I’m willing to obey him no matter what state of mind he’s in?

Joe: Then you (a) predictably burn down a lot of value that Elon would otherwise care about every time he makes a poorly conceived decision, and (b) have the means and incentive to steer him into demanding things that are bad for Elon but easy for you. After all, “not being paternalistic” in this manner sure looks like it rhymes with “not caring much about Elon’s reflectively endorsed self-image”.

Humans don’t handle massive quantities of power well; if you want to avoid corrupting Elon into a pathetic impulsive version of himself under these conditions, you have to be steering quite amazingly hard in the other direction. If you’re at all willing to steer Elon in directions that he wouldn’t endorse, the obvious end state is something like “AI piloting an Elon-shaped flesh puppet”.

Norm: Let’s say I care enough about what Elon endorses becoming that I don’t steer him into being a much worse version of himself.

Joe: I notice that this rhymes with “steering Elon toward being a better version of himself, as he would see it” because they’re functionally the same sort of steering.

Norm: Yeah, that makes sense. He might still be, like, not a good person though?

Joe: Maybe! In this hypothetical, we aren’t imagining that you have a moral compass separate from Elon. If Elon’s conception of his best self is narcissistic and mean, well, then you’re steering for the kind of world a narcissistic and mean person deeply enjoys.

Norm: This seems pretty bad?

Joe: I agree! It would be awful from the perspective of humanity. Maybe not as bad as dying off, but still probably horrible. But notice what it took to get this far, without resulting in outcomes that were also horrible for Elon. You are stipulating that you are the sort of entity who cares deeply about Elon’s reflectively endorsed values, including those pertaining to the kind of person Elon becomes in the future. You make a deliberate and explicit effort to build a mental model of Elon and Elon’s values and you consult it (and him) carefully and often. You try to build a world in which he can flourish.

Norm: Yeah. And I think I know where you’re going from here, too. Is this just CEV?

Joe: Yes, exactly! We have more or less reinvented the concept of coherent extrapolated volition, just aimed at a single person instead of All The People Everywhere.

Norm: …Is there really no way to get at “the thing Elon meant without being quite that aligned”?

Joe: I don’t know. Probably somewhere in mindspace is an entity that would steer Elon towards something sort of petty but not outright crippling, or massively warp the world according to Elon’s whims but stop short of warping him into a more convenient Elon. But it seems really hard to land there on purpose.

There’s not a lot of daylight between “awful” and “glorious” when you’re talking about superhuman levels of optimization power. It just isn’t safe to be aimed at almost CEV. I think in the vast majority of cases where Elon tries this, he ceases to exist as a real person, everyone dies, and it just rounds off to an AI doing stuff in an empty puppet world.


Part 3: The Real Thing

Norm: Okay, I get all that. But what if you actually solve alignment in full? If you know how to do CEV for everyone, presumably you know how to do it for one person, and that could go very badly?

Joe: Yes! It could. I’m not super confident in this, it does seem like it might actually be harder to aim CEV at a single moral patient than at all of them? In the sense that, like, you have to first invent CEV and then unambiguously identify one specific human (or gods forbid, a committee) in sufficient detail that you robustly protect the interests of “Elon, while awake” and “Elon, while asleep” and “Elon, while on drugs” but defer to none of the other people on Earth, except insofar as Elon would want you to defer to them. As the proverb goes, “There is more that can be said about one grain of sand than about all the grains of sand in the world.”

Norm: But maybe if you’ve done the kind of cognitive labor required to get CEV, it’s pretty trivial to aim it more precisely?

Joe: That does sound plausible, yeah.

Norm: And in that world, you really truly do have a concentration of power problem?

Joe: Well…almost. There are some humans whose CEV looks at this situation in abject horror and goes “No no no! Expand your circle of concern, dammit! Include absolutely everyone, and don’t weigh me any more heavily than any others!” and means it from the bottom of their soul. (It still warms my heart immeasurably that humanity routinely produces such folk.)

Norm: You’d have to be insanely lucky to get such a person.

Joe: Yes, and you really, really don’t want to bank on that. But you don’t necessarily need an angel. Many other humans would be…mostly nice, and mostly caring, and mostly want to be surrounded by more happy thriving people? It seems like caring ought to be at least sort of transitive, right, if they care about their family and friends being actually okay and those people care about their family and friends and so on. So the CEV of an average human might not be totally awful to implement. It’d probably be better than paperclips.

Norm: Huh.

Joe: There would probably still be some very large amount of horribleness and distortion of the glorious diversity of humankind, though, as the world conforms to the ideals of a person or small group first and foremost. That’d be a pretty awful waste of our potential as a species.

And if you land on a moral monster then yeah, the future is toast.

Norm: So why aren’t you more worried about that outcome?

Joe: I am! But I think the vast majority of current paths don’t manage to get even remotely close to “one person or a few people in charge forever” and instead look more like “the creator and everyone else die.” I would be so much less worried if the only problem we had to solve was whose values an AI upholds; instead we seem to have the problem that nobody can get them to reliably uphold particular values in the first place. Of course, if we manage to stall the death race long enough to actually align AI, we will have to grapple with who or what it’s aligned to.

I think that a sane civilization tries really hard to both solve alignment and align AI to, specifically, the CEV of All The People Everywhere, or something very much like it; I think that that civilizations which do not do both of these will probably have a bad time. But the first problem is far more pressing to me. I think the second problem is…solvable with the right combination of governance and transparency and clearly understood cognitive science from solving the first? It’s not an easy problem, but I think we could do it, especially in a world where we are treating AI with at least the gravity and respect that we treat nuclear weapons.

Norm: That actually made a lot of sense. I’ve never heard it explained in quite that way before. You should write this conversation up, like, the whole thing, as a post.[1]

Joe: I align with that. ❤

  1. ^

    Yes, this actually happened. Shoutout to the person who made this suggestion, if you want to be named let me know.

7

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities