Hide table of contents

Link: Ethical Guidelines for a Superintelligence by Ernest Davis, NYU. October 2014.

Thanks to davidc for pointing this one out. Nick Bostrom represents one side of EA - xrisk with a focus on artificial intelligence - that deserves more intelligent criticism. So it's certainly refreshing to see these comments by Davis.

I'll add my thoughts here (I figured they'd be long enough for a top-level post). Note: I have very little connection with FHI or MIRI - I certainly don't represent them in any way. This will make a lot more sense if you've read at least part way through the book, or are otherwise familiar with Bostrom's position on machine intelligence. Otherwise, read the book :D


  • Davis doesn't significantly misrepresent Bostrom's point of view as far as I can tell
  • Davis correctly points out that Bostrom doesn't go into multi-dimensional models of intelligence (and intelligence explosions) enough
  • Davis points out that Bostrom may have a bit too much of an affection for transhumanist morality and the anthropic principle
  • Extrapolating the moral taboos of some well-respected people isn't a terrible way to construct an AI morality, and I hope it gets added in to the conversation, but it's not enough to render the rest of the conversation moot
  • I think Davis needs to work on steelmanning some of his other arguments

Davis's review starts with an overview of the key message in Bostrom's book:

Nick Bostrom, in his new book SuperIntelligence, argues that, sooner or later, one way or another, it is very likely that an artificial intelligence (AI) will achieve intelligence comparable to a human. Soon after this has happened — probably within a few years, quite possibly within hours or minutes — the AI will attain a level of intelligence immensely greater than human. There is then a serious danger that the AI will achieve total dominance of earthly society, and bring about nightmarish, apocalyptic changes in human life. Bostrom describes various horrible scenarios and the paths that would lead to them in grisly detail. He expects that the AI might well then turn to large scale interstellar travel and colonize the galaxy and beyond. He argues, therefore, that ensuring that this does not happen must be a top priority for mankind.

This seems like a good summary of Bostrom's viewpoint. I'd only disagree with the word "grisly" - I find Bostrom's tone to be pretty gentle. And maybe he's overstating the probabilities a touch.

Davis likes the "Paths" section of the book, which explains how a superintelligence could come about. But he has some reservations about the "Dangers" and "Strategies". I'll go through his complaints here and add my own thoughts.

The assumption that intelligence is a potentially infinite quantity with a well-defined, one-dimensional value

It's pretty clear that Bostrom doesn't actually believe this. The point Davis is trying to make is more sophisticated than this - and I'll get to that next - but first I just want to make clear that Bostrom doesn't think intelligence is something like mass or height with a "well-defined, one-dimensional value". Here's Bostrom:

It would be convenient if we could quantify the cognitive caliber of an arbitrary cognitive system using some familiar metric, such as IQ scores or some version of the Elo ratings that measure the relative abilities of players in two-player games such as chess. But these metrics are not useful in the context of superhuman artificial general intelligence. [snip] suppose we could somehow establish that a certain future AI will have an IQ of 6,455: then what? We would have no idea of what such an AI could actually do. We would not even know that such an AI had as much general intelligence as a normal human adult - perhaps the AI would instead have a bundle of special-purpose algorithms enabling it to solve typical intelligence test questions with superhuman efficiency but not much else.

Some recent efforts have been made to develop measurements of cognitive capacity that could be applied to a wider range of information-processing systems, including artificial intelligences. Work in this direction, if it can overcome various technical difficulties, may turn out to be quite useful for some scientific purposes including AI development. For purposes of the present investigation, however, its usefulness would be limited since we would remain unenlightened about what a given superhuman performance score entails for actual ability to achieve practically important outcomes in the world.

Bostrom argues for various different flavours of AI:

  • Different outputs (or "superpowers") that an AI might command: strategizing, social manipulation, hacking etc.
  • Different structures (or "forms") of superintelligence: speed superintelligence, collective superintelligence, and quality superintelligence.

However, might Bostrom be paying lip-service to non-one-dimensionalness while still relying on it for his central models and arguments? Davis claims:

if you loosen the idealization, important parts of the argument become significantly weaker, such as Bostrom’s expectation that the progress from human intelligence to superhuman intelligence will occur quickly

Let's have a look at chapter 4: the kinetics of an intelligence explosion. Bostrom's equation is:

Rate of change in intelligence = Optimization power / Recalcitrance

It's important to note that optimization power isn't the same as intelligence. In this equation you can think of optimization power as something more like money - it's something you can throw at a problem and maybe make some progress, as long as the recalcitrance isn't too high. It's a one-dimensional quantity.

Intelligence is a vector.

Recalcitrance, however, could be a vector too. Depending on how the project is going, and what the theoretical hurdles turn out to be, different aspects of intelligence might advance at different rates. The equation is still valid, if you interpret it as a vector equation (and you take some liberties in notation by dividing by a vector).

I'm not sure this is what Bostrom intended though, and he certainly didn't say it. I think that chapter 4 would have been stronger if it had considered non-one dimensional models.

But we can still take the spirit chapter 4. Suppose intelligence is a multi-faceted thing, with the different factors having different recalcitrances and feeding into optimization power differently. I think you'd still expect a fast takeoff, unless there was some kind of bottleneck: all the factors of intelligence relevant to optimization power hit a roadblock (or each hit a separate roadblock).

I think there's a fruitful discussion to be had here in any case. Bring out the multi-dimensional models!

However, all that running faster does is to save you time. If you have two machines A and B and B runs ten times as fast as A, then A can do anything that B can do if you’re willing to wait ten times as long.

Imagine you own a company, and you came into possession of a potion that would make employees do the same work ten times faster. What do you think would happen? Because I think that company would take over the world, unless the potion got leaked or sold to a rival company - this is essentially the difference between Eliezer Yudkowsky's and Robin Hanson's view of machine intelligence dynamics, at least as I understand it. (An intelligence explosion vs. lots of fast competing agents).

The assumption that a large gain in intelligence would necessarily entail a correspondingly large increase in power

Davis here appears to confuse intelligence with brain size, and is also still assuming that Bostrom is imagining a one-dimensional, clear-cut definition of intelligence.

It's really hard to argue that a large gain in intelligence would entail anything, because everyone disagrees on what that word means. But there's an interesting discussion to be had here if we dissolve the concept of intelligence.

Intelligence can be broken down into resources and abilities. Resources include:

  • Hardware
    • Speed at which that hardware runs
    • Whether the hardware allows its software to be copied (no for brains, yes for ems)
  • Money
  • Physical manipulators
  • Knowledge
  • Algorithms

Abilities include (roughly borrowed from Bostrom's "superpowers" list, though they may not necessarily be exhibited at the "super" level):

  • Strategizing
  • Social manipulation
  • Hacking
  • Technology research, including software
  • Economic productivity

The discussion would be:

  • Which important resources and abilities are missing from those lists? 
  • Which resources imply which abilities?
  • Which abilities allow further gains in which resources, in the sort of future world we'd expect to find the system in
  • Which combination of abilities would suggest a lot of "power"

I think that thinking about these would yield some statements of the form "a large gain in [something] would lead to a large gain in [other thing]", some of which would be nontrivial. This basically brings us back to multidimensional models being cool.

The assumption that large intelligence entails virtual omnipotence

As Bostrom and Davis both point out, this one swings both ways - it can lead to existential doom or "messianic benefits" (Davis's words) depending on whether the intelligence has friendly or unfriendly (or bizarre) motivations.

Davis doesn't explain why Bostrom is wrong on this point. But using the model I hinted at above, we might imagine some upper limit on the resources and abilities that can be acquired by a system - witness the fact that no system (such as a government or corporation) has managed to take over the entire world yet. Bostrom argues:

In human-run organizations, economies of scale are counteracted by bureaucratic inefficiencies and agency problems, including difficulties in keeping trade secrets. [snip] An AI system, however, might avoid some of these scale diseconomies, since the AI's modules (in contrast to human workers) need not have individual preferences that diverge from those of the system as a whole.

There's also a reference to the economic literature dealing with the theory of the firm.

The unwarranted belief that, though achieving intelligence is more or less easy, giving a computer an ethical point of view is really hard

Making a system behave ethically is trivially easy. "Hello world" does that. Your pocket calculator, and existing narrow AI systems, do that while still exhibiting interesting behaviour. Bear this in mind.

But if an AI is to be a master manipulator, it will need a good understanding of what people consider moral; if it comes across as completely amoral, it will be at a very great disadvantage in manipulating people

I can see the following problem. An AI is programmed not to do anything it considers immoral. It formulates the plan to construct another AI with slightly weaker morality safeguards. Would people consider this immoral? If they're consequentialist then probably yes. If they're not, and most people aren't, then probably no, although depending how much they know about AI they might consider it "foolish" which is a different thing from immoral.

But we're attacking a strawman here, as Davis proposes something more interesting:

You specify a collection of admirable people, now dead. [snip] You then instruct the AI, “Don’t do anything that these people would have mostly seriously disapproved of.”

Note that being dead is irrelevant here. You can equally well specify "what a living person as of 2015 would have seriously disapproved of".

I'm guessing the spirit of this moral code is:

  • to forbid taking over and establishing a singleton
  • to allow passive behaviour, such as allowing a rival AI project to take over and establish a singleton

If I'm right, this runs into the problem of "if your AI doesn't take over the world then someone else's soon will".

This aside, we can consider how the AI might behave. Following Bostrom's classification:

  • Oracles give their operators a lot of power. But if the oracle is programmed not to do anything the wise people would disapprove of then it would stop answering questions once it realises it would give its operators too much power. Clever, huh?
  • Genies similarly
  • Sovereigns are ruled out if the wise people are forbidding establishing a singleton
  • Tools probably aren't agenty enough to follow the proscription accurately

So it sounds pretty good. It's basically "take the coherent extrapolated volition of some dead people", and it potentially suffers the problem of being too weedy. I think it suffers from the similar philosophical and implementation problems to anything CEV-like though - who knows what the wise dead people would have thought about low-fidelity ems, or nanobots distributing morphine to prevent wild animal suffering, or any of the other puzzles that may confront future society. You need to extrapolate and you need to aggregate opinions. And you need to choose the right people to start with.

This reminds me a little of Holden Karnofsky's Tool AI. It's relatively easy to come to the conclusion that MIRI and FHI haven't done their homework, by proposing a candidate solution that they haven't explicitly mentioned yet. If they are to take their critics seriously, and I think they should, then such proposals will enter the conversation, and maybe even future editions of this book or its successors.

But it doesn't mean the problem has gone away. It may be that extrapolating the volition of dead people is a really hard task for AI, and that other things such as initiating an economic takeover, are actually more straightforward. In that case we'd have the technology for a doomsday device before we had the suggested safeguard. And even if this isn't the case, while we're carefully building in all our safeguards, some foreign power will be hacking together any old rubbish, and as consequentialists we're responsible for the consequences of that.

This may not seem adequate to Bostrom, because he is not content with human morality in its current state; he thinks it is important for the AI to use its superintelligence to find a more ultimate morality. That seems to me both unnecessary and very dangerous.

Yeah, Bostrom can be a bit iffy when it comes to transhumanist topics. Moving on.

Bostrom considers at length solving the problem of the out-of-control computer by suggesting to the computer that it might actually be living in a simulated universe, and if so, the true powers that be might punish it for making too much mischief

Yeah, on the anthropic principle too. On the other hand, everyone else is iffy when it comes to the anthropic principle too, and I think Bostrom was just trying to be thorough here.

Any machine should have an accessible “off” switch; and in the case of a computer or robot that might have any tendency toward self-preservation, it should have an off switch that it cannot block. However, in the case of computers and robots, this is very easily done, since we are building them. All you need is to place in the internals of the robot, inaccessible to it, a device that, when it receives a specified signal, cuts off the power or, if you want something more dramatic, triggers a small grenade. This can be done in a way that the computer probably cannot find out the details
of how the grenade is placed or triggered, and certainly cannot prevent it.

Goodness, Bostrom didn't consider the possibility that you could build a machine with an "off" switch?

All the AI needs to do is:

  • Play nice until it's figured out a way to kill everyone before they can get their hand to the off switch, or
  • Play nice until it's so entwined with the rest of the world's infrastructure that switching it off would be unthinkable, or
  • Play nice until its superintelligence has figured out where the off switch is, or
  • Accomplish its goal (e.g. building a million paperclips), then kill everyone to make sure the paperclips remain forever. The off switch gets triggered on somebody's death but the AI doesn't care, or
  • Play nice until it's built another AI somewhere which has the same goals but doesn't have an off switch, or
  • Play nice until it's wireheaded everyone so they don't care

You get the idea. Manual off switches don't work when you're dealing with a master manipulator. Bostrom does discuss a number of safeguards (in particular "tripwires") that are a bit like off switches but smarter, but he discusses potential flaws in these too.

Something can always go wrong, or some foolish or malicious person might create a superintelligence with no moral sense and with control of its own off switch. I certainly have no objection to imposing restrictions...

That's great! Davis is saying "if your AI doesn't take over the world or at least do something really bad, then someone else's soon will unless there are restrictions". This is pretty close to one of my own memes, and the differences are important, but I'm glad we could find some common ground.

Thanks for reading!





More posts like this

Sorted by Click to highlight new comments since:

This is what we need: intelligent criticism of EA orthodoxy from the outside

Does SuperIntelligence really rise to the level of "EA orthodoxy"?

This might just be a nitpick, but it does really seem like we would want to avoid implying something too strong about EA orthodoxy.

You're absolutely right. I've changed that bit in the final draft.

Just FYI, Rob Bensinger is prepping a reply to Davis for MIRI's blog.

Is a conclusion missing?

OK, finished draft done. Sorry for posting it by accident earlier!

Yes - I clicked on "save and continue" and what I got was "submit". I'd better get back to work on it, I guess!

Yes, happened to me too. I thought it would save a version as private draft.

Yes, that's pretty unintuitive, I've made a note of this so that it can be fixed.

More from Giles
Curated and popular this week
Relevant opportunities