TLDR: We don’t know how to control a superintelligence, so we should probably figure that out before we create one. (And since we don’t know when somebody might create one, we should probably figure it out as soon as possible - even if it costs a lot of money).
The following is an argument written for a non-technical audience on what AI alignment is, and why I believe it should be highly prioritised. I use terms and make points with that audience in mind, leaving nuance and specifics to more technical discussions to preserve length and simplicity.
A superintelligence is an agent - like a human or a company or a dog - that can make decisions and do things in the world better than any human could. If it was trying to play chess, it would play better than any human. If it was trying to make money, it would do that better than any human. If it was trying to come up with a way of making itself smarter, it could also do that better than any human.
We already have superintelligent agents at some tasks - narrow A.I. - that can play games and things better than people can. The number of things that these narrow A.I.s can do is growing pretty quickly, and getting an A.I. to do something new is getting easier and easier. For example, the old chess A.I.s that first beat humans could only ever play chess, but the new ones can play chess, go, and shogi without major changes to their programming. The sort of superintelligence I am talking about is one that could do every task better than any human.
Suppose we were able to create a machine that could do everything a human could do just a bit better than any human. One of the things it can do better, by definition, is build a better machine, which could then build an even better machine, and so on. Where’s does it end? Well, eventually, at the theoretical limits of computation. These theoretical limits are very, very high - without even getting close to the limit, a 10kg computer could do more computation every hour than 10 billion human brains could do in a million years. (And a superintelligence wouldn’t be limited to just 10kg). At that point, we are talking about something that can essentially do anything that is allowed by the laws of physics - something so incredibly smart it’s comparable to a civilisation millions of years ahead of us.
The problem is that we have no idea how to control such a thing. Remember, this machine is only intelligent - giving it a sense of morality, or ethics, or a desire to do good looks like a totally separate problem. A superintelligence would of course be able to understand morality, but there’s no reason to think it will value morality the way we do (unless we deliberately program it in). We don’t yet know how to program any high-level human concept like morality, love, or happiness - the difficulty is in nailing down the concept to the kind of mathematical language a computer can understand before it becomes superintelligent.
But why make a moral machine, anyway? Why not just have a superpowerful tool that just does what we ask? Let’s suppose we give a superintelligence this goal: “Make as many paperclips as you can, as fast as you can.” (Maybe we run a paperclip factory). While it’s near-human level, it might figure the best way to make paperclips is to run the factory more efficiently, which is great. What else could we expect it to do? Well, it would probably understand that it could be even better at making paperclips if it were a bit smarter, so it would work on making itself smarter. What else? It would know that it could make more paperclips with more resources - factories, metal, machines - so it would also work towards getting more resources. It might understand that the humans that built it don’t actually want it to go build more factories, but it wouldn’t care - the only thing we programmed it to care about is making as many paperclips as possible, as fast as possible.
It also doesn’t want to be turned off. It doesn’t care about dying, of course, it only cares about paperclips - but it can’t make paperclips if it’s turned off. It also can’t make paperclips if we reprogram it, so it doesn’t want to be reprogrammed.
At some point, the superintelligence’s goal of making paperclips becomes a bit of a problem. It wants resources to turn into paperclips, and we want resources to turn into food and cars and hospitals. Being millions of times smarter than any human, and having access to all of humanity’s information and communication via the internet, it would win. Easily. So it goes, gradually converting all the matter on Earth into paperclips and von Neumann probes, which fly to other planets and turn them into paperclips too. Spreading out in all directions at the speed of light, the paperclip maximiser.
The problem is Instrumental Convergence. Would the superintelligence be better at achieving its goal if it had more resources? More intelligence? Would it be better at achieving its goals if it keeps itself turned on? If it stops it’s goal from being changed? If you are thinking of giving the superintelligence a goal for which the answer is ‘yes’ to any of those questions, something like the above story will happen. We might shout all we like “That’s not what we meant!”, and it might understand us, but it doesn’t care because we didn’t program it to do what we meant. We don’t know how to.
There is an entire field dedicated to trying to figure out how to make sure a superintelligence is aligned with our goals - to do what we mean, or to independently do ‘good’, or to limit its impact on the world so if it does go wrong at least we can try again - but funding, time, and talent is short, and the problem is proving to be significantly harder than we might have naively expected. Right now, we can’t guarantee a superintelligence would act in our interests, nor guarantee it would value our lives enough that it wouldn’t incidentally kill us in pursuit of some other goal, like a human incidentally kills ants while walking.
So a superintelligence could be super powerful and super dangerous if and when we are able build it. When might that be? Let’s use expert opinion as a bit of a guide here, rather than spending ages diving into the arguments ourselves. Well, it turns out they have no idea. Seriously, there’s a huge disagreement. Some surveys of experts predict it’s at least 25 years away (or impossible), others predict it’s less than 10 years away, most have a tonne of variation.
If nothing else, that much tells us we probably shouldn’t be too confident in our own pet predictions for when we might build a superintelligence. (And even twenty-five years is super soon). But what about predictions for how quickly a superintelligence will ‘takeoff’ - going from ‘slightly more intelligent than a human’ to ‘unthinkably intelligent’? If it takes off slow enough, we’ll have time to figure out how to make it safe after we create the first superintelligence, which would be very handy indeed. Unfortunately, it turns out nobody agrees on that either. Some people predict it will only take a few hours, others predict weeks or years, and still others decades.
To summarise - we don’t know when we might build a superintelligence, we don’t know how quickly a superintelligence will go from ‘genius human’ to ‘unstoppable’, and we don’t know how to control it if it does - and there’s a decent chance it’s coming pretty soon. A lot of people are working on building superintelligence as soon as possible, but far fewer people (and far less funding) is going into safety. The good news is that lots of people aren’t worried about this too much, because they believe that we will have solved the problem of how to make superintelligence safe (the alignment problem) before we manage to build one. I actually think they are probably right about that, but the reason I am so worried here is that ‘probably’ isn’t very reassuring to me.
It’s really a question of risk management. How certain are you that a superintelligence is more than, say, 50 years from being built? How certain are you that we will be able to solve alignment before then? Is it worth spending a bit more money, as a society, to increase that certainty?
We should also consider how helpful an aligned superintelligence would be. Something as powerful as the machine we’re considering here would be able to solve world problems in a heartbeat. Climate change, poverty, disease, death - would a civilisation a million years ahead of ours be able to solve these? If such a civilisation could, then a superintelligence that has ‘taken off’ would be able to as well.
When I first became aware of this two years ago, it seemed obvious to me that I should change my major to computer science and try to come up with a solution myself. Today, it looks like the best thing for me to do is try to generate money and influence to get multiple other people working on the problem too. The purpose of this post is to beg you to please think about this problem. The lack of social, political, and scientific discussion is super worrying - even if you only think there’s a 1% chance of a bad superintelligence being developed soon, that’s still a massive gamble when we are talking about extinction.
To find out more, WaitButWhy has a nice, gradual intro that’s a little bit more in depth than this. If you are technically minded, this talk/transcript from Eliezer Yudkowsky gives a very good overview of the research field. The book Superintelligence by Nick Bostrom goes much more in depth, but it a little out of date today. Also the websites LessWrong, Intelligence.org, and the Future of Life Institute all have more discussions and resources to dip your toes in. If you’re into videos, the panel discussion at (one of) the first superintelligence safety conferences nicely sums up the basic views and state from the current major players. I beg you to consider this problem yourself in deciding what the best thing you can do for the world is. The field is tiny, so single new researcher, policy maker, contributor, or voter can really have a massive difference.
If you are not yet convinced, I would love to hear your arguments. I would actually love to be convinced that it is not a danger -- it would take so much worry off.
Sorry for the delay on this reply. It’s been a very busy week.
Okay, so, to be clear -- I am making the argument that superintelligence safety is an important area that is underfunded today, and you are arguing that extinction caused by superintelligence is so unlikely that it shouldn’t be a concern. Is that accurate?
With that in mind, I’ll go through you points here one by one, and then attempt to address some of arguments in your blog posts (though the first post was unavailable!).
I agree with you here. My reason for bringing this up in the main post was to show that superintelligence is possible under today’s understanding of physics. Raw computation is not intelligent by itself, we agree, but rather one requirement for it. I was just pointing out the computation that could be done in a small amount of matter is much larger than the computation that is done in the brain. (And that the brain’s computation is in a pattern that we call general intelligence).
I didn’t mention a lot of good research relevant to safety, and progress is being made in many independant directions for sure. I do agree, I would also like to see more of a crossover, though I really don’t know how much the two areas are already working off the other’s progress. I’d be surprised if it were zero. Regardless, if it were zero, it would show poor communication, rather than say anything about the concerns being wrong.
I mean, there’s no rule that a superintelligence has to misunderstand you. And there’s no certainty instrumental convergence is correct. (I wouldn’t risk my life on either statement!) It’s just that we think being smarter would help achieve most goals, so we probably should expect a superintelligence to try and make itself smarter.
The other part is we just don’t know how to guarantee that a superintelligence will do what we mean. (If you do know how to do this, that would be a huge relief). Even in your example of trying to get an superintelligence just to make itself smarter, I certainly wouldn’t be confident it would do it in the way I expect -- I have enough trouble predicting how my programs today will run. Suppose I’d written a utility function for ‘smartness’ that actually just measured total bits flipped, for example, I might not realise until afterwards, which wouldn't be good.
I might be misunderstanding you here. Are you arguing that because superintelligence does not yet exist, it is not yet worthwhile to work on safety? Or are you arguing that we can’t be confident that a solution to alignment will work without a superintelligence to test it on?
If it’s the first, I would argue that there’s a major risk that we won’t find a solution in the period of time between creating a superintelligence, and the superintelligence having enough power to be a big problem. Unless I was super confident this time period would be very large, wouldn’t it make more sense to try and find a solution as early as possible?
I’d also argue that solving a solution early would mean it could be worked into the design of a superintelligence early, rather than just relying on the class of solutions that would fit something that’s already been built.
If it’s the second, I agree -- it would be a much easier problem to solve if we had a ‘mini’-superintelligence to practice on, for sure. Figuring out how to do this is a part of safety research! How can we limit a superintelligence’s capabilities so it stays in this state? How can we predict what will happen as we increase a weak superintelligence to a strong superintelligence? We still need to figure out how to do that as well, hence my call for research funding.
I am not sure this is true, I’ve always read takeoff speed estimates as counting from the moment of human-level general intelligence - though I know many people imagine a human-level AGI as having access to current narrow superintelligence (as in, max[human, current computer] abilities at each task). Maybe that’s it.
Regardless, as above, I hope we get that chance, though from the little research that has been done it looks like this might not be as safe as it sounds. We would have to be very very good at determining the capability of an AGI, be confident that no other project is moving forward faster than us, and be confident that the behaviour will remain the same as intelligence increases -- which might be the trickiest one. For example, a near-human AGI might be able to predict that doing what humans want early on would make it more likely to achieve its goal later on, no matter what the goal actually is. -- So we haven’t avoided catastrophe, only added an instrumental goal of ‘behaving the way humans want me to until I have enough power to disregard them without being shut down’. Still, this is an open area of research and I hope it gets more funding and attention.
Getting into your arguments for that figure below, though I want to clarify here my estimate of superintelligence being built this century is in the double digits percentage wise, and that if it's built before we solve alignment it is almost certain to be dangerous. I'm not relying on very low probabilities of drastic outcomes, so Pascal's Mugging doesn't apply.
_
Onwards to a some limited responses to your blog posts. I wasn’t entirely sure if I understood your argument properly, so I’m going to try and list the main points here and see if you agree.
1. You argue that if the probability of an AI-related extinction event were large, and if a single AI-related extinction event could affect any lifeform in the universe ever, one should have already happened somewhere and we shouldn’t exist.
2. You argue that current safety research is ineffective -- we’d be able to work more effectively and cheaply if we waited until we were closer to developing superintelligence.
3. You believe that if a superintelligence was going to be built in the near future, and if it was going to be dangerous, it would probably result in a smaller scale catastrophe that would give us plenty of warning that a bigger catastrophe was coming.
4. You believe that there are numerous psychological reasons people are inclined to believe superintelligence is likely and dangerous, and so increase your skepticism of the claims because of that.
5. You argue that left to its own devices, regular commercial or academic research will be able to solve the problem.
If there’s a major point I’ve missed here, or if I’ve phrased these badly, do correct me! Anyway, let’s go through them.
If the probability of broadcasting radio into space were large, we should have already detected alien radio. (Since radio would also spread at the speed of light in all directions, and be distinct from natural events). I don’t believe this is strong evidence against the hypothesis that superintelligence (or radio) is possible and dangerous, though I suppose it’s evidence that there are no other advanced civilisations within our past light cone.
It is hard to say how effective current safety research is, for sure. If anything, the limited progress should make us think this problem is very hard and make us way less confident about being able to solve it in a short period of time in the future. Particularly since some aspects of safety get harder to implement the longer we wait -- building culture and institutions that consider the issue when setting up their AGI projects, for instance.
If the time period between a small scale catastrophe and a large one is small, we shouldn’t be confident that we can solve safety in time -- especially if you are right about a small scale catastrophe being evidence we are nearing superintelligence.
Additionally, if there exist large scale failure modes that are wholly different to any small scale failure mode, we shouldn’t expect learning from small scale catastrophes to help us prevent larger ones.
Alternatively, we might even make large scale failures harder to detect by patching small failures -- for example, we might think we’ve prevented a superintelligence from trying to escape onto the internet, but we’ve really just made escaping so hard that only a strong superintelligence could manage it.
Humanities general lack of concern generally about climate change or nuclear weapons (prior to them being created / caused) would indicate to me the psychological trends go in the other direction, at least for most people. Regardless, I would certainly agree with being really skeptical about extraordinary claims.
I would argue that it’s an extraordinary claim both ways. Either superintelligence is not that hard to build, or there is something so incredibly complicated and special about biological general intelligence that even with billions of dollars of funding per year for a hundred years, we won’t manage to replicate it - even as we replicate other aspects of biological intelligence (like vision, or motor control).
You might argue, fairly, that this is more likely, but do you really believe it is billions of times as likely?
I’m not sure if you’re main disagreement is with superintelligence being built at all, or with it being dangerous, so let’s look at that quickly too. If we are skeptical of superintelligence being dangerous because it seems extraordinary, we should also be skeptical of the extraordinary claim that a superintelligence would be be safe and good by default. (If it is not by default, we already have discovered how difficult it is specify safe behaviour).
I really hope so.
Commercially, building a superintelligence (or rather, every step towards superintelligence) would be extremely profitable. But since safety research would take some of your best minds away from building it, the incentives are in the wrong direction. Whoever spends the least on safety has the largest proportion of their resources to spend on development.
As far as regular academic research goes, it’s more hopeful, but the number of people working on safety in traditional academia is very very low. How confident can we be that this low output would be enough to solve the problem prior to building a superintelligence -- especially given how difficult we’ve found it to be so far -- and considering how many ambitious researchers are working on building a superintelligence as soon as possible? Perhaps money could be best spent persuading those researchers to consider safety, I don’t know.
To conclude, I want to lay out what would change my mind:
If progress on computer hardware and software seemed very likely to halt (or slow dramatically) in the near future.
If our current understanding of neuroscience turned out to be wrong, and we could show that simulating general purpose computation required far more computation than the brain’s cells do -- perhaps the brain uses hard-to-compute actions on the level of atoms or smaller, rather than something that could be done in abstract models of cells.
If somebody was able to disprove (or provide very strong evidence against) the orthogonality thesis and instrumental convergence thesis.
If no project was working on building superintelligence.
Otherwise, it seems very much like we could have the capability of simulating and optimising a general intelligence in the near future, and that this could be very dangerous.