There's No Fire Alarm for Artificial General Intelligence

EA Forum Archives

This is a linkpost for https://www.lesswrong.com/posts/BEtzRE2M5m9YEAQpX/there-s-no-fire-alarm-for-artificial-general-intelligence

Shared on the Forum's front page as a classic repost. We recommend reading it alongside Katja Grace's recent response. (Also, don't miss the comments on LessWrong!)

What is the function of a fire alarm?

One might think that the function of a fire alarm is to provide you with important evidence about a fire existing, allowing you to change your policy accordingly and exit the building.

In the classic experiment by Latane and Darley in 1968, eight groups of three students each were asked to fill out a questionnaire in a room that shortly after began filling up with smoke. Five out of the eight groups didn't react or report the smoke, even as it became dense enough to make them start coughing. Subsequent manipulations showed that a lone student will respond 75% of the time; while a student accompanied by two actors told to feign apathy will respond only 10% of the time. This and other experiments seemed to pin down that what's happening is pluralistic ignorance. We don't want to look panicky by being afraid of what isn't an emergency, so we try to look calm while glancing out of the corners of our eyes to see how others are reacting, but of course they are also trying to look calm.

(I've read a number of replications and variations on this research, and the effect size is blatant. I would not expect this to be one of the results that dies to the replication crisis, and I haven't yet heard about the replication crisis touching it. But we have to put a maybe-not marker on everything now.)

A fire alarm creates common knowledge, in the you-know-I-know sense, that there is a fire; after which it is socially safe to react. When the fire alarm goes off, you know that everyone else knows there is a fire, you know you won't lose face if you proceed to exit the building.

The fire alarm doesn't tell us with certainty that a fire is there. In fact, I can't recall one time in my life when, exiting a building on a fire alarm, there was an actual fire. Really, a fire alarm is weaker evidence of fire than smoke coming from under a door.

But the fire alarm tells us that it's socially okay to react to the fire. It promises us with certainty that we won't be embarrassed if we now proceed to exit in an orderly fashion.

It seems to me that this is one of the cases where people have mistaken beliefs about what they believe, like when somebody loudly endorsing their city's team to win the big game will back down as soon as asked to bet. They haven't consciously distinguished the rewarding exhilaration of shouting that the team will win, from the feeling of anticipating the team will win.

When people look at the smoke coming from under the door, I think they think their uncertain wobbling feeling comes from not assigning the fire a high-enough probability of really being there, and that they're reluctant to act for fear of wasting effort and time. If so, I think they're interpreting their own feelings mistakenly. If that was so, they'd get the same wobbly feeling on hearing the fire alarm, or even more so, because fire alarms correlate to fire less than does smoke coming from under a door. The uncertain wobbling feeling comes from the worry that others believe differently, not the worry that the fire isn't there. The reluctance to act is the reluctance to be seen looking foolish, not the reluctance to waste effort. That's why the student alone in the room does something about the fire 75% of the time, and why people have no trouble reacting to the much weaker evidence presented by fire alarms.

* * *

It's now and then proposed that we ought to start reacting later to the issues of Artificial General Intelligence (background here), because, it is said, we are so far away from it that it just isn't possible to do productive work on it today.

(For direct argument about there being things doable today, see: Soares and Fallenstein (2014/2017); Amodei, Olah, Steinhardt, Christiano, Schulman, and Mané (2016); or Taylor, Yudkowsky, LaVictoire, and Critch (2016).)

(If none of those papers existed or if you were an AI researcher who'd read them but thought they were all garbage, and you wished you could work on alignment but knew of nothing you could do, the wise next step would be to sit down and spend two hours by the clock sincerely trying to think of possible approaches. Preferably without self-sabotage that makes sure you don't come up with anything plausible; as might happen if, hypothetically speaking, you would actually find it much more comfortable to believe there was nothing you ought to be working on today, because e.g. then you could work on other things that interested you more.)

(But never mind.)

So if AGI seems far-ish away, and you think the conclusion licensed by this is that you can't do any productive work on AGI alignment yet, then the implicit alternative strategy on offer is: Wait for some unspecified future event that tells us AGI is coming near; and then we'll all know that it's okay to start working on AGI alignment.

This seems to me to be wrong on a number of grounds. Here are some of them.

One: As Stuart Russell observed, if you get radio signals from space and spot a spaceship there with your telescopes and you know the aliens are landing in thirty years, you still start thinking about that today.

You're not like, "Meh, that's thirty years off, whatever." You certainly don't casually say "Well, there's nothing we can do until they're closer." Not without spending two hours, or at least five minutes by the clock, brainstorming about whether there is anything you ought to be starting now.

If you said the aliens were coming in thirty years and you were therefore going to do nothing today... well, if these were more effective times, somebody would ask for a schedule of what you thought ought to be done, starting when, how long before the aliens arrive. If you didn't have that schedule ready, they'd know that you weren't operating according to a worked table of timed responses, but just procrastinating and doing nothing; and they'd correctly infer that you probably hadn't searched very hard for things that could be done today.

In Bryan Caplan's terms, anyone who seems quite casual about the fact that "nothing can be done now to prepare" about the aliens is missing a mood; they should be much more alarmed at not being able to think of any way to prepare. And maybe ask if somebody else has come up with any ideas? But never mind.

Two: History shows that for the general public, and even for scientists not in a key inner circle, and even for scientists in that key circle, it is very often the case that key technological developments still seem decades away, five years before they show up.

In 1901, two years before helping build the first heavier-than-air flyer, Wilbur Wright told his brother that powered flight was fifty years away.

In 1939, three years before he personally oversaw the first critical chain reaction in a pile of uranium bricks, Enrico Fermi voiced 90% confidence that it was impossible to use uranium to sustain a fission chain reaction. I believe Fermi also said a year after that, aka two years before the denouement, that if net power from fission was even possible (as he then granted some greater plausibility) then it would be fifty years off; but for this I neglected to keep the citation.

And of course if you're not the Wright Brothers or Enrico Fermi, you will be even more surprised. Most of the world learned that atomic weapons were now a thing when they woke up to the headlines about Hiroshima. There were esteemed intellectuals saying four years after the Wright Flyer that heavier-than-air flight was impossible, because knowledge propagated more slowly back then.

Were there events that, in hindsight, today, we can see as signs that heavier-than-air flight or nuclear energy were nearing? Sure, but if you go back and read the actual newspapers from that time and see what people actually said about it then, you'll see that they did not know that these were signs, or that they were very uncertain that these might be signs. Some playing the part of Excited Futurists proclaimed that big changes were imminent, I expect, and others playing the part of Sober Scientists tried to pour cold water on all that childish enthusiasm; I expect that part was more or less exactly the same decades earlier. If somewhere in that din was a superforecaster who said "decades" when it was decades and "5 years" when it was five, good luck noticing them amid all the noise. More likely, the superforecasters were the ones who said "Could be tomorrow, could be decades" both when the big development was a day away and when it was decades away.

One of the major modes by which hindsight bias makes us feel that the past was more predictable than anyone was actually able to predict at the time, is that in hindsight we know what we ought to notice, and we fixate on only one thought as to what each piece of evidence indicates. If you look at what people actually say at the time, historically, they've usually got no clue what's about to happen three months before it happens, because they don't know which signs are which.

I mean, you could say the words “AGI is 50 years away” and have those words happen to be true. People were also saying that powered flight was decades away when it was in fact decades away, and those people happened to be right. The problem is that everything looks the same to you either way, if you are actually living history instead of reading about it afterwards.

It's not that whenever somebody says "fifty years" the thing always happens in two years. It's that this confident prediction of things being far away corresponds to an epistemic state about the technology that feels the same way internally until you are very very close to the big development. It's the epistemic state of "Well, I don't see how to do the thing" and sometimes you say that fifty years off from the big development, and sometimes you say it two years away, and sometimes you say it while the Wright Flyer is flying somewhere out of your sight.

Three: Progress is driven by peak knowledge, not average knowledge.

If Fermi and the Wrights couldn't see it coming three years out, imagine how hard it must be for anyone else to see it.

If you're not at the global peak of knowledge of how to do the thing, and looped in on all the progress being made at what will turn out to be the leading project, you aren't going to be able to see of your own knowledge at all that the big development is imminent. Unless you are very good at perspective-taking in a way that wasn't necessary in a hunter-gatherer tribe, and very good at realizing that other people may know techniques and ideas of which you have no inkling even that you do not know them. If you don't consciously compensate for the lessons of history in this regard; then you will promptly say the decades-off thing. Fermi wasn't still thinking that net nuclear energy was impossible or decades away by the time he got to 3 months before he built the first pile, because at that point Fermi was looped in on everything and saw how to do it. But anyone not looped in probably still felt like it was fifty years away while the actual pile was fizzing away in a squash court at the University of Chicago.

People don't seem to automatically compensate for the fact that the timing of the big development is a function of the peak knowledge in the field, a threshold touched by the people who know the most and have the best ideas; while they themselves have average knowledge; and therefore what they themselves know is not strong evidence about when the big development happens. I think they aren't thinking about that at all, and they just eyeball it using their own sense of difficulty. If they are thinking anything more deliberate and reflective than that, and incorporating real work into correcting for the factors that might bias their lenses, they haven't bothered writing down their reasoning anywhere I can read it.

To know that AGI is decades away, we would need enough understanding of AGI to know what pieces of the puzzle are missing, and how hard these pieces are to obtain; and that kind of insight is unlikely to be available until the puzzle is complete. Which is also to say that to anyone outside the leading edge, the puzzle will look more incomplete than it looks on the edge. That project may publish their theories in advance of proving them, although I hope not. But there are unproven theories now too.

And again, that's not to say that people saying "fifty years" is a certain sign that something is happening in a squash court; they were saying “fifty years” sixty years ago too. It's saying that anyone who thinks technological timelines are actually forecastable, in advance, by people who are not looped in to the leading project's progress reports and who don't share all the best ideas about exactly how to do the thing and how much effort is required for that, is learning the wrong lesson from history. In particular, from reading history books that neatly lay out lines of progress and their visible signs that we all know now were important and evidential. It's sometimes possible to say useful conditional things about the consequences of the big development whenever it happens, but it’s rarely possible to make confident predictions about the timing of those developments, beyond a one- or two-year horizon. And if you are one of the rare people who can call the timing, if people like that even exist, nobody else knows to pay attention to you and not to the Excited Futurists or Sober Skeptics.

Four: The future uses different tools, and can therefore easily do things that are very hard now, or do with difficulty things that are impossible now.

Why do we know that AGI is decades away? In popular articles penned by heads of AI research labs and the like, there are typically three prominent reasons given:

(A) The author does not know how to build AGI using present technology. The author does not know where to start.

(B) The author thinks it is really very hard to do the impressive things that modern AI technology does, they have to slave long hours over a hot GPU farm tweaking hyperparameters to get it done. They think that the public does not appreciate how hard it is to get anything done right now, and is panicking prematurely because the public thinks anyone can just fire up Tensorflow and build a robotic car.

(C) The author spends a lot of time interacting with AI systems and therefore is able to personally appreciate all the ways in which they are still stupid and lack common sense.

We've now considered some aspects of argument A. Let's consider argument B for a moment.

Suppose I say: "It is now possible for one comp-sci grad to do in a week anything that N+ years ago the research community could do with neural networks at all." How large is N?

I got some answers to this on Twitter from people whose credentials I don't know, but the most common answer was five, which sounds about right to me based on my own acquaintance with machine learning. (Though obviously not as a literal universal, because reality is never that neat.) If you could do something in 2012 period, you can probably do it fairly straightforwardly with modern GPUs, Tensorflow, Xavier initialization, batch normalization, ReLUs, and Adam or RMSprop or just stochastic gradient descent with momentum. The modern techniques are just that much better. To be sure, there are things we can't do now with just those simple methods, things that require tons more work, but those things were not possible at all in 2012.

In machine learning, when you can do something at all, you are probably at most a few years away from being able to do it easily using the future's much superior tools. From this standpoint, argument B, "You don't understand how hard it is to do what we do," is something of a non-sequitur when it comes to timing.

Statement B sounds to me like the same sentiment voiced by Rutherford in 1933 when he called net energy from atomic fission "moonshine". If you were a nuclear physicist in 1933 then you had to split all your atoms by hand, by bombarding them with other particles, and it was a laborious business. If somebody talked about getting net energy from atoms, maybe it made you feel that you were unappreciated, that people thought your job was easy.

But of course this will always be the lived experience for AI engineers on serious frontier projects. You don't get paid big bucks to do what a grad student can do in a week (unless you're working for a bureaucracy with no clue about AI; but that's not Google or FB). Your personal experience will always be that what you are paid to spend months doing is difficult. A change in this personal experience is therefore not something you can use as a fire alarm.

Those playing the part of wiser sober skeptical scientists would obviously agree in the abstract that our tools will improve; but in the popular articles they pen, they just talk about the painstaking difficulty of this year's tools. I think that when they're in that mode they are not even trying to forecast what the tools will be like in 5 years; they haven't written down any such arguments as part of the articles I've read. I think that when they tell you that AGI is decades off, they are literally giving an estimate of how long it feels to them like it would take to build AGI using their current tools and knowledge. Which is why they emphasize how hard it is to stir the heap of linear algebra until it spits out good answers; I think they are not imagining, at all, into how this experience may change over considerably less than fifty years. If they've explicitly considered the bias of estimating future tech timelines based on their present subjective sense of difficulty, and tried to compensate for that bias, they haven't written that reasoning down anywhere I've read it. Nor have I ever heard of that forecasting method giving good results historically.

Five: Okay, let's be blunt here. I don't think most of the discourse about AGI being far away (or that it's near) is being generated by models of future progress in machine learning. I don't think we're looking at wrong models; I think we're looking at no models.

I was once at a conference where there was a panel full of famous AI luminaries, and most of the luminaries were nodding and agreeing with each other that of course AGI was very far off, except for two famous AI luminaries who stayed quiet and let others take the microphone.

I got up in Q&A and said, "Okay, you've all told us that progress won't be all that fast. But let's be more concrete and specific. I'd like to know what's the least impressive accomplishment that you are very confident cannot be done in the next two years."

There was a silence.

Eventually, two people on the panel ventured replies, spoken in a rather more tentative tone than they'd been using to pronounce that AGI was decades out. They named "A robot puts away the dishes from a dishwasher without breaking them", and Winograd schemas. Specifically, "I feel quite confident that the Winograd schemas--where we recently had a result that was in the 50, 60% range--in the next two years, we will not get 80, 90% on that regardless of the techniques people use."

A few months after that panel, there was unexpectedly a big breakthrough on Winograd schemas. The breakthrough didn't crack 80%, so three cheers for wide credibility intervals with error margin, but I expect the predictor might be feeling slightly more nervous now with one year left to go. (I don't think it was the breakthrough I remember reading about, but Rob turned up this paper as an example of one that could have been submitted at most 44 days after the above conference and gets up to 70%.)

But that's not the point. The point is the silence that fell after my question, and that eventually I only got two replies, spoken in tentative tones. When I asked for concrete feats that were impossible in the next two years, I think that that's when the luminaries on that panel switched to trying to build a mental model of future progress in machine learning, asking themselves what they could or couldn't predict, what they knew or didn't know. And to their credit, most of them did know their profession well enough to realize that forecasting future boundaries around a rapidly moving field is actually really hard, that nobody knows what will appear on arXiv next month, and that they needed to put wide credibility intervals with very generous upper bounds on how much progress might take place twenty-four months' worth of arXiv papers later.

(Also, Demis Hassabis was present, so they all knew that if they named something insufficiently impossible, Demis would have DeepMind go and do it.)

The question I asked was in a completely different genre from the panel discussion, requiring a mental context switch: the assembled luminaries actually had to try to consult their rough, scarce-formed intuitive models of progress in machine learning and figure out what future experiences, if any, their model of the field definitely prohibited within a two-year time horizon. Instead of, well, emitting socially desirable verbal behavior meant to kill that darned hype about AGI and get some predictable applause from the audience.

I'll be blunt: I don't think the confident long-termism has been thought out at all. If your model has the extraordinary power to say what will be impossible in ten years after another one hundred and twenty months of arXiv papers, then you ought to be able to say much weaker things that are impossible in two years, and you should have those predictions queued up and ready to go rather than falling into nervous silence after being asked.

In reality, the two-year problem is hard and the ten-year problem is laughably hard. The future is hard to predict in general, our predictive grasp on a rapidly changing and advancing field of science and engineering is very weak indeed, and it doesn't permit narrow credible intervals on what can't be done.

Grace et al. (2017) surveyed the predictions of 352 presenters at ICML and NIPS 2015. Respondents’ aggregate forecast was that the proposition “all occupations are fully automatable” (in the sense that “for any occupation, machines could be built to carry out the task better and more cheaply than human workers”) will not reach 50% probability until 121 years hence. Except that a randomized subset of respondents were instead asked the slightly different question of “when unaided machines can accomplish every task better and more cheaply than human workers”, and in this case held that this was 50% likely to occur within 44 years.

That's what happens when you ask people to produce an estimate they can't estimate, and there's a social sense of what the desirable verbal behavior is supposed to be.

* * *

When I observe that there's no fire alarm for AGI, I'm not saying that there's no possible equivalent of smoke appearing from under a door.

What I'm saying rather is that the smoke under the door is always going to be arguable; it is not going to be a clear and undeniable and absolute sign of fire; and so there is never going to be a fire alarm producing common knowledge that action is now due and socially acceptable.

There's an old trope saying that as soon as something is actually done, it ceases to be called AI. People who work in AI and are in a broad sense pro-accelerationist and techno-enthusiast, what you might call the Kurzweilian camp (of which I am not a member), will sometimes rail against this as unfairness in judgment, as moving goalposts.

This overlooks a real and important phenomenon of adverse selection against AI accomplishments: If you can do something impressive-sounding with AI in 1974, then that is because that thing turned out to be doable in some cheap cheaty way, not because 1974 was so amazingly great at AI. We are uncertain about how much cognitive effort it takes to perform tasks, and how easy it is to cheat at them, and the first "impressive" tasks to be accomplished will be those where we were most wrong about how much effort was required. There was a time when some people thought that a computer winning the world chess championship would require progress in the direction of AGI, and that this would count as a sign that AGI was getting closer. When Deep Blue beat Kasparov in 1997, in a Bayesian sense we did learn something about progress in AI, but we also learned something about chess being easy. Considering the techniques used to construct Deep Blue, most of what we learned was "It is surprisingly possible to play chess without easy-to-generalize techniques" and not much "A surprising amount of progress has been made toward AGI."

Was AlphaGo smoke under the door, a sign of AGI in 10 years or less? People had previously given Go as an example of What You See Before The End.

Looking over the paper describing AlphaGo's architecture, it seemed to me that we were mostly learning that available AI techniques were likely to go further towards generality than expected, rather than about Go being surprisingly easy to achieve with fairly narrow and ad-hoc approaches. Not that the method scales to AGI, obviously; but AlphaGo did look like a product of relatively general insights and techniques being turned on the special case of Go, in a way that Deep Blue wasn’t. I also updated significantly on "The general learning capabilities of the human cortical algorithm are less impressive, less difficult to capture with a ton of gradient descent and a zillion GPUs, than I thought," because if there were anywhere we expected an impressive hard-to-match highly-natural-selected but-still-general cortical algorithm to come into play, it would be in humans playing Go.

Maybe if we'd seen a thousand Earths undergoing similar events, we'd gather the statistics and find that a computer winning the planetary Go championship is a reliable ten-year-harbinger of AGI. But I don't actually know that. Neither do you. Certainly, anyone can publicly argue that we just learned Go was easier to achieve with strictly narrow techniques than expected, as was true many times in the past. There's no possible sign short of actual AGI, no case of smoke from under the door, for which we know that this is definitely serious fire and now AGI is 10, 5, or 2 years away. Let alone a sign where we know everyone else will believe it.

And in any case, multiple leading scientists in machine learning have already published articles telling us their criterion for a fire alarm. They will believe Artificial General Intelligence is imminent:

(A) When they personally see how to construct AGI using their current tools. This is what they are always saying is not currently true in order to castigate the folly of those who think AGI might be near.

(B) When their personal jobs do not give them a sense of everything being difficult. This, they are at pains to say, is a key piece of knowledge not possessed by the ignorant layfolk who think AGI might be near, who only believe that because they have never stayed up until 2AM trying to get a generative adversarial network to stabilize.

(C) When they are very impressed by how smart their AI is relative to a human being in respects that still feel magical to them; as opposed to the parts they do know how to engineer, which no longer seem magical to them; aka the AI seeming pretty smart in interaction and conversation; aka the AI actually being an AGI already.

So there isn't going to be a fire alarm. Period.

There is never going to be a time before the end when you can look around nervously, and see that it is now clearly common knowledge that you can talk about AGI being imminent, and take action and exit the building in an orderly fashion, without fear of looking stupid or frightened.

* * *

So far as I can presently estimate, now that we've had AlphaGo and a couple of other maybe/maybe-not shots across the bow, and seen a huge explosion of effort invested into machine learning and an enormous flood of papers, we are probably going to occupy our present epistemic state until very near the end.

By saying we're probably going to be in roughly this epistemic state until almost the end, I don't mean to say we know that AGI is imminent, or that there won't be important new breakthroughs in AI in the intervening time. I mean that it's hard to guess how many further insights are needed for AGI, or how long it will take to reach those insights. After the next breakthrough, we still won't know how many more breakthroughs are needed, leaving us in pretty much the same epistemic state as before. Whatever discoveries and milestones come next, it will probably continue to be hard to guess how many further insights are needed, and timelines will continue to be similarly murky. Maybe researcher enthusiasm and funding will rise further, and we'll be able to say that timelines are shortening; or maybe we’ll hit another AI winter, and we'll know that's a sign indicating that things will take longer than they would otherwise; but we still won't know how long.

At some point we might see a sudden flood of arXiv papers in which really interesting and fundamental and scary cognitive challenges seem to be getting done at an increasing pace. Whereupon, as this flood accelerates, even some who imagine themselves sober and skeptical will be unnerved to the point that they venture that perhaps AGI is only 15 years away now, maybe, possibly. The signs might become so blatant, very soon before the end, that people start thinking it is socially acceptable to say that maybe AGI is 10 years off. Though the signs would have to be pretty darned blatant, if they’re to overcome the social barrier posed by luminaries who are estimating arrival times to AGI using their personal knowledge and personal difficulties, as well as all the historical bad feelings about AI winters caused by hype.

But even if it becomes socially acceptable to say that AGI is 15 years out, in those last couple of years or months, I would still expect there to be disagreement. There will still be others protesting that, as much as associative memory and human-equivalent cerebellar coordination (or whatever) are now solved problems, they still don't know how to construct AGI. They will note that there are no AIs writing computer science papers, or holding a truly sensible conversation with a human, and castigate the senseless alarmism of those who talk as if we already knew how to do that. They will explain that foolish laypeople don't realize how much pain and tweaking it takes to get the current systems to work. (Although those modern methods can easily do almost anything that was possible in 2017, and any grad student knows how to roll a stable GAN on the first try using the tf.unsupervised module in Tensorflow 5.3.1.)

When all the pieces are ready and in place, lacking only the last piece to be assembled by the very peak of knowledge and creativity across the whole world, it will still seem to the average ML person that AGI is an enormous challenge looming in the distance, because they still won’t personally know how to construct an AGI system. Prestigious heads of major AI research groups will still be writing articles decrying the folly of fretting about the total destruction of all Earthly life and all future value it could have achieved, and saying that we should not let this distract us from real, respectable concerns like loan-approval systems accidentally absorbing human biases.

Of course, the future is very hard to predict in detail. It's so hard that not only do I confess my own inability, I make the far stronger positive statement that nobody else can do it either. The “flood of groundbreaking arXiv papers” scenario is one way things could maybe possibly go, but it's an implausibly specific scenario that I made up for the sake of concreteness. It's certainly not based on my extensive experience watching other Earthlike civilizations develop AGI. I do put a significant chunk of probability mass on "There's not much sign visible outside a Manhattan Project until Hiroshima," because that scenario is simple. Anything more complex is just one more story full of burdensome details that aren't likely to all be true.

But no matter how the details play out, I do predict in a very general sense that there will be no fire alarm that is not an actual running AGI--no unmistakable sign before then that everyone knows and agrees on, that lets people act without feeling nervous about whether they're worrying too early. That's just not how the history of technology has usually played out in much simpler cases like flight and nuclear engineering, let alone a case like this one where all the signs and models are disputed. We already know enough about the uncertainty and low quality of discussion surrounding this topic to be able to say with confidence that there will be no unarguable socially accepted sign of AGI arriving 10 years, 5 years, or 2 years beforehand. If there’s any general social panic it will be by coincidence, based on terrible reasoning, uncorrelated with real timelines except by total coincidence, set off by a Hollywood movie, and focused on relatively trivial dangers.

It's no coincidence that nobody has given any actual account of such a fire alarm, and argued convincingly about how much time it means we have left, and what projects we should only then start. If anyone does write that proposal, the next person to write one will say something completely different. And probably neither of them will succeed at convincing me that they know anything prophetic about timelines, or that they've identified any sensible angle of attack that is (a) worth pursuing at all and (b) not worth starting to work on right now.

* * *

It seems to me that the decision to delay all action until a nebulous totally unspecified future alarm goes off, implies an order of recklessness great enough that the law of continued failure comes into play.

The law of continued failure is the rule that says that if your country is incompetent enough to use a plaintext 9-numeric-digit password on all of your bank accounts and credit applications, your country is not competent enough to correct course after the next disaster in which a hundred million passwords are revealed. A civilization competent enough to correct course in response to that prod, to react to it the way you'd want them to react, is competent enough not to make the mistake in the first place. When a system fails massively and obviously, rather than subtly and at the very edges of competence, the next prod is not going to cause the system to suddenly snap into doing things intelligently.

The law of continued failure is especially important to keep in mind when you are dealing with big powerful systems or high-status people that you might feel nervous about derogating, because you may be tempted to say, "Well, it's flawed now, but as soon as a future prod comes along, everything will snap into place and everything will be all right." The systems about which this fond hope is actually warranted look like they are mostly doing all the important things right already, and only failing in one or two steps of cognition. The fond hope is almost never warranted when a person or organization or government or social subsystem is currently falling massively short.

The folly required to ignore the prospect of aliens landing in thirty years is already great enough that the other flawed elements of the debate should come as no surprise.

And with all of that going wrong simultaneously today, we should predict that the same system and incentives won't produce correct outputs after receiving an uncertain sign that maybe the aliens are landing in five years instead. The law of continued failure suggests that if existing authorities failed in enough different ways at once to think that it makes sense to try to derail a conversation about existential risk by saying the real problem is the security on self-driving cars, the default expectation is that they will still be saying silly things later.

People who make large numbers of simultaneous mistakes don’t generally have all of the incorrect thoughts subconsciously labeled as "incorrect" in their heads. Even when motivated, they can't suddenly flip to skillfully executing all-correct reasoning steps instead. Yes, we have various experiments showing that monetary incentives can reduce overconfidence and political bias, but (a) that's reduction rather than elimination, (b) it's with extremely clear short-term direct incentives, not the nebulous and politicizable incentive of "a lot being at stake", and (c) that doesn't mean a switch is flipping all the way to "carry out complicated correct reasoning". If someone's brain contains a switch that can flip to enable complicated correct reasoning at all, it's got enough internal precision and skill to think mostly-correct thoughts now instead of later--at least to the degree that some conservatism and double-checking gets built into examining the conclusions that people know will get them killed if they’re wrong about them.

There is no sign and portent, no threshold crossed, that suddenly causes people to wake up and start doing things systematically correctly. People who can react that competently to any sign at all, let alone a less-than-perfectly-certain not-totally-agreed item of evidence that is likely a wakeup call, have probably already done the timebinding thing. They've already imagined the future sign coming, and gone ahead and thought sensible thoughts earlier, like Stuart Russell saying, "If you know the aliens are landing in thirty years, it's still a big deal now."

* * *

Back in the funding-starved early days of what is now MIRI, I learned that people who donated last year were likely to donate this year, and people who last year were planning to donate "next year" would quite often this year be planning to donate "next year". Of course there were genuine transitions from zero to one; everything that happens needs to happen for a first time. There were college students who said "later" and gave nothing for a long time in a genuinely strategically wise way, and went on to get nice jobs and start donating. But I also learned well that, like many cheap and easy solaces, saying the word "later" is addictive; and that this luxury is available to the rich as well as the poor.

I don't expect it to be any different with AGI alignment work. People who are trying to get what grasp they can on the alignment problem will, in the next year, be doing a little (or a lot) better with whatever they grasped in the previous year (plus, yes, any general-field advances that have taken place in the meantime). People who want to defer that until after there's a better understanding of AI and AGI will, after the next year's worth of advancements in AI and AGI, want to defer work until a better future understanding of AI and AGI.

Some people really want alignment to get done and are therefore now trying to wrack their brains about how to get something like a reinforcement learner to reliably identify a utility function over particular elements in a model of the causal environment instead of a sensory reward term or defeat the seeming tautologicalness of updated (non-)deference. Others would rather be working on other things, and will therefore declare that there is no work that can possibly be done today, not spending two hours quietly thinking about it first before making that declaration. And this will not change tomorrow, unless perhaps tomorrow is when we wake up to some interesting newspaper headlines, and probably not even then. The luxury of saying "later" is not available only to the truly poor-in-available-options.

After a while, I started telling effective altruists in college: "If you're planning to earn-to-give later, then for now, give around $5 every three months. And never give exactly the same amount twice in a row, or give to the same organization twice in a row, so that you practice the mental habit of re-evaluating causes and re-evaluating your donation amounts on a regular basis. Don't learn the mental habit of just always saying 'later'."

Similarly, if somebody was actually going to work on AGI alignment "later", I'd tell them to, every six months, spend a couple of hours coming up with the best current scheme they can devise for aligning AGI and doing useful work on that scheme. Assuming, if they must, that AGI were somehow done with technology resembling current technology. And publishing their best-current-scheme-that-isn't-good-enough, at least in the sense of posting it to Facebook; so that they will have a sense of embarrassment about naming a scheme that does not look like somebody actually spent two hours trying to think of the best bad approach.

There are things we’ll better understand about AI in the future, and things we’ll learn that might give us more confidence that particular research approaches will be relevant to AGI. There may be more future sociological developments akin to Nick Bostrom publishing Superintelligence, Elon Musk tweeting about it and thereby heaving a rock through the Overton Window, or more respectable luminaries like Stuart Russell openly coming on board. The future will hold more AlphaGo-like events to publicly and privately highlight new ground-level advances in ML technique; and it may somehow be that this does not leave us in the same epistemic state as having already seen AlphaGo and GANs and the like. It could happen! I can't see exactly how, but the future does have the capacity to pull surprises in that regard.

But before waiting on that surprise, you should ask whether your uncertainty about AGI timelines is really uncertainty at all. If it feels to you that guessing AGI might have a 50% probability in N years is not enough knowledge to act upon, if that feels scarily uncertain and you want to wait for more evidence before making any decisions... then ask yourself how you'd feel if you believed the probability was 50% in N years, and everyone else on Earth also believed it was 50% in N years, and everyone believed it was right and proper to carry out policy P when AGI has a 50% probability of arriving in N years. If that visualization feels very different, then any nervous "uncertainty" you feel about doing P is not really about whether AGI takes much longer than N years to arrive.

And you are almost surely going to be stuck with that feeling of "uncertainty" no matter how close AGI gets; because no matter how close AGI gets, whatever signs appear will almost surely not produce common, share, agreed-on public knowledge that AGI has a 50% chance of arriving in N years, nor any agreement that it is therefore right and proper to react by doing P.

And if all that did become common knowledge, then P is unlikely to still be a neglected intervention, or AI alignment a neglected issue; so you will have waited until sadly late to help.

But far more likely is that the common knowledge just isn't going to be there, and so it will always feel nervously "uncertain" to consider acting.

You can either act despite that, or not act. Not act until it's too late to help much, in the best case; not act at all until after it's essentially over, in the average case.

I don't think it's wise to wait on an unspecified epistemic miracle to change how we feel. In all probability, you're going to be in this mental state for a while - including any nervous-feeling "uncertainty". If you handle this mental state by saying "later", that general policy is not likely to have good results for Earth.

* * *

Further resources:

MIRI’s research guide (https://intelligence.org/research-guide/) and forum (https://agentfoundations.org)
FLI’s collection of introductory resources
CHAI’s alignment bibliography at http://humancompatible.ai/bibliography
80,000 Hours’ AI job postings on https://80000hours.org/job-board/
The Open Philanthropy Project’s AI fellowship and general call for research proposals
My brain-dumps on AI alignment
If you're arriving here for the first time, my long-standing work on rationality, and CFAR’s workshops
And some general tips from Ray Arnold for effective altruists considering AI alignment as a cause area.

Jackson WagnerJan 11 20222

Review for the Decade Review

This is going to be a quick review since I there has been plenty of discussion of this post and people understand it well. But this post was very influential for me personally, and helped communicate yet another aspect of the key problem with AI risk -- the fact that its so unprecedented, which makes it hard to test and iterate solutions, hard to raise awareness and get agreement about the nature of the problem, and hard to know how much time we have left to prepare.

AI is simply one of the biggest worries among longtermist EAs, and this essay does a good job describing a social dynamic unique to the space of AI risk that makes dealing with the risk harder. For this reason it would be a fine inclusion in the decadal review.

EA Forum Bot Site
EA Forum

There's No Fire Alarm for Artificial General Intelligence

30

30

Reactions

More posts like this