Hide table of contents

(Cross-posted from Twitter.)

 

My take on Leopold Aschenbrenner's new report: I think Leopold gets it right on a bunch of important counts.

Three that I especially care about:

  1. Full AGI and ASI soon. (I think his arguments for this have a lot of holes, but he gets the basic point that superintelligence looks 5 or 15 years off rather than 50+.)
  2. This technology is an overwhelmingly huge deal, and if we play our cards wrong we're all dead.
  3. Current developers are indeed fundamentally unserious about the core risks, and need to make IP security and closure a top priority.

I especially appreciate that the report seems to get it when it comes to our basic strategic situation: it gets that we may only be a few years away from a truly world-threatening technology, and it speaks very candidly about the implications of this, rather than soft-pedaling it to the degree that public writings on this topic almost always do. I think that's a valuable contribution all on its own.

Crucially, however, I think Leopold gets the wrong answer on the question "is alignment tractable?". That is: OK, we're on track to build vastly smarter-than-human AI systems in the next decade or two. How realistic is it to think that we can control such systems?

Leopold acknowledges that we currently only have guesswork and half-baked ideas on the technical side, that this field is extremely young, that many aspects of the problem look impossibly difficult (see attached image), and that there's a strong chance of this research operation getting us all killed. "To be clear, given the stakes, I think 'muddling through' is in some sense a terrible plan. But it might be all we’ve got." Controllable superintelligent AI is a far more speculative idea at this point than superintelligent AI itself.

Image

I think this report is drastically mischaracterizing the situation. ‘This is an awesome exciting technology, let's race to build it so we can reap the benefits and triumph over our enemies’ is an appealing narrative, but it requires the facts on the ground to shake out very differently than how the field's trajectory currently looks.

The more normal outcome, if the field continues as it has been, is: if anyone builds it, everyone dies.

This is not a national security issue of the form ‘exciting new tech that can give a country an economic or military advantage’; it's a national security issue of the form ‘we've found a way to build a doomsday device, and as soon as anyone starts building it the clock is ticking on how long before they make a fatal error and take themselves out, and take the rest of the world out with them’.

Someday superintelligence could indeed become more than a doomsday device, but that's the sort of thing that looks like a realistic prospect if ASI is 50 or 150 years away and we fundamentally know what we're doing on a technical level — not if it's more like 5 or 15 years away, as Leopold and I agree.

The field is not ready, and it's not going to suddenly become ready tomorrow. We need urgent and decisive action, but to indefinitely globally halt progress toward this technology that threatens our lives and our children's lives, not to accelerate ourselves straight off a cliff.

Concretely, the kinds of steps we need to see ASAP from the USG are:

- Spearhead an international alliance to prohibit the development of smarter-than-human AI until we’re in a radically different position. The three top-cited scientists in AI (Hinton, Bengio, and Sutskever) and the three leading labs (Anthropic, OpenAI, and DeepMind) have all publicly stated that this technology's trajectory poses a serious risk of causing human extinction (in the CAIS statement). It is absurd on its face to let any private company or nation unilaterally impose such a risk on the world; rather than twiddling our thumbs, we should act.

- Insofar as some key stakeholders aren’t convinced that we need to shut this down at the international level immediately, a sane first step would be to restrict frontier AI development to a limited number of compute clusters, and place those clusters under a uniform monitoring regime to forbid catastrophically dangerous uses. Offer symmetrical treatment to signatory countries, and do not permit exceptions for any governments. The idea here isn’t to centralize AGI development at the national or international level, but rather to make it possible at all to shut down development at the international level once enough stakeholders recognize that moving forward would result in self-destruction. In advance of a decision to shut down, it may be that anyone is able to rent H100s from one of the few central clusters, and then freely set up a local instance of a free model and fine-tune it; but we retain the ability to change course, rather than just resigning ourselves to death in any scenario where ASI alignment isn’t feasible.

Rapid action is called for, but it needs to be based on the realities of our situation, rather than trying to force AGI into the old playbook of far less dangerous technologies. The fact that we can build something doesn't mean that we ought to, nor does it mean that the international order is helpless to intervene.

111

13
3

Reactions

13
3

More posts like this

Comments15


Sorted by Click to highlight new comments since:

Rob - excellent post. Wholeheartedly agree. 

This is the time for EAs to radically rethink our whole AI safety strategy. Working on 'technical AI alignment' is not going to work in the time that we probably have, given the speed of AI capabilities development.

I think it's still good for some people to work on alignment research. The future is hard to predict, and we can't totally rule out a string of technical breakthroughs, and the overall option space looks gloomy enough (at least from my perspective) that we should be pursuing multiple options in parallel rather than putting all our eggs in one basket.

That said, I think "alignment research pans out to the level of letting us safely wield vastly superhuman AGI in the near future" is sufficiently unlikely that we definitely shouldn't be predicating our plans on that working out. AFAICT Leopold's proposal is that we just lay down and die in the worlds where we can't align vastly superhuman AI, in exchange for doing better in the worlds where we can align it; that seems extremely reckless and backwards to me, throwing away higher-probability success worlds in exchange for more niche and unlikely success worlds.

I also think alignment researchers thus far, as a group, have mainly had the effect of shortening timelines. I want alignment research to happen, but not at the cost of reducing our hope in the worlds where alignment doesn't pan out, and thus far a lot of work labeled "alignment" has either seemed to accelerate the field toward AGI, or seemed to provide justification/cover for increasing the heat and competitiveness of the field, which seems pretty counterproductive to me.

Yep. 100% agree!

Leopold's implicit response as I see it:

  1. Convincing all stakeholders of high p(doom) such that they take decisive, coordinated action is wildly improbable ("step 1: get everyone to agree with me" is the foundation of many terrible plans and almost no good ones)
  2. Still improbable, but less wildly, is the idea that we can steer institutions towards sensitivity to risk on the margin and that those institutions can position themselves to solve the technical and other challenges ahead

Maybe the key insight is that both strategies walk on a knife's edge. While Moore's law, algorithmic improvement, and chip design hum along at some level, even a little breakdown in international willpower to enforce a pause/stop can rapidly convert to catastrophe. Spending a lot of effort to get that consensus also has high opportunity cost in terms of steering institutions in the world where the effort fails (and it is very likely to fail).

Leopold's view more straightforwardly makes a high risk bet on leaders learning things they don't know now and developing tools they can't foresee now by a critical moment that's fast approaching. 

I think it's accordingly unsurprising that confidence in background doom is the crux here. In Leopold's 5% world, the first plan seems like the bigger risk. In MIRI's 90% world, the second does. Unfortunately, the error bars are wide here and the arguments on both sides seem so inextricably priors-driven that I don't have much hope they'll narrow any time soon.   

Three high-level reasons I think Leopold's plan looks a lot less workable:

  1. It requires major scientific breakthroughs to occur on a very short time horizon, including unknown breakthroughs that will manifest to solve problems we don't understand or know about today.
  2. These breakthroughs need to come in a field that has not been particularly productive or fast in the past. (Indeed, forecasters have been surprised by how slowly safety/robustness/etc. have progressed in recent years, and simultaneously surprised by the breakneck speed of capabilities.)
  3. It requires extremely precise and correct behavior by a giant government bureaucracy that includes many staff who won't be the best and brightest in the field -- inevitably, many technical and nontechnical people in the bureaucracy will have wrong beliefs about AGI and about alignment.

The "extremely precise and correct behavior" part means that we're effectively hoping to be handed an excellent bureaucracy that will rapidly and competently solve a thirty-digit combination lock requiring the invention of multiple new fields and the solving of a variety of thorny and poorly-understood technical problems -- all in a handful of years.

(It also requires that various empirical predictions all pan out. E.g., Leopold could do everything right and get the USG fully on board and get the USG doing literally everything right by his lights -- and then the plan ends up destroying the world rather than saving it because it turned out ASI was a lot more compute-efficient to train than he expected, resulting in the USG being unable to monopolize the tech and unable to achieve a sufficiently long lead time.)

My proposal doesn't require qualitatively that kind of success. It requires governments to coordinate on banning things. Plausibly, it requires governments to overreact to a weird, scary, and publicly controversial new tech to some degree, since it's unlikely that governments will exactly hit the target we want. This is not a particularly weird ask; governments ban things (and coordinate or copy-paste each other's laws) all the time, in far less dangerous and fraught areas than AGI. This is "trying to get the international order to lean hard in a particular direction on a yes-or-no question where there's already a lot of energy behind choosing 'no'", not "solving a long list of hard science and engineering problems in a matter of months and weeks and getting a bureaucracy to output the correct long string of digits to nail down all the correct technical solutions and all the correct processes to find those solutions".

The CCP's current appetite for AGI seems remarkably small, and I expect them to be more worried that an AGI race would leave them in the dust (and/or put their regime at risk, and/or put their lives at risk), than excited about the opportunity such a race provides. Governments around the world currently, to the best of my knowledge, are nowhere near the cutting edge in ML. From my perspective, Leopold is imagining a future problem into being ("all of this changes") and then trying to find a galaxy-brained incredibly complex and assumption-laden way to wriggle out of this imagined future dilemma, when the far easier and less risky path would be to not have the world powers race in the first place, have them recognize that this technology is lethally dangerous (something the USG chain of command, at least, would need to fully internalize on Leopold's plan too), and have them block private labs from sending us over the precipice (again, something Leopold assumes will happen) while not choosing to take on the risk of destroying themselves (nor permitting other world powers to unilaterally impose that risk).

The CCP's current appetite for AGI seems remarkably small, and I expect them to be more worried that an AGI race would leave them in the dust (and/or put their regime at risk, and/or put their lives at risk), than excited about the opportunity such a race provides.

Yeah, I also tried to point this out to Leopold on LW and via Twitter DM, but no response so far. It confuses me that he seems to completely ignore the possibility of international coordination, as that's the obvious alternative to what he proposes, that others must have also brought up to him in private discussions.

I think his answer is here:

Some hope for some sort of international treaty on safety. This seems fanciful to me. The world where both the CCP and USG are AGI-pilled enough to take safety risk seriously is also the world in which both realize that international economic and military predominance is at stake, that being months behind on AGI could mean being permanently left behind. If the race is tight, any arms control equilibrium, at least in the early phase around superintelligence, seems extremely unstable. In short, ”breakout” is too easy: the incentive (and the fear that others will act on this incentive) to race ahead with an intelligence explosion, to reach superintelligence and the decisive advantage, too great.

At the very least, the odds we get something good-enough here seem slim. (How have those climate treaties gone? That seems like a dramatically easier problem compared to this.)

There are several AGI pills one can swallow. I think the prospects for a treaty would be very bright if CCP and USG were both uncontrollability-pilled. If uncontrollability is true, strong cases for it are valuable.

On the other hand, if uncontrollability is false, Aschenbrenner's position seems stronger (I don't mean that it necessarily becomes correct, just that it gets stronger).

It seems Alignment folk have a libertarian bent. 

"Liberty: Prioritizes individual freedom and autonomy, resisting excessive governmental control and supporting the right to personal wealth. Lower scores may be more accepting of government intervention, while higher scores champion personal freedom and autonomy..."

"alignment researchers are found to score significantly higher in liberty (U=16035, p≈0)"

https://forum.effectivealtruism.org/posts/eToqPAyB4GxDBrrrf/key-takeaways-from-our-ea-and-alignment-research-surveys?commentId=HYpqRTzrz2G6CH5Xx

Leopold's scenario requires that the USG come to deeply understand all the perils and details of AGI and ASI (since they otherwise don't have a hope of building and aligning a superintelligence), but then needs to choose to gamble its hegemony, its very existence, and the lives of all its citizens on a half-baked mad science initiative, when it could simply work with its allies to block the tech's development and maintain the status quo at minimal risk.

Success in this scenario requires a weird combination of USG prescience with self-destructiveness: enough foresight to see what's coming, but paired with a weird compulsion to race to build the very thing that puts its existence at risk, when it would potentially be vastly easier to spearhead an international alliance to prohibit this technology.

[anonymous]18
8
0

The field is not ready, and it's not going to suddenly become ready tomorrow. We need urgent and decisive action, but to indefinitely globally halt progress toward this technology that threatens our lives and our children's lives, not to accelerate ourselves straight off a cliff.

I think most advocacy around international coordination (that I've seen, at least) has this sort of vibe to it. The claim is "unless we can make this work, everyone will die."

I think this is an important point to be raising– and in particular I think that efforts to raise awareness about misalignment + loss of control failure modes would be very useful. Many policymakers have only or primarily heard about misuse risks and CBRN threats, and the "policymaker prior" is usually to think "if there is a dangerous, tech the most important thing to do is to make the US gets it first."

But in addition to this, I'd like to see more "international coordination advocates" come up with concrete proposals for what international coordination would actually look like. If the USG "wakes up", I think we will very quickly see that a lot of policymakers + natsec folks will be willing to entertain ambitious proposals.

By default, I expect a lot of people will agree that international coordination in principle would be safer but they will fear that in practice it is not going to work. As a rough analogy, I don't think most serious natsec people were like "yes, of course the thing we should do is enter into an arms race with the Soviet Union. This is the safeest thing for humanity."

Rather, I think it was much more a vibe of "it would be ideal if we could all avoid an arms race, but there's no way we can trust the Soviets to follow-through on this." (In addition to stuff that's more vibesy and less rational than this, but I do think insofar as logic and explicit reasoning were influential, this was likely one of the core cruses.)

In my opinion, one of the most important products for "international coordination advocates" to produce is some sort of concrete plan for The International Project. And importantly, it would need to somehow find institutional designs and governance mechanisms that would appeal to both the US and China. Answering questions like "how do the international institutions work", "who runs them", "how are they financed", and "what happens if the US and China disagree" will be essential here.

The Baruch Plan and the Acheson-Lilienthal Report (see full report here) might be useful sources of inspiration.

P.S. I might personally spend some time on this and find others who might be interested. Feel free to reach out if you're interested and feel like you have the skillset for this kind of thing.

We should definitely talk more about ways for a possible Baruch Plan of AI!

Leopold Aschenbrenner founded an investment firm for AGI and on its homepage he says: "My aspiration is to secure the blessings of liberty for our posterity." Might that influence what he writes about AGI? (Source: https://www.forourposterity.com)

Executive summary: The author agrees with Leopold Aschenbrenner's report on the imminence and risks of artificial superintelligence (ASI), but argues that alignment is not tractable and urgent action is needed to halt or restrict ASI development to avoid catastrophic outcomes.

Key points:

  1. The author agrees with Aschenbrenner that full AGI and ASI are likely only 5-15 years away, and that this technology poses an existential risk if mishandled.
  2. Current AI developers are not taking the risks seriously enough and need to prioritize security and limited access to intellectual property.
  3. Aschenbrenner's report mischaracterizes the situation by suggesting that controllable ASI is feasible; the author argues that if anyone builds ASI with our current level of understanding, it will likely lead to human extinction.
  4. The author calls for urgent action from the US government, including leading an international alliance to prohibit smarter-than-human AI development and restricting frontier AI development to monitored compute clusters.
  5. Rapid action is needed based on the realities of the situation, rather than treating ASI like less dangerous technologies.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

A recent survey of AI alignment researchers found that the most common opinion on the statement "Current alignment research is on track to solve alignment before we get to AGI" was "Somewhat disagree". The same survey found that most AI alignment researchers also support pausing or slowing down AI progress.

Slowing down AI progress might be net-positive if you take ideas like longtermism seriously but it seems challenging to do given the strong economic incentives to increase AI capabilities. Maybe government policies to limit AI progress will eventually enter the Overton window when AI reaches a certain level of dangerous capability.

Curated and popular this week
Ben_West🔸
 ·  · 1m read
 · 
> Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. > > The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts. > > Full paper | Github repo Blogpost; tweet thread. 
Joris 🔸
 ·  · 5m read
 · 
Last week, I participated in Animal Advocacy Careers’ Impactful Policy Careers programme. Below I’m sharing some reflections on what was a really interesting week in Brussels! Please note I spent just one week there, so take it all with a grain of (CAP-subsidized) salt. Posts like this and this one are probably much more informative (and assume less context). I mainly wrote this to reflect on my time in Brussels (and I capped it at 2 hours, so it’s not a super polished draft). I’ll focus mostly on EU careers generally, less on (EU) animal welfare-related careers. Before I jump in, just a quick note about how I think AAC did something really cool here: they identified a relatively underexplored area where it’s relatively easy for animal advocates to find impactful roles, and then designed a programme to help these people better understand that area, meet stakeholders, and learn how to find roles. I also think the participants developed meaningful bonds, which could prove valuable over time. Thank you to the AAC team for hosting this! On EU careers generally * The EU has a surprisingly big influence over its citizens and the wider world for how neglected it came across to me. There’s many areas where countries have basically given a bunch (if not all) of their decision making power to the EU. And despite that, the EU policy making / politics bubble comes across as relatively neglected, with relatively little media coverage and a relatively small bureaucracy. * There’s quite a lot of pathways into the Brussels bubble, but all have different ToCs, demand different skill sets, and prefer different backgrounds. Dissecting these is hard, and time-intensive * For context, I have always been interested in “a career in policy/politics” – I now realize that’s kind of ridiculously broad. I’m happy to have gained some clarity on the differences between roles in Parliament, work at the Commission, the Council, lobbying, consultancy work, and think tanks. * The absorbe
Max Taylor
 ·  · 9m read
 · 
Many thanks to Constance Li, Rachel Mason, Ronen Bar, Sam Tucker-Davis, and Yip Fai Tse for providing valuable feedback. This post does not necessarily reflect the views of my employer. Artificial General Intelligence (basically, ‘AI that is as good as, or better than, humans at most intellectual tasks’) seems increasingly likely to be developed in the next 5-10 years. As others have written, this has major implications for EA priorities, including animal advocacy, but it’s hard to know how this should shape our strategy. This post sets out a few starting points and I’m really interested in hearing others’ ideas, even if they’re very uncertain and half-baked. Is AGI coming in the next 5-10 years? This is very well covered elsewhere but basically it looks increasingly likely, e.g.: * The Metaculus and Manifold forecasting platforms predict we’ll see AGI in 2030 and 2031, respectively. * The heads of Anthropic and OpenAI think we’ll see it by 2027 and 2035, respectively. * A 2024 survey of AI researchers put a 50% chance of AGI by 2047, but this is 13 years earlier than predicted in the 2023 version of the survey. * These predictions seem feasible given the explosive rate of change we’ve been seeing in computing power available to models, algorithmic efficiencies, and actual model performance (e.g., look at how far Large Language Models and AI image generators have come just in the last three years). * Based on this, organisations (both new ones, like Forethought, and existing ones, like 80,000 Hours) are taking the prospect of near-term AGI increasingly seriously. What could AGI mean for animals? AGI’s implications for animals depend heavily on who controls the AGI models. For example: * AGI might be controlled by a handful of AI companies and/or governments, either in alliance or in competition. * For example, maybe two government-owned companies separately develop AGI then restrict others from developing it. * These actors’ use of AGI might be dr
Recent opportunities in AI safety