Hide table of contents

The debates this week have clarified a number of things in my mind, and I think that has been useful. At the same time, I think there’s a lack of clarity about what was proposed, and what the objections are. Given that, I wanted to summarize my position, and then, despite being convinced that there were some pitfalls which I had not considered, explain why the arguments made have not changed my mind overall.

What is being proposed?

The overall proposal is premised on considering future AI systems potentially dangerous. Because of the danger they can pose, we should not allow such systems to be trained unless and until each such system is evaluated for likely risks, and then should not allow it to be released until it’s been shown to be safe via testing. (As an aside, this isn’t treating these future AI systems like nuclear weapons, it’s treating them like buildings, where builders submit plans for approval before building, the plans are checked to ensure the building is safe, up to code, and not breaking laws, and after all that occurs, inspections are required before it can be used.) The proposed mechanism for not allowing such models to be created is an international treaty, because unilateral action is not helpful. Despite this, as outlined below, the treaty could fail. It might be rejected, might be accepted but fail to stop some dangerous systems, or might restrict AI more than is ideal. But if it succeeds, there would be an enforceable set of guidelines that will be developed, mitigating or eliminating the risk of such systems. In addition, such a treaty needs to be backed up by a compliance and verification regime, which would also need to be negotiated. This regime should include restrictions that make it unlikely that potentially dangerous AI is being developed outside the review framework.  There are legitimate concerns people might have about the details or general idea of such a treaty, but the arguments I have seen don’t convince me that pushing for a moratorium like the one outlined isn’t the most critical avenue for the world to pursue in order to mitigate what many agree is an existential risk. 

Why do people disagree?

Opponents to a moratorium, including Nora Belrose have offered many reasons that efforts around negotiating a global treaty to prevent dangerously powerful and insufficiently aligned systems would fail - both in their essays, and in the comments responding to both their own and other essays. Given that, I want to break down some of the objections and clarify why I don’t find any of these objections compelling. I think there are three general points being made; that many plans are likely to fail, that failure makes things worse, or that success would be bad. Before that, there’s a often-repeated supposition among those opposed to any such measures that maybe alignment is easy. I think this is possible, especially given sufficient attention from researchers, AI companies, and governments. I’m still skeptical, but certainly don’t think it’s fair to make plans that depend on the assumption that it’s impossible, or at least can’t be done given current plans, without defending that position - as Rob Bensinger would, and has done elsewhere.

Many plans are likely to fail

It seems implausible to opponents of a moratorium that the world coordinates well enough to stop future malevolent AI. I think this presumes that any efforts are all-or-nothing, and in some cases, explicitly claims that we need a global dictatorship to make it work. But we’re not talking about a thought experiment, we’re talking about an actual process - so we need to look at what actual processes will lead to to understand what would happen, and what is worth pursuing. I think the claim that it’s impossible to stop AI misunderstands both how global treaties and regulation work, and how such negotiations fail. Countries generally negotiate treaties based on input from the public, experts, and diplomats. They may have mechanisms for verification that succeed or fail, they may have provisions that are implicitly ignored or explicitly rejected, or they may not be accepted or ratified. In each case, this isn’t the end of the story. Nuclear arms reductions negotiations were extensive and often stalled, but they kept getting restarted because both sides had an incentive to reduce risk. Whatever treaties get negotiated, if any, may not be sufficient to stop dangerous AI development. Moreover, I agree that treaties are nearly certain to fail at their most extreme goals, such as fully stopping progress or completely preventing advances by non-complying nations. At the same time that I agree those failure modes are likely, I think there is essentially no way they overshoot and lock everyone into a position where everyone wants to build mutually-beneficial AI, but cannot because the treaty prevents it, short of a global government that’s legally or bureaucratically incapable of making changes. And the reasons that won’t happen include the fact that every country wants some degree of sovereignty, there’s no path for it to happen, and so on. It’s bizarre to hypothesize that any concrete governance building process will metastasize into a global dictatorship without anyone noticing and stopping it. But it’s entirely possible that the world fails to make a treaty.

Failure makes things worse

There are a few ways that trying to get a treaty and having it fail could be bad. The first set of failures is if a treaty is created, but it doesn’t achieve its objectives. The second set is if no treaty is created. Perhaps a failed international treaty differentially advantages the bad guys, those who refuse to cooperate or comply with international consensus. Canada doesn’t have nuclear weapons, because they followed the rules, but North Korea does. This is a real problem - but I’ll note that nuclear programs that existed in the 1970s, including South Africa, which renounced its nuclear program, mostly didn’t continue. The nuclear non-proliferation treaty undoubtedly slowed the process of technology diffusion. Analogously, it seems implausible that any treaty on AI would cause signatory states to renounce all AI research, and those states and the world’s researchers would explicitly be trying to develop safety measures and continue to progress on building safe AI - and any treaty would, implicitly or explicitly, allow them to continue if it becomes clear that non-signatory states are racing to build AI. On the other hand, perhaps a treaty bans building powerful processors without controls, and those evading the controls have an advantage, and again, no-one bothers to relax restrictions for other nations. Supposing a failure that precludes these responses is assuming, preposterously, that no-one has bothered considering this, and that treaties would be created without any means to address it. Another risk is that if no treaty is created, the attempts to create one could lead to a focus on international rules that might distract from useful work elsewhere. I admit this, and think it requires care. There is definitely room for a combination of industry initiatives to ensure safety and agree not to scale to dangerous levels or allow or pursue dangerous applications, as well as national responses on regulating safety and unsafe large models. They don’t replace an international agreement, but it’s not an either-or question. So I certainly agree that there’s good reason for companies to build industry standards via self-governance, and countries can and should regulate dangerous models internally, whether or not any treaty is pursued or agreed to. These are complementary approaches, not alternative ones.

Success may be bad

The idea of a temporary pause without any further action is a bad idea. There may be those who disagree, but it seems that even those who support a pause, in the most literal, short term, and naive sense, have said that it’s not sufficient. I would go further, because I think that any simple pause with a fixed term would be useless and damaging. It could create hardware or algorithmic overhang, it wouldn’t actually make companies stop, and so on. But a more permanent moratorium doesn’t mean stopping forever, for several reasons. First, as mentioned above, any treaty will have actual provisions governing what can happen, and what cannot. No-one seems to be advocating a treaty that doesn’t allow some mechanism for review. I’m very concerned that the criteria are going to be far too weak, rather than too strong - but pushing not to have a treaty certainly doesn’t fix that. In any case, if a model is safe according to criteria specified, and/or it passes safety review mechanisms, it would be allowed. And if AI labs think that the criteria are too strict, they have every ability to push for changes. Second, there are concerns that a treaty with enforcement mechanisms would trigger a nuclear war. This, again, seems to ignore what actually happens in treaty negotiations. Even if a treaty explicitly permits individual member states to act militarily to enforce it, countries need to choose to go to war - treaties aren’t smart contracts in blockchains with automated responses. And even treaties that are widely agreed on, but later become irrelevant, don’t actually stay in force. Countries acting to enforce a treaty do so by consensus, or they cause international incidents or wars - but that can happen in an escalating arms race anyways, and treaties often provide otherwise missing mechanisms to resolve disputes - for example, as mentioned above, review of models. Third, if a treaty is so successful that it actually stops progress in AI, opponents seem to stipulate a dichotomy in which there are only two ways for AI progress to continue; either we have unfettered AI development and safety is solved because safety is easy, or we have global dictatorship via omnisurveillance and a ML-powered boot stomping humanity forever. (I, in contrast, think that developing AGI under the control of governments is a far riskier proposition in terms of enabling or creating dictatorships!) To address the concern about dictatorships being the result of not pursuing AGI, I’ll again note that governments would need to agree to this. And looking at other domains, even the strongest proposed versions of nuclear arms deals put nuclear weapons under control of international bodies, and even the strongest proposals for controlling AI don’t include much more than stopping production of high-end GPUs, monitoring manufacturing, and requiring that a very small number of people not do what they want in building AI. A global CERN for AI isn’t emperor Palpatine, or Kang the conqueror. Even banning chips that allow better video on computers, better animation in movies, and better games on next-gen consoles is not a big inconvenience for most people - and if it is tragic for gamers and AI researchers, the reduced access to compute is still eminently survivable even for them. Next, to address the question of whether alignment is easy, I think we need to not assume it will be. Some opponents are incredibly optimistic that we can solve the problems of AI safety on the fly, but others strongly disagree. And if the optimists assume they are correct, they are betting all our lives on their prediction, and I think we want to agree not to allow unilateralist risk taking on this scale.  And given that, contra Matthew Barnett, the treaties [being discussed] aren’t permanent. Once the systems are aligned with our values, and agreed to be beneficial, they should and would be deployed - the critical issue, which Holly Elmore points out, is that the burden of proof must be on those building the systems, not on those opposed to them. But once that burden of proof is addressed, with suitable reviews for safety, these systems would be trained and deployed. To return to the final part of this objection, slowing progress is incredibly costly - I disagree with Paul Graham’s lack of caution, among other things, but agree that AI promises tremendous benefits, and delaying those benefits is not something to do lightly. The counter argument is that if and when we know it’s safe, and there are governance mechanisms in place, the benefits can be realized, and there is nothing forcing us to rush forward and take excess risks. This is a real tradeoff, but given the stakes - uncertain timelines, uncertainty about whether alignment is solvable, irreversible proliferation of models, and all of our lives on the line - my view is that AI requires significant caution.

What I changed my mind about

My biggest surprise was how misleading the terms being used were, and think that many opponents were opposed to something different than what supporters were interested in suggesting. Second, I was very surprised to find opposition to the claim that AI might not be safe, and could pose serious future risks, largely because the systems would be aligned by default - i.e. without any enforced mechanisms for safety. I also found out that there was a non-trivial group that wants to roll back AI progress to before GPT-4 for safety reasons, as opposed to job displacement and copyright reasons.  I was convinced by Gerald Monroe that getting a full moratorium was harder than I have previously argued based on an analogy to nuclear weapons. (I was not convinced that it “isn't going to happen without a series of extremely improbable events happening simultaneously” - largely because I think that countries will be motivated to preserve the status quo.) I am mostly convinced by Matthew Barnett’s claim that advanced AI could be delayed by a decade, if restrictions are put in place - I was less optimistic, or what he would claim is pessimistic. As explained above, I was very much not convinced that a policy which was agreed to be irrelevant would remain in place indefinitely. I also didn’t think that there’s any reason to expect a naive pause for a fixed period, but he convinced me that this is more plausible than I had previously thought - and I agree with him, and disagree with Rob Bensinger, about how bad this might be. Lastly, I have been convinced by Nora that the vast majority of the differences in positions is predictive, rather than about values. Those optimistic about alignment are against pausing, and in most cases, I think those pessimistic about alignment are open to evidence that specific systems are safe. This is greatly heartening, because I think that over time, we’ll continue to see evidence in one direction or another about what is likely, and if we can stay in a scout-mindset, we will (eventually) agree on the path forward.


To conclude on a slightly less hopeful note, I want to re-emphasize another dimension to the discussion, which is timing. Waiting for the evidence to be completely convincing even to skeptics, as the world seems to have done with global warming, in order to put plans in place is abandoning hope for a solution prematurely. The negotiations needed for a treaty will take time, and it’s far easier to step back or abandon a treaty if really robustly safe systems are being built, or there is clear progress on safety, and we conclude the risk was overblown. But “robustly safe systems” do not describe what is happening now, and I don’t think we should bet all of our lives on getting this right on the first try. We also can’t afford a foolhardy plan to wait until everyone is more confident that there is a danger, then really quickly get a global moratorium discussed, debated, negotiated, put into effect, and enforced.  Opposing enforceable treaties which have the ability to ban dangerous projects is pushing to prevent democratic governance to evaluate and possibly stop risky models. While we’re uncertain, blanket opposition seems unjustifiable. The details of those mechanisms are critical, but they are things which should be debated in detail, not dismissed or opposed a priori because of some specific contingent detail.


This post is part of AI Pause Debate Week. Please see this sequence for other posts in the debate.





More posts like this

Sorted by Click to highlight new comments since:

I fully support a pause, however that is enacted, until we find a way to ensure safety. 

I think part of the reasons so many people do not consider a pause not only reasonable but actually self-evidently the right thing to do is related to the specific experience of some of the people on the forum. 

A lot of people engaging in this debate naturally come from an AI or tech background. Or they've followed the fortunes of Facebook and Amazon and Microsoft from a distance and seen how they've been allowed to do pretty much whatever they want. Any proposal to limit tech innovation may seem shocking. Because tech has had an almost regulation-free ride until now. And other groups in the public eye, such as banks and investment firms have paid off enough people in congress to eliminate most of their regulations too. 

But this is very much NOT the norm. 

But if you look at, say, the S&P 500, you'll see maybe 30 tech companies or banks, and a few others, which face very little regulation. But many more companies who are very used to being very strictly regulated. 

  • Pharma companies are used to discovering miracle drugs but still having to go through decades (literally!) of safety testing before they can make them available to the public, and even then they still need FDA audits to prove that they are producing exactly what they said, how they said they would. Any change can take another few years to get approved. 
  • Engineers and Architects know that every major design they create needs to be reviewed by countless bodies who effectively have a right to deny approval - and the burden of proof is always on the side of those who want to go ahead. 
  • If you try to get a new chemical approved for use in food, it is such a long and costly process that most companies just don't bother even trying. 

This is how the world works. There is this mentality among tech people that they somehow have the right to innovate and put out products with no restrictions as if this as everyone's god-given right. But it's not. 

So maybe people within tech have a can't do attitude (as Katja Grace called it) towards a pause, thinking it cannot work. But the world knows how to do pauses, how to define safety criteria and ways to make sure they are met before a product is released. Sure, the details for AI will be different than for Pharma, but is AI fundamentally more complex than the interactions of a new, complex chemical with a human body? It isn't obviously so. 

The FDA and others have found ways to keep drugs safe, while still allowing phenomenal progress. It is frustrating as hell in the short term, but in the long run it works best for everyone - when you buy a drug, it is highly unlikely to do you harm in unexpected ways, and typically any harm it might do has been analysed, communicated to the medical community. So that you and your doctor know what the risks are. 

It feels wrong for the AI community to believe that they deserve to be free of regulation when the risks are even greater than those from Pharma. And it feels like a can't do attitude for us to believe that a pause cannot happen or cannot be effective. 


Executive summary: The author argues in favor of an international moratorium on developing artificially intelligent systems until they can be proven safe, responding to common objections.

Key points:

  1. A moratorium would require AI systems to undergo safety reviews before release, not ban AI entirely. It could fail in various ways but would likely still slow dangerous AI proliferation.
  2. Failure may not make things much worse - existing initiatives could continue and treaties can be amended. Doing nothing risks an AI arms race.
  3. Success will not necessarily lead to dictatorship or permanently halt progress. Safe systems would be allowed and treaties can evolve if no longer relevant.
  4. The benefits of AI do not justify rushing development without appropriate safeguards against existential risks.
  5. The evidence for AI risk is not yet definitive but negotiating safety mechanisms takes time, so discussions should begin before it is too late.
  6. Differences are largely predictive, not values-based - optimism versus pessimism about easy alignment. Evidence may lead to agreement over time with open-mindedness.


This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

contra Matthew Barnett, treaties aren’t permanent

I don't really know what you mean by this, but I never said treaties are permanent. Can you please not strawman me?

I apologize that my intent here was unclear - I edited it to say "the treaties [being discussed] aren't permanent," which I thought was clear from context.

Curated and popular this week
Relevant opportunities