Hide table of contents

This post discusses a promising model for verifying compliance with international regulations on AI development. It is written by Damin Curtis & Alexander M. Wyckoff, and discusses a proposal by Yonadav Shavit. Crossposted on the AI Alignment Forum.

 

The Verification Problem

To cope with emerging security challenges, states will have to create new regulatory frameworks to reign in the development of dangerous AI models/capabilities. To be effective, any laws or agreements will need a credible verification mechanism behind them, yet how to create such a mechanism is an open technical/policy question. 

Frameworks to limit the proliferation of powerful weapons systems have been developed before, such as the International Atomic Energy Agency (IAEA) and the Nuclear Non-Proliferation Treaty (NPT). Through internationally agreed upon frameworks, inspections, and tracking of hazardous materials, the IAEA and NPT have successfully limited the development of nuclear WMD while allowing parties to develop useful technologies such as nuclear energy and biochemical research labs.

Developing a similar framework for verifying the peaceful development of AI models is, of course, difficult. The infrastructure and process of training safe versus unsafe models can be nearly identical, and unsafe protocols difficult to identify in training modules. What’s more, a method of verification must not endanger the privacy/intellectual property rights of the proving party (“the prover”). However, there is growing consensus that compute governance may be the keystone of AI governance, and there are strategic bottlenecks in the supply chain of compute-providing semiconductors that may allow for effective monitoring. 

 

Proposal Overview

A recent (2023) proposal for such a framework by Yonadav Shavit seems promising on all of these fronts, as outlined in his paper, What Does it Take to Catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training Via Compute Monitoring. (Note that the following proposal is not our own, it is the work of Yonadav Shavit. The wording in this paper, and the discussion surrounding it, is our own.)

Shavit proposes that all advanced chips be manufactured with hardware which saves occasional “snapshots” of the training transcripts running on the chip, which can then be viewed by inspectors. An international regulatory agency would inspect to ensure chips are not being trained for illegal/dangerous models, and simultaneously monitor chip supply chains and deliveries to keep informed on which parties are stockpiling enough chips to train large, new AI models. We believe this proposal holds promise as a potential underpinning of an (inter)national verification system for powerful AI model developments.

 

The proposed system works at three levels:

1. Chip level

In order to be approved for use in AI training infrastructure, new chips will be hard-wired to save log-weight snapshots of the code they’re running. These snapshots will be reported to regulators along with (hashes of) training transcripts, allowing regulators to verify whether the chips have been used for a rule-violating training run. Regulators can physically inspect random samples of chips to ensure that their snapshots were reported, and to ensure that tampering has not occurred to undermine the validity of remote snapshot taking & reporting.

2. Data Center level

The data center (prover) provides its training process transcript, allowing the prover & verifier to confirm that retraining with that transcript provides the same “logged weight-snapshot”, showing that the training is being done as claimed. The verifier then ensures that the training transcript’s properties are in line with agreed regulations; for example, making sure none of the algorithm code is relevant to achieving breaches of data security or other unlawful use cases.

3. Supply Chain level

Cutting-edge chip supply chains have highly concentrated bottlenecks/choke points. These concentrated providers will record where chips are being sold to, and report to the verifier. When doing checks, the verifier can randomly sample chips at training centers to ensure chips were acquired through legally reported sources/transactions, ensuring that none have been covertly acquired. Ensuring chips are not illegally sold would help prevent training runs from happening outside of this monitoring system. 

 

Benefits

Reduces the Necessity of Unilateral Actions

In the absence of a verification mechanism, governments will increasingly deploy costly unilateral export restrictions to try and limit one another’s development of harmful models. For example, in 2022, the US enacted export controls to deny the People’s Republic of China access to advanced computer chips, in part to restrict development of AI models that might be put to harmful purposes (such as weapons development or human rights violations). This measure may not have been necessary if a mechanism had existed for the PRC to credibly commit to exclusively pursue models with agreed-upon capabilities, in which case China could continue with unobjectionable imports and the US would not have to restrict its own companies’ investments and sales to the PRC. This dynamic is similar to how the existence of nuclear verification systems allows more nations to build peaceful nuclear power plants without causing fear of weapons proliferation.  

Empowers safety/regulatory minimums, increasing security for all

This framework could also empower governments to enforce safety minimums. Without government-enforced safety precautions, companies are incentivized to skirt safety lest they fall behind their competitors. This also applies at the international level; as nations race to acquire advanced AI capabilities and compete for high-end research investments, governments are incentivized to “race to the bottom”, reducing regulations in order to speed up development and attract research lab investments. A verification framework could empower international agreements on safety minimums, setting a common floor for all and averting unsafe development norms. (Notably, such a development could slow the “race to the bottom” even if only a small number of countries agreed to join this verification framework. Today, countries have an incentive to sacrifice safety for speed due to lack of knowledge of whether their rivals have done the same. If even a few important countries joined the framework, those outside the framework could feel more confident in the safety behavior of those within, reducing their uncertainty and their incentive to preemptively cut down on safety.)

Respects privacy & sovereignty

This framework does not require AI developers to disclose sensitive, proprietary information about their training models, nor does it require monitoring of individual’s private computing devices (it only monitors large training centers). Participant countries must give their continued consent by allowing inspectors to access their chips’ snapshots and training process transcripts. A country or data center could refuse compliance at any point, which ensures this process respects national sovereignty. If the prover follows the verifier's steps, rule-violation is unlikely to go undetected. If the prover does not comply with verifiers, this will itself be cause for suspicion, as is the case with the nuclear facility verification processes. Participation should be driven by a universal interest in ensuring the international community upholds safety norms in advanced AI development. 

 

Limitations & Weaknesses

We conclude by noting some potential weaknesses of this proposal. These may also serve as ideas for further work on the subject of developing credible verification systems. Like with our security threats, we divide our framework’s weaknesses into three major areas of concern: 

1. Only Monitors Larger Compute Centers. 

This framework only calls for monitoring of large-scale compute clusters. This proposal also does not prevent the training of smaller models, which don’t require large quantities of compute to train. These may still have concerning capabilities such as facial recognition, weapons targeting, or misinformation risk. 

2. Pre-existing models and chips. 

This framework only prevents the training of new large models going forward; it does not prevent application of models that have already been created, nor does it necessarily monitor chips and data centers that were already on the market prior to the implementation of this framework. 

What’s more, this system only seeks to prevent the development-- not the propagation-- of illegal models. If a model were somehow created outside of this monitoring system, it could be easily copied and used. 

3. Dual capability. 

We are also concerned about the dual capability of some microchips, especially in the future. As AI computation becomes cheaper, microchips meant for applications such as in medical technology could be repurposed for training harmful AI models. This framework would need to evaluate not only the advertised training capabilities of chips, but also potential dual capability, ideally ensuring these programs are not illegally transferred to use at an unauthorized training center. 

The framework does partially account for this, as verifiers can randomly sample chips at known training centers to ensure that all chips were obtained from verified sources in recorded transactions.

 

Conclusion

We believe that this framework has potential as a means for verifying compliance with international regulations on AI development, empowering governments to pursue beneficial agreements on AI development and substantially improving the AI safety/governance landscape. We hope that this post increases the visibility of this proposal and sparks further discussion of its feasibility and improvement. 

Written by Damin Curtis & Alexander M. Wyckoff

Comments2


Sorted by Click to highlight new comments since:

Cool post; researching these issues seems like one of the most important things in AI governance to me!

Some questions I have (for future research) are:

  1. How hard is it to distinguish approved from unapproved training runs with these snapshots that the chips would provide? Is this just about establishing that the length of the training run is below a certain threshold, or does it assess whether the training run follows a previously submitted recipe that was approved to be safe by an authorizing body? 
  2. How long would it take to implement these mechanisms at the hardware level, and who would have to be on board to make this happen? (E.g., if the US govt simply passed legislation that prohibited future chip innovation unless these machanisms are installed, would that be enough to get it done?)

Thanks for your comment & questions! These are great questions for further research. I don't know enough to comment on the first question. But as for the second, we're lucky that right now, the advanced chip supply chain has multiple tight bottlenecks, and is largely controlled by US-allied advanced democracies (Taiwan, Korea, Japan, Netherlands, UK, US, etc). This is part of why the US was able to effectively cut off China's access to obtaining the most advanced chips. So there is a window of opportunity, where the most important countries could agree to require their companies to implement this framework, and require certain buyers to comply with the framework as well. Countries generally can require their companies to manufacture a certain way, and can also set import/export restrictions on chips to ensure transactions are compliant. 

Curated and popular this week
 ·  · 25m read
 · 
Epistemic status: This post — the result of a loosely timeboxed ~2-day sprint[1] — is more like “research notes with rough takes” than “report with solid answers.” You should interpret the things we say as best guesses, and not give them much more weight than that. Summary There’s been some discussion of what “transformative AI may arrive soon” might mean for animal advocates. After a very shallow review, we’ve tentatively concluded that radical changes to the animal welfare (AW) field are not yet warranted. In particular: * Some ideas in this space seem fairly promising, but in the “maybe a researcher should look into this” stage, rather than “shovel-ready” * We’re skeptical of the case for most speculative “TAI<>AW” projects * We think the most common version of this argument underrates how radically weird post-“transformative”-AI worlds would be, and how much this harms our ability to predict the longer-run effects of interventions available to us today. Without specific reasons to believe that an intervention is especially robust,[2] we think it’s best to discount its expected value to ~zero. Here’s a brief overview of our (tentative!) actionable takes on this question[3]: ✅ Some things we recommend❌ Some things we don’t recommend * Dedicating some amount of (ongoing) attention to the possibility of “AW lock ins”[4]  * Pursuing other exploratory research on what transformative AI might mean for animals & how to help (we’re unconvinced by most existing proposals, but many of these ideas have received <1 month of research effort from everyone in the space combined — it would be unsurprising if even just a few months of effort turned up better ideas) * Investing in highly “flexible” capacity for advancing animal interests in AI-transformed worlds * Trying to use AI for near-term animal welfare work, and fundraising from donors who have invested in AI * Heavily discounting “normal” interventions that take 10+ years to help animals * “Rowing” on na
 ·  · 3m read
 · 
About the program Hi! We’re Chana and Aric, from the new 80,000 Hours video program. For over a decade, 80,000 Hours has been talking about the world’s most pressing problems in newsletters, articles and many extremely lengthy podcasts. But today’s world calls for video, so we’ve started a video program[1], and we’re so excited to tell you about it! 80,000 Hours is launching AI in Context, a new YouTube channel hosted by Aric Floyd. Together with associated Instagram and TikTok accounts, the channel will aim to inform, entertain, and energize with a mix of long and shortform videos about the risks of transformative AI, and what people can do about them. [Chana has also been experimenting with making shortform videos, which you can check out here; we’re still deciding on what form her content creation will take] We hope to bring our own personalities and perspectives on these issues, alongside humor, earnestness, and nuance. We want to help people make sense of the world we're in and think about what role they might play in the upcoming years of potentially rapid change. Our first long-form video For our first long-form video, we decided to explore AI Futures Project’s AI 2027 scenario (which has been widely discussed on the Forum). It combines quantitative forecasting and storytelling to depict a possible future that might include human extinction, or in a better outcome, “merely” an unprecedented concentration of power. Why? We wanted to start our new channel with a compelling story that viewers can sink their teeth into, and that a wide audience would have reason to watch, even if they don’t yet know who we are or trust our viewpoints yet. (We think a video about “Why AI might pose an existential risk”, for example, might depend more on pre-existing trust to succeed.) We also saw this as an opportunity to tell the world about the ideas and people that have for years been anticipating the progress and dangers of AI (that’s many of you!), and invite the br
 ·  · 12m read
 · 
I donated my left kidney to a stranger on April 9, 2024, inspired by my dear friend @Quinn Dougherty (who was inspired by @Scott Alexander, who was inspired by @Dylan Matthews). By the time I woke up after surgery, it was on its way to San Francisco. When my recipient woke up later that same day, they felt better than when they went under. I'm going to talk about one complication and one consequence of my donation, but I want to be clear from the get: I would do it again in a heartbeat. Correction: Quinn actually donated in April 2023, before Scott’s donation. He wasn’t aware that Scott was planning to donate at the time. The original seed came from Dylan's Vox article, then conversations in the EA Corner Discord, and it's Josh Morrison who gets credit for ultimately helping him decide to donate. Thanks Quinn! I met Quinn at an EA picnic in Brooklyn and he was wearing a shirt that I remembered as saying "I donated my kidney to a stranger and I didn't even get this t-shirt." It actually said "and all I got was this t-shirt," which isn't as funny. I went home and immediately submitted a form on the National Kidney Registry website. The worst that could happen is I'd get some blood tests and find out I have elevated risk of kidney disease, for free.[1] I got through the blood tests and started actually thinking about whether to do this. I read a lot of arguments, against as well as for. The biggest risk factor for me seemed like the heightened risk of pre-eclampsia[2], but since I live in a developed country, this is not a huge deal. I am planning to have children. We'll just keep an eye on my blood pressure and medicate if necessary. The arguments against kidney donation seemed to center around this idea of preserving the sanctity or integrity of the human body: If you're going to pierce the sacred periderm of the skin, you should only do it to fix something in you. (That's a pretty good heuristic most of the time, but we make exceptions to give blood and get pier