This note was written as part of a research avenue that I don’t currently plan to pursue further. It’s more like work-in-progress than Forethought’s usual publications, but I’m sharing it as I think some people may find it useful.
There have been many proposals for an international AI project to manage risks from advanced AI. Something that’s missing from these proposals to date (including my own) is the idea of differential AI development, and in particular, the importance of differentially accelerating helpful AI capabilities.[1]
Some AI capabilities pose major risks (e.g. agentic superintelligence with wide real-world powers; LLMs that can provide instructions to manufacture bioweapons). But others are generally societally beneficial (e.g. AI for cancer screening), and others are actively helpful for addressing the risks posed by other AI capabilities (e.g. AI for forecasting). We want to limit those that pose major risks, permit those that are generally beneficial, and encourage those that are actively helpful for addressing other risks.
However, existing proposals for an international AI project usually propose that any frontier AI development (often defined above a certain compute threshold) takes place under the auspices of the international project, which would prohibit the development of the most-helpful AI capabilities as well as the most-dangerous AI capabilities.
Instead, I think proposals should probably: (i) be more surgical, focused on limiting the most-dangerous capabilities and (ii) try to actively encourage the most-helpful capabilities.
I think that the ability to do AI R&D (such as ML research and engineering, and chip design) is the most worrying capability.
I think the primary challenge that arises from AI comes from the fact that, once AI can fully automate AI R&D, then progress in AI capabilities likely becomes extremely fast, following a super-exponential progress curve. It’s only because of this that you quickly move from pre-AGI that is easy to control to superintelligence that might be very difficult to control. It’s only because of this that you have extremely rapid technological change that results in a large number of non-alignment related challenges, with little time to respond. And it’s primarily because of this that AI could lead to intense concentration of power, where a small lead in capabilities can turn into a decisive advantage over rivals.
For this reason, I’d suggest that any international project should have a monopoly on the training of AI only that is both above a certain FLOP threshold AND that is aiming at producing AI that can meaningfully automate ML research and engineering or chip design, or produce other potentially-catastrophic technologies like engineered pathogens.
There are different ways of implementing this proposal. On one model, companies would need permission to do training runs over the FLOP threshold, with permission granted if the lab is trusted, agrees to oversight, and agrees not to try to meaningfully improve the automation of AI R&D. An alternative is that there is no process for granting permission, but that it’s illegal to make an AI system, trained with more than the FLOP threshold, that meaningfully improves automation of AI R&D. The former is more restrictive; the latter is riskier.
This might seem hard to enforce. But, because the FLOP threshold is high and such large training runs are so costly, these restrictions would only apply to a handful of actors. This means that fairly intense oversight would be feasible: for example, requiring capability audits from the international project at regular intervals throughout training, or requiring some international project supervisors to be employed at the company.
What’s more, the incentives for companies to violate this agreement would be weak: any attempt to escape oversight would be very risky and likely to fail (such as via detection or whistleblowers); and the penalties for violating the agreement could be severe (such as no longer being able to train further AI models, or even jail time); and such companies could still make major profits via AI that cannot do AI R&D. For this reason, I think this looks enforceable even if one cannot precisely specify which capabilities are prohibited (in just the same way that financial fraud is illegal even though the law cannot precisely specify all the conditions under which fraud occurs).
“Helpful” here refers in particular to helpfulness for governments, companies, and broader society to respond to risks posed by rapid AI tech progress.
Some helpful AI capabilities include:
We could call the set of such capabilities “artificial wisdom” rather than “artificial intelligence”.
It would be highly desirable if we could get narrow superintelligence in these domains (just as we have narrow superintelligence in Chess and Go) before the point at which we have more generally capable, or more dangerous, AI systems.
Governments could choose to deliberately incentivise work on helpful AI capabilities by (i) giving grants or subsidies to companies that are producing helpful AI capabilities; (ii) awarding prizes to companies that produce helpful AI capabilities; (iii) creating Advance Market Commitments, agreeing in advance to pay a certain amount in advance for access to AI with certain capabilities (perhaps as measured on technical benchmarks); (iv) directly building such capabilities as part of an international project.
The approach I’ve suggested both limits only the most-dangerous capabilities, and actively encourages the most-helpful capabilities.
I think that this approach has a number of advantages over the blanket-ban approach:
This approach isn’t without its challenges, including:
Depending on how AI progresses, it might turn out that these challenges are too difficult to get around. But given the magnitude of the potential benefits from helpful AI capabilities (including for existential risk reduction), it could well be worth increasing the risk from dangerous capabilities a little, to increase the benefits from helpful ones.
Thanks to many people for comments and discussion, and to Rose Hadshar for help with editing.
This article was created by Forethought. See the original on our website.
This applies the idea of “differential technological development” — the idea of trying to reduce the risks, and capitalise on the benefits, of new technology by influencing the sequence in which new technologies are developed — to AI capabilities.
Executive summary: The author argues that international AI projects should adopt differential AI development by tightly restricting the most dangerous capabilities, especially AI that automates AI R&D, while actively accelerating and incentivizing “artificial wisdom” systems that help society govern rapid AI progress.
Key points:
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.