Policy ideas for mitigating AI risk

Thomas Larsen

Policy ideas for mitigating AI risk

Thomas Larsen

12 min readSep 16, 2023

121

Comments 15

Sorted by

New & upvoted

tlevin

Thanks for writing this up!

I hope to write a post about this at some point, but since you raise some of these arguments, I think the most important cruxes for a pause are:

It seems like in many people's models, the reason the "snap back" is problematic is that the productivity of safety research is much higher when capabilities are close to the danger zone, both because the AIs that we're using to do safety research are better and because the AIs that we're doing the safety research on are more similar to the ones in the danger zone. If the "snap back" reduces the amount of calendar time during which we think AI safety research will be most productive in exchange for giving us more time overall, this could easily be net negative. On the other hand, a pause might just "snap back" to somewhere on the capabilities graph that's still outside the danger zone, and lower than it would've been without the pause for the reasons you describe.
A huge empirical uncertainty I have is: how elastic is the long-term supply curve of compute? If, on one extreme end, the production of computing hardware for the next 20 years is set in stone, then at the end of the pause there would be a huge jump in how much compute a developer could use to train a model, which seems pretty likely to produce a destabilizing/costly jump. At the other end, if compute supply were very responsive to expected AI progress and a pause would mean a big cut to e.g. Nvidia's R&D budget and TSMC shelved plans for a leading-node fab or two as a result, the jump would be much less worrying in expectation. I've heard that the industry plans pretty far in advance because of how much time and money it takes to build a fab (and how much coordination is required between the different parts of the supply chain), but it seems like at this point a lot of the future expected revenue to be won from designing the next generations of GPUs comes from their usefulness for training huge AI systems, so it seems like there should at least be some marginal reduction in long-term capacity if there were a big regulatory response.

Greg_Colbourn ⏸️

Any realistic pause would only be lifted once there is a consensus on a potential solution to x-safety (or at least, say, full solutions to all jailbreaks, mechanistic interpretability and alignment up to the (frozen) frontier). If compute limits are in place during the pause, they can gradually be ratcheted up, with evals performed on models trained at each step, to avoid any such sudden snap back.

Zach Stein-Perlman

Good post.

You want to give a regulator the power to decide which large training runs are safe. I think this policy's effects depend tremendously on the regulator—if it's great at distinguishing safe stuff from dangerous stuff and it makes great choices, the policy is great; if not, it's not. I feel pretty uncertain about how good it would be, and I suspect some disagreements about this policy are actually disagreements about how good the regulator would be. It feels hard to evaluate a proposal that leaves so much up to the regulator.

(Maybe it would help to have a concrete illustrative line to help readers get a sense of what you think the regulator would ideally do, like "LLMs and bio stuff with training compute > 1e24 FLOP are banned, everything else is not." Ideally the regulator would be more sophisticated than that, of course.)

Tom McGrath

This is a good insight - I definitely feel like lack of trust (due partly to uncertainty) in the proposed regulator is a big blocker for me feeling at all on board with pause/regulation more broadly. Especially relevant given that I think the original CAIP proposals missed the mark by some margin. I acknowledge that Thomas is writing in his personal capacity, but I think that the link is still relevant.

Zach Stein-Perlman

I think the original CAIP proposals missed the mark by some margin

Their original criteria for "frontier AI" was very broad, but an expansive definition makes sense if you think the regulator will be great—you give it lots of discretion to reject unsafe stuff but it can quickly approve safe stuff. I think disagreements about CAIP's central proposal come down to different intuitions about how good the regulator would be—I think Thomas thinks the regulator would quickly approve almost all clearly-safe stuff, so an expansive scope does little harm.

Tom McGrath

Yeah, this sounds right to me. At present I feel like a regulator would end up massively overrepresenting at least one of (a) the EA community and (b) large tech corporations with pretty obviously bad incentives.

Zach Stein-Perlman

Hmm, I don't see what goes wrong if the regulator overrepresents EA. And overrepresenting the major labs is suboptimal but I'd guess it's better than no regulation—it decreases multipolarity among labs and (insofar as major labs are relatively safe and want to require others to be safe) improves safety directly.

Tom McGrath

A regulator overrepresenting EA seems bad to me (not an EA) because:

I don't agree with a lot of the beliefs of the EA community on this subject and so I'd expect an EA-dominated regulator to take actions I don't approve of.
Dominance by a specific group makes legitimacy much harder.
The EA community is pretty strongly intertwined with the big labs so most of the concerns from there carry over.

I don't expect (1) to be particularly persuasive for you but maybe (2) and (3) are. I find some of the points in Ways I Expect AI Regulation To Increase X-Risk relevant to issues with overrepresentation of big labs. I think the overrepresentation of big labs would lead to a squashing of open-source, for instance, which I think is currently beneficial and would remain beneficial on the margin for a while.

More generally, I don't particularly like the flattening of specific disagreements on matters of fact (and thus subsequent actions) to "wants people to be safe"/"doesn't want people to be safe". I expect that most people who disagree about the right course of action aren't doing so out of some weird desire to see people harmed/replaced by AI (I'm certainly not) and it seems a pretty unfair dismissal.

Zach Stein-Perlman

OK.

Re "want to require others to be safe"—that was poorly worded, I meant wants to require everyone to follow specific safety practices they already follow, possibly to slow competitors in addition to safety reasons.

Tom McGrath

Cool, apologies if that came across a bit snarky (on rereading it does to me). I think this was instance N+1 of this phrasing and I'd gotten a bit annoyed by instances 1 through N which you obviously bear no responsibility for! I'm happy to have pushed back on the phrasing but hope I didn't cause offence.

A more principled version of (1) would be to appeal to moral uncertainty, or to the idea that a regulator should represent all the stakeholders and I worry than an EA-dominated regulator would fail to do so.

Aleksi Maunu

Naively I would trade a lot of clearly-safe stuff being delayed or temporarily prohibited for even a minor decrease in chance of safe-seeming-but-actually-dangerous stuff going through, which pushes me towards favoring a more expansive scope of regulation.

(in my mind the potential loss of decades of life improvements currently pale vs potential non-existence of all lives in the longterm future)

Don't know how to think about it when accounting for public opinion though, I expect a larger scope will gather more opposition to regulation, which could be detrimental in various ways, the most obvious being decreased likelihood of such regulation being passed/upheld/disseminated to other places.

Larks

Thanks for sharing this Thomas!

I would like to hear your thoughts on the centralisation of power this would cause. Historically it seems like multiple parties owning the means of production, and having wide latitude with how to employ this capital, has been a key driver of human progress and protector of liberty. In the event of TAI I worry that a powerful regulator like this would effectively centralise a huge degree of control over society and the economy in the hands of the regulator. For example, the regulator might adopt a definition of safety that incorporates political notions (e.g. what makes an output 'toxic'), resulting in only ideologically compliant firms being are allowed to run large models, and hence giving these firms allied to the administration a major advantage over competitors. Note that this is not regulatory capture, and hence I don't expect conflict of interest rules to resolve it.

Of course if the alternative is literally extinction then perhaps some people's answer is (to some degree) 'so be it'.

Zach Stein-Perlman

Small stuff:

(1) Watermarking

Required watermarking and traceability on advanced models, so that we can match AI outputs to specific AI models and developers.

Watermarking is mostly an open technical problem—there's no great existing best-practice that government can just require labs to implement. (I know you mean that government should require the limited stuff we know how to do.)

(2) Incident reporting

Some ideas for increasing the government’s visibility into AI development are

Government should also facilitate or require incident reporting.

(3) Overhang

If the pause threshold is in terms of FLOP rather than E-FLOP, the dotted green line and horizontal blue line should actually be slightly upward sloping.

(If the pause threshold is in terms of E-FLOP, then (a) the FLOP threshold needs to decrease over time and (b) capabilities still increase during the pause because of inference-time improvements in algorithms and increases in compute.)

(Also I want to flag that assuming progress is linear in log(E-FLOP) over time by default is a reasonable simplification for your purposes, but it is a simplification.)

Greg_Colbourn ⏸️

Great stuff Thomas.

I think a pause on AI progress wouldn’t be very helpful unless used in concert with other effective governance interventions, such as the ones that I have outlined above.

Agree.

A longer pause that lasts until we are confident that we have robust AI safety measures in place that allow for safe deployment would be helpful. I’m currently in favor of building the capacity of the world to create a long pause on AI.
As a result, I’m only excited about versions of a pause that don’t return to “AI progress as usual”, after the pause is over.

Yes, I don't think anyone is seriously proposing a fixed-expiry pause at this point (FLI's "6 month" letter was really just a foot-in-the-d̶o̶o̶r̶Overton-Window I think). Pause in my thinking is basically shorthand for "global indefinite pause of frontier AI development, until global consensus is reached on an x-safety solution (including solving the alignment problem, preventing misuse, and ensuring multi-agent coordination); including accepting that this may not be possible such that the pause becomes effectively permanent^[1]".

^{^}
but fear not we can still have a good future including all the nice things, it might just take a bit longer

utilistrutil

I'll be looking forward to hearing more about your work on whistleblowing! I've heard some promising takes about this direction. Strikes me as broadly good and currently neglected.

Comments

More from the author

Introducing the Center for AI Policy (& we're hiring!)

Thomas Larsen·2y ago·2m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·1w ago·Curated 1d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

166

The first video from Giving What We Can's new channel is out now!

JustinPortela·3d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·5d ago·2m read

This is a linkpost for Request for Proposals: Research and Applied Work on Digital Minds. I'm glad to announce a request for proposals for research and applied work on digital minds at Longview Ph...

Recent opportunities to take action

A huge way you can help pigs in 5-20 minutes (in the US)

ElliotTep·1d ago·1m read

Seeking feedback and collaborators for an AI welfare project

Juliana Grant·17h ago·2m read

PauseCon London '26: Applications now open

Jonathan@PauseAI·15h ago·1m read

tlevin

Thanks for writing this up!

I hope to write a post about this at some point, but since you raise some of these arguments, I think the most important cruxes for a pause are:

It seems like in many people's models, the reason the "snap back" is problematic is that the productivity of safety research is much higher when capabilities are close to the danger zone, both because the AIs that we're using to do safety research are better and because the AIs that we're doing the safety research on are more similar to the ones in the danger zone. If the "snap back" reduces the amount of calendar time during which we think AI safety research will be most productive in exchange for giving us more time overall, this could easily be net negative. On the other hand, a pause might just "snap back" to somewhere on the capabilities graph that's still outside the danger zone, and lower than it would've been without the pause for the reasons you describe.
A huge empirical uncertainty I have is: how elastic is the long-term supply curve of compute? If, on one extreme end, the production of computing hardware for the next 20 years is set in stone, then at the end of the pause there would be a huge jump in how much compute a developer could use to train a model, which seems pretty likely to produce a destabilizing/costly jump. At the other end, if compute supply were very responsive to expected AI progress and a pause would mean a big cut to e.g. Nvidia's R&D budget and TSMC shelved plans for a leading-node fab or two as a result, the jump would be much less worrying in expectation. I've heard that the industry plans pretty far in advance because of how much time and money it takes to build a fab (and how much coordination is required between the different parts of the supply chain), but it seems like at this point a lot of the future expected revenue to be won from designing the next generations of GPUs comes from their usefulness for training huge AI systems, so it seems like there should at least be some marginal reduction in long-term capacity if there were a big regulatory response.

^{^}

but fear not we can still have a good future including all the nice things, it might just take a bit longer

Policy ideas for mitigating AI risk

Policy ideas for mitigating AI risk

Executive Summary

The Strategic Landscape

Summary

A National Policy Proposal

Regulatory Body

Emergency Powers

Some Objections

Additional details

International AI Policy

Preventing Dangerous AI

Solving Safety

My thoughts on a pause