The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns

simeon_c

The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns

simeon_c

3 min readApr 22, 2023

Comments

More from the author

110

AGI Timelines in Governance: Different Strategies for Different Timeframes

simeon_c·3y ago·12m read

AGI x Animal Welfare: A High-EV Outreach Opportunity?

simeon_c·3y ago·1m read

Responsible Scaling Policies Are Risk Management Done Wrong

simeon_c·2y ago·26m read

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·3d ago·Curated 1h ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·2w ago·Curated 6d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

143

Let's taboo the V-word

lincolnq·3d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Recent opportunities to take action

EA Organisation Updates thread: July 2026

Dane Valerie·2d ago·1m read

Applications open for new supported programs on the GWWC donation platform (2026)

Aidan Whitfield🔸, Giving What We Can🔸·1h ago·3m read

Free, client-funded daily 1:1 accountability coaching for people active in the EA ecosystem (GoalsWon)

Guillermo D'Anna·20h ago·1m read

Paul_Christiano

I've seen this a few times but I'm skeptical about taking this rhetorical approach.

I think a large fraction of AI risk comes from worlds where the ex ante probability of catastrophe is more like 50% than 100%. And in many of those worlds, the counterfactual impact of individual developers move faster is several times smaller (since someone else is likely to kill us all in the bad 50% of worlds). On top of that, reasonable people might disagree about probabilities and think 10% in a case where I think 50%.

So putting that together they may conclude that racing faster increases the risk of doom by 0.03% for every 1% that it increases your share of the future (whether measured in profit, or reduced opportunity for misuse of frontier systems). And that's just not going to be compelling.

I think you will have an extremely hard time convincing people that the race is obviously suicidal. I know some folks are confident about this, but I don't really find that position credible today and I've spent a very long time thinking about the problem and engaging with pessimistic people. Maybe it will become obvious tomorrow, and maybe it's OK for some people to be betting their chips on that, but I don't want to get lumped in with them (because I think their political position is going to become increasingly untenable over time).

On the flip side, I don't think it's controversial to say: "If the probability of AI takeover is 10%, AI developers need to stop racing."

It's a tiny bit unclear what that means, so to be a bit more precise: "If people didn't stop AI development until things look significantly more dangerous than they do today, then then the probability of takeover would be more than 10%." I don't think that's true today, but will likely become true.

Introduction

In this post, I will discuss the cruel trade-off between misuse concerns and X-risks (accidental risks) regarding racing. It is important to note that this post does not advocate for one risk being more plausible than the other, nor does it make any normative statements. The purpose is to analyze the outcome of a particular worldview.

In one sentence, the key claim is: “If an entity in power to develop AGI is mostly worried about misuse AND think that they’re the best entity (morally speaking) in their reference class, it is good to race. This is bad for accidental risks”.

Epistemic status: I’ve written up this post pretty quickly, after a conversation where it didn’t seem clear to someone. I'm confident about the general claim, less about specific claims.

Why If One Is Worried about Misuse They Should Race?

a. AGI is the most powerful thing ever created. Those who will control that will have an unprecedented level of control over everyone else. So if you end up with someone with bad intents controlling that, that’s probably the end for everyone else. It also leaves plenty of room for scenarios like stable authoritarianism.
b. Thus, if you’re at the head of an AGI lab and you’re genuinely worried about some other AGI labs’ CEO’s ethics, you would want to ensure that they don't develop AGI before you do. You could be worried about individuals getting power or certain cultures or governments getting power (the US fearing China, or a lab fearing another)

a. In a world where weak AGIs accessible to a wide range of people enable the creation of bioweapons or facilitate massive cyberattacks, there is a danger of reaching a point where everyone can kill everyone else but there is not yet a powerful "defensive AGI" to prevent this via global deterence & surveillance.
b. If you view your lab as responsible and if you’re primarily concerned about misuse, it makes sense to race towards the development of a defensive AGI of a sufficient power to avoid the dangerous scenario mentioned above.

a. Preventing jailbreaks is hard. So you may want to reach AGI as early as possible with as few deployments as possible. So the earlier you get to AGI, the better the world is.
b. Hence you should race as fast as possible^[1] internally and deploy as little as possible externally to get the necessary capital to be able to reach the goalpost as early as possible.

And to be clear, here the cause of the racing is beliefs on the world, not secretely evil intentions.

Some Reasons Why Racing is Bad for Accidents

On the other hand, for those who’re worried about accidental risks and who are pessimistic about our chances of solving those issues in a short amount of time (“alignment is hard”), racing is one of, if not the worst thing:

Racing lets very little time to address problems as they arise. They’ll incentivize labs to patch problems in the cheapest way that works, e.g. fine-tuning. This is differentially more worrying for accident than for misuse because while you can control misuse with access restrictions, you can’t control accidents.

Racing doesn’t let time to understand what’s going on inside the models. It also incentivizes to build the simplest AGI that works and is not easily misused rather than something which has strong foundations for making sure it doesn’t cause accidents. Not understanding what’s going on is most worrying for those concerned about deception scenarios.

Racing leads to cut corners internally on red teaming for accidents, making sure the model is not deceptive etc.

It’s important to note that except that point (we should race really fast), most other measures are the same to solve misuse problems and accidental risks problems, i.e. auditing, licensing, developing models in close source, getting compute governance right, working to make models robust to jailbreaks etc.

^{^}

"as fast as possible" includes constraints like "make your model non trivial to break to prevent misuse". The main problem is just that preventing misuse requires a priori much less engineering and intervention on the model itself than preventing accidents.