ARC Evals: Responsible Scaling Policies

Zach Stein-Perlman

ARC Evals: Responsible Scaling Policies

Zach Stein-Perlman

3 min readSep 28, 2023

Comments 1

Sorted by

New & upvoted

[anonymous]

1. I like the idea of concrete (publicly stated) pre-defined measures, since it lowers the risk of moving safety standards/targets. It would be a substantial improvement over what we have today, especially if there's coordination between top labs.

2. The graph shows jumps where y increases at a rate greater than x. Has this ever happened before? What we've seen so far is more of a mirrored L. First we move along the x-axis, later (to a smaller degree) along the y-axis.

3. The line between the red and blue area should be heavily blurred/striped. This might seem like an aesthetic nitpick, but we can't map the edges of what we've never seen. Our current perceptions are thought up by human minds that are innately tuned to empathize with and predict human behavior, which unwittingly leads to thinking along the lines: "If I was an AI and thought like a psychopathic human, what would I do?". We don't do this explicitly, but that's what we're actually doing. The real danger lies in the unknown unknowns, which cannot be plotted on a graph a priori. At the moment, we're assuming progression of dangers/capabilities in a "logical order", i.e. the way humans gain abilities/learn things. If the order is thrown around, so are the warning signs.

Comments

More from the author

220

FLI open letter: Pause giant AI experiments

Zach Stein-Perlman·3y ago·3m read

134

Maybe Anthropic's Long-Term Benefit Trust is powerless

Zach Stein-Perlman·2y ago·3m read

128

Introducing AI Lab Watch

Zach Stein-Perlman·2y ago·2m read

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·4d ago·Curated 1d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

151

Let's taboo the V-word

lincolnq·4d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·2d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

EA Organisation Updates thread: July 2026

Dane Valerie·3d ago·1m read

Help us launch AI safety university groups by referring potential founders

Jason Chin🔸·14h ago·4m read

Save the date: Swiss AI Safety Days 2026 (7-8 November, ETH Zurich)

Andre Santos 🔸, patrickwidmann, mariuswenk·16h ago·1m read