Some biases and selection effects in AI risk discourse

Tamsin Leake

This is a linkpost for https://carado.moe/biases-selection-effects-ai.html

These are some selection effects impacting what ideas people tend to get exposed to and what they'll end up believing, in ways that make the overall epistemics worse. These have mostly occured to me about AI discourse (alignment research, governance, etc), mostly on LessWrong. (They might not be exclusive to discourse on AI risk.)

(EDIT: I've reordered the sections in this post so that less people get stuck on what was the first section and so they a better chance of reading the other two sections.)

Outside-view is overrated

In AI discourse, outside-view (basing one's opinion on other people's and on (things that seem like) precedents), as opposed to inside-view (having an actual gears-level understanding of how things work), is being quite overrated for a variety of reasons.

There's the issue of outside-view double-counting, as in this comic I drew. When building an outside-view, people don't particularly check whether 10 people say the same thing because they came up with independently, or because 9 of them heard it from the 1 person who came up with it, and they themselves mostly stuck to outside-view.
I suspect that outside-view is being over-valued because it feels safe — if you just follow what you believe to be consensus and/or an authority, then it can feel less like it's "your fault" if you're wrong. You can't really just rely on someone else's opinion on something, because they might be wrong, and to know if they're wrong you need an inside-view yourself. And there's a fundamental sense in which developing your own inside-view of AI risk is contributing to the research, whereas just reusing what exists is neutral, and {reusing what exists + amplifying it based on what has status or memetic virulence} is doing damage to the epistemic commons, due to things like outside-view double-counting.
There's occasionally a tendency to try to adopt the positions that are held by authority figures/organizations in order to appeal to them, to get resources/status, and/or generally to fit in. (Similarly, be wary of the opposite as well — having a wacky opinion in order to get quirkyness/interestingness points.)
"Precedents"-based ideas are pretty limited — there isn't much that looks similar to {us building things that are smarter than us and as-flexible-as-software}. The comparison with {humans as mesa-optimizers relative to evolution} has been taken way outside of its epistemic range.

Arguments about P(doom) are filtered for nonhazardousness

Some of the best arguments for high P(doom) / short timelines that someone could make would look like this:

It's not that hard to build an AI that kills everyone: you just need to solve [some problems] and combine the solutions. Considering how easy it is compared to what you thought, you should increase your P(doom) / shorten your timelines.

But obviously, if people had arguments of this shape, they wouldn't mention them, because they make it easier for someone to build an AI that kills everyone. This is great! Carefulness about exfohazards is better than the alternative here.

But people who strongly rely on outside-view for their P(doom) / timelines should be aware that their arguments are being filtered for nonhazardousness. Note that this plausibly applies to other topics than P(doom) / timelines.

Note that beyond not-being-mentioned, such arguments are also anthropically filtered against: in worlds where such arguments have been out there for longer, we died a lot quicker, so we're not there to observe those arguments having been made.

Confusion about the problem often leads to useless research

People enter AI risk discourse with various confusions, such as:

What are human values?
Aligned to whom?
What does it mean for something to be an optimizer?
Okay, unaligned ASI would kill everyone, but how?
What about multipolar scenarios?
What counts as AGI, and when do we achieve that?

Those questions about the problem do not particularly need fancy research to be resolved; they're either already solved or there's a good reason why thinking about them is not useful to the solution. For these examples:

What are human values?

We don't need to figure out this problem, we can just implement CEV without ever having a good model of what "human values" are.
Aligned to whom?

The vast majority of the utility you have to gain is from {getting a utopia rather than everyone-dying-forever}, rather than {making sure you get the right utopia}.
What does it mean for something to be an optimizer?

Expected utility maximization seems to fully cover this. More general models aren't particularly useful to saving the world.
Okay, unaligned ASI would kill everyone, but how?

This does not particularly matter. If there is unaligned ASI, we just die, the way AI now just wins at chess; this is the only part that particularly matters.
What about multipolar scenarios?

They do a value-handshake and kill everyone together.
What counts as AGI, and when do we achieve that?

People keep mentioning definitions of AGI such as "when 99% of currently fully remote jobs will be automatable" or "for almost all economically relevant cognitive tasks, at least matches any human's ability at the task".

I do not think such definitions are useful, because I don't think these things are particularly related to how-likely/when AI will kill everyone. I think AI kills everyone before observing the event in either of those quotes — and even if it didn't, having passed those events doesn't particularly impact when AI will kill everyone. I usually talk about timelines until decisive strategic advantage (aka AI takes over the world) takes over, because that's what matters.

"AGI" should probably just be tabood at this point.

These answers (or reasons-why-answering-is-not-useful) usually make sense if you're familiar with rationality and alignment, but some people are still missing a lot of the basics of rationality and alignment, and by repeatedly voicing these confusions they cause people to think that those confusions are relevant and should be researched, causing lots of wasted time.

It should also be noted that some things are correct to be confused about. If you're researching a correlation or concept-generalization which doesn't actually exist in the territory, you're bound to get pretty confused! If you notice you're confused, ask yourself whether the question is even coherent/true, and ask yourself whether figuring it out helps save the world.

Matthew_BarnettDec 12 202312

What are human values?
We don't need to figure out this problem, we can just implement CEV without ever having a good model of what "human values" are.
Aligned to whom?
The vast majority of the utility you have to gain is from {getting a utopia rather than everyone-dying-forever}, rather than {making sure you get the right utopia}.
What does it mean for something to be an optimizer?
Expected utility maximization seems to fully cover this. More general models aren't particularly useful to saving the world.

For what it's worth, I have significant disagreements with basically all of your short replies to these basic questions, and I've been heavily engaged in AI alignment discussions for several years. So, I strongly disagree with your claim that these questions are "either already solved or there's a good reason why thinking about them is not useful to the solution", at least in the way you seem to think they have been solved.

Tamsin LeakeDec 12 20233

I feel like they're at least solved-enough that they're not particularly what should be getting focused on. I predict that in worlds where we survive, spending time on those question doesn't end up having cashed out to much value.

SummaryBotDec 13 20231

Executive summary: The post discusses three selection effects biasing AI risk discourse: overvaluing outside views, filtering arguments for safety, and pursuing useless research based on confusion.

Key points:

Overreliance on outside views like consensus opinions double counts evidence and feels safer than developing independent expertise.
Strong arguments for high extinction risk often look unsafe to share, so discourse misses hazardous insights.
Confusions about core issues lead researchers down useless paths instead of focusing on decisive factors.
Checking whether a question is coherent or helps save the world can avoid wasted effort.
Tabooing terms like AGI may help avoid distraction on irrelevant definitional debates.
Recognizing these selection effects can improve individual and collective epistemics.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

EA Forum Bot Site
EA Forum

Some biases and selection effects in AI risk discourse

4

Outside-view is overrated

Arguments about P(doom) are filtered for nonhazardousness

Confusion about the problem often leads to useless research

4

Reactions

More posts like this