Effective Persuasion For AI Alignment Risk

Brian Lui

In this post, I propose the idea that the communication around current AI alignment risk could be improved. I then suggest a possible improvement which is easy for people to adopt individually, and could also be adopted as a whole. This is not meant as an authoritative recommendation, but more of a "putting it out there" discussion topic.

Outline

My experience of learning about AI alignment, and why it didn’t persuade me
The current approach used to evangelize AI alignment awareness, and why it is ineffective
Why we arrived at this ineffective approach
An alternative approach that’s more effective
Proposed action: use the effective approach to spread awareness of AI risk, not the current ineffective approach

Epistemic statuses:

Certain for my personal experience

Very high for the average person's experience

High for the ineffectiveness of current communication method

Medium-high for the effectiveness of proposed communication method

My experience of learning about AI alignment

As a regular person who is new to EA, I would like to share my experience of learning about AI alignment risk. The first thing I came across was the “paperclip maximizer” idea. Then, I read about superintelligence explosion “foom”. These are the top results when searching for “AI existential risk”, so I think it’s a representative sample.

After reading about these two topics, I rejected AI risk. Not in the sense of “I read the pros and cons and decided that the risk was overrated”, but in the sense of “this is stupid, and I’m not even going to bother thinking about it”.

I did this because these scenarios are too far from my model of the world and did not seem credible. Since I had no reason to believe that this line of enquiry would lead to useful learning, I abandoned it.

The current persuasion approach is ineffective

There are two types of people we want to persuade. Firstly, people with power and influence, and secondly, the public. The first category is important because they have leverage to create outcomes, and the second category is important because it is the main determinant of how favorable the “playing field” is.

I should be much easier to persuade than the average member of the public, because I was interested in learning more about AI risk! But even I bounced off. It will surely be an insurmountable climb for regular people who are disinterested in the topic. This is a disaster of messaging if the goal is to persuade the public.

Is it a good approach for persuading the first category of people, those with power and influence? We might expect it to be more effective, because they have stronger analytical capabilities and would be more receptive to logical analysis. Unfortunately, this misunderstands power and influence. This might work if we’re targeting rich people, because analytical power is correlated with wealth generation. But people with power and influence generally do NOT respond well to analytic arguments. Instead, they respond to arguments focusing on power and influence, which is completely unaddressed by the paperclip foom approach. This line of persuasion may even backfire, by reducing its perceived credibility.

The current persuasion approach is correct!

What’s vexing is that persuading people by talking about paperclip foom is clearly correct. As Carl Sagan said, “extraordinary claims require extraordinary evidence”. Taking the outside view, AI alignment risk is an extraordinary claim, because it’s so alien to our general understanding of the world. So if that’s the case, we need to provide extraordinary evidence.

Unfortunately, extraordinary evidence consists of claiming catastrophic outcomes and then showing how they could come about. This is ineffective because the reader is required to agree to a huge inferential distance. In most cases, the reader will not even read the evidence or evaluate the evidence before dismissing it.

This is an “Always Be Closing” approach. Make a big ask, if you encounter resistance, you can then retreat slightly and make a sale. A small ask such as “AI alignment is potentially dangerous” is feeble, instead, let’s alarm people by announcing a huge danger, getting them to pay attention. Attention, Interest, Desire, Action!

It’s a pity that this approach crashes and burns at the Interest stage.

An alternative, effective approach

Another sales maxim is “don’t sell past the close”. Just do enough to get the sale - in this case, the “sale” is to get people thinking that AI alignment is a serious problem that we should care about. What would this approach look like?

Recently Nintil posted his thoughts about AGI. His approach basically claims that the “Always Be Closing” maximalist approach is unnecessary. This is encapsulated by a quote within the blogpost:

“Specifically, with the claim that bringing up MNT [i.e. nanotech] is unnecessary, both in the "burdensome detail" sense and "needlessly science-fictional and likely to trigger absurdity heuristics" sense. (leplen at LessWrong, 2013)”

Nintil makes a case that AGI is a concern even without the “needlessly science-fictional” stuff. I found this interesting and worth serious consideration, because it didn’t require a large inferential leap and so it was credible. “Here are some assumptions that you probably already have. I’ll show how this leads to bad outcomes” is a much more effective approach, and it caused me to re-evaluate the importance of AI alignment.

We should use the effective approach, not the ineffective approach

AI alignment persuasion does not require paperclip foom discussions, and these are likely counterproductive. Since an effective explanation exists, I think it would be helpful to focus on this line when evangelizing AI risk.

You may be tempted to discount my opinion because I am an average person, but I think my viewpoint is useful because the extremely high intelligence of the median effective altruist leads to some blindspots. A recent example in the news is the failure to elect Carrick Flynn, an effective altruist running for a congressional seat in Oregon. It wasted $10 million and backfired as “a big optical blunder, one that threatened to make not just Bankman-Fried but all of EA look like a craven cover for crypto interests.”

Being highly intelligent makes it harder to visualize how a regular person perceives things. I claim that my perspective here is valuable for EA, since we’re considering the question of how to persuade regular people. Please give this serious consideration!

Finally, this would also help other EA cause areas. All effective altruists contribute to the EA commons by working hard on various causes. Maximalist AI alignment discussion is burning the commons because it makes EAs look “weird”, which reduces the influence of EA as a whole. Therefore, not just AI alignment EAs, but all EAs have a stake in moving to a more effective communications method for AI risk.

5 Reactions

More posts like this

Comments7

Sorted by

New & upvoted

Click to highlight new comments since: Today at 8:27 PM

Miranda_ZhangAug 10 20223

I'm always keen to think about how to more effectively message EA ideas, but I'm not totally sure what the alternative, effective approach is. To clarify, do you think Nintil's argument is basically the right approach? If so, could you pick out some specific quotes and explain why/how they are less inferentially distant?

Jose Luis RiconAug 11 202211

Hi, I'm the author of Nintil.com (We met at Future Forum :)

Essentially, an essential rule in argumentation is that the premises have to be more plausible than the conclusion. For many people, foom scenarios, nanotech, etc makes them switch off.

I have this quote

Here I want to add that the lack of criticism is likely because really engaging with these arguments requires an amount of work that makes it irrational for someone who disagrees to engage. I make a similar analogy here with homeopathy: Have you read all the relevant homeopathic literature and peer-reviewed journals before dismissing it as a field? Probably not. You would need some reason to believe that you are going to find evidence that will change your mind in that literature. In the case of AI risk, the materials required to get someone to engage with the nanotech/recursive self-improvement cases should include sci-fi free cases for AI risk (Like the ones I gesture at in this post) and perhaps tangible roadmaps from our current understanding to systems closer to Drexlerian nanotech (Like Adam Marblestone's roadmap).

Basically, you can't tell people "Nanotech is super guaranteed to happen, check out this book from Drexler". If they don't agree with that, they won't read it, there is too much commitment. Instead, one should just start from premises the average person agrees with (speed, memory, strategic planning) and get to the "AI risk is worth taking seriously". That is a massive step up from them laughing at it. Then one can argue about timelines and how big of a risk it is, but first one has to bring them into the conversation, and my arguments accomplishes (I hope) that.

Miranda_ZhangAug 11 20223

This makes a lot of sense, thanks so much!

I think I agree with this point, but in my experience I don't see many AI safety people using these inferentially-distant/extreme arguments in outreach. That's just my very limited anecdata though.

Brian LuiAug 11 20221

Excellent! This is the core of the point that I wanted to communicate. Thanks for laying it out so clearly.

Brian LuiAug 11 20221

Great! Yes. The key part I think is this:

Advanced nanotechnology, often self-replicating
Recursive self-improvement (e.g. an intelligence explosion)
Superhuman manipulation skills (e.g. it can convince anyone of anything)
There are exceptions to this, like the example I discuss in Appendix C.
I found that trying to reason about AGI risk scenarios that rely on these is hard because I keep thinking that these possibly run into physical limitations that deserve more thought before thinking they are plausible enough to substantially affect my thinking. It occurred to me it would be fruitful to reason about AGI risk taking these options off the table to focus on other reasons one might suspect AGIs would have overwhelming power:
Speed¹ (The system has fast reaction times)
Memory (The system could start with knowledge of all public data at the time of its creation, and any data subsequently acquired would be remembered perfectly)
Superior strategic planning (There are courses of actions that might be too complex for humans to plan in a reasonable amount of time, let alone execute)

My view is that normal people are unreceptive to arguments that focus on the first three (advanced nanotechnology, recursive self-improvement, superhuman manipulation skills). Leave aside whether these are probable or not. Just talking about it is not going to work, because the "ask" is too big. It would be like going to rural Louisiana and talking at them about intersectionality.

Normal people are receptive to arguments based on the last three (speed, memory, superior strategic planning). Nintil then goes on to make an argument based only on these ideas. This is persuasive. The reason is that it's easy for people to accept all three premises:

Computers are very fast. This accords with people's experience.
Computers can store a lot of data. People can understand this, too.
Superior strategic planning might be slightly trickier, but it's still easy to grasp, because people know that computers can beat the strongest humans at chess and go.

Miranda_ZhangAug 11 20223

Thanks, this makes sense! Yeah, this is why many arguments I see start at a more abstract level, e.g.

We are building machines that will become vastly more intelligent than us (c.f. superior strategic planning), and it seems reasonable that then we won't be able to predict/control them
Any rational agent will strategically develop instrumental goals that could make it hard for us to ensure alignment (e.g., self-preservation -> can't turn them off)

Brian LuiAug 12 20221

I might have entered at a different vector (all online) so I experienced a different introduction to the idea! If my experience is atypical, and most people get the "gentle" introduction you described, that is great news.