AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI

Center for AI Safety; Dan H

AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI

Comments 3

Sorted by

New & upvoted

Unless I'm misunderstanding something important (which is very possible!) I think Bengio's risk model is missing some key steps.

In particular, if I understand the core argument correctly, it goes like this:

1. (Individually) Human-level AI is possible.
2. At the point where individual AIs are human-level intelligence, they will collectively be superhuman in ability, due to various intrinsic advantages of being digital.
3. It's possible to build such AIs with autonomous goals that are catastrophically or existentially detrimental to humanity. (Bengio calls them "rogue AIs")
4. Some people may choose to actually build rogue AIs.
5. Thus, (some chance of) doom.

As stated, I think this argument is unconvincing. Because for superhuman rogue AIs to be catastrophic for humanity, they need to not only be catastrophic for 2023_Humanity but also for humanity even after we also have the assistance of superhuman or near-superhuman AIs.

If I was trying to argue for Bengio's position, I would probably go down one (or more) of the following paths:

Alignment being very hard/practically impossible: If alignment is very hard and nobody can reliably build a superhuman AI that's sufficiently aligned that we trust it to stop rogue AI, then the rogue AI can cause a catastrophe unimpeded
1. Note that this is not just an argument for the possibility of rogue AIs, but an argument against non-rogue AIs.
Offense-defense imbalance: Perhaps it's easier in practice to create rogue AIs to destroy the world than to create non-rogue AIs to prevent the world's destruction.
1. Vulnerable world: Perhaps it's much easier to destroy the world than prevent its destruction
  1. Toy example: Suppose AIs with a collective intelligence of 200 IQ is enough to destroy the world, but AIs with a collective intelligence of 300 IQ is needed to prevent the world's destruction. Then the "bad guys" will have a large head start on the "good guys."
2. Asymmetric carefulness: Perhaps humanity will not want to create non-rogue AIs because most people are too careful about the risks. Eg maybe we have an agreement among the top AI labs to not develop AI beyond capabilities level X without alignment level Y, or something similar in law. Suppose, further, in this world that normal companies mostly follow the law and at least one group building rogue AIs don't.
  1. In a sense, you can view this as a more general case of (1). In this story, we don't need AI alignment to be very hard, just for humanity to believe it is.

It's possible Bengio already believes 1) or 2), or something else similar, and just thought it was obvious enough to not be worth noting. But at least among my conversations with AI risk skeptics, among the ones who think AGI itself is possible/likely, the most common objection is why rogue AIs will be able to overpower not just humans but also other AIs as well.

aog

That’s a good point! Joe Carlsmith makes a similar step by step argument, but includes a specific step about whether the existence of rogue AI would lead to catastrophic harm. Would have been nice to include in Bengio’s.

Carlsmith: https://arxiv.org/abs/2206.13353

Denis

"for superhuman rogue AIs to be catastrophic for humanity, they need to not only be catastrophic for 2023_Humanity but also for humanity even after we also have the assistance of superhuman or near-superhuman AIs."

This is a very interesting argument, and definitely worthy of discussion. I realise you have only sketched your argument here, so I won't try to poke holes in it.

Briefly, I see two objections that need to be addressed:

1. One fear is that the rogue AIs may well be released on 2023_Humanity or a version very close to that due to the exponential capability growth we could see if we create an AI that is able to develop better AI itself. Net, it may be enough that it would be catastrophic for 2023_Humanity.

2. The challenge of developing aligned superhuman AIs which would defend us against rogue AIs while themselves offering no threat is not trivial, and I'm not sure how many major labs are working on that right now, or if they can even write a clear problem-statement about what such an AI system should be.
From first principles, the concern is that this AI would necessarily be more limited (it needs to be aligned and safe) than a potential rogue AI, so why should we believe we could develop such an AI faster and enable it to stay ahead of potential rogue AIs?

Far from disagreeing with your comment, I'm just thinking about how it would work and what tangible steps need to be taken to create the kind of well-aligned AIs which could protect humanity.

Comments

More from the author

431

Statement on AI Extinction - Signed by AGI Labs, Top Academics, and Many Other Notable Figures

Center for AI Safety·3y ago·1m read

Modeling the impact of AI safety field-building programs

Center for AI Safety·3y ago·8m read

$250K in Prizes: SafeBench Competition Announcement

Center for AI Safety·2y ago·2m read

Curated and popular this week

Hard-to-reverse decisions destroy option value

Stefan_Schubert·9y ago·Curated 1d ago·14m read

This post is co-authored with Ben Garfinkel. It is cross-posted from the CEA blog. A PDF version can be found here. Summary: Some strategic decisions available to the effective altruism m...

Introducing Impact List: a ranking of philanthropists by expected lives saved

Elliot Olds·2d ago·6m read

TL;DR: I'm releasing a website that ranks philanthropists according to EA principles and research, and allows users to re-rank the list using their own assumptions. I'd like feedback and help making it better. I'd especially like ideas for how to make the results more trustworthy. Funding may be available. Crossposted to LessWrong. ...

If you're agentic, work in biosecurity

sharmaayushmaan🔸·6d ago·7m read

Disclaimer: Although I work on the Groups Team at CEA, I’m writing this in a personal capacity, and this post does not constitute an endorsement by CEA. Agency - the realisation that you really can just do things. TL;DR Biosecurity needs people (of any background) who are agentic and have a high execution velocity and track record....

Recent opportunities to take action

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·4d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·5d ago·3m read

Starting an EA group @ SUNY Binghamton

micahzarin·3d ago·1m read

Linch

1. (Individually) Human-level AI is possible.
2. At the point where individual AIs are human-level intelligence, they will collectively be superhuman in ability, due to various intrinsic advantages of being digital.
3. It's possible to build such AIs with autonomous goals that are catastrophically or existentially detrimental to humanity. (Bengio calls them "rogue AIs")
4. Some people may choose to actually build rogue AIs.
5. Thus, (some chance of) doom.

Alignment being very hard/practically impossible: If alignment is very hard and nobody can reliably build a superhuman AI that's sufficiently aligned that we trust it to stop rogue AI, then the rogue AI can cause a catastrophe unimpeded
1. Note that this is not just an argument for the possibility of rogue AIs, but an argument against non-rogue AIs.
Offense-defense imbalance: Perhaps it's easier in practice to create rogue AIs to destroy the world than to create non-rogue AIs to prevent the world's destruction.
1. Vulnerable world: Perhaps it's much easier to destroy the world than prevent its destruction
  1. Toy example: Suppose AIs with a collective intelligence of 200 IQ is enough to destroy the world, but AIs with a collective intelligence of 300 IQ is needed to prevent the world's destruction. Then the "bad guys" will have a large head start on the "good guys."
2. Asymmetric carefulness: Perhaps humanity will not want to create non-rogue AIs because most people are too careful about the risks. Eg maybe we have an agreement among the top AI labs to not develop AI beyond capabilities level X without alignment level Y, or something similar in law. Suppose, further, in this world that normal companies mostly follow the law and at least one group building rogue AIs don't.
  1. In a sense, you can view this as a more general case of (1). In this story, we don't need AI alignment to be very hard, just for humanity to believe it is.

AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI

AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI

Yoshua Bengio makes the case for rogue AI

How to screen AIs for extreme risks

Funding for Work on Democratic Inputs to AI

Links