Assessing the Dangerousness of Malevolent Actors in AGI Governance: A Preliminary Exploration

Callum Hinchcliffe; Richard Annilo

Comments 4

Sorted by

New & upvoted

Callum Hinchcliffe

This was a great experience and I learnt a lot:

Choosing a research topic is a whole research project in itself
The writing phase takes much longer than the earlier phase of working out what to write (more than twice as long)
Co-working on research with one other person is great. It's very motivating and you learn a lot from each other. You have faster feedback loops and so can make a better outcome sooner.
This kind of research-writing-co-working is very mentally tiring (at first I couldn't do more than 4hrs per day)

We really wanted to complete the project in a tight timeframe. I actually posted this 2 weeks after we finished because it was the first chance I had.

Some reflections:

I think that the amount of time we set aside was too short for us, and we could still have made worthwhile improvements with more time to reflect, such as:

Choosing an easier topic for our first research project
Doing more further reading
I think the section on Risk-conducive preferences (RCPs) is not important enough to warrant the amount of words it is taking up
Many sentences could be re-written to improve the wording, and I don't think 'Factors of malevolent actors' is a very good heading.

(I'll come back and reply to this comment with more of my own reflections if I think of more and get more time in the next day or two) (edit: formatting)

Jim Buhler

Interesting, thanks for sharing your thoughts on the process and stuff! (And happy to see the post published!) :)

Vasco Grilo🔸

Nice post!

In order to decrease and prevent suffering

Not sure whether you intended to give examples of this in the table.

The following figures (Figure 1 and Figure 2) show how our work fits together with governance proposals.

This is a nitpick, but I would find the diagrams easier to read if the "bad outcome" was at the bottom, such that the direction of causality was from top to bottom.

Callum Hinchcliffe

Nice post!

Thank you! Since it's (my) first post, it's helpful to have some positive encouragement.

We actually intended not to give examples of those.
That's useful feedback on the diagram, thanks.

Comments

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·2w ago·Curated 6d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

137

Let's taboo the V-word

lincolnq·3d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·17h ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

^{^}

By malevolent actor in this context, we mean someone with preferences conducive to risk, not that they actually wish to do evil.

^{^}

Jim Buhler gives some reasons for focussing on this area

^{^}

For simplicity, we just give one, but in reality, actors could use a portfolio of approaches to try to gain access.

^{^}

Jim Buhler frames the issue around preventing the existence of some AGI that has one of these “x-risk conducive preferences”, like intrinsically valuing punishment/destruction/death, and gives some ways for such an AGI to come about. But since these ways initially require an actor to have some risk-conducive preference, we focus on actors.

^{^}

If the AGI is like an oracle (like a generally intelligent simulator, like a multi-modal GPT-6), then the agent could use it to achieve its preferences, and the extent to which the agent has those preferences is only relevant for what the simulator is likely to say no to. But if the AGI is very agentic (like Auto-GPT or some RL model) then it will have those preferences.

^{^}

As described in What We Owe The Future, Chapter 6: Collapse (Will MacAskill 2022)

Example	Actor	Preference	Likely method of accessing AGI^[3]	Outcome
1	Chinese state government	Socialism with Chinese characteristics	Builds it	Value lock-in (nationally)
2	Russia	Win a war	Espionage, blackmailing and corruption	Global destruction
3	Doomsday cult	Extinction	Hacking	Extinction
4	Nationalist terrorist group	Destruction of a nation	Infiltration	Large scale destruction, increased global destabilisation

RCP	Reason	Examples
Extinction: Preference for humanity to go extinct	To stop humanity’s destruction of other species and the environment	Because if humans will not go extinct themselves quickly, they will make Earth uninhabitable, making all of life go extinct. Because they do not believe humans are able to all together in perpetuity live in environmentally sustainable ways. Because they do not see human life as any more valuable than the lives of other species. Because they are in a doomsday cult
Extinction: Preference for humanity to go extinct	In order to decrease and prevent suffering
Value Lock-in: Preference for certain values to be upheld indefinitely	Because the values held are assumed to be perfect	Religious extremists Some authoritarian governments
Irrecoverable civilisation collapse^[6]: Preference for removal of civilisation in a way that is irrecoverable (even if preferences change in the future)	Because civilisation causes social and environmental problems	(No well-known examples that don’t include value lock-in)

RCP	Conducive to	Reasons	Examples
Preference to radically reduce human population	Irrecoverable civilisation collapse	Worries of environmental effects Worries of overpopulation	Anti-capitalist terrorist groups
Preference to structure civilisation in a certain way	Value lock-in	Religious dogma	Religious extremists Some governments
Preferences for conflict (CSPs)	Extinction by global war	Anger	War-mongering nationstates Other terrorist groups
Preferences for removal of civilisation	Irrecoverable civilisation collapse	Worries about the environment Belief that civilisation causes more social issues, e.g. Anarcho-primitivism	Some forms of anti-globalization movements, advocating for self-sufficiently at a local community level Anti-capitalist terrorist groups Groups holding preferences to return to tribal or agricultural societies Some forms of extreme religious fundamentalism Any religious traditions with strong connections to nature and Earth

Assessing the Dangerousness of Malevolent Actors in AGI Governance: A Preliminary Exploration

Key Takeaways

Executive summary

Background

What kind of scenarios involving malevolent actors are we talking about here?

Factors of malevolent actors

Risk-conducive preferences (RCPs)

Neglected malevolent actor scenarios

Conclusion

Further Reading:

Acknowledgements