Hide table of contents

Crossposted to LessWrong.

TL/DR: We developed an interactive guide to AI safety arguments, based on researcher interviews. Go check it out! Please leave a comment and let us know what you think.

Introduction

Vael Gates interviewed 97 AI researchers on their beliefs about the future. These interviews were quite broad, covering researchers’ hopes and concerns for the field in general, as well as advanced AI safety specifically. Full transcripts are now available.

Lukas Trötzmüller interviewed 22 EAs about their opinions on AI safety. They were pre-selected to be skeptical either about the classical “existential risk from AI” arguments, or about the importance of work on AI safety. The focus of this research was on distilling specific arguments and organizing them.

This guide builds mostly on Vael’s conversations, and it aims to replicate the interview experience. The goal is not necessarily to convince, but to outline the most common arguments and counterarguments, and help readers gain a deeper understanding of their own perspective.

Design Goals

Our previous research has uncovered a wide range of arguments that people hold about AI safety. We wanted to build a resource that talks about the most frequently mentioned of those.

Instead of a linear article (which would be quite long), we wanted to create an interactive format. As someone goes through the guide, they should be presented with the content that is most relevant to them.

Even though we had a clear target audience of AI researchers in mind, the text turned out to be surprisingly accessible to a general audience. This is because most of the classical AI risk arguments do not require in-depth knowledge of present-day AI research.

Format

Our guide consists of a collection of short articles that are linked together. There are five main chapters:

  1. When will Generally Capable AI Systems be developed?
  2. The Alignment Problem (“AI systems don’t always do what we want”)
  3. Instrumental Incentives
  4. Threat Models (“how might AI be dangerous?”)
  5. Pursuing Safety Work

Each chapter begins with an argument, after which the reader is asked for their agreement or disagreement. If they disagree, they can select between several objections that they may have.

Each one of these objections links to a separate article, presenting possible counterarguments, that they can optionally read. Most of the objections and counterarguments are directly taken from Vael’s interviews with AI researchers.

After reading the counterargument, the reader can indicate whether they find it plausible, then is guided back to the introduction of the chapter. The reader may advance to the next main chapter at any time.

The following diagram illustrates this structure:

Agreement and disagreement is shown visually in the table of contents.

The “Threat Models” chapter is meant as a short interlude and does not present any counterarguments - we might expand upon that in the future.

Polling and Commenting

It is also possible to leave comments on individual pages. These are displayed publicly at the end of the guide.

On the last page, you can also see a visual summary of your responses, and how they compare to the average visitor:

Requesting Feedback

We are releasing this within the EA and alignment communities. We would like to gather additional feedback before presenting it to a wider audience. If you have feedback or suggestions, please leave a comment below. We welcome feedback on the structure as well as the language and argumentation.

Creating Interactive Guides for Other EA Cause Areas

Our goal was to enable anyone to put complex arguments into an interactive format - without requiring experience in web development. The guide is written in a Google Doc. It contains all the pages separated by headlines, and some special code for defining the structure. Our system converts this document into an interactive website, and updates can be made through the document.

Nothing about the interactive system we developed is special to AI safety. This could be used for other purposes - for example: an introduction to longtermism, the case for bio security, or explaining ethical arguments. If you would like to use this for your project, please get in touch with Lukas.

Related Projects

The Stampy project aims to provide a comprehensive AI safety FAQ. We have given the Stampy team permission to re-use our material as they see fit.

Conclusion & Downside Risk

If you haven’t opened the guide yet, go ahead and check it out. We are really interested in your comments. How is the language and the argumentation? Are we missing important arguments? Could we make this easier to use or improve the design? Would you actually recommend this as a resource to people, if not why?

Looking at the result of our work, we notice positives and negatives.

Vael likes that the content is pretty clear and comprehensive.

Lukas likes the visual presentation and the overall look & feel. However, he has some reservations about the level of rigour in the argumentation - there are definitely parts that could be made more solid.

We both like the interactive format. We are unsure whether this is the best way to talk to people, from a fieldbuilding perspective. The reason is this: Even though the guide is interactive, it is not a replacement for a real conversation. People only have a limited number of options to choose from, and then they get lots of text trying to counter their arguments. Indeed, we wonder if this might create resistance in some readers, and if the downside risks might be worse than the upsides.

Contributions

The guide was written by Lukas Trötzmüller, with guidance and additional writing from Vael Gates.

Technical implementation by Michael Keenan and Lukas Trötzmüller.

Copy Editing: David Spearman, Stephen Thomas.

We would like to thank everyone who gave feedback.

This work was funded by the AI Safety Field Building Hub.

Comments5


Sorted by Click to highlight new comments since:

Thanks for this! I liked it and found it helpful for understanding the key arguments for AI risk.

It also felt more engaging than other presentations of those arguments because it is interactive and comparative.

I think that the user experience could be improved a little but that it's probably not worth making those improvements until you have a larger number of users.

One change you could make now is to mention the number of people who have completed the tool (maybe on the first page) and also change the outputs on the conclusion page to percentages.

How do you imagine using this tool in the future? Like what are some user stories (e.g., person x wants to do y, so they use this)?

Here are some quick (possibly bad) ideas I have for potential uses (ideally after more testing):

  • As something that advocates like Robert Miles can refer relevant people to
  • As part of a longitudinal study where a panel of say 100 randomly selected AI safety researchers do this annually, and you report on changes in their responses over time.
  • Using a similar approach/structure, with new sections and arguments, to assess levels of agreement and disagreement with different AI safety research agendas within the AI Safety community and to identify the cruxes
  • As a program that new AI Safety researchers, engineers and movement builders do to understand the relevant arguments and counterarguments.

I also like the idea of people making something like this for other cause areas and appreciate the effort invested to make that easy to do.

I tried to comment on the page https://ai-risk-discussions.org/perspectives/test-before-deploying, but instead got an error message telling me to use the contact mail.

Thanks for the bug report, checking into it now. 

Update: Michael Keenan reports it is now fixed!

Curated and popular this week
 ·  · 9m read
 · 
This is Part 1 of a multi-part series, shared as part of Career Conversations Week. The views expressed here are my own and don't reflect those of my employer. TL;DR: Building an EA-aligned career starting from an LMIC comes with specific challenges that shaped how I think about career planning, especially around constraints: * Everyone has their own "passport"—some structural limitation that affects their career more than their abilities. The key is recognizing these constraints exist for everyone, just in different forms. Reframing these from "unfair barriers" to "data about my specific career path" has helped me a lot. * When pursuing an ideal career path, it's easy to fixate on what should be possible rather than what actually is. But those idealized paths often require circumstances you don't have—whether personal (e.g., visa status, financial safety net) or external (e.g., your dream org hiring, or a stable funding landscape). It might be helpful to view the paths that work within your actual constraints as your only real options, at least for now. * Adversity Quotient matters. When you're working on problems that may take years to show real progress, the ability to stick around when the work is tedious becomes a comparative advantage. Introduction Hi, I'm Rika. I was born and raised in the Philippines and now work on hiring and recruiting at the Centre for Effective Altruism in the UK. This post might be helpful for anyone navigating the gap between ambition and constraint—whether facing visa barriers, repeated setbacks, or a lack of role models from similar backgrounds. Hearing stories from people facing similar constraints helped me feel less alone during difficult times. I hope this does the same for someone else, and that you'll find lessons relevant to your own situation. It's also for those curious about EA career paths from low- and middle-income countries—stories that I feel are rarely shared. I can only speak to my own experience, but I hop
 ·  · 1m read
 · 
This morning I was looking into Switzerland's new animal welfare labelling law. I was going through the list of abuses that are now required to be documented on labels, and one of them made me do a double-take: "Frogs: Leg removal without anaesthesia."  This confused me. Why are we talking about anaesthesia? Shouldn't the frogs be dead before having their legs removed? It turns out the answer is no; standard industry practice is to cut their legs off while they are fully conscious. They remain alive and responsive for up to 15 minutes afterward. As far as I can tell, there are zero welfare regulations in any major producing country. The scientific evidence for frog sentience is robust - they have nociceptors, opioid receptors, demonstrate pain avoidance learning, and show cognitive abilities including spatial mapping and rule-based learning.  It's hard to find data on the scale of this issue, but estimates put the order of magnitude at billions of frogs annually. I could not find any organisations working directly on frog welfare interventions.  Here are the organizations I found that come closest: * Animal Welfare Institute has documented the issue and published reports, but their focus appears more on the ecological impact and population decline rather than welfare reforms * PETA has conducted investigations and released footage, but their approach is typically to advocate for complete elimination of the practice rather than welfare improvements * Pro Wildlife, Defenders of Wildlife focus on conservation and sustainability rather than welfare standards This issue seems tractable. There is scientific research on humane euthanasia methods for amphibians, but this research is primarily for laboratory settings rather than commercial operations. The EU imports the majority of traded frog legs through just a few countries such as Indonesia and Vietnam, creating clear policy leverage points. A major retailer (Carrefour) just stopped selling frog legs after welfar
 ·  · 10m read
 · 
This is a cross post written by Andy Masley, not me. I found it really interesting and wanted to see what EAs/rationalists thought of his arguments.  This post was inspired by similar posts by Tyler Cowen and Fergus McCullough. My argument is that while most drinkers are unlikely to be harmed by alcohol, alcohol is drastically harming so many people that we should denormalize alcohol and avoid funding the alcohol industry, and the best way to do that is to stop drinking. This post is not meant to be an objective cost-benefit analysis of alcohol. I may be missing hard-to-measure benefits of alcohol for individuals and societies. My goal here is to highlight specific blindspots a lot of people have to the negative impacts of alcohol, which personally convinced me to stop drinking, but I do not want to imply that this is a fully objective analysis. It seems very hard to create a true cost-benefit analysis, so we each have to make decisions about alcohol given limited information. I’ve never had problems with alcohol. It’s been a fun part of my life and my friends’ lives. I never expected to stop drinking or to write this post. Before I read more about it, I thought of alcohol like junk food: something fun that does not harm most people, but that a few people are moderately harmed by. I thought of alcoholism, like overeating junk food, as a problem of personal responsibility: it’s the addict’s job (along with their friends, family, and doctors) to fix it, rather than the job of everyday consumers. Now I think of alcohol more like tobacco: many people use it without harming themselves, but so many people are being drastically harmed by it (especially and disproportionately the most vulnerable people in society) that everyone has a responsibility to denormalize it. You are not likely to be harmed by alcohol. The average drinker probably suffers few if any negative effects. My argument is about how our collective decision to drink affects other people. This post is not