Lessons learned from talking to >100 academics about AI safety

mariushobbhahn

Comments 19

Sorted by

New & upvoted

Marius -- very helpful post; thank you.

Your observations ring true -- I've talked about AI safety issues over the last 6 years with about 30-50 academic faculty, and taught a course for 60+ undergraduate students (mostly psych majors) that includes two weeks of discussion on AI safety. I think almost everything you said sounds similar to my experiences.

Additional observations:

A) The moral, social, and political framing of AI safety issues matters for getting people interested, and these often need adjusting to the other person's or group's ideology. Academics who lean left politically often seem more responsive to arguments about algorithmic bias, technological unemployment, concentration of power, disparate impact, etc. Academics who lean libertarian tend to be more responsive to arguments about algorithmic censorship, authoritarian lock-in, and misuse of AI by the military-industrial complex. Conservative academics are often surprisingly interested in multi-generational, longtermism perspectives, and seem quite responsive to X risk arguments (insofar as conservatives tend to view civilization as rather fragile, transient, and in need of protection from technological disruptions.) So, it helps to have a smorgasbord of different AI safety concerns that different kinds of people with different values can relate to. There's no one-size-fits-all way to get people interested in AI safety.

B) Faculty and students outside computer science often don't know what they're supposed to do about AI safety, or how they can contribute. I interact mostly with behavioral and biological scientists in psychology, anthropology, economics, evolutionary theory, behavior genetics, etc. The brighter ones are often very interested in AI issues, and get excited when they hear that 'AI should be aligned with human values' -- because many of them study human values. Yet, when they ask 'OK, will AI safety insiders respect my expertise about the biological/psychological/economic basis of human values, and want to collaborate with me about alignment?', I have to answer 'Probably not, given the current culture of AI safety research, and the premium it places on technical machine learning knowledge as the price of admission'.

C) Most people -- including most academics -- come to the AI safety issue through the lens of the science fiction movies and TV series they've watched. Rather than dismissing these media sources as silly, misleading, and irrelevant to the 'serious work' of AI alignment, I've found it helpful to be very familiar with these media sources, to find the ones that really resonate with the person I'm talking with (whether it's Terminator 2, or Black Mirror, or Ex Machina, or Age of Ultron), and to kind of steer the conversation from that shared enthusiasm about sci fi pop culture towards current AI alignment issues.

Dušan D. Nešić (Dushan)

Thank you Geoffrey for an insightful contribution!

Regarding B - The project PIBBSS has done over the last fellowship (disclosure I now work there as Ops Director) has exactly this goal in mind, and we are keen to connect to non-AI researchers interested in doing AI safety research by utilizing their diverse professions. Do point them our way and tell them that the interdisciplinary field is in development. The fellowship is not open yet, and we are considering how to go forward, but there will likely be speaker series that would be relevant to these people.

Geoffrey Miller

Dušan - thanks for this pointed to PIBBSS, which I hadn't heard of before. I've signed up for the newsletter!

Linch

Many people were interested in how they could contribute. However, often they were more interested in reframing their specific topic to sound more like AI safety rather than making substantial changes to their research.

As stated, this doesn't sound like wanting to contribute to me.

mariushobbhahn

I think it's a process and just takes a bit of time. What I mean is roughly "People at some point agreed that there is a problem and asked what could be done to solve it. Then, often they followed up with 'I work on problem X, is there something I could do?'. And then some of them tried to frame their existing research to make it sound more like AI safety. However, if you point that out, they might consider other paths of contributing more seriously. I expect most people to not make substantial changes to their research though. Habits and incentives are really strong drivers".

David Mathers🔸

Did you manage to get anyone/many people to eventually agree that there is a non-negligible X-risk from A.I.?

mariushobbhahn

Probably not in the first conversation. I think there were multiple cases in which a person thought something like "Interesting argument, I should look at this more" after hearing the X-risk argument and then over time considered it more and more plausible.

But like I state in the post, I think it's not reasonable to start from X-risks and thus it wasn't the primary focus of most conversations.

mariushobbhahn

I don't think these conversations had as much impact as you suggest and I think most of the stuff funded by EA funders has decent EV, i.e. I have more trust in the funding process than you seem to have.

I think one nice side-effect of this is that I'm now widely known as "the AI safety guy" in parts of the European AIS community and some people have just randomly dropped me a message or started a conversation about it because they were curious.

I was working on different grants in the past but this particular work was not funded.

ChanaMessinger

Agree there's a bunch of value in just your presence bringing AI safety to the space / conversation where it makes it more salient to people.

Patrick Gruban 🔸

Thank you for the write-up. This was very helpful in getting a better understanding of the reactions from the academic field.

Don’t start with X-risk or alignment, start with a technical problem statement such as “uncontrollability” or “interpretability” and work from there.

Karl von Wendt makes a similar point in Let’s talk about uncontrollable AI where he argues "that we talk about the risks of “uncontrollable AI” instead of AGI or superintelligence". His aim is "to raise awareness of the problem and encourage further research, in particular in Germany and the EU". Do you think this could be a better framing? Do you think there is some framing that might be better suited for different cultural contexts, like in Germany, or does that seem neglectable?

mariushobbhahn

I have talked to Karl about this and we both had similar observations.

I'm not sure if this is a cultural thing or not but most of the PhDs I talked to came from Europe. I think it also depends on the actor in the government, e.g. I could imagine defense people to be more open to existential risk as a serious threat. I have no experience in governance, so this is highly speculative and I would defer to people with more experience.

ChanaMessinger

Fwiw, from talking to my dad, who works adjacent to ML people, I think
"c) they have worked with ML systems for many years and are generally more skeptical of everyone claiming highly capable AI. Their lived experience is just that hype cycles die and AI is usually much worse than promised. "

is doing a huge percent (maybe most?) of the work.

Jmd

Thanks for writing this Marius :), I am feeling a little bit motivated to see how these things would apply to having conversations with academics about biorisks. I suspect that many of your points will hold true - not being alarmist, focusing on technical aspects, higher-ups are more dismissive, and that I will learn lots of new things or at least get better at talking about this :)

Peter

This is really useful and makes sense - thanks for sharing your findings!

In my experience talking about an existing example of a problem like recommendation systems prioritizing unintended data points + example of meaningful AI capability usually gets people interested. Those two combined would probably be bad if we're not careful. Jumping to the strongest/worst scenarios usually makes people recoil because it's bad and unexpected and doesn't make sense why you're jumping to such an extreme outcome.

Do you have any examples of resources you were unaware of before? That could be useful to include as a section both for the actual resources and thinking about how to find such sources in the future.

mariushobbhahn

Reflects my experience!

The resources I was unaware of were usually highly specific technical papers (e.g. on some aspect of interpretability), so nothing helpful for a general audience.

ChanaMessinger

"Therefore, I think it is important to point this out (in a nice way!) whenever you spot this pattern"

Would be interested in how you do this / scripts you use, etc.

mariushobbhahn

Usually just asking a bunch of simple questions like "What problem is your research addressing?", "why is this a good approach to the problem?", "why is this problem relevant to AI safety?", "How does your approach attack the problem?", etc.

Just in a normal conversation that doesn't feel like an interrogation.

[anonymous]

I'd be interested to hear what arguments/the best case you've heard in your conversations about why the AI security folks are wrong and AGI is not, in principle, such a risk. I am looking for the best case against AGI X-Risk, since many professional AI researchers seem to hold this view, mostly without writing down their reasons which might be really relevant to the discussion

mariushobbhahn

I'm obviously heavily biased here because I think AI does pose a relevant risk.

I think the arguments that people made were usually along the lines of "AI will stay controllable; it's just a tool", "We have fixed big problems in the past, we'll fix this one too", "AI just won't be capable enough; it's just hype at the moment and transformer-based systems still have many failure modes", "Improvements in AI are not that fast, so we have enough time to fix them".

However, I think that most of the dismissive answers are based on vibes rather than sophisticated responses to the arguments made by AI safety folks.

Comments

Lessons learned from talking to >100 academics about AI safety

Executive summary

Findings

Takeaways

Things we/I did - long version

Findings - long version

People are open to chat about AI safety

I learned something from the discussions

Intentional vs unintentional harms

It depends on the career stage

Misunderstandings and vague concepts

People dislike alarmism

People are interested in the technical aspects

People want to know how they can contribute

People know that doing AI safety research is a risk to their academic career

Explain don’t convince

It has gotten much easier

Conclusion