How I Formed My Own Views About AI Safety

Neel Nanda

How I Formed My Own Views About AI Safety

Neel Nanda

17 min readFeb 27, 2022

Comments 11

Sorted by

New & upvoted

Miranda_Zhang

Thank you for this. I am a community-builder and I've definitely started emphasizing the importance of developing inside views to my group members. However, it seems like there may be domains where developing an inside view is relatively less important (e.g., algebraic geometers vs moral philosophers), because experts in that field appear to have better feedback loops. Given this, I'm curious whether you think community-builders might want to form inside views* on which areas to emphasize inside view formation for, to help communicate more accurately to our members?

*I'm not confident I'm describing an 'inside view.' Maybe this is something like, 'getting a sense of outside views across an array of domains?'

I found your post doubly useful because I've recently been exploring how I can form inside views, which I've found both practically and emotionally difficult. Not being familiar with the rationality or AI safety community, I was surprised by how much emphasis was placed on inside views and started feeling a bit like an imposter in the EA community. I definitely felt like it was "low-status" to not have inside views on the causes I prioritized, though I expect at least some of this was due to my own anxiety.

Being able to see how you tackled this is really useful, as it gives me another model for how I could develop inside views (particularly on AI risk, which is the first thing I'm working on). It also reinforces that a lot of people have more career flexibility than they think - and so, perhaps, it's okay if I haven't figured out whether I should switch from community building into AI safety research in the three months before I graduate!

Jamie B

Hey! I have been thinking about this a lot from the perspective of a confused community builder / pipeline strategist, too. I didn't get so far as Neel, it's been great to read this post before getting anywhere near finishing my thoughts. It captures a lot of the same things better than I had. Thanks for your comment too - definitely a lot of overlap here!

I have got as far as some ideas, here, and would love any initial thoughts before I try to write it up with more certainty?

First a distinction which I think you're pointing at - an inside view on what? The thing I can actually have an excellent inside view about as a (full-time) community builder is how community building works. Like, how to design a programme, how people respond to certain initiatives, what the likelihood certain things work are, etc.

Next, programmes that lead to working in industry, academic field building, independent research, etc, look different. How do I decide which to prioritise? This might require some inside view on how each direction changes the world (and interacts with the others), and lead to an answer on which I’m most optimistic about supporting. There is nobody to defer to here, as practitioners are all (rightly) quite bullish about their choice. Having an inside view on which approach I find most valuable will lead to quite concrete differences in the ultimate strategy I’m working towards or direction I’m pointing people in, I think.

When it comes to what to think about object-level work (i.e. how does alignment happen, technically), I get more hazy on what I should aim for. By statistical arguments, I reckon most inside views that exist on what work is going to be valuable are probably wrong. Why would mine be different? Alternatively, they might all be valuable, so why support just one. Or something in between. Either way, if I am doing meta work, it will probably be wrong to be bullish about my single inside view on 'what will go wrong'. I think I should aim to support a number of research agenda if I don't have strong reasons to believe some are wrong. I think this is where I will be doing most of my deferral, ultimately (and as the field shifts from where I left it).

However, understanding how valuable the object-level work is does seem important for deciding which directions to support (e.g. academia vs industry), so I’m a bit stuck on where to draw a kune. As Neel says, I might hope to get as far understanding what other people believe about their agenda and why - I always took this as "can I model the response person X would give, when considering an unseen question", rather than memorising person X's response to a number of questions.

I think where I am landing on this is that it might be possible to assume uniform prior over the directions I could take, and adjust my posterior by 'learning things' and understanding their models on both the direction-level and object-level, properly. Another thought I want to explore - is this something like a worldview diversification over directions? It feels similar, as we’re in a world where it ‘might turn out’ some agenda or direction was correct, but there’s no way of knowing that right now.

To confirm - I believe people doing the object-level work (i.e. alignment research) should be bullish about their inside view. Let them fight it out, and let expert discourse decide what is “right” or “most promising”. I think this amounts to Neel’s “truth-seeking” point.

Miranda_Zhang

Hey Jamie, thanks for this! Seems like you've thought about it quite a bit - probably more than I have - but here are my initial thoughts. Hope this is helpful to you; if so, maybe we should chat more!

First a distinction which I think you're pointing at - an inside view on what? [...] How do I decide which to prioritise? This might require some inside view on how each direction changes the world (and interacts with the others), and lead to an answer on which I’m most optimistic about supporting. There is nobody to defer to here, as practitioners are all (rightly) quite bullish about their choice. Having an inside view on which approach I find most valuable will lead to quite concrete differences in the ultimate strategy I’m working towards or direction I’m pointing people in, I think.

Agree! When I first wrote my comment, I labelled this a 'meta-inside view:' an inside view on what somebody (probably you, but possibly others like your group members) need to form inside views on. But this might be too confusing compared to less jargon-y phrases like, 'prioritizing what you form an inside view on first' or something.

Regardless, I think we are capturing the same issue here - although I don't use 'issue' in a negative sense. In my ideal world, community-builders would form pretty different views on causes to prioritize because this would help increase intellectual diversity and the discovery of the 'next-best' thing to work on. That doesn't mean, however, that there couldn't be some sort of guidance for how community-builders might go about figuring out what to prioritize.

I think this is where I will be doing most of my deferral, ultimately (and as the field shifts from where I left it).

Yeah, I think this is the status quo for any field that one isn't an expert on. Community-builders may be experts on community-building, but that doesn't extend to other domains, hence the need for deferral. Perhaps the key difference here is that community-builders need to be extra aware of the ever-shifting landscape and stay plugged-in, since their advice may directly impact the 'next generation' of EAs.

However, understanding how valuable the object-level work is does seem important for deciding which directions to support (e.g. academia vs industry), so I’m a bit stuck on where to draw a kune. As Neel says, I might hope to get as far understanding what other people believe about their agenda and why - I always took this as "can I model the response person X would give, when considering an unseen question", rather than memorising person X's response to a number of questions.

Hmm, I think you're right that developing an inside view for a specific cause would influence the levers that you think are most important (which has effects on your CB efforts, etc.) - but I'm not sure this has much implications for what CBs should do. My prior is that it is very unlikely that there are any causes where only a handful of levers and skillsets would be relevant, such that I would feel comfortable suggesting that people rely more on personal fit to figure out their careers once they've chosen a cause area. However, I acknowledge that there is definitely more need in certain causes (e.g., software engineers for AI safety): I just don't think that the CB level is the right level to apply this knowledge. I would feel more comfortable having cause-specific recruiters (c.f., University community building seems like the wrong model for AI safety).

I definitely agree on the latter point. I see community-builders as both building and embodying pipelines to the EA community! As the 'point-of-entry' for many potential EAs, I think it is sufficient for CBs to be able to model the mainstream views for core cause areas. I expect that the most talented CBs will probably have developed inside views for a specific cause outside of CB, but that doesn't seem necessary to me for good CB work.

I think where I am landing on this is that it might be possible to assume uniform prior over the directions I could take, and adjust my posterior by 'learning things' and understanding their models on both the direction-level and object-level, properly. Another thought I want to explore - is this something like a worldview diversification over directions? It feels similar, as we’re in a world where it ‘might turn out’ some agenda or direction was correct, but there’s no way of knowing that right now.

Oh, I'm a huge fan of worldview diversification! I don't currently have thoughts on starting with a non-/uniform prior ... I am, honestly, somewhat inclined to suggest that CBs 'adapt' a bit to the communities in which they are working. That is, perhaps what should partly affect a CB's prioritization re: inside view development is the existing interests of their group. For example, considering the Bay Area's current status as a tech hub, it seems pretty important for CBs in the Bay Area to develop inside views on, say, AI safety - even if AI safety may not be what they consider the most pressing issue in the entire world. What do you think?

To confirm - I believe people doing the object-level work (i.e. alignment research) should be bullish about their inside view. Let them fight it out, and let expert discourse decide what is “right” or “most promising”.

Also completely agree here. : )

Sam Clarke

Nice post! I agree with ~everything here. Parts that felt particularly helpful:

There are even more reasons why paraphrasing is great than I thought - good reminder to be doing this more often
The way you put this point was v crisp and helpful: "Empirically, there’s a lot of smart people who believe different and contradictory things! It’s impossible for all of them to be right, so you must disagree with some of them. Internalising that you can do this is really important for being able to think clearly"
The importance of "how much feedback do they get from the world" in deferring intelligently

One thing I disagree with: the importance of forming inside views for community epistemic health. I think it's pretty important. E.g. I think that ~2 years ago, the arguments for the longterm importance of AGI safety were pretty underdeveloped; that since then lots more people have come out with their insidee views about it; and that now the arguments are in much better shape.

Sam Clarke

Also, nitpick, but I find the "inside view" a more confusing and jargony way of just saying "independent impressions" (okay, also jargon to some extent, but closer to plain English), which also avoids the problem you point out: inside view is not the opposite of the Tetlockian sense of outside view (and the other ambiguities with outside view that another commenter pointed out).

Neel Nanda

The complaint that it's confusing jargon is fair. Though I do think the Tetlock sense + phrase inside view captures something important - my inside view is what feels true to me, according to my personal best guess and internal impressions. Deferring doesn't feel true in the same way, it feels like I'm overriding my beliefs, not like how they world is.

This mostly comes under the motivation point - maybe, for motivation, inside views matter but independent impressions don't? And people differ on how they feel about the two?

Sam Clarke

I'm still confused about the distinction you have in mind between inside view and independent impression (which also have the property that they feel true to me)?

Or do you have no distinction in mind, but just think that the phrase "inside view" captures the sentiment better?

Neel Nanda

Inside view feels deeply emotional and tied to how I feel the world to be, independent impression feels cold and abstract

Neel Nanda

One thing I disagree with: the importance of forming inside views for community epistemic health. I think it's pretty important. E.g. I think that ~2 years ago, the arguments for the longterm importance of AGI safety were pretty underdeveloped; that since then lots more people have come out with their insidee views about it; and that now the arguments are in much better shape.

I want to push back against this. The aggregate benefit may have been high, but when you divide it by all the people trying, I'm not convinced it's all that high.

Further, that's an overestimate - the actual question is more like 'if the people who are least enthusiastic about it stop trying to form inside views, how bad is that?'. And I'd both guess that impact is fairly heavy tailed, and that the people most willing to give up are the least likely to have a major positive impact.

I'm not confident in the above, but it's definitely not obvious

Sam Clarke

Thanks - good points, I'm not very confident either way now

Yonatan Cale

Linking: Taboo Outside View [Lesswrong post, 292 karma]

Comments

More from the author

122

Advice for Sending Cold Messages to Busy People at EAG

Neel Nanda, Jemima·1y ago·6m read

121

Good Research Takes are Not Sufficient for Good Strategic Takes

Neel Nanda·1y ago·4m read

Interpretability Will Not Reliably Find Deceptive AI

Neel Nanda·1y ago·9m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·1w ago·Curated 4d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

191

The first video from Giving What We Can's new channel is out now!

JustinPortela·6d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

108

Let's taboo the V-word

lincolnq·1d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Recent opportunities to take action

Miranda_Zhang

First a distinction which I think you're pointing at - an inside view on what? [...] How do I decide which to prioritise? This might require some inside view on how each direction changes the world (and interacts with the others), and lead to an answer on which I’m most optimistic about supporting. There is nobody to defer to here, as practitioners are all (rightly) quite bullish about their choice. Having an inside view on which approach I find most valuable will lead to quite concrete differences in the ultimate strategy I’m working towards or direction I’m pointing people in, I think.

I think this is where I will be doing most of my deferral, ultimately (and as the field shifts from where I left it).

However, understanding how valuable the object-level work is does seem important for deciding which directions to support (e.g. academia vs industry), so I’m a bit stuck on where to draw a kune. As Neel says, I might hope to get as far understanding what other people believe about their agenda and why - I always took this as "can I model the response person X would give, when considering an unseen question", rather than memorising person X's response to a number of questions.

I think where I am landing on this is that it might be possible to assume uniform prior over the directions I could take, and adjust my posterior by 'learning things' and understanding their models on both the direction-level and object-level, properly. Another thought I want to explore - is this something like a worldview diversification over directions? It feels similar, as we’re in a world where it ‘might turn out’ some agenda or direction was correct, but there’s no way of knowing that right now.

To confirm - I believe people doing the object-level work (i.e. alignment research) should be bullish about their inside view. Let them fight it out, and let expert discourse decide what is “right” or “most promising”.

Also completely agree here. : )

How I Formed My Own Views About AI Safety

How I Formed My Own Views About AI Safety

Introduction

The Message of Inside Views

How I Interpreted the Message of Inside Views

My Journey

My Advice for Thinking About & Forming Inside Views

Why to form them?

How to form them?

Misc