Hide table of contents

Crossposted to LessWrong.

In October 2022, 91 EA Forum/LessWrong users answered the AI timelines deference survey. This post summarises the results.

Context

The survey was advertised in this forum post, and anyone could respond. Respondents were asked to whom they defer most, second-most and third-most, on AI timelines. You can see the survey here.

Results

This spreadsheet has the raw anonymised survey results. Here are some plots which try to summarise them.[1]

Simply tallying up the number of times that each person is deferred to:

The plot only features people who were deferred to by at least two respondents.[2]


Some basic observations:

  • Overall, respondents defer most frequently to themselves—i.e. their “inside view” or independent impression—and Ajeya Cotra. These two responses were each at least twice as frequent as any other response.
  • Then there’s a kind of “middle cluster”—featuring Daniel Kokotajlo, Paul Christiano, Eliezer Yudkowsky and Holden Karnofsky—where, again, each of these responses were  ~at least twice as frequent as any other response.
  • Then comes everyone else…[3] There’s probably something more fine-grained to be said here, but it doesn’t seem crucial to understanding the overall picture.

What happens if you redo the plot with a different metric? How sensitive are the results to that?

One thing we tried was computing a “weighted” score for each person, by giving them:

  • 3 points for each respondent who defers to them the most
  • 2 points for each respondent who defers to them second-most
  • 1 point for each respondent who defers to them third-most.

If you redo the plot with that score, you get this plot. The ordering changes a bit, but I don’t think it really changes the high-level picture. In particular, the basic observations in the previous section still hold.

We think the weighted score (described in this section) and unweighted score (described in the previous section) are the two most natural metrics, so we didn’t try out any others.

Don’t some people have highly correlated views? What happens if you cluster those together?

Yeah, we do think some people have highly correlated views, in the sense that their views depend on similar assumptions or arguments. We tried plotting the results using the following basic clusters:

  • Open Philanthropy[4] cluster = {Ajeya Cotra, Holden Karnofsky, Paul Christiano, Bioanchors}
  • MIRI cluster = {MIRI, Eliezer Yudkowsky}
  • Daniel Kokotajlo gets his own cluster
  • Inside view = deferring to yourself, i.e. your independent impression
  • Everyone else = all responses not in one of the above categories

Here’s what you get if you simply tally up the number of times each cluster is deferred to:

This plot gives a breakdown of two of the clusters (there’s no additional information that isn’t contained in the above two plots, it just gives a different view).

This is just one way of clustering the responses, which seemed reasonable to us. There are other clusters you could make.

Limitations of the survey

  • Selection effects. This probably isn’t a representative sample of forum users, let alone of people who engage in discourse about AI timelines, or make decisions influenced by AI timelines.
  • The survey didn’t elicit much detail about the weight that respondents gave to different views. We simply asked who respondents deferred most, second-most and third-most to. This misses a lot of information.
  • The boundary between [deferring] and [having an independent impression] is vague. Consider: how much effort do you need to spend examining some assumption/argument for yourself, before considering it an independent impression, rather than deference? This is a limitation of the survey, because different respondents may have been using different boundaries.

Acknowledgements

Sam and McCaffary decided what summary plots to make. McCaffary did the data cleaning, and wrote the code to compute summary statistics/make plots. Sam wrote the post.

Daniel Kokotajlo suggested running the survey. Thanks to Noemi Dreksler, Rose Hadshar and Guive Assadi for feedback on the post.

  1. ^

    You can see the code for these plots here, along with a bunch of other plots which didn’t make the post.

  2. ^

    Here’s a list of people who were deferred to by exactly one respondent.

  3. ^

    Arguably, Metaculus doesn’t quite fit with “everyone else”, but I think it's good enough as a first approximation, especially when you also consider the plots which result from the weighted score (see next section).

  4. ^

    This cluster could have many other names. I’m not trying to make any substantive claim by calling it the Open Philanthropy cluster.

Comments3


Sorted by Click to highlight new comments since:

Things that surprised me about the results

  • There’s more variety than I expected in the group of people who are deferred to
    • I suspect that some of the people in the “everyone else” cluster defer to people in one of the other clusters—in which case there is more deference happening than these results suggest.
  • There were more “inside view” responses than I expected (maybe partly because people who have inside views were incentivised to respond, because it’s cool to say you have inside views or something). Might be interesting to think about whether it’s good (on the community level) for this number of people to have inside views on this topic.
  • Metaculus was given less weight than I expected (but as per Eli (see footnote 2), I think that’s a good thing).
  • Grace et al. AI expert surveys (12) were deferred to less than I expected (but again, I think that’s good—many respondents to those surveys seem to have inconsistent views, see here for more details. And also there’s not much reason to expect AI experts to be excellent at forecasting things like AGI—it’s not their job, it’s probably not a skill they spend time training).
  • It seems that if you go around talking to lots of people about AI timelines, you could move the needle on community beliefs more than I expected.

Thanks for doing this survey and sharing the results, super interesting!

Regarding

maybe partly because people who have inside views were incentivised to respond, because it’s cool to say you have inside views or something

Yes, I definitely think that there's a lot of potential for social desirability bias here! And I think this can happen even if the responses are anonymous, as people might avoid the cognitive dissonance that comes with admitting to "not having an inside view." One might even go as far as framing the results as  "Who do people claim to defer to?"

Interesting, thanks! Relative to results you plot, Epoch here gave more weight to Samotsvety and Metaculus, and less to Ajeya:

Curated and popular this week
 ·  · 11m read
 · 
Confidence: Medium, underlying data is patchy and relies on a good amount of guesswork, data work involved a fair amount of vibecoding.  Intro:  Tom Davidson has an excellent post explaining the compute bottleneck objection to the software-only intelligence explosion.[1] The rough idea is that AI research requires two inputs: cognitive labor and research compute. If these two inputs are gross complements, then even if there is recursive self-improvement in the amount of cognitive labor directed towards AI research, this process will fizzle as you get bottlenecked by the amount of research compute.  The compute bottleneck objection to the software-only intelligence explosion crucially relies on compute and cognitive labor being gross complements; however, this fact is not at all obvious. You might think compute and cognitive labor are gross substitutes because more labor can substitute for a higher quantity of experiments via more careful experimental design or selection of experiments. Or you might indeed think they are gross complements because eventually, ideas need to be tested out in compute-intensive, experimental verification.  Ideally, we could use empirical evidence to get some clarity on whether compute and cognitive labor are gross complements; however, the existing empirical evidence is weak. The main empirical estimate that is discussed in Tom's article is Oberfield and Raval (2014), which estimates the elasticity of substitution (the standard measure of whether goods are complements or substitutes) between capital and labor in manufacturing plants. It is not clear how well we can extrapolate from manufacturing to AI research.  In this article, we will try to remedy this by estimating the elasticity of substitution between research compute and cognitive labor in frontier AI firms.  Model  Baseline CES in Compute To understand how we estimate the elasticity of substitution, it will be useful to set up a theoretical model of researching better alg
 ·  · 6m read
 · 
TLDR: This 6 million dollar Technical Support Unit grant doesn’t seem to fit GiveWell’s ethos and mission, and I don’t think the grant has high expected value. Disclaimer: Despite my concerns I still think this grant is likely better than 80% of Global Health grants out there. GiveWell are my favourite donor, and given how much thought, research, and passion goes into every grant they give, I’m quite likely to be wrong here!   What makes GiveWell Special? I love to tell people what makes GiveWell special. I giddily share how they rigorously select the most cost-effective charities with the best evidence-base. GiveWell charities almost certainly save lives at low cost – you can bank on it. There’s almost no other org in the world where you can be pretty sure every few thousand dollars donated be savin’ dem lives. So GiveWell Gives you certainty – at least as much as possible. However this grant supports a high-risk intervention with a poor evidence base. There are decent arguments for moonshot grants which try and shift the needle high up in a health system, but this “meta-level”, “weak evidence”, “hits-based” approach feels more Open-Phil than GiveWell[1]. If a friend asks me to justify the last 10 grants GiveWell made based on their mission and process, I’ll grin and gladly explain. I couldn’t explain this one. Although I prefer GiveWell’s “nearly sure” approach[2], it could be healthy to have two organisations with different roles in the EA global Health ecosystem. GiveWell backing sure things, and OpenPhil making bets.   GiveWell vs. OpenPhil Funding Approach What is the grant? The grant is a joint venture with OpenPhil[3] which gives 6 million dollars to two generalist “BINGOs”[4] (CHAI and PATH), to provide technical support to low-income African countries. This might help them shift their health budgets from less effective causes to more effective causes, and find efficient ways to cut costs without losing impact in these leaner times. Teams of 3-5
 ·  · 5m read
 · 
Crossposted on Substack. Preamble I have been asked to share a few tips on how to get the most out of the conference as part of the opening talk of this year’s EAGxPrague. I decided to focus on the tacit knowledge I have. This is because there is a lot of useful advice on how to get the most out of conferences, but I feel they usually offer general tips, as opposed to really getting into the weeds. Here is a slightly more polished version of the transcript I prepared. I also added some pictures: Hi, my name is Gergő, and I would like to welcome you to this conference too. Props to Hana for organising! Yesterday, I decided to check Swapcard to see how many EA conferences I attended since my first one in 2020. It’s around 20, so I thought I would exactly describe what I do and hope that I don’t come across as insane to you. Subscribe to The Field Building Blog Conference week Monday/Tuesday At the beginning of conference week, I first scroll through the agenda and add a couple of workshops and talks that I might be interested in. There are usually lot’s of interesting sessions, but I try to be selective as I know 1-1s are usually more valuable. At this point, I roughly end up with something like 6 sessions. This doesn’t take long. The reason it’s good to do it early because most people haven’t yet started networking. For the times I booked sessions, I set myself unavailable so people can’t book me at those slots. I wish Swapcard did this automatically, but it doesn't.         Wednesday/Thursday I set aside time on at least part of my Wednesday or Thursday to continue preparing. I do this on one of these days because, by then, most attendees have registered for the conference. Before this time, their profiles won’t show up on Swapcard. This is where the heavy lifting starts. I spend at least 3, but depending on the size of the conference, more than 6 hours networking on Swapcard. Figuring out who to talk to I literally scroll through every profile on