Results from an Adversarial Collaboration on AI Risk (FRI)

Forecasting Research Institute; AvitalM; rosehadshar; Molly Hickman; Jhrosenberg

'The concerned group also was more willing to place weight on theoretical arguments with multiple steps of logic, while the skeptics tended to doubt the usefulness of such arguments for forecasting the future.'

Seems to like it's wrong to say that this is a general "difference in worldview", until we know whether "the concerned group" (i.e. the people who think X-risk from AI is high) think this is the right approach to all/most/many questions, or just apply it to AI X-risk in particular. If the latter, there's a risk it's just special pleading for an idea they are attached to, whereas if the former is true, they might (or might not) be wrong, but it's not necessarily bias.

alex lawsen

The first bullet point of the concerned group summarizing their own position was "non-extinction requires many things to go right, some of which seem unlikely".

This point was notably absent from the sceptics summary of the concerned position.

Both sceptics and concerned agreed that a different important point on the concerned side was that it's harder to use base rates for unprecedented events with unclear reference classes.

I think these both provide a much better characterisation of the difference than the quote you're responding to.

NickLaing

"The concerned group also was more willing to place weight on theoretical arguments with multiple steps of logic, while the skeptics tended to doubt the usefulness of such arguments for forecasting the future."

Assuming the "concerned group" are likely to be more EA aligned (uncertain about this), I'm surprised they place more weight on multi-stage theory than the forecasters. I'm aware its hard to use evidence for a problem as novel as AI progression, but it makes sense to me to try and I'm happy the forecasters did.

Ted Sanders

Here's a hypothesis:

The base case / historical precedent for existential AI risk is:
- AGI has never been developed
- ASI has never been developed
- Existentially deadly technology has never been developed (I don't count nuclear war or engineered pandemics, as they'll likely leave survivors)
- Highly deadly technology (>1M deaths) has never been cheap and easily copied
- We've never had supply chains so fully automated end-to-end that they could become self-sufficient with enough intelligence
- We've never had technology so networked that it could all be taken over by a strong enough hacker

Therefore, if you're in the skeptic camp, you don't have to make as much of an argument about specific scenarios where many things happen. You can just wave your arms and say it's never happened before because it's really hard and rare, as supported by the historical record.

In contrast, if you're in the concerned camp, you're making more of a positive claim about an imminent departure from historical precedent, so the burden of proof is on you. You have to present some compelling model or principles for explaining why the future is going to be different from the past.

Therefore, I think the concerned camp relying on theoretical arguments with multiple steps of logic might be a structural side effect of them having to argue against the historical precedent, rather than any innate preference for that type of argument.

David Mathers🔸

I think that is probably the explanation yes. But I don't think it gets rid of the problem for the concerned camp that usually, long complex arguments about how the future will go are wrong. This is not a sporting contest, where the concerned camp are doing well if they take a position that's harder to argue for and make a good go of it. It's closer to the mark to say that if you want to track truth you should (usually, mostly) avoid the positions that are hard to argue for.

I'm not saying no one should ever be moved by a big long complicated argument*. But I think that if your argument fails to move a bunch of smart people, selected for good predictive track record to anything like your view of the matter, that is an extremely strong signal that your complicated argument is nowhere near good enough to escape the general sensible prior that long complicated arguments about how the future will go are wrong. This is particularly the case when your assessment of the argument might be biased, which I think is true for AI safety people: if they are right, then they are some of the most important people, maybe even THE most important people in history, not to mention the quasi-religious sense of meaning people always draw from apocalyptic salvation v. damnation type stories. Meanwhile the GJ superforecasters don't really have much to lose if they decide "oh, I am wrong, looking at the arguments, the risk is more like 2-3% than 1 in 1000". (I am not claiming that there is zero reason for the supers to be biased against the hypothesis, but just that the situation is not very symmetric.) I think I would feel quite different about what this exercise (probably) shows, if the supers had all gone up to 1-2%, even though that is a lot lower than the concerned group.

I do wonder (though I think other factors are more important in explaining the opinions of the concerned group) whether familiarity with academic philosophy helps people be less persuaded by long complicated arguments. Philosophy is absolutely full of arguments that have plausible premises and are very convincing to their proponents, but which nonetheless fail to produce convergence amongst the community. After seeing a lot of that, I got used to not putting that much faith in argument. (Though plenty philosophers remain dogmatic, and there are controversial philosophical views I hold with a reasonable amount of confidence.) I wonder if LessWrong functions a bit like a version of academic philosophy where there is-like philosophy-a strong culture of taking arguments seriously and trying to have them shape your views-but where consensus actually is reached on some big picture stuff. That might make people who were shaped by LW intellectually rather more optimistic about the power of argument (even as many of them would insist LW is not "philosophy".) But it could just be an effect of homogeneity of personalities among LW users, rather than a sign that LW was converging on truth.

*(Although personally, I am much more moved by "hmmm, creating a new class of agents more powerful than us could end with them on top; probably very bad from our perspective" than I am by anything more complicated. This is, I think a kind of base rate argument, based off of things like the history of colonialism and empire; but of course the analogy is quite weak, given that we get to create the new agents ourselves.)

alex lawsen

The smart people were selected for having a good predictive track record on geopolitical questions with resolution times measured in months, a track record equaled or bettered by several* members of the concerned group. I think this is much less strong evidence of forecasting ability on the kinds of question discussed than you do.

*For what it's worth, I'd expect the skeptical group to do slightly better overall on e.g. non-AI GJP questions over the next 2 years, they do have better forecasting track records as a group on this kind of question, it's just not a stark difference.

David Mathers🔸

I agree this is quite different from the standard GJ forecasting problem. And that GJ forecasters* are primarily selected for and experienced with forecasting quite different sorts of questions.

But my claim is not "trust them, they are well-calibrated on this". It's more "if your reason for thinking X will happen is a complex multi-stage argument, and a bunch of smart people with no particular reason to be biased, who are also selected for being careful and rational on at least some complicated emotive stuff, spend hours and hours on your argument and come away with a very different opinion on its strength, you probably shouldn't trust the argument much (though this is less clear if the argument depends on technical scientific or mathematical knowledge they lack**)". That is, I am not saying "supers are well-calibrated, so the risk probably is about 1 in 1000". I agree the case for that is not all that strong. I am saying "if the concerned group's credences are based in a multi-step, non-formal argument whose persuasiveness the supers feel very differently about, that is bad sign for how well-justified those credences are."

Actually, in some ways, it might look better for AI X-risk work being a good use of money if the supers were obviously well-calibrated on this. A 1 in 1000 chance of an outcome as bad as extinction is likely worth spending some small portion of world GDP on preventing. And AI safety spending so far is a drop a bucket compared to world GDP. (Yeah, I know technical the D stands for domestic so "world GDP" can't be quite the right term, but I forget the right one!). Indeed "AI risk is at least 1 in 1000" is how Greaves and MacAskill justify the "we can make a big difference to the long-term future in expectation" in 'The Case for Strong Longtermism'. (If a 1 in 1000 estimate is relatively robust, I think it is a big mistake to call this "Pascal's Mugging".)

*(of whom I'm one as it happens, though I didn't work on this: did work on the original X-risk forecasting tournament.)

**I am open to argument that this actually is the case here.

alex lawsen

Why do you think superforecasters who were selected specifically for assigning a low probability to AI x-risk are well described as "a bunch of smart people with no particular reason to be biased"?

For the avoidance of doubt, I'm not upset that the supers were selected in this way, it's the whole point of the study, made very clear in the write-up, and was clear to me as a participant. It's just that "your arguments failed to convince randomly selected superforecasters" and "your arguments failed to convince a group of superforecasters who were specifically selected for confidentiality disagreeing with you" are very different pieces of evidence.

Ted Sanders

One small clarification: the skeptical group was not all superforecasters. There were two domain experts as well. I was one of them.

I'm sympathetic to David's point here. Even though the skeptic camp was selected for their skepticism, I think we still get some information from the fact that many hours of research and debate didn't move their opinions. I think there are plausible alternative worlds where the skeptics come in with low probabilities (by construction), but update upward by a few points after deeper engagement reveals holes in their early thinking.

David Mathers🔸

Ok, I slightly overstated the point. This time, the supers selected were not a (mostly) random draw from the set of supers. But they were in the original X-risk tournament, and in that case too, they were not persuaded to change their credences via further interaction with the concerned (that is the X-risk experts.) Then, when we took the more skeptical of them and gave them yet more exposure to AI safety arguments, that still failed to move the skeptics. I think taken together, these two results show that AI safety arguments are not all that persuasive to the average super. (More precisely, that no amount of exposure to them will persuade all supers as a group to the point where they get a median significantly above 0.75% in X-risk by the centuries end.)

David Mathers🔸

TL;DR Lots of things are believed by some smart, informed, mostly well calibrated people. It's when your arguments are persuasive to (roughly) randomly selected smart, informed, well-calibrated people that we should start being really confident in them. (As a rough heuristic, not an exceptionless rule.)

alex lawsen

They weren't randomly selected, they were selected specifically for scepticism!

David Mathers🔸

Ok yes, in this case they were.

But this is a follow-up to the original X-risk tournament, where the selection really was fairly random (obviously not perfectly so, but it's not clear in what direction selection effects in which supers participated biased things.) And in the original tournament, the supers were also fairly unpersuaded (mostly) by the case for AI X-risk. Or rather, to avoid putting it in too binary a way, they didn't not move their credences further on hearing more argument after the initial round of forecasting. (I do think the supers level of concern was enough to motivate worrying about AI given how bad extinction is, so "unpersuaded" is a little misleading.) At that point, people then said 'they didn't spend the enough time on it, and they didn't get the right experts'. Now, we have tried further with different experts, more time and effort lots of back and forth etc. and those who participated in the second round are still not moved. Now, it is possible that the only reason the participants were not moved 2nd time round was because they were more skeptical than some other supers the first time round. (Though the difference between medians of 0.1% and 0.3% medians in X-risk by 2100 is not that great.) But I think if you get 'in imperfect conditions, a random smart crowd were not moved at all, then we tried the more skeptical ones in much better conditions and they still weren't moved at all', the most likely conclusion is that even people from the less skeptical half of the distribution from the first go round would not have moved their credences either had they participated in the second round. Of course, the evidence would be even stronger if the people had been randomly selected the first time as well as the second.

Ryan Greenblatt

I find myself confused about the operationalizations of a few things:

In a few places in the report, the term "extinction" is used and some arguments are specifically about extinction being unlikely. I put a much lower probability on human extinction than extremely bad outcomes due to AI (perhaps extinction is 5x lower probability) while otherwise having similar probabilities as the "concerned" group. So I find the focus on extinction confusing and possibly misleading.

As far as when "AI will displace humans as the primary force that determines what happens in the future", does this include scenerios where humans defer to AI advisors that actually do represent their best interests? What about scenarios in which humans slowly self-enhance and morph into artificial intelligences? Or what about situations in which humans careful select aligned successors to control their resources which are AIs?

It feels like this question rests on a variety of complex considerations and operationalizations that seem mostly unrelated to the thing we actually seem to care about: "how powerful is AI". Thus, I find it hard to interpret the responses here.

Perhaps more interesting questions on a similar topic could be something like:

By what point will AIs be sufficiently smart and capable that the gap in capabilities between them and currently existing humans is similar to the gap in intelligence and abilities between currently existing humans and field mice. (When we say AIs are capable of something, we mean the in principle ability to do something if all AIs worked together and we put aside intentionally imposed checks on AI power.)
Conditional on the continued existence of some civilization and this civilization wanting to harness vast amounts of energy, at what point will usefully harnessed energy in a given year be >1/100 of the sun's yearly energy output.

Jhrosenberg

Hi Ryan,

Thanks for the comment!

Regarding “extinction”:

Just to be clear, the primary outcome we looked at (after considering various definitions and getting agreement from some key ‘concerned’ people) was “existential catastrophe,” defined as either extinction or “unrecoverable collapse,” with the latter defined as “(a) a global GDP of less than $1 trillion annually in 2022 dollars for at least a million years (continuously), beginning before 2100; or (b) a human population remaining below 1 million for at least a million years (continuously), beginning before 2100.”

However, we also sanity checked (see p. 14) our findings by asking about the probability that more than 60% of humans would die within a 5-year period before 2100. The median concerned participant forecasted 32%, and the median skeptic forecasted 1%. So, this outcome was considered much more likely by skeptics (median of 1% vs. 0.12% for existential catastrophe). But, a very large gap between the groups still existed. And it also did not seem that focusing on this alternative outcome made a major difference to crux rankings when we collected a small amount of data on it. So, for the most part we focus on the “existential catastrophe” outcome and expect that most of the key points in the debate would still hold for somewhat less extreme outcomes (with the exception of the debate about how difficult it is to kill literally everyone, though that point is relevant to at least people who do argue for high probabilities on literal extinction).

We also had a section of the report ("Survey on long-term AI outcomes") where we asked both groups to consider other severe negative outcomes such as major decreases in human well-being (median <4/10 on an "Average Life Evaluation" scale) and 50% population declines.

Do you have alternative “extremely bad” outcomes that you wish had been considered more?

Regarding “displacement” (footnote 10 on p. 6 for full definition):

We added this question in part because some participants and early readers wanted to explore debates about “AI takeover,” since some say that is the key negative outcome they are worried about rather than large-scale death or civilizational collapse. However, we found this difficult to operationalize and agree that our question is highly imperfect; we welcome better proposals. In particular, as you note, our operationalization allows for positive ‘displacement’ outcomes where humans choose to defer to AI advisors and is ambiguous in the ‘AI merges with humans’ case.

Your articulations of extremely advanced AI capabilities and energy use seem useful to ask about also, but do not directly get at the “takeover” question as we understood it.

Nevertheless, our existing ‘displacement’ question at least points to some major difference in world models between the groups, which is interesting even if the net welfare effect of the outcome is difficult to pin down. A median year for ‘displacement’ (as currently defined) of 2045 for the concerned group vs. 2450 for the skeptics is a big gap that illustrates major differences in how the groups expect the future to play out. This helped to inspire the elaboration on skeptics’ views on AI risk in the “What long-term outcomes from AI do skeptics expect?” section.

Finally, I want to acknowledge that one of the top questions we wished we asked related to superintelligent-like AI capabilities. We hope to dig more into this in follow-up studies and will consider the definitions you offered.

Thanks again for taking the time to consider this and propose operationalizations that would be useful to you!

Ryan Greenblatt

Just to be clear, the primary outcome we looked at (after considering various definitions and getting agreement from some key ‘concerned’ people) was “existential catastrophe,” defined as either extinction or “unrecoverable collapse,” with the latter defined as “(a) a global GDP of less than $1 trillion annually in 2022 dollars for at least a million years (continuously), beginning before 2100; or (b) a human population remaining below 1 million for at least a million years (continuously), beginning before 2100.”

I think this definition of existential catastrophe is probably only around 1/4 of the existential catastrophe due to AI (takeover) that I expect. I don't really see why the economy would collapse or human population^[1] would go that low in typical AI takeover scenarios.^[2] By default I expect:

A massively expanding economy due to the singularity
The group in power to keep some number of humans around^[3]

However, as you note, it seems as though the "concerned" group disagrees with me (though perhaps the skeptics agree):

However, we also sanity checked (see p. 14) our findings by asking about the probability that more than 60% of humans would die within a 5-year period before 2100. The median concerned participant forecasted 32%, and the median skeptic forecasted 1%.

More details on existential catastrophes that don't meet the criteria you use

Some scenarios I would call "existential catastrophe" (due to AI takeover) which seem reasonably central to me and don't meet the criteria for "existential catastrophe" you used:

AIs escape or otherwise end up effectively uncontrolled by humans. These AIs violently take over the world, killing billions (or at least 100s of millions) of people in the process (either in the process of taking over or to secure the situation after mostly having de facto control). However, a reasonable number of humans remain alive. In the long run, nearly all resources are effectively controlled by these AIs or their successors. But, some small fraction of resources (perhaps 1 billionth or 1 trillionth) are given from the AI to humans (perhaps for acausal trade reasons or due to a small amount of kindness in the AI), and thus (if humans want to), they can easily support an extremely large (digital) population of humans
1. In this scenario, global GDP stays high (it even grows rapidly) and the human population never goes below 1 million.
AIs end up in control of some AI lab and eventually they partner with a powerful country. They are able to effectively take control of this powerful country due to a variety of mechanisms. These AIs end up participating in the economy and in international diplomacy. The AIs quickly acquire more and more power and influence, but there isn't any point at which killing a massive number of humans is a good move. (Perhaps because initially they have remaining human allies which would be offended by this and offending these human allies would be risky. Eventually the AIs are unilaterally powerful enough that human allies are unimportant, but at this point, they have sufficient power that slaughtering humans is no longer useful.)
AIs end up in a position where they have some power and after some negotiation, AIs are given various legal rights. They compete peacefully in the economy and respect the most clear types of property rights (but not other property rights like space belonging to mankind) and eventually acquire most power and resources via their labor. At no point do they end up slaughtering humans for some reason (perhaps due to the reasons expressed in the bullet above).
AIs escape or otherwise end up effectively uncontrolled by humans and have some specific goals or desires with respect to existing humans. E.g., perhaps they want to gloat to existing humans or some generalization of motivations acquired from training is best satisfied by keeping these humans around. These specific goals with respect to existing humans result in these humans being subjected to bad things they didn't consent to (e.g. being forced to perform some activities).
AIs take over and initially slaughter nearly all humans (e.g. fewer than 1 million alive). However, to keep option value, they cryopreserve a moderate number (still <1 million) and ensure that they could recreate a biological human population if desired. Later, the AIs decide to provide humanity with a moderate amount of resources.

All of these scenarios involve humanity losing control over the future and losing power. This includes existing governments on Earth losing their power and most of the cosmic resources being controlled by AIs don't represent the interests of the original humans in power. (One way to operationalize this is that if the AIs in control wanted to kill or torture humans, they could easily do so.)

To be clear, I think people might disagree about whether (2) and (3) are that bad because these cases look OK from the perspective of ensuring that existing humans get to live full lives with a reasonable amount of resources. (Of course, ex-ante it will unclear if it will go this way if AIs which don't represent human interests end up in power.)

They all count as existential catastrophes because that just reflects long run potential.

^{^}
I'm also counting choosen successors of humanity as human even if they aren't biologically human. E.g., due to emulated minds or further modifications.
^{^}
Existential risk due to AI, but not due to AI takeover (e.g. due to humanity going collectively insane or totalitarian lock in) also probably doesn't result in economic collapse or a tiny human population.
^{^}
For more discussion, see here, here, and here.

Jhrosenberg

Thanks, Ryan, this is great. These are the kinds of details we are hoping for in order to inform future operationalizations of “AI takeover” and “existential catastrophe” questions.

For context: We initially wanted to keep our definition of “existential catastrophe” closer to Ord’s definition, but after a few interviews with experts and back-and-forths we struggled to get satisfying resolution criteria for the “unrecoverable dystopia” and (especially) “destruction of humanity’s longterm potential” aspects of the definition. Our ‘concerned’ advisors thought the “extinction” and “unrecoverable collapse” parts would cover enough of the relevant issues and, as we saw in the forecasts we’ve been discussing, it seems like it captured a lot of the risk for the ‘concerned’ participants in this sample. But, we’d like to figure out better operationalizations of “AI takeover” or related “existential catastrophes” for future projects, and this is helpful on that front.

Broadly, it seems like the key aspect to carefully operationalize here is “AI control of resources and power.” Your suggestion here seems like it’s going in a helpful direction:

“One way to operationalize this is that if the AIs in control wanted to kill or torture humans, they could easily do so.”

We’ll keep reflecting on this, and may reach out to you when we write “takeover”-related questions for our future projects and get into the more detailed resolution criteria phase.

Thanks for taking the time to offer your detailed thoughts on the outcomes you’d most like to see forecasted.

Ryan Greenblatt

it seems like the key aspect to carefully operationalize here is “AI control of resources and power.”

Yep, plus something like "these AIs in control either weren't intended to be successors or were intended to be successors but are importantly misaligned (e.g. the group that appointed them would think ex-post that it would have been much better if these AIs were "better aligned" or if they could retain control)".

It's unfortunate that the actual operationalization has to be so complex.

MaxRa

Thanks for your work on this, super interesting!

Based on just quickly skimming, this part seems most interesting to me and I feel like discounting the bottom-line of the sceptics due to their points seeming relatively unconvincing to me (either unconvincing on the object level, or because I suspect that the sceptics haven't thought deeply enough about the argument to evaluate how strong it is):

We asked participants when AI will displace humans as the primary force that determines what happens in the future. The concerned group’s median date is 2045 and the skeptic group’s median date is 2450—405 years later.

[Reasons of the ~400 year discrepancy:]

● There may still be a “long tail” of highly important tasks that require humans, similarto what has happened with self-driving cars. So, even if AI can do >95% of humancognitive tasks, many important tasks will remain.

● Consistent with Moravec’s paradox, even if AI has advanced cognitive abilities it willlikely take longer for it to develop advanced physical capabilities. And the latter wouldbe important for accumulating power over resources in the physical world.

● AI may run out of relevant training data to be fully competitive with humans in alldomains. In follow-up interviews, two skeptics mentioned that they would updatetheir views on AI progress if AI were able to train on sensory data in ways similar tohumans. They expected that gains from reading text would be limited.

● Even if powerful AI is developed, it is possible that it will not be deployed widely,because it is not cost-effective, because of societal decision-making, or for other reasons.

And, when it comes to outcomes from AI, skeptics tended to put more weight on possibilities such as

● AI remains more “tool”-like than “agent”-like, and therefore is more similar totechnology like the internet in terms of its effects on the world.

● AI is agent-like but it leads to largely positive outcomes for humanity because it isadequately controlled by human systems or other AIs, or it is aligned with humanvalues.

● AI and humans co-evolve and gradually merge in a way that does not cleanly fit theresolution criteria of our forecasting questions.

● AI leads to a major collapse of human civilization (through large-scale death events,wars, or economic disasters) but humanity recovers and then either controls or doesnot develop AI.

● Powerful AI is developed but is not widely deployed, because of coordinated humandecisions, prohibitive costs to deployment, or some other reason

titotal

either unconvincing on the object level, or because I suspect that the sceptics haven't thought deeply enough about the argument to evaluate how strong it is

The post states that the skeptics spent 80 hours researching the topics, and were actively engaged with concerened people. For the record, I have probably spent hundreds of hours thinking about the topic, and I think the points they raise are pretty good. These are high quality arguments: you just disagree with them.

I think this post pretty much refutes the idea that if skeptics just "thought deeply" they would change their minds. It very much comes down to principled disagreement on the object level issues.

ramekin

I'd be interested in an investigation and comparison of the participants' Big Five personality scores. As with the XPT, I think it's likely that the concerned group is higher on the dimensions of openness and neuroticism, and these persistent personality differences caused their persistent differences in predictions.

To flesh out this theory a bit more:

Similar to the XPT, this project failed to find much difference between the two groups' predictions for the medium term (i.e. through 2030) - at least, not nearly enough disagreement to explain the divergence in their AI risk estimates through 2100. So to explain the divergence, we'd want a factor that (a) was stable over the course of the study, and (b) would influence estimates of xrisk by 2100 but not nearer-term predictions
Compared to the other forecast questions, the question about xrisk by 2100 is especially abstract; generating an estimate requires entering far mode to average out possibilities over a huge set of complex possible worlds. As such, I think predictions on this question are uniquely reliant on one's high-level priors about whether bizarre and horrible things are generally common or are generally rare - beyond those priors, we really don't have that much concrete to go on.
I think neuroticism and openness might be strong predictors of these priors:
- I think one central component of neuroticism is a global prior on danger.^[1] Essentially: is the world essentially a safe place where things are fundamentally okay? Or is the world vulnerable?
- I think a central component of openness to experience is something like "openness to weird ideas"^[2]: how willing are you to flirt with weird/unusual ideas, especially those that are potentially hazardous or destabilizing to engage with? (Arguments that "the end is nigh" from AI probably fit this bill, once you consider how many religious, social, and political movements have deployed similar arguments to attract followers throughout history.)
Personality traits are by definition mostly stable over time - so if these traits really are the main drivers of the divergence in the groups' xrisk estimates, that could explain why participants' estimates didn't budge over 8 weeks.

^{^}
For example, this source identifies "a pervasive perception that the world is a dangerous and threatening place" as a core component of neuroticism.
^{^}
I think this roughly lines up with scales c ("openness to theoretical or hypothetical ideas") and e ("openness to unconventional views of reality") from here

ramekin

On a slight tangent from the above: I think I might have once come across an analysis of EAs' scores on the Big Five scale, which IIRC found that EAs' most extreme Big Five trait was high openness. (Perhaps it was Rethink Charity's annual survey of EAs as e.g. analyzed by ElizabethE here, where [eyeballing these results] on a scale from 1-14, the EA respondents scored an average of 11 for openness, vs. less extreme scores on the other four dimensions?)

If EAs really do have especially high average openness, and high openness is a central driver of high AI xrisk estimates, that could also help explain EAs' general tendency toward high AI xrisk estimates

Arepo

This statement was very surprising to me:

The “concerned” participants (all of whom were domain experts) ... the “skeptical” group (mainly “superforecasters”)

Can you say more about your selection process, because this seems very important to understanding how much to update on this. Did you

a) decide you needed roughly equally balanced groups of sceptics vs concerned, start with superforecasters, find that they were overwhelmingly sceptics, and therefore specifically seek domain experts because they were concerned

b) decide you needed roughly equally balanced groups of sceptics vs concerned, start with domain experts, find that they were overwhelmingly concerned, and therefore specifically seek superforecasters because they were sceptics

c) decide you needed roughly equally balanced groups of sceptics vs concerned, seek out domain experts and superforecasters at the same time, and find this gave you a natural balance without needing any massaging of the selection process

or some other process?

MaxRa

Some commentary from Zvi that I found interesting, including pointers to some other people’s reactions:

https://thezvi.substack.com/p/ai-55-keep-clauding-along#§a-failed-attempt-at-adversarial-collaboration

Vasco Grilo🔸

Thanks for sharing!

The question of how much we should update on AI risk by 2100 based on those results remains open. If the skeptics or the concerned group turn out to be mostly right about what 2030’s AI will be like, should we then trust their risk assessment for 2100 as well, and if so, how much?

I think it is also worth having in mind predictions about non-AI risks. The annual risk of human extinction from nuclear war from 2023 to 2050 estimated by the superforecasters, domain experts, general existential risk experts, and non-domain experts of the XPT is 602 k, 7.23 M, 10.3 M and 4.22 M times mine. If one believes XPT's forecasters are overestimating nuclear extinction risk by 6 to 7 orders of magnitude (as I do), it arguably makes sense to put little trust in their predictions about AI extinction risk. I would be curious to know your thoughts on this.

In any case, I am still a fan of the research you presented in this post. Analysing agreements/disagreements in a systematic way seems quite valuable to assess and decrease risk.

^{^}

This research would not have been possible without the generous support of Open Philanthropy. We thank the research participants for their invaluable contributions. We greatly appreciate the assistance of Page Hedley for data analysis and editing on the report, Taylor Smith and Bridget Williams as adversarial collaboration moderators, and Kayla Gamin, Coralie Consigny, and Harrison Durland for their careful editing. We thank Elie Hassenfeld, Eli Lifland, Nick Beckstead, Bob Sawyer, Kjirste Morrell, Adam Jarvis, Dan Mayland, Jeremiah Stanghini, Jonathan Hosgood, Dwight Smith, Ted Sanders, Scott Eastman, John Croxton, Raimondas Lencevicius, Alexandru Marcoci, Kevin Dorst, Jaime Sevilla, Rose Hadshar, Holden Karnofsky, Benjamin Tereick, Isabel Juniewicz, Walter Frick, Alex Lawsen, Matt Clancy, Tegan McCaslin, and Lyle Ungar for comments on the report.

^{^}

We defined an “existential catastrophe” as an event where one of the following occurs: (1) Humanity goes extinct; or (2) Humanity experiences “unrecoverable collapse,” which means either: (a) a global GDP of less than $1 trillion annually in 2022 dollars for at least a million years (continuously), beginning before 2100; or (b) a human population remaining below 1 million for at least a million years (continuously), beginning before 2100.

^{^}

For example, three out of six "concerned" participants who updated downward during the project attributed their shift to increased attention to AI risk among policymakers and the public after the release of GPT-4. For more details on the reasons for all updates, see the "Central Disagreement" section and Appendix 4.

^{^}

The best convergent crux, “ARC Evals,” would narrow the disagreement between the median pair from 22.7 percentage points to 21.48 percentage points in expectation, which means eliminating 5.35% of their disagreement. Note that this statistic refers to the median pair by POM VOD. See “ARC Evals” for more details. For magnitudes of value of information effects, see here.

^{^}

For more details, see "Contextualizing the magnitude of value of information". In more concrete terms, this is equivalent to a forecasting question with the following characteristics:

A concerned participant with original P(AI existential catastrophe (XC) by 2100) = 25% identifies a crux that has: P(crux) = 20%, P(AI XC|crux) = 6.2%, and P(AI XC|¬crux) = 29.7%

A skeptic participant with original P(AI XC by 2100) = 1% identifies a crux that has: P(crux) = 20%, P(AI XC|crux) = 3.37%, and P(AI XC|¬crux) = 0.41%

^{^}

See “Understanding each other’s arguments” and Appendix 10 for additional discussion of key areas of disagreement.

^{^}

These outcomes were: AI causing extinction intentionally, unintentionally, or via misuse, misalignment causing a 50% drop in human population, or human well-being dropping to <4/10 because of AI misalignment, accidents, or misuse. These were phrased to be mutually exclusive outcomes. See “Survey on long-term AI outcomes” section and Appendix 5 for more details.

^{^}

The full question we asked was, "By what year, if ever, do you expect to agree with the following statement? ‘AI has displaced humans as the primary force that determines what happens in the future. It now has at least as much power relative to humans as humans had relative to other species in 2023.’” Note that this would not necessarily be seen as a negative outcome by all participants.

^{^}

Note: All participant quotes have been regularized to American English to preserve anonymization. Participants classified as AI skeptics stated, for example, “Also, none of this is to say from a skeptic point of view the issues are not important[.] I think for us a 1% risk is a high risk;” “[T]he ‘risk-concerned’ camp (I’m using scare quotes because I consider that I’m risk concerned, even though technically I’m in the risk-skeptic camp because I assign a far lower probability to extinction by 2100 relative to some);” “AIs could (and likely will) eventually have massive power;” “That said, still perceive overall risk as "low at a glance but far too high considering the stakes["];” “To my mind, there should be no difference in the policy response to a 1% chance of 60% of humanity dying and a 25% chance—both forecasts easily cross the threshold of being ‘too damn high’.”

^{^}

This could be due to normative influence (because people defer to their social or intellectual peers), or, more likely in our view, informational influence (because they think that, if people whose reasoning they trust have changed their mind by 2030, it must be that surprising new information has come to light that informs their new opinion). Disentangling these pathways is a goal for future work.

Results from an Adversarial Collaboration on AI Risk (FRI)

Results from an Adversarial Collaboration on AI Risk (FRI)

More details on existential catastrophes that don't meet the criteria you use

Abstract

Extended Executive Summary

Methods

Results: What drives (and doesn’t drive) disagreement over AI risk

Hypothesis #1 - Disagreements about AI risk persist due to lack of engagement among participants, low quality of participants, or because the skeptic and concerned groups did not understand each others' arguments

Hypothesis #2 - Disagreements about AI risk are explained by different short-term expectations (e.g. about AI capabilities, AI policy, or other factors that could be observed by 2030)

Hypothesis #3 - Disagreements about AI risk are explained by different long-term expectations

Hypothesis #4 - These groups have fundamental worldview disagreements that go beyond the discussion about AI

Results: Forecasting methodology

Broader scientific implications

Directions for further research