State Space of X-Risk Trajectories

David_Kristoffersson; JustinShovelain

Comments 7

Sorted by

New & upvoted

I think this article very nicely undercuts the following common sense research ethics:

If your research advances the field more towards a positive outcome than it moves the field towards a negative outcome, then your research is net-positive

Whether research is net-positive depends on the current field's position relative to both outcomes (assuming that when either outcome is achieved, the other can no longer be achieved). It replaces this with another heuristic:

To make a net-positive impact with research, move the field closer to the positive outcome than the negative outcome with a ratio of at least the same ratio as distance-to-positive : distance-to-negative.

If we add uncertainty to the mix, we could calculate how risk averse we should be (where risk aversion should be larger when the research step is larger, as the small projects probably carry much less risk to accidentally make a big step towards FAI).

The ratio and risk-aversion could lead to some semi-concrete technology policy. For example, if the distance to FAI and UAI is (100, 10), technology policy could prevent funding any projects that either have a distance-ratio (for lack of a better term) lower than 10 or that have a 1% or higher probability a taking a 10d step towards UAI.

Of course, the real issue is whether such a policy can be plausibly and cost-effectively enforced or not, especially given that there is competition with other regulatory areas (China/US/EU).

Without policy, the concepts can still be used for self-assessment. And when a researcher/inventor/sponsor assesses the risk-benefit profile of a technology themselves, they should discount for their own bias as well, because they are likely to have an overly optimistic view of their own project.

MichaelA🔸

Good points.

Also, this comment reminded of somewhat similar arguments in this older post by Justin (and Ozzie Gooen).

adamShimi

The geometric intuition underlying this post already proves useful for me!

Yesterday, while discussing with a friend why I want to change my research topic to AI Safety instead of what I currently do (distributed computing), my first intuition was that AI safety aims at shaping the future, while distributed computing is relatively agnostic about it. But a far better intuition comes when considering the vector along the current trajectory in state space, starting at the current position of the world, and whose direction and length capture the trajectory and the speed at which we follow it.

From this perspective, the difference between distributed computing/hardware/cloud computing research and AI safety research is obvious in terms of vector operations:

The former amounts to positive scaling of the vector, and thus makes us go along our current trajectory faster.
While the latter amounts to rotations (and maybe scaling, but it is a bit less relevant), which allows us to change our trajectory.

And since I am not sure we are heading in the right direction, I prefer to be able to change the trajectory (at least potentially).

David_Kristoffersson

Happy to see you found it useful, Adam! Yes, general technological development corresponding to scaling of the vector is exactly the kind of intuition it's meant to carry.

adamShimi

As a tool for existential risk research, I feel like the graphical representation will indeed be useful in crystallizing the differences in hypotheses between researchers. It might even serves as a self-assessing tool, for checking quickly some of the consequences of one's own view.

But beyond the trajectories (and maybe specific distances), are you planning on representing the other elements you mention? Like the uncertainty or the speed along trajectories? I feel like the more details about an approach can be integrated into a simple graphical representation, the more this tool will serve to disentangle disagreement between researchers.

David_Kristoffersson

But beyond the trajectories (and maybe specific distances), are you planning on representing the other elements you mention? Like the uncertainty or the speed along trajectories?

Thanks for your comment. Yes; the other elements, like uncertainty, would definitely be part of further work on the trajectories model.

nathanhb

For what it's worth, my model of a path to safe AI looks like a narrow winding path along a ridge with deadly falls to either side:

Unfortunately, the deadly falls to either side have illusions projected onto them of shortcuts to power, wealth, and utility. I don't think there is any path which goes to safety without a long ways of immediate danger nearby. In this model, deliberately consistently optimizing for safety above all else during the dangerous stretch is the only way to make it through.

The danger zone is where the model is sufficiently powerful and agentic enough that a greedy shortsighted person could say to it, "Here is access to the internet. Make me lots of money." and this would result in a large stream of money pouring into their account. I think we're only a few years away from that point, and that the actions that safety researchers take in the meantime aren't going to change that. So, we need both safety research and governance, and carefully selecting disproportionately safety-accelerating research would be entirely irrelevant to the strategic landscape.

This is just my view, and I may be wrong, but I think it's worth pointing out that there's a chance that the idea of trying to do disproportionately safety-accelerating research is a distraction from strategically relevant action.

Comments

More from the author

Convergence 2024 Impact Review

David_Kristoffersson, Gwyn Glasser·1y ago·17m read

Announcing Convergence Analysis: An Institute for AI Scenario & Governance Research

David_Kristoffersson, Deric Cheng, Convergence Analysis·2y ago·5m read

The ‘far future’ is not just the far future

David_Kristoffersson·6y ago·3m read

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·6d ago·Curated 3d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

154

Let's taboo the V-word

lincolnq·6d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

105

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·4d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

EA Organisation Updates thread: July 2026

Dane Valerie·5d ago·1m read

announcing High Impact Aliens

tzukitchan·2d ago·1m read

Help us launch AI safety university groups by referring potential founders

Jason Chin🔸, Thomas Rodskog·2d ago·4m read

Siebe

I think this article very nicely undercuts the following common sense research ethics:

If your research advances the field more towards a positive outcome than it moves the field towards a negative outcome, then your research is net-positive

To make a net-positive impact with research, move the field closer to the positive outcome than the negative outcome with a ratio of at least the same ratio as distance-to-positive : distance-to-negative.

Of course, the real issue is whether such a policy can be plausibly and cost-effectively enforced or not, especially given that there is competition with other regulatory areas (China/US/EU).

How does the state space of x-risk trajectories model compare to the trajectory visualizations in [9]? The axes are almost completely different. Their trajectories graphs have an axis for time; the state space doesn’t. Their graphs have an axis for population size; the state space doesn’t. In the state space, each axis represents a stably bad or a stably good future. Though, in the visualizations in [9], hitting the x-axis represents extinction, which maps somewhat to hitting the axis of a stably bad future in the trajectories model. The visualizations in [9] illustrate valuable ideas but they seem to be less about choices or interventions than the state space model is. ↩︎

State Space of X-Risk Trajectories

Abstract

Introduction

State Space Model of X-Risk Trajectories

Trajectories of the world

Defining distance more exactly

Uncertainty over positions and trajectories

Defining how to calculate speed over a trajectory

Extensions

Conclusion

Bibliography