Red teaming a model for estimating the value of longtermist interventions - A critique of Tarsney's "The Epistemic Challenge to Longtermism"

AF; Bryce Woodworth; Chris Lonsberry

Red teaming a model for estimating the value of longtermist interventions - A critique of Tarsney's "The Epistemic Challenge to Longtermism"

AF,

Comments

Sorted by

New & upvoted

No comments on this post yet.

Be the first to respond.

Comments

More from the author

Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter

AF, billz, Nate Thomas, Alexandra Bates·2y ago·1m read

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·1w ago·Curated 5d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

114

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·6d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

How (not) to fundraise from Anthropic staff

Jack Lewars·5d ago·7m read

Adapted from my Substack, Funding Anthropalypse. Short version: if you want a share of the coming Anthropic and OpenAI windfall - the $37bn+ that could be in play next year - the way in is to become 'legibly excellent', so the evaluators and donors that frontier lab staff already trust point them to yo...

Recent opportunities to take action

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·1d ago·2m read

Starting an EA group @ SUNY Binghamton

micahzarin·13h ago·1m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·1d ago·3m read

^{^}

See Apply for Red Team Challenge [May 7 - June 4] for the intro post.

^{^}

Another aspect of persistence which does not seem to be explored in this paper is the possibility that the impact of the intervention could fade out on its own in time. The reason this matters is that the methods Tarsney uses to estimate r do not seem to account for such fadeout effects.

^{^}

To be clear, although we use the phrasing Larry Longtermist and the example of working on x-risk reduction, one does not have to hold longtermist views to prioritize existential risk reduction.

^{^}

To illustrate an example of fade-out (see footnote above) based on the same example, perhaps Larry's work to reduce nuke risk tends to fade away over time. Imagine he did a lot of work to educate leaders. As a result that particular generation was particularly good at deescalating conflicts. But as they are replaced or retire, the new leadership has effectively "forgotten" the lessons Larry taught. In order to maintain ongoing vigilance within the leadership, he would have to continue his educational work every few years. This feels qualitatively different from ENEs as Tarsney describes them. [What type of skepticism is this? Epistemic persistence]

^{^}

Chris: By maximum predictability horizon, I mean the point at which forecasters no longer do better than chance. For the purposes of the model analyzed here, we could just as easily say 100 years because the key timescale is the 1,000 year period between the intervention and the beginning of the long-term future.

Further, to avoid stretching the evidence on forecasting too far, it should be noted that the literature on forecasting is focused on events and world-states of a different nature than we are dealing with here. Predicting whether or not human civilization will still exist in 2122 is very different (and not just in timescale) than forecasting the price of a given commodity in a decade or whether an armed conflict will take place in a given region in the next ten years.

^{^}

Chris: That is: if successful predictable influence is always contingent on successful prediction, then the chances of predictable influence must, by definition, be lower than the chances of prediction. Epistemic status is exploratory on this point. In particular, I have trouble envisioning how my chances of predictable influence will ever be better than my chances at prediction. Since my objective is to select the intervention with the maximum expected value, the influence must be predictable and attributable to my intervention.

^{^}

Chris: This is a more straightforward and practical application of Bayesian reasoning than the work of assigning credibility bounds to various futures that may come to pass in thousands or millions of years.

^{^}

Meaning it's difficult to imagine a world in which 100% of human resources go toward a specific intervention for the basic reason that maintaining ourselves and our society consumes a considerable chunk of total resources.

^{^}

One could imagine scenarios. There have allegedly been outbreaks of deadly disease caused by agents escaping from a lab whose mission is to save humans from those very agents. Alternately, In advertising the risk to humanity of bioengineered threats (or advanced AI, military use of drone swarms), one could inadvertently alert a malicious actor to the possibility.

^{^}

Technically, V_srepresents the difference in expected value per star per unit time between worlds where we are in S versus the complement state NS.

^{^}

More precisely, min EV(L) = X% * EV(L|favorable values, point estimates for others) + (100-X)% * EV(L|Conservative steady-state model in 4.2)

^{^}

Orders of Magnitude. Mathematically: $E V (L) > 10^{11} * E V (N) .$

^{^}

This assumption ensures that the minimum EV calculated is in-fact a minimum since if it is the sum of two terms weighted appropriately, the expected value of L when V_s > 10²⁰ (V/yr)/star and the expected value of L when V_s = 0 or the conservative steady-state remain on Earth indefinitely scenario of our long-run future, which is strictly less than the true expected value only if the expected value of V_s when V_s < 10²⁰ (V/yr)/star is positive.

^{^}

This refers to a scenario where we remain a physical species and live on planet.

^{^}

Another scenario is the spread of uncontrolled ecosystems to other planets in which wild animal suffering outweighs other flourishing. It also seems possible for humanity to spread into the universe but have net-negative lives on average due to societal reasons like totalitarian regimes, widespread poverty, etc.

^{^}

Of course, this is also just one of multiple arguments that could be made on why Tarsney’s assumption is unjustified.

^{^}

It is possible for the conditional EV of V_s to be negative but for the EV of L in the non-optimist scenarios to still be positive if the steady-state (we remain on Earth indefinitely) EV outweighs the negative cubic space colonization EV. (i.e. V_e outweighs V_s).

^{^}

Tarsney: "I will call any challenge to longtermism that does not require rejecting expectational utilitarianism an empirical challenge, since it does not rely on normative claims unfavorable to longtermism" (p. 4).

^{^}

Tarsney: "The case for longtermism may depend either on plausible but non-obvious empirical claims or on a tolerance for Pascalian fanaticism" (abstract).

^{^}

Note that the relevant probability is actually the probability that they are telling the truth, minus the probability that they will actually swap the rewards and pay out only if you deny them. It still seems like the probability that they are truthful should be higher, even if only by a very small amount.

^{^}

One could disagree on the grounds that our credence should decrease superlinearly in the proposed payoff, but assuming that the mugger would actually pay out Graham's number of utils, it seems implausible to believe they have less than a 10% chance of paying out Graham's number * 10.

One could further believe that having a decision procedure that leads to paying the mugger would in itself cause enough muggings to be net negative in expectation. If one actually believed that mugging scenarios could provide more EV than the entire future of humanity would otherwise generate, then this view is implausible as well.

^{^}

In this example, the probability would have to be higher than the inverse of Graham's number times the EV of the rest of the model. Given the magnitude of Graham's number, we contend that this is effectively 0 probability, and that it would be unreasonably overconfident to assert the non-existence of Operators of the Seventh Dimension with such probability.

^{^}

Tarsney was kind enough to provide some additional comments on this: ”My own tentative view here is described in my paper "Exceeding Expectations" -- in short, and very roughly, I think that the only real requirement of ethical decision-making under risk is first-order stochastic dominance; that in virtue of background risk, this requires us to be de facto EV-maximizers, more or less, when the relevant probabilities aren't too small”.

^{^}

Anjay: I personally equate ‘significantly’ with a greater than 10% chance of extinction but others likely have different intuitions. This is for the reason that if the likelihood is very high, the expected value can outweigh interventions focused on the nearterm just considering the people alive today.

^{^}

Anjay: This is based on the claim that a truly misaligned, superintelligent power-seeking AI would likely be very persistent and thus the rate of ENEs that remove it from this state would likely be very very low.

^{^}

Anjay: One example here could be something like the “You get what you measure” scenario from Paul Christiano’s, What Failure Looks Like where alignment efforts can help make our measures slightly better and more correlated with what we actually care about despite still falling short of a flourishing future.

Red teaming a model for estimating the value of longtermist interventions - A critique of Tarsney's "The Epistemic Challenge to Longtermism"

Red teaming a model for estimating the value of longtermist interventions - A critique of Tarsney's "The Epistemic Challenge to Longtermism"

Summary (2 mins)

Introduction

A brief introduction to Tarsney's paper and model

Anecdotal illustration

Assumptions

Critiques inside the model

A priori, the skeptical position seems strong

Is forecasting required for predictable influence?

The current form of the model assumes 1,000 years of persistence

Biased incorporation of uncertainty

Summary of section 6.1

An unjustified assumption in the author’s minimum expected value reasoning

Critiques outside the model

Challenging the usage of expectational utilitarianism

The given model implicitly assumes bounded fanaticism

The conclusions require a potentially-contentious degree of fanaticism

Suspicion of negative side-effects in first-order expected value calculations

Relation to AI risk and the longtermist movement

Conclusion