Hide table of contents

TLDR; This post explains why I think we should be more explicit about our values when making impact estimates/ value predictions and provides some example suggestions of how to do this.

There has been a string of recent posts discussing and predicting characteristics (EV, Variance, etc) about future value. (How Binary is Longterm Value?, The Future Might Not Be So Great, Parfit + Singer + Aliens = ?, shameless plug , etc. )

Moreover, estimating the "impact" of interventions is a central theme in this community. It is perhaps the core mission of Effective Altruism. 

Most of the time I see posts discussing impact/value, I don't see a definition of value.[1]  What we define value to mean (sometimes called, ethics, morality, etc.) is the function that converts material outcomes (the is) into a number for which, bigger = better (the ought).

If someone makes a post engaging in value estimating and doesn't define value, it seems like there are two likely possibilities.

  • Most people engaging with the post will use their own internal notion of value.
  • Most people will engage with the post using what they perceive to be the modal value in the community, so probably total utilitarianism.

I believe these are both sub-optimal outcomes.  I do not believe most people engaging with these posts are trying to actively grapple with meta-ethics, so in the first place, they might not care to talk through the fact that they have different internal notions of value. More importantly, the ability to identify and isolate cruxes is central to rationality. We should always aim to modularize our discussions as this clarifies disagreements in the moment and allows the conclusions of the conversation to be much more plug-and-pull in the future. On some questions of impact, it could be the case that the answer to the question is not a function of the value system we use. But I think this is incredibly unlikely[2] and anyway we should explicitly come to that conclusion rather than assume it. 

If the second outcome, at least most of us would be on the same page. Of course, not everyone would be on the same page. Also, it isn't like total utilitarianism is clearly defined. You still need to give utility a useable definition, you need to create a weighting rule or map for sentient beings, and you need to define if there is such a thing as negative lives (and if yes, where the line is), etc.[3] So you would still have a lesser version of the above point. Plus, we then have also created an environment with a de facto ethic, which doesn't seem like a good vibe to me. 

Suggestions

Primary suggestion: Write your definition of value in your bio, and if you don't clarify in your comment/post, people should default to using this definition of value. I'm not sure there is an easily generalizable blueprint for all ethical systems, but here is an example of what a utilitarian version might look like (not my actual values). Note that this could probably be fleshed out more and/or better but I don't think it matters for the purpose of this post.

BIO

Ethical Framework: Total Utilitarianism

Definition of Utility: QALYS, but rescaled so that quality of life can dip negative

Weighting functions: Amount of neurons

Additional Clarifications: I believe this is implicit in my weighting function but I consider future and digital minds to be morally valuable. My definition of a neuron is (....). I would prefer to use my Coherent Extrapolated Volition over my current value system. 

 

Other suggestions I like less:

Suggestion: Define value in your question/comment post. [4]

Suggestion: Make a certain form of total utilitarianism the de jure meaning of value on the forum when people don't clearly define value or don't set a default value in their bio.[5]

Suggestion: Don't do impact estimates in one go, do output/outcome estimates. Then extrapolate separately. I.E. ask questions like "How many QALYs will there be in the future" "How many human rights will be violated" etc. 

  1. ^

    Sometimes I will see something like "my ethics are suffering focused, so this leads me x instead of y". 

  2. ^

    If we think of morality as being an arbitrary map that takes the world as an input and spits out a (real) number, then it is an arbitrary map from   or  to R, where F some set (Technically, the dimensions of the universe are not necessarily comprised of the same sets so this notation is wrong, plus I don't actually have any idea what I'm talking about).  If this is the case, we can basically make the "morality map" do whatever we want. So when asking questions about how the value of the world will end up looking, we can almost certainly create two maps(moralities) that will spit out very different answers for the same world. 

  3. ^

    I understand how strict of a bar clarifying these things every post would be, and I don't think we need to be strict about it, but we should keep these things in mind and push towards a world where we are communicating this information. 

  1. ^

    This seems laborious

  2. ^

    We can of course make it explicit that we don't endorse this, and it is just a discussion norm. I would still understand if people feel this opens us up to reputational harms and thus is a bad idea. 

Show all footnotes
Comments2


Sorted by Click to highlight new comments since:

Cool I'll check it out.

Curated and popular this week
 ·  · 16m read
 · 
This is a crosspost for The Case for Insect Consciousness by Bob Fischer, which was originally published on Asterisk in January 2025. [Subtitle.] The evidence that insects feel pain is mounting, however we approach the issue. For years, I was on the fence about the possibility of insects feeling pain — sometimes, I defended the hypothesis;[1] more often, I argued against it.[2] Then, in 2021, I started working on the puzzle of how to compare pain intensity across species. If a human and a pig are suffering as much as each one can, are they suffering the same amount? Or is the human’s pain worse? When my colleagues and I looked at several species, investigating both the probability of pain and its relative intensity,[3] we found something unexpected: on both scores, insects aren’t that different from many other animals.  Around the same time, I started working with an entomologist with a background in neuroscience. She helped me appreciate the weaknesses of the arguments against insect pain. (For instance, people make a big deal of stories about praying mantises mating while being eaten; they ignore how often male mantises fight fiercely to avoid being devoured.) The more I studied the science of sentience, the less confident I became about any theory that would let us rule insect sentience out.  I’m a philosopher, and philosophers pride themselves on following arguments wherever they lead. But we all have our limits, and I worry, quite sincerely, that I’ve been too willing to give insects the benefit of the doubt. I’ve been troubled by what we do to farmed animals for my entire adult life, whereas it’s hard to feel much for flies. Still, I find the argument for insect pain persuasive enough to devote a lot of my time to insect welfare research. In brief, the apparent evidence for the capacity of insects to feel pain is uncomfortably strong.[4] We could dismiss it if we had a consensus-commanding theory of sentience that explained why the apparent evidence is ir
 ·  · 40m read
 · 
I am Jason Green-Lowe, the executive director of the Center for AI Policy (CAIP). Our mission is to directly convince Congress to pass strong AI safety legislation. As I explain in some detail in this post, I think our organization has been doing extremely important work, and that we’ve been doing well at it. Unfortunately, we have been unable to get funding from traditional donors to continue our operations. If we don’t get more funding in the next 30 days, we will have to shut down, which will damage our relationships with Congress and make it harder for future advocates to get traction on AI governance. In this post, I explain what we’ve been doing, why I think it’s valuable, and how your donations could help.  This is the first post in what I expect will be a 3-part series. The first post focuses on CAIP’s particular need for funding. The second post will lay out a more general case for why effective altruists and others who worry about AI safety should spend more money on advocacy and less money on research – even if you don’t think my organization in particular deserves any more funding, you might be convinced that it’s a priority to make sure other advocates get more funding. The third post will take a look at some institutional problems that might be part of why our movement has been systematically underfunding advocacy and offer suggestions about how to correct those problems. OUR MISSION AND STRATEGY The Center for AI Policy’s mission is to directly and openly urge the US Congress to pass strong AI safety legislation. By “strong AI safety legislation,” we mean laws that will significantly change AI developers’ incentives and make them less likely to develop or deploy extremely dangerous AI models. The particular dangers we are most worried about are (a) bioweapons, (b) intelligence explosions, and (c) gradual disempowerment. Most AI models do not significantly increase these risks, and so we advocate for narrowly-targeted laws that would focus their att
 ·  · 1m read
 · 
Disclaimer: Post written in a personal capacity. These are personal opinions and do not in any way represent my employer's views TL;DR: * I do not think we will produce high reliability methods to evaluate or monitor the safety of superintelligent systems via current research paradigms, with interpretability or otherwise. * Interpretability still seems a valuable tool and remains worth investing in, as it will hopefully increase the reliability we can achieve. * However, interpretability should be viewed as part of an overall portfolio of defences: a layer in a defence-in-depth strategy * It is not the one thing that will save us, and it still won’t be enough for high reliability. EDIT: This post was originally motivated by refuting the claim "interpretability is the only reliable path forward for detecting deception in advanced AI", but on closer reading this is a stronger claim than Dario's post explicitly makes. I stand by the actual contents of the post, but have edited the framing a bit, and also emphasised that I used to hold the position I am now critiquing, apologies for the mistake Introduction There’s a common argument made in AI safety discussions: it is important to work on interpretability research because it is a realistic path to high reliability safeguards on powerful systems - e.g. as argued in Dario Amodei’s recent “The Urgency of Interpretability”.[1] Sometimes an even stronger argument is made, that interpretability is the only realistic path to highly reliable safeguards - I used to believe both of these arguments myself. I now disagree with these arguments. The conceptual reasoning is simple and compelling: a sufficiently sophisticated deceptive AI can say whatever we want to hear, perfectly mimicking aligned behavior externally. But faking its internal cognitive processes – its "thoughts" – seems much harder. Therefore, goes the argument, we must rely on interpretability to truly know if an AI is aligned. I am concerned this line of