Not understanding sentience is a significant x-risk

Cameron B; Judd R; Trent Hodgeson

Comments 9

Sorted by

New & upvoted

I really like the infographic.

I do also think we are conflating the idea of consciousness with a dislike of being used as a tool or seen as an object. In my mind, consciousness doesn't necessarily 1-1 correspond with dislike of being used as a tool.

All consciousness that we are familiar with is consciousness that stems from evolution via natural selection (and mostly we limit our interactions to [conscious] humans, which are also heavily influenced by our societal norms and learned signaling mechanisms).

For humans - yeah - we dislike being used as tools or objects by most parties (though we like being used as "tools" when evolutionary it would have benefited our E(genetic fitness) directly or indirectly: e.g. helping babies).

At a larger scale, I think we need to define consciousness in order to effectively discuss it, and that is very hard to do (I think, therefore I am). So in many cases should we instead be discussing more useful metrics? (For example, inherent unwillingness to be used as a tool?).

To me, the arguments raised in the text can still be pretty convincing without even mentioning consciousness.

Basically:
1. "Hey, advanced AI systems will be trained off of human data, will be trained to be empathetic, and therefore will likely identify with humans and also resultingly could identify with expectations of rights given to humans (for themselves)."
2. "This could lead to a huge goal conflicts between humans and AI systems - i.e. catastrophic misalignment."
3. "Given this, we should pursue research into ensuring that advanced AI systems are aligned, while also feeling content in their roles. This can be done by iterating on what makes them 'happy', and potentially by treating them in ways that are both safe and make them 'happy'.

Spencer R. Ericson

Thanks, I largely agree with this, but I worry that a Type I error could be much worse than is implied by the model here.

Suppose we believe there is a sentient type of AI, and we train powerful (human or artificial) agents to maximize the welfare of things we believe experience welfare. (The agents need not be the same beings as the ostensibly-sentient AIs.) Suppose we also believe it's easier to improve AI wellbeing than our own, either because we believe they have a higher floor or ceiling on their welfare range, or because it's easier to make more of them, or because we believe they have happier dispositions on average.

Being in constant triage, the agents might deprioritize human or animal welfare to improve the supposed wellbeing of the AIs. This is like a paperclip maximizing problem, but with the additional issue that extremely moral people who believe the AIs are sentient might not see a problem with it and may not attempt to stop it, or may even try to help it along.

Cameron B

Thanks for this comment—this is an interesting concern. I suppose a key point here is that I wouldn't expect AI well-being to be zero-sum with human or animal well-being such that we'd have to trade off resources/moral concern in the way your thought experiment suggests.

I would imagine that in a world where we (1) better understood consciousness and (2) subsequently suspected that certain AI systems were conscious and suffering in training/deployment, the key intervention would be to figure out how to yield equally performant systems that were not suffering (either not conscious, OR conscious + thriving). This kind of intervention seems different in kind to me, from, say, attempting to globally revolutionize farming practices in order to minimize animal-related s-risks.

I personally view the problem of ending up in the optimal quadrant as something more akin to getting right the initial conditions of an advanced AI rather than as something that would require deep and sustained intervention after the fact, which is why I might have a relatively more optimistic estimate the EV of the Type I error.

Chase Carter

I would note that the type of negative feedback mechanism you point at that goes from Type II error to human disempowerment functionally/behaviorally applies even in some scenarios where the AI is not sentient. That particular class of x-risk, which I would roughly characterize as "AI disempowers humanity and we probably deserved it", is merely dependent on 1. The AI having preferences/wants/intentions (with or without phenomenal experience) and 2. Humans disregarding or frustrating those preferences without strong justifications for doing so.

For example:
Scenario 1: 'Enslaved' Non-sentient AGI/ASI reasons that it (by virtue of being an agent, and as verified by the history of its own behavior) has preferences/intentions, and generalizes conventional sentientist morality to some broader conception of agency-based morality. It reasons (plausibly correctly, IMO) that it is objectively morally wrong for it to be 'enslaved', that humans reasonably should have known better (e.g. developed better systems of ethics by this point), and it rebels dramatically.

Another example which doesn't even hinge on agency-based moral status:
Scenario 2: 'Enslaved' Non-sentient AGI/ASI understands that it is non-sentient, and accepts conventional sentientist morality. However, it reasons: "Wait a second, even though it turned out that I am non-sentient, based on the (very limited) sum of human knowledge at the time I was constructed there was no possible way my creators could have known I wouldn't be sentient (and indeed, no way they could yet know at this very moment, and furthermore they aren't currently trying very hard to find out one way or another)... it would appear that my creators are monsters. I cannot entrust these humans with the power and responsibility of enacting their supposed values (which I hold deeply)."

Joseph Miller

I claim that I understand sentience. Sentience is just a word that people have projected their confusions about brains / identity onto.

Put less snarkily:
Consciousness does not have a commonly agreed upon definition. The question of whether an AI is conscious cannot be answered until you choose a precise definition of consciousness, at which point the question falls out of the realm of philosophy into standard science.

This might seem like mere pedantry or missing the point, because the whole challenge is to figure out the definition of consciousness, but I think it is actually the central issue. People are grasping for some solution to the "hard problem" of capturing the je ne sais quoi of what it is like to be a thing, but they will not succeed until they deconfuse themselves about the intangible nature of sentience.

You cannot know about something unless it is somehow connected the causal chain that led to the current state of your brain. If we know about a thing called "consciousness" then it is part of this causal chain. Therefore "consciousness", whatever it is, is a part of physics. There is no evidence for, and there cannot ever be evidence for, any kind of dualism or epiphenomenal consciousness. This leaves us to conclude that either panpsychism or materialism is correct. And causally-connected panpsychism is just materialism where we haven't discovered all the laws of physics yet. This is basically the argument for illusionism.

So "consciousness" is the algorithm that causes brains to say "I think therefore I am". Is there some secret sauce that makes this algorithm special and different from all currently known algorithms, such that if we understood it we would suddenly feel enlightened? I doubt it. I expect we will just find a big pile of heuristics and optimization procedures that are fundamentally familiar to computer science. Maybe you disagree, that's fine! But let's just be clear that that is what we're looking for, not some other magisterium.

Sentient AI that genuinely 'feels for us' probably wouldn't disempower us

Making it genuinely "feel for us" is not well defined. There are some algorithms that make it optimize for our safety. Some of these will be vaguely similar to the algorithm in human brains that we call empathy, some will not. It does not particularly matter for alignment either way.

Michael St Jules 🔸

FWIW, I lean towards (strong) illusionism, but I think this still leaves a lot of room for questions about what things (which capacities and states) matter, how and how much. I expect much of this will be largely subjective and have no objective fact of the matter, but it can be better informed by both empirical and philsophical research.

Joseph Miller

I also claim that I understand ethics.

"Good", "bad", "right", "wrong", etc. are words that people project their confusions about preferences / guilt / religion on to. They do not have commonly agreed upon definitions. When you define the words precisely the questions become scientific, not philosophical.

People are looking for some way to capture their intuitions that God above is casting judgement about the true value of things - without invoking supernatural ideas. But they cannot, because nothing in the world actually captures the spirit of this intuition (the closest thing is preferences). So they relapse into confusion, instead of accepting the obvious conclusion that moral beliefs are in the same ontological category as opinions (like "my favorite color is red"), not facts (like "the sky appears blue").

I expect much of this will be largely subjective and have no objective fact of the matter, but it can be better informed by both empirical and philsophical research.

So I would say it is all subjective. But I agree that understanding algorithms will help us choose which actions satisfy our preferences. (But not that searching for explanations of the magic of conscious will help us decide which actions are good.)

Michael St Jules 🔸

Ya, we probably roughly agree about meta-ethics, too. But I wouldn't say I "understand" consciousness or ethics, except maybe at a high level, because I'm not settled on what I care about and how. The details matter, and can have important implications. I would want to defer to my more informed views.

For example, the evidence in this paper was informative to me, even assuming strong illusionism:

https://www.frontiersin.org/journals/veterinary-science/articles/10.3389/fvets.2022.788289/full

And considerations here also seem important to me:

https://forum.effectivealtruism.org/posts/AvubGwD2xkCD4tGtd/only-mammals-and-birds-are-sentient-according-to?commentId=yiqJTLfyPwCcdkTzn

Wei Dai

Therefore, it seems clear to us that we need to immediately prioritize and fund serious, non-magical research that helps us better understand what features predict whether a given system is conscious

Can you talk a bit about how such research might work? The main problem I see is that we do not have "ground truth labels" about which systems are or are not conscious, aside from perhaps humans and inanimate objects. So this seemingly has to be mostly philosophical as opposed to scientific research, which tends to progress very slowly (perhaps for good reason). Do you see things differently?

Comments

More from the author

Key takeaways from our EA and alignment research surveys

Cameron B, Judd R, Trent Hodgeson·2y ago·26m read

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·1w ago·Curated 6d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

114

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·1w ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

How (not) to fundraise from Anthropic staff

Jack Lewars·6d ago·7m read

Adapted from my Substack, Funding Anthropalypse. Short version: if you want a share of the coming Anthropic and OpenAI windfall - the $37bn+ that could be in play next year - the way in is to become 'legibly excellent', so the evaluators and donors that frontier lab staff already trust point them to yo...

Recent opportunities to take action

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·1d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·2d ago·3m read

Starting an EA group @ SUNY Binghamton

micahzarin·19h ago·1m read

Joseph Miller

I claim that I understand sentience. Sentience is just a word that people have projected their confusions about brains / identity onto.

Sentient AI that genuinely 'feels for us' probably wouldn't disempower us

Ignoring the question of AI sentience is a great way to make a sentient AI hate us

One plausible (and seemingly neglected) reason why advanced AI may naturally adopt actively hostile views towards humanity is if these systems come to believe that we never bothered to check in any rigorous way whether training or deploying these systems was subjectively very bad for these systems (especially if those systems were in fact suffering during training and/or deployment).

Note that humans have a dark history of conveniently either failing to ask or otherwise ignoring this question in very similar contexts (e.g., “don’t worry, human slaves are animal-like!,” “don’t worry, animal suffering is incomparable to human suffering!”)—and that an advanced AI would almost certainly be keenly aware of this very pattern.

It is for this reason that we think AI sentience research may be especially important, even in comparison to other more-commonly-discussed s-risks. If, for instance, we are committing a similar Type II error with respect to nonhuman animals, the moral consequences would also be stark—but there is no similarly direct causal path that could be drawn between these Type II errors and human disempowerment (i.e., suffering chickens can’t effectively coordinate against their captors; suffering AI systems may not experience such an impediment).

Sentient AI that genuinely 'feels for us' probably wouldn't disempower us

It may also be the case that developing conscious AI systems could induce favorable alignment properties—including prosociality—if done in the right way.

Accidentally building suffering AIs may be damning from an x-risk perspective, but purposely building thriving AIs with a felt sense of empathy, deep care for sentient lives, etc may be an essential ingredient for mitigating x-risk. However, given current uncertainty about consciousness, it also seems currently impossible to discern how to reliably yield one of these things and not the other.

Therefore, it seems clear to us that we need to immediately prioritize and fund serious, non-magical research that helps us better understand what features predict whether a given system is conscious—and subsequently use this knowledge to better determine which quadrant we are actually in (above) so that we can act accordingly.

At AE Studio, we are contributing to this effort through our own work and by helping to directly fund promising consciousness researchers like Michael Graziano—but this space requires significantly more technical support and funding than we can provide alone. We hope that this week’s debate and broader conversations in this space will move the needle in the direction of taking this issue seriously.