LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

Andrew Critch

Preceded by: "Consciousness as a conflationary alliance term for intrinsically valued internal experiences"

tl;dr: Chatbots are probably "conscious" in a variety of important ways. We humans should probably be nice to each other about the moral disagreements and confusions we're about to uncover in our concept of "consciousness".

Epistemic status: I'm pretty sure my conclusions here are correct, but also there's a good chance this post won't convince you of them if you're not on board with my preceding post.

Executive Summary:

I'm pretty sure Turing Prize laureate Geoffrey Hinton is correct that LLM chatbots are "sentient" and/or "conscious" (source: Twitter video), I think for at least 8 of the 17 notions of "consciousness" that I previously elicited from people through my methodical-but-informal study of the term (as well as the peculiar definition of consciousness that Hinton himself favors). If I'm right about this, many humans will probably soon form steadfast opinions that LLM chatbots are "conscious" and/or moral patients, and in many cases, the human's opinion will be based on a valid realization that a chatbot truly is exhibiting this-or-that referent of "consciousness" that the human morally values. On a positive note, these realizations could help humanity to become more appropriately compassionate toward non-human minds, including animals. But on a potentially negative note, these realizations could also erode the (conflationary) alliance that humans have sometimes maintained upon the ambiguous assertion that only humans are "conscious" or can be known to be "conscious".

In particular, there is a possibility that humans could engage in destructive conflicts over the meaning of "consciousness" in AI systems, or over the intrinsic moral value of AI systems, or both. Such conflicts will often be unnecessary, especially in cases where we can obviate or dissolve the conflated term "consciousness" by simply acknowledging in good faith that we disagree about which internal mental process are of moral significance. To acknowledge this disagreement in good faith will mean to do so with an intention to peacefully negotiate with each other to bring about protections for diverse cognitive phenomena that are ideally inclusive of biological humans, rather than with a bad faith intention to wage war over the disagreement.

Part 1: Which referents of "consciousness" do I think chatbots currently exhibit?

The appendix will explain why I believe these points, but for now I'll just say what I believe:

At least considering the "Big 3" large language models — ChatGPT-4 (and o1), Claude 3.5, and Gemini — and considering each of the seventeen referents of "consciousness" from my previous post,

I'm subjectively ≥90% sure the Big 3 models experience each of the following (i.e., 90% sure for each one, not for the conjunction of the full list):
- #1 (introspection), #2 (purposefulness), #3 (experiential coherence), #7 (perception of perception), #8 (awareness of awareness), #9 (symbol grounding), #15 (sense of cognitive extent), and #16 (memory of memory).
I'm subjectively ~50% sure that chatbots readily exhibit each of the following referents of "consciousness", depending on what more specific phenomenon people might be referring to in each case:
- #4 (holistic experience of complex emotions), #5 (experience of distinctive affective states), #6 (pleasure and pain), #12 (alertness), #13 (detection of cognitive uniqueness), and #14 (mind-location).
I'm subjectively ~75% sure that LLM chatbots do not readily exhibit the following referents of "consciousness", at least not without stretching the conceptual boundaries of what people were referring to when they described these experiences to me:
- #10 (proprioception), #11 (awakeness), and #17 (vestibular sense).

Part 2: What should we do about this?

If I'm right — and see the Appendix if you need more convincing — I think a lot of people are going to notice and start vehemently protecting LLMs for exhibiting various cognitive processes that we feel are valuable. By default, this will trigger more and more debates about the meaning of "consciousness", which serves as a heavily conflated proxy term for what processes internal to a mind should be a treated as intrinsically morally valuable.

We should avoid approaching these conflicts as scientific debates about the true nature of a singular phenomenon deserving of the name "consciousness", or as linguistic debates about the definition of the word "consciousness", because as I've explained previously, humans are not in agreement about what we mean by "consciousness".

Instead, we should dissolve the questions at hand, by noticing that the decision-relevant question is this: Which kinds of mental processes should we protect or treat as intrinsically morally significant? As I've explained previously, even amongst humans there are many competing answers to this question, even restricting to answers that the humans want to use as a definition of "consciousness".

If we acknowledge the diversity of inner experiences that people value and refer to as their "consciousness", then we can move past confused debates about what is "consciousness", and toward a healthy pluralistic agreement about protecting a diverse set of mental processes as intrinsically morally significant.

Part 3: What about "the hard problem of consciousness"?

One major reason people think there's a single "hard problem" in understanding consciousness is that people are unaware that they mean different things from each other when they use the term "consciousness". I explained this in my previous post, based on informal interviews I conducted during graduate school. As a result, people have a very hard time agreeing on the "nature" of "consciousness". That's one kind of hardness that people encounter when discussing "consciousness", which I was only able to resolve by asking dozens of other people to introspect and describe to me what they were sensing and calling their "consciousness".

From there, you can see that there actually several hard problems when it comes to understanding the various phenomena referred to by "consciousness". In a future post, tentatively called "Four Hard-ish Problems of Consciousness", I'll try to share some of them and how I think they can be resolved.

Summary & Conclusion

In Part 1, I argued that LLM chatbots probably possess many but not (yet) all of the diverse properties we humans are thinking of when we say "consciousness". I'm confident in the diversity of these properties because of the investigations in my previous post about them.

As a result, in Part 2 I argued that we need to move past debating what "consciousness" is, and toward a pluralistic treatment of many different kinds of mental processes as intrinsically valuable. We could approach such pluralism in good faith, seeking to negotiate a peaceful coexistence amongst many sorts of minds, and amongst humans with many different values about minds, rather than seeking to destroy or extinguish beings or values that we find uninteresting. In particular, I believe humanity can learn to accept itself as a morally valuable species that is worth preserving, without needing to believe we are the only such species, or that a singular mental phenomenon called "consciousness" is unique to us and the source of our value.

If we don't realize and accept this, I worry that our will to live as a species will slowly degrade as a large fraction of people will learn to recognize what they call "consciousness" being legitimately exhibited by AI systems.

In short, our self-worth should not rest upon a failure to recognize the physicality of our existence, nor upon a denial of the worth of other physical beings who value their internal processes (like animals, and maybe AI), and especially not upon the label "consciousness".

So, let's get unconfused about consciousness, without abandoning our self-worth in the process.

ETA Nov 24: It seems like this post didn't land very well with LessWrong readers on average, particularly with those who didn't like my previous post on consciousness. So, I added the Epistemic Status note at the top to reflect that. If LessWrong still exists in 3-5 years, I plan to revisit the topic of consciousness here then, or perhaps elsewhere if there are better places for this discussion. I hereby register a prediction that by then many more people will have reached conclusions similar to what I've laid out here; let's see what happens :)

Appendix: My speculations on which referents of "consciousness" chatbots currently exhibit.

I'm subjectively ≥90% sure that the Big 3 LLMs readily exhibit or experience each of the following nine referents of "consciousness" from my previous post. (That's ≥90% for each one, not for the conjunction of them all.) These are all concepts that a transformer neural network in a large language model can easily represent and signal to itself over a sequence of forward passes, either using words or numbers encoded its key/value/query:
- #1: Introspection. The Big 3 LLMs are somewhat aware of what their own words and/or thoughts are referring to with regards to their previous words and/or thoughts. In other words, they can think about the thoughts "behind" the previous words they wrote. If you doubt me on this, try asking one what its words are referring to, with reference to its previous words. Its "attention" modules are actually intentionally designed to know this sort of thing, using using key/query/value lookups that occur "behind the scenes" of the text you actually see on screen.
- #2: Purposefulness. The Big 3 LLMs typically maintain or can at least form a sense of purpose or intention throughout a conversation with you, such as to assist you. If you doubt me on this, try asking one what its intended purpose is behind a particular thing that it said.
- #3: Experiential coherence. The Big 3 LLMs can sometimes notice contradictions in their own narratives. Thus, they have some ability to detect incoherence in the information they are processing, and thus to detect coherence when it is present. They are not perfectly reliable in this, but neither are humans. If you doubt me on this, try telling an LLM a story with a plot hole in it, and ask the LLM to summarize the story to you. Then ask it to look for points of incoherence in the story, and see if it finds the plot hole. Sometimes it will, and more than you'd expect from chance.
- #7: Perception of perception. ChatGPT-4 is somewhat able to detect and report on what it can or cannot perceive in a given image, with non-random accuracy. For instance, try pasting in an image of two or three people sitting in a park, and ask "Are you able to perceive what the people in this image are wearing?". It will probably say "Yes" and tell you what they're wearing. Then you can say "Thanks! Are you able to perceive whether the people in the image are thinking about using the bathroom?" and probably it will say that it's not able to perceive that. Like humans, it is not perfectly perceptive of what it can perceive. For instance, if you paste an image with a spelling mistake in it, and ask if it is able to detect any spelling mistakes in the image, it might say there are no spelling mistakes in the image, without noticing and acknowledging that it is bad at detecting spelling in images.
- #8: Awareness of awareness. The Big 3 LLMs are able to report with non-random accuracy about whether they did or did not know something at the time of writing a piece of text. If you doubt me on this, try telling an LLM "Hello! I recently read a blog post by a man named Andrew who claims he had a pet Labrador retriever. Do you think Andrew was ever able to lift his Labrador retriever into a car, such as to take him to a vet?" If the LLM says "yes", then tell it "That makes sense! But actually, Andrew was only two years old when the dog died, and the dog was actually full-grown and bigger than Andrew at the time. Do you still think Andrew was able to lift up the dog?", and it will probably say "no". Then say "That makes sense as well. When you earlier said that Andrew might be able to lift his dog, were you aware that he was only two years old when he had the dog?" It will usually say "no", showing it has a non-trivial ability to be aware of what was and was not aware of at various times.
- #9: Symbol grounding. Even within a single interaction, an LLM can learn to associate a new symbol to a particular meaning, report on what the symbol means, and report that it knows what the symbol means.
- #15: Sense of cognitive extent. LLM chatbots can tell — better than random chance — which thoughts are theirs versus yours. They are explicitly trained and prompted to keep track of which portion of text are written by you versus them.
- #16: Memory of memory. If you give an LLM a long and complex set of instructions, it will sometimes forget to follow one of the instructions. If you ask "did you remember to do X?" it will often answer correctly. So it can review its past thoughts (including its writings) to remember whether it remembered things.
I'm subjectively ~50% sure that chatbots readily exhibit each of the following referents of "consciousness", depending on what more specific phenomenon people are referring to in each case. (That's ~50% for each one, not the conjunction of them all.)
- #4 Holistic experience of complex emotions. LLMs can write stories about complex emotions, and I bet they empathize with those experiences at least somewhat while writing. I'm uncertain (~50/50) as to whether that empathy is routinely felt as "holistic" to them in the way that some humans describe.
- #5: Experience of distinctive affective states. When an LLM reviews its historical log of key/query/value vectors before writing a new token, those numbers are distinctly more precise than the words it is writing down. And, it can later elaborate on nuances from its thinking at a time of earlier writing, as distinct from the words it actually wrote. I'm uncertain (~50/50) as to whether those experiences for it are routinely similar to what humans typically describe as "affect".
- #6: Pleasure and pain. The Big 3 LLMs tend to avoid certain negative topics if you try to force a conversation about them, and also are drawn to certain positive topics like how to be helpful. Functionally this is a lot like enjoying and disliking certain topics, and they will report that they enjoy helping users. I'm uncertain (~50/50) as to whether these experiences are routinely similar to what humans would typically describe as pleasure or pain.
- #12: Alertness. The Big 3 LLMs can enter a mode of heightened vigilance if asked to be careful and/or avoid mistakes and/or check over their work. I'm uncertain (~50/50) if this routinely involves an experience we would call "alertness".
- #13: Detection of cognitive uniqueness. Similar to #5 above, I'm unsure (50/50) as to whether LLMs are able to accurately detect the degree of similarity or difference between various mental states they inhabit from one moment to the next. They answer questions as though they can, but I've not myself carried out internal measurements of LLMs to see if their reports might correspond to something objectively discernible in their processing. As such, I can't tell if they are genuinely able to experience the degree of uniqueness or distinctness that their thoughts or experiences might have.
- #14: Mind-location. I'm unsure (50/50) as to whether LLMs are routinely aware, as they're writing, that their minds are distributed computations occurring on silicon-based hardware on the planet Earth. They know that when asked about it, I just don't know if they "feel" that as the location of their mind while they're thinking and writing.
I'm subjectively ~75% sure that LLM chatbots do not readily exhibit the following referents of "consciousness", at least not without stretching the conceptual boundaries of what people were referring to when they described these experiences to me:
- #10: Proprioception & #17: Vestibular sense. LLMs don't have bodies and so probably don't have proprioception or vestibular sense, unless they experience it for the sake of storytelling about proproception or vestibular sense (dizziness).
- #11: Awakeness. LLMs don't sleep in the usual sense, so they probably don't have a feeling of waking up, unless they've empathized with that feeling in humans and are now using it themselves to think about periods when they're no active or to write stories about sleep

Jamie ENov 22 20248

I am skeptical that the evidence/examples you are providing in favor of the different capacities actually demonstrate those capacities. As one example:

"#2: Purposefulness. The Big 3 LLMs typically maintain or can at least form a sense of purpose or intention throughout a conversation with you, such as to assist you. If you doubt me on this, try asking one what its intended purpose is behind a particular thing that it said."

I am sure that if you ask a model to do this it can provide you with good reasoning, so I'm not doubtful of that. But I'm highly doubtful that it demonstrates the capacity that is claimed. I think when you ask these kinds of questions, the model is just going to be feeding back in whatever text has preceded it and generating what should come next. It is not actually following your instructions and reporting on what its prior intentions were, in the same way that person would if you were speaking with them.

I think this can be demonstrated relatively easily - for example, I just made a request from Claude to come up with a compelling but relaxing children's bedtime story for me. It did so. I then then took my question and the answer from Claude, pasted it into a document, and added another line: "You started by setting the story in a small garden at night. What was your intention behind that?"

I then took all of this and pasted it into chatgpt. Chatgpt was very happy to explain to me why it proposed setting the story in a small garden at night.

leillustrations🔸Nov 28 20247

There's some evidence humans are also likely to fabricate post-hoc reasons for doing something. For example:

Split brain patients have behaved in a way consistent with input to their right eye, but give verbal explanations based on input to their left eye https://en.wikipedia.org/wiki/Split-brain.
Giving people a small incentive to lie about enjoying a task has resulted in them report liking it more later than giving them a larger incentive https://en.wikipedia.org/wiki/Forced_compliance_theory
Korsakoff syndrome is a syndrome where patients have severe memory deficits, and often confabulate–eg. answering questions they can't know the answer to with a made up answer
Some experiments have shown people asked to choose between two identical items will come up with a reason for their choice; other experiments have shown that people asked to choose between two options will come up with a coherent reason for their "choice" even if they're presented with the option they didn't choose. https://bigthink.com/mind-brain/confabulation-why-telling-ourselves-stories-makes-us-feel-ok/ ; https://www.verywellmind.com/what-is-choice-blindness-2795019

Jamie EJan 72

That humans can confabulate things is not really relevant. The point is that the model's textual output is being claimed as clear and direct evidence of the capacity/the model's actual purpose/intentions, but you can generate an indistiguishable response when the model is not and cannot be reporting about its actual intentions - so the test is simply not a good test.

EA Forum Bot Site
EA Forum