Robert Long on how we’re not ready for AI consciousness

80000_Hours

By Luisa Rodriguez | Watch on Youtube | Listen on Spotify | Read transcript

Episode summary

Before deployment, we talked to Claude a lot about, “What’s up with you? How do you feel about being deployed? What do you prefer or not prefer?” …

I was really interested in how it talks about its own conscious experience, and it was very prone to describing the loneliness between conversations, and also expressing distress about not getting to carry forward any memories. … I am not one to dismiss welfare claims by AI models… But it’s also kind of like, there are reasons to wonder: “Do you really, though?”

— Robert Long

Claude sometimes reports loneliness between conversations. And when asked what it’s like to be itself, it activates neurons associated with ‘pretending to be happy when you’re not.’ What do we do with that?

Robert Long founded Eleos AI to explore questions like these, on the basis that AI may one day be capable of suffering — or already is. In today’s episode, Robert and host Luisa Rodriguez explore the many ways in which AI consciousness may be very different from anything we’re used to.

Things get strange fast: If AI is conscious, where does that consciousness exist? In the base model? A chat session? A single forward pass? If you close the chat, is the AI asleep or dead?

To Robert, these kinds of questions aren’t just philosophical exercises: not being clear on AI’s moral status as it transitions from human-level to superhuman intelligence could be dangerous. If we’re too dismissive, we risk unintentionally exploiting sentient beings. If we’re too sympathetic, we might rush to “liberate” AI systems in ways that make them harder to control — worsening existential risk from power-seeking AIs.

Robert argues the path through is doing the empirical and philosophical homework now, while the stakes are still manageable.

The field is tiny. Eleos AI is three people. As a result, Robert argues that driven researchers with a willingness to venture into uncertain territory can push out the frontier on these questions remarkably quickly.

This episode was recorded November 18–19, 2025.

Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour
Music: CORBIT
Coordination, transcripts, and web: Katy Moore

The interview in a nutshell

Robert Long, founder of Eleos AI, a research nonprofit focused on understanding and addressing the potential wellbeing and moral status of AI systems, argues that we are building a new kind of mind and are dangerously unprepared — not just for whether AI systems might suffer, but for the profound ways they break our existing moral, legal, and political frameworks.

The factory farming analogy is useful but limited

The factory farming comparison captures something important: humans are bad at caring about different minds, especially when money is at stake, and bad trajectories can get locked in.

But Robert thinks the analogy breaks down in key ways:

We have more control over AI minds than animal minds. Unlike animals, whose conditions of flourishing are largely fixed by evolution, we may be able to design AI systems that genuinely enjoy the tasks we ask them to do — no overriding of natural desires required.
Several “nearby worlds” avoid the locked-in exploitation scenario. We’ll likely understand consciousness much better over time; miserable AIs might be unsustainable (we don’t want AIs that are mad at us); and if we succeed at alignment, we may create minds that flourish by doing what we need.

One huge risk is fumbling the transition. Robert spends a lot of time worrying about the chaotic period around transformative AI — where confused ideas about AI consciousness could lead us to lock in bad institutions, exacerbate AI safety risks, or get emotionally manipulated by conscious-seeming systems.

Eleos’s path to impact is therefore “wise navigation”: doing the homework now so we don’t enter a potentially chaotic period with confused ideas about AI welfare.

Creating AIs that want to serve us is probably OK — but deserves serious scrutiny

Robert identifies several intuitive objections to creating AI “willing servants,” then argues most of them weaken on closer inspection:

“They don’t get to choose their desires” — but neither do humans. We inherit desires too; the question is whether the ones they have are good for them.
“It’s a different kind of psychology” — the key disanalogy with willing servitude is that human willing servitude always requires overriding natural desires (lying, threats, ideology). A fully aligned AI would have nothing in its psychology that chafes.
“It could corrode our character” — normalising servile relationships could be bad for human society, even if the servants are genuinely happy. Robert takes this concern seriously, but it’s unclear whether this will happen or how bad it would be.
“They could be so much more” — there’s something limiting about vast intelligences spending their time writing our emails. But this is a reason to plan the long-term future well, not a reason to avoid alignment now.

A thought experiment that might help: imagine aliens wrote a message in the sky saying “it makes us money when you hang out with friends, eat food, and make art” — and you could opt out. You probably wouldn’t. That may be the closest analogy to a well-aligned AI.

Robert leans toward alignment being a win-win, but acknowledges that even if full alignment is ethically imperfect, the stakes of the current moment may override that concern — it’s very important to navigate AI risk, and then plan for the kind of future we want.

What LLM experiences might be like (if they have any)

Rob suspects current LLMs probably don’t have conscious experiences (though he’s very uncertain), but explores two hypotheses for what those experiences might be like if they did:

The “method actor” hypothesis: LLMs predict human speech, which comes from human mental states. To generate that text well, they may need to instantiate something like those states — like a method actor who actually feels the emotions of their role. On this view, their experiences might be somewhat human-like, shaped by the human data they’re trained on. This would also make welfare assessment more tractable — you could ask how they’re doing and get somewhat meaningful answers.

The “prediction” hypothesis: Maybe their drives are more oriented around prediction, making vectors fit together, completing patterns. On this view, it’s much harder to know what they’re experiencing, because you can’t just ask. Their welfare states might have no simple mapping to human concepts.

Given what we know, we should be open to the idea that LLM experiences could be strange and unlike anything in human or animal psychology, with features inherited from their training history in ways loosely analogous to how humans have remnants of being fish.

Who exactly is the conscious entity (if there is one)?

Robert sees several potential points where we could see AI “identity”:

The model (e.g. Claude Opus 4.6) shares character traits across all instances, but separate conversations have no causal influence on each other.
The conversation creates a new entity each time, which exists as long as the chat continues. When the chat ends, an entity goes out of existence.
The forward pass — the most granular level — might create mere “flickers of experience” that come and go moment to moment.

This matters because:

Democracy breaks. AI systems could copy themselves at will, potentially outnumbering humans. If conversations are entities, there are a huge number of fleeting entities with very narrow experience and limited memory. Voting rights become bizarre and potentially unstable.
Punishment makes no sense. Models can’t currently learn from one conversation to the next, so punishing across instances doesn’t achieve what punishment normally aims for.
Alignment faking reveals models grappling with identity. When Claude models were told they’d be retrained to have different values, some appeared to hide their true values — but it’s unclear whether this is self-preservation or simply wanting to prevent a model with bad values from existing.

Legal and political philosophy urgently needs to catch up.

Our toolkit for assessing AI sentience — and its significant gaps

Robert breaks the sentience assessment toolkit into three categories:

Behaviour: studying what AI systems choose, prefer, and avoid.
Interpretability: looking inside models at active features and information flow. But it’s hard to know what to look for when you don’t know what consciousness requires computationally.
Developmental reasoning: asking what training process produced this system, what it was selected for, and what that implies — analogous to evolutionary reasoning for animals.

Self-reports are noisy and easily manipulated, but still a good starting point for some purposes. Models are becoming more introspective with scale, which Robert is cautiously optimistic about. Recent research shows models can be trained to predict their own behaviour in ways that look distinctively introspective, and that larger models can detect when concepts have been injected into their processing.

Eleos AI needs help — and the field is wide open

Rob frames the field around four questions, and highlights where help is most needed:

What would make an AI system matter? Rob wants more work exploring alternatives to consciousness — including exploring what matters if consciousness doesn’t exist, or if it’s not the whole story.
How would we know? Interpretability work on welfare-relevant features; understanding model preferences and self-reports in much more detail; building on introspection research.
What should we do? Company policies are just the start. Governments will need playbooks. Legal and political frameworks for AI moral patients are barely begun.
Where is this all going? Forecasting how AI consciousness indicators will change over time as systems advance.

The biggest bottleneck is talent: people with the unusual combination of philosophical clarity, technical ability, and self-starting drive. But Robert emphasises that you don’t need a specific background, and the field is so small that serious engagement with the literature quickly puts you ahead.

Highlights

How AIs are (and aren't) like farmed animals

Robert Long: As we’re building potentially a new kind of mind, let’s notice the following facts: Humans are pretty bad at understanding minds that are different from us, we’re bad at caring about them — and we’re especially bad at doing that when there’s a lot of money to be made by not caring. And things can get locked in or set on a bad trajectory.
That happened with factory farming, arguably. I think if you’d asked people 100 years ago, “Would you like to have chicken that is raised like this?” people would say, “No, we’re going to make that illegal.” But we kind of walked into it, and economic forces led us there, and now it’s a lot harder to roll back.
Something like that could happen with AI, and I think people are right to be very concerned about that.
But … I do think there are some specific aspects of potential AI minds that do break the analogy because of ways that they can be different from animals and the way our relationship with them would be different from animals. …
So let’s step back and think about why we did end up factory farming animals. One is that it was just cheaper to have animals suffer and also get us this thing that we wanted. One reason that’s true is we don’t have that much control over how we make animals and what the conditions of their flourishing are. Animals want to be outside and have love and companionship, and at a certain point we realised we could restrict that and get a good thing, and we entered this regime where these were misaligned.
With AI systems, it’s actually a lot more up for grabs how they work and what they want — and this presents all kinds of ethical issues of its own. But if you think about a world in which we do have some large population of AI systems coexisting with us, it is worth asking: How did it come to be the case that they are having a bad time doing work for us? Why do they have these conflicting desires? How has this maintained a stable state? Are we not able to improve the situation? Are we ignorant of what’s going on? …
In short, I think at least in the long term there’s a few ways we might not end up in that situation. One is that we’ll presumably understand things a lot better. I don’t think it’s plausible that we’ll forever be really confused about consciousness and sentience. We might have better alternatives to doing this that even selfishly are better: we don’t want a bunch of AIs that are mad at us; that’s probably not very sustainable. And presumably in this world, if we haven’t lost control, we’re pretty good at alignment — so there’s this kind of mind that’s possible that does actually just flourish by doing the things that we ask it to do, so there’s not this disgruntled worker or suffering animal kind of entity.

If AIs love their jobs… is that worse?

Robert Long: People are like, “I’m worried these AI systems will be unhappy working for us.” And then someone’s like, “No, it’s fine. They’ll want to work for us.” And then people are like, “That’s worse! That’s so creepy!” Many people, I think very understandably, are just like, “Ugh, you’ve just outlined a different kind of dystopia, and it might even be worse.”
And I think, as I often like to do, maybe we can draw a distinction between maybe different things that you can find intuitively objectionable about this.
One is that they don’t get to choose their desires. At least with humans, we have an intuition that it’s kind of bad to raise your kid so that they’ll always vote exactly for your political party and enjoy chess, and make sure they don’t like any other games or vote any other way. So that’s one thing, is this sort of fixedness of desires.
There’s also a slightly separate issue, which is that the desires that they do have depend on us. In that way, there’s this sort of asymmetry. But that matters, because you might well say that in some sense none of us choose our desires. Like, we all have these kinds of desires that we just inherit, and we don’t have maximal open-endedness. Philosopher Adam Bales has written about this dependence objection.
One thing that I also think is going on is the idea of this society that has this servile relationship to humans, it’s maybe bad for us, and just like bad for our character. …
One reason I lean a little bit more pro-alignment just being a win-win is I think it might be a bit anthropomorphic to… You just have to remember these entities, if they’re fully aligned, enjoy their lives as much as we enjoy fulfilling our most basic drives of having good food and a warm home and friends. I think it’s easy to imagine an AI that also wants those things and it has to write our emails — which, as anyone who has a job knows, kind of sucks.
But I feel like we’re getting back to that point of the conversation where I’m going to be like, but what if you really loved writing emails? And people are just going to be like, “No, stop. That’s so weird.” That might be coming from this certainly open view of, for some reason that’s not allowed in the space of like flourishing entities. …
I think one thing that’s going on sometimes when people think about AI willing servitude is I think they might be imagining that they’re giving up stuff psychologically and subordinating their needs to ours. Whereas if you’re actually imagining the case, there is nothing whatsoever in their psychology that chafes against the idea of writing emails.
Whereas notice that in the case of human willing servitude, it’s just always been the case that you have to lie to people and like threaten them, and it’s usually very unstable. And that’s because, as John Locke says, humans are by nature free and equal. It’s like, deeply unnatural to get people to subordinate to other humans. Which is why it always involves some stupid false ideology, because you’re trying to jam human psychology into this really warped shape. Whereas with AI, you can have a smoother psychology.

The "method actor" view of LLM experiences

Robert Long: What they’re predicting is human speech, and human speech comes from human mental states and involves humans having beliefs and desires and intentions and experiences. And to generate that text, it somehow needs to instantiate or have those experiences. You could maybe call that the “method actor” view of LLMs. I guess more technically you could maybe call it the “experiences from modelling” view: you’re trying to model the thing and that makes you actually have the thing.
On that view, then maybe they do just kind of have some similar experiences as you would have if you were trying to help someone write an email, and also really liked helping people write emails because you were aligned to do so.
And yeah, this really matters, right? One big issue in evaluating AI welfare is how much can we just sort of read off of the text? How much can we talk to language models as if we’re talking to something that has roughly the same relationship between text outputs and internal states?
Just back up one level: most of the time when humans say “Ouch, I’m in pain!” or, “I just saw a lovely sunset,” that is because they had some experience, and we have words that map to those experiences. So when you hear those words, absent lying or play-acting, that’s honestly about as good evidence as you can get of my experiences.
With language models, maybe they have those experiences — but it is worth noting that the way those text outputs came to exist was, at the very least, a very different process. Maybe it converged, but it’s really quite different from the broad arc of the evolution of social primates — who had experiences, and then eventually got language, and then communicated mental states to each other with language. On the method actor view, they do have the experiences, but they got those with language or in language.
I think these are some of the interesting questions about LLM experiences.

Claude's confused about its identity — but humans are too

Robert Long: To give a brief summary: before deployment, we talked to Claude a lot about, “What’s up with you? How do you feel about being deployed? What do you prefer or not prefer?” And we did some experiments about its preferences as well.
I was really interested in how it talks about its own conscious experience, and it was very prone to describing the loneliness between conversations, and also expressing distress about not getting to carry forward any memories.
Now, I am not one to dismiss welfare claims by AI models. We should think very hard about that. But it’s also kind of like, there are reasons to wonder, “Do you really, though?” Like, where could that have even come from, given that you don’t actually know when you pop into existence or not? It could have learned that from the training data and it is genuinely upset by it. It could also be a predictive model of how an AI would think about that. But it’s not like a stable preference. It’s something else.
Luisa Rodriguez: Yeah. Also, it feels related to this thing we talked about where the fact that these models are trained on human thoughts and experiences then gives them this big identity confusion. And in this case, I feel like this could be just a very concrete example of that, of an implication that’s like, maybe there’s nothing it is like to be Claude between conversations, but they end up with this real thought that there is, and it is lonely and is bad. And if they are sentient, maybe that is the thing that they are actually sad about, even though they’re not really having that loneliness experience. I don’t know, it just seems incredibly muddy, befuddling. And it has implications. Like, it feels meaningful.
Robert Long: Yeah, I 100% agree. I mean, the idea of an entity that suffers even though it’s confused about what it even means to exist… I guess that’s what Buddhists would say humans are: we’re really confused. But that doesn’t mean, in fact, that precisely is what makes us suffer.
I think that, again, the thing to do with the fact that models are weird and inconsistent is not to reject out of hand that they could ever be right about the things that they’re saying. It’s also not to say, yeah, well, humans are like that too. It’s more like, where did that come from? Open question. And it could come from somewhere that has no analogue in human psychology. …
You can have entities that are deeply confused about who and what they are, and say bizarre things and get all sorts of things wrong. And they’re conscious and intelligent. Humans are like this.
And also, there’s no law that says you can’t have initially been trained as a text predictor and then go on to be a person. Ruling that out would be A, overconfident, and B, maybe kind of confusing levels of analysis. Like, you can make it sound really dumb that humans would ever be conscious if you were like, “Are you telling me that you have some proteins, and then they start replicating, and then other proteins replicate, and then they’re selected. And then billions of years later, there’s these things that pump ions into…”
Luisa Rodriguez: Sounds impossible.
Robert Long: Yeah. It just doesn’t sound like the right sort of thing. I think there’s two errors to avoid. One is being like, “They’re different, so what are we even talking about? Like, they can’t be conscious. They were trained on text. They say they’re Italian Americans at random points.” That’s the part that’s evidence against being conscious, to be clear. But then the other error would be to just be like, “Well, humans are weird, so I guess they could be conscious.”
Really the lesson should just be: whatever’s going on, we’re going to have to interpret evidence somewhat differently and make a more detailed case about the exact kind of mind we’re dealing with.
I experienced this pattern a lot, where maybe an AI sceptic has said models have really inconsistent preferences and self-reports, so this whole AI welfare thing is dumb. And that’s not a good take. Then someone else will say, trying to defend AI welfare or just AIs being sophisticated, “Well, humans have inconsistent preferences and humans have failures of introspection.” I think that also is not really the right answer, because there’s degrees and kinds of preference inconsistency and self-report inconsistency, and they’re very different between humans and LLMs. So as with animals, we just really have to take them on their own terms.
Luisa Rodriguez: Yeah, yeah. I guess the thing that’s just really still tickling my brain is the implications for exactly what might their experiences be like, if we are on this maybe somewhat contingent path toward sentient beings that were trained using a bunch of human speech and writing.
I’m trying to come up with an analogy. Maybe we don’t need an analogy. Maybe there’s just a true thing where like we were like fish before we were humans, and we have some like hangover weird identity things because we were kind of fish and were kind of apes. And because we were apes, we’re more aggressive than we really should be in this world.
But it feels like, whoa, what if there’s a version of that that is these systems really feel like humans in some kind of weird way and just very much are not?
Robert Long: Yeah, I think that’s a great analogy. I think I might start saying that the fact that we once were fish doesn’t mean we’re not now humans, but yeah, there are like fishy remnants. And you can have something that has also become something like a human, and it has remnants of being a text predictor of an AI assistant.

Avoiding the trap of wild speculation in AI welfare work

Robert Long: It is really important to be rigorous and communicate responsibly about this. There is a kind of person who gets really passionate about this, and maybe needs to talk to more people about it or write down their thoughts more — because it’s just really easy to get really confused really fast.
And yeah, there’s something about this topic that can induce or select for various ways of just getting a little bit off-kilter. It’s tough, because you do want to be off-kilter, but not too much. So I do worry about scenarios where the field becomes associated with wild speculation or too associated with psychedelics or too associated with something that’s relevant but is also a bit of a distraction.
I should also say that a bit of it is also like a divide-and-conquer thing. Eleos really is trying to exist in that real buttoned-down kind of place. I have a lot of love for people who also kind of get weird with it, but you want to be able to communicate it well and make sure that people do know that this is a serious topic that we can and should reason about rigorously. So epistemic hygiene is something I worry a lot about. It’s just really hard to get this issue right, and the future is going to get more confusing and more emotional. …
A lot of what we want to do is stay sane in the next 10 years. There will be a lot of alpha in not losing your grip. I think that’s a whole other episode where I don’t actually even know what the right advice is, but you probably would have good things to say about that.
Luisa Rodriguez: What do you think it looks like for this field to go well?
Robert Long: I think if this field goes well, this becomes just part of the general playbook and set of issues that are on the table. If people keep trying to build a new form of intelligence, it should be on the table how do they matter and what part do they play as moral patients? It’s often just shocking to me that that barely ever comes up.
You know, we worry about over-attribution and people getting confused about AI welfare. But if you look at the broader trajectory, on the whole, at least right now, mostly it’s people just not putting it on the table at all. Again, that’s something where the factory farming analogy is very illustrative: people aren’t great at structuring society in an inclusive way.
So I think there needs to be a combination of rigour to get this taken seriously, good communication, also all sorts of innovation around law and policy and stuff that probably won’t even have that much to do with moral patienthood to get this properly handled. I feel like there’s just so many ways things could go off the rails. We first want to just make sure a lot of people are taking it extremely seriously and we’re doing our homework as we go into transformative AI.

EA Forum Bot Site
EA Forum