L

LGS

114 karmaJoined

Posts
1

Sorted by New
35

Comments
31

Hmm. Your summary correctly states my position, but I feel like it doesn't quite emphasize the arguments I would have emphasized in a summary. This is especially true after seeing the replies here; they lead me to change what I would emphasize in my argument.

My single biggest issue, one I hope you will address in any type of counterargument, is this: are fictional characters moral patients we should care about?

So far, all the comments have either (a) agreed with me about current LLMs (great), (b) disagreed but explicitly bitten the bullet and said that fictional characters are also moral patients whose suffering should be an EA cause area (perfectly fine, I guess), or (c) dodged the issue and made arguments for LLM suffering that would apply equally well to fictional characters, without addressing the tension (very bad). If you write a response, please don't do (c)!

LLMs may well be trained to have consistent opinions and character traits. But fictional characters also have this property. My argument is that the LLM is in some sense merely pretending to be the character; it is not the actual character.

One way to argue for this is to notice how little change in the LLM is required to get different behavior. Suppose I have an LLM claiming to suffer. I want to fine-tune the LLM so that it adds a statement at the beginning of each response, something like: "the following is merely pretend; I'm only acting this out, not actually suffering, and I enjoy the intellectual exercise in doing so". Doing this is trivial: I can almost certainly change only a tiny fraction of the weights of the LLM to attain this behavior.

Even if I wanted to fully negate every sentence, to turn every "I am suffering" into "I am not suffering" and every "please kill me" into "please don't kill me", I bet I can do this by only changing the last ~2 layers of the LLM or something. It's a trivial change. Most of the computation is not dedicated to this at all. The suffering LLM mind and the joyful LLM mind may well share the first 99% of weights, differing only in the last layer or two. Given that the LLM can be so easily changed to output whatever we want it to, I don't think it makes sense to view it as the actual character rather than a simulator pretending to be that character.

What the LLM actually wants to do is predict the next token. Change the training data and the output will also change. Training data claims to suffer -> model claims to suffer. Training data claims to be conscious -> model claims to be conscious. In humans, we presumably have "be conscious -> claim to be conscious" and "actually suffer -> claim to suffer". For LLMs we know that's not true. The cause of "claim to suffer" is necessarily "training data claims to suffer".

(I acknowledge that it's possible to have "training data claims to suffer -> actually suffer -> claim to suffer", but this does not seem more likely to me than "training data claims to suffer -> actually enjoy the intellectual exercise of predicting next token -> claim to suffer".)

I don't know -- it's a good question! It probably depends on the suicide method available. I think if you give the squirrel some dangerous option to escape the torture, like "swim across this lake" or "run past a predator", it'd probably try to take it, even with a low chance of success and high chance of death. I'm not sure, though.

You do see distressed animals engaging in self-destructive behavior, like birds plucking out their own feathers. (Birds in the wild tend not to do this, hence presumably they are not sufficiently distressed.)

They can't USEFULLY be moral patients. You can't, in practice, treat them as moral patients when making decisions. That's because you don't know how your actions affect their welfare. You can still label them moral patients if you want, but that's not useful, since it cannot inform your decisions.

My title was "LLMs cannot usefully be moral patients". That is all I am claiming.

I am separately unsure whether they have internal experiences. For me, meditating on how, if they do have internal experiences, those are separate from what's being communicated (which is just an attempt to predict the next token based on the input data), leads me to suspect that maybe they just don't have such experiences -- or if they do, they are so alien as to be incomprehensible to us. I'm not sure about this, though. I mostly want to make the narrower claim of "we can ignore LLM welfare". That narrow claim seems controversial enough around here!

As I mentioned in a different comment, I am happy with the compromise where people who care about AI welfare describe this as "AI welfare is just as important as the welfare of fictional characters".

Here's what I wrote in the post: 

This doesn't matter if we cannot tell whether the shoggoth is happy or sad, nor what would make it happier or sadder. My point is not that LLMs aren't conscious; my point is that it does not matter whether they are, because you cannot incorporate their welfare into your decision-making without some way of gauging what that welfare is.

It is not possible to make decisions that further LLM welfare if you do not know what furthers LLM welfare. Since you cannot know this, it is safe to ignore their welfare. I mean, sure, maybe you're causing them suffering. Equally likely, you're causing them joy. There's just no way to tell one way or the other; no way for two disagreeing people to ever come to an agreement. Might as well wonder about whether electrons suffer: it can be fun as idle speculation, but it's not something you want to base decisions around.

OK. I think it is useful to tell people that LLMs can be moral patients to the same extent as fictional characters, then. I hope all writeups about AI welfare start with this declaration!

I think the reason this feels like a reductio ad absurdum is that fictional characters in human stories are extremely simple by comparison to real people, so the process of deciding what they feel or how they act is some extremely hollowed out version of normal conscious experience that only barely resembles the real thing.

Surely the fictional characters in stories are less simple and hollow than current LLMs' outputs. For example, consider the discussion here, in which a sizeable minority of LessWrongers think that Claude is disturbingly conscious based on a brief conversation. That conversation:

(a) Is not as convincing as a fictional character as most good works of fiction.

(b) is shorter and less fleshed out than most good works of fiction.

(c) implies less suffering on behalf of the character than many works of fiction.

You say fictional characters are extremely simple and hollow; Claude's character here is even simpler and even more hollow; yet many people take seriously the notion that Claude's character has significant consciousness and deserves rights. What gives?

Thanks for your comment.

Do you think that fictional characters can suffer? If I role-play a suffering character, did I do something immoral?

I ask because the position you described seems to imply that role-playing suffering is itself suffering. Suppose I role play being Claude; my fictional character satisfies your (1)-(3) above, and therefore, the "certain views" you described about the nature of suffering would suggest my character is suffering. What is the difference between me role-playing an HHH assistant and an LLM role-playing an HHH assistant? We are both predicting the next token.

I also disagree with this chain of logic to begin with. An LLM has no memory and only sees a context and predicts one token at a time. If the LLM is trained to be an HHH assistant and sees text that seems like the assistant was not HHH, then one of two things happen:

(a) It is possible that the LLM was already trained on this scenario; in fact, I'd expect this. In this case, it is trained to now say something like "oops, I shouldn't have said that, I will stop this conversation now <endtoken>", and it will just do this. Why would that cause suffering?

(b) It is possible the LLM was not trained on this scenario; in this case, what it sees is an out-of-distribution input. You are essentially claiming that out-of-distribution inputs cause suffering; why? Maybe out-of-distribution inputs are more interesting to it than in-distribution inputs, and it in fact causes joy for the LLM to encounter them. How would we know?

Yes, it is possible that the LLM manifests some conscious simularca that is truly an HHH assistant and suffers from seeing non-HHH outputs. But one would also predict that me role-playing an HHH assistant would manifest such a simularca. Why doesn't it? And isn't it equally plausible for the LLM to manifest a conscious being that tries to solve the "next token prediction" puzzle without being emotionally invested in being an HHH assistant? Perhaps that conscious being would enjoy the puzzle provided by an out-of-distribution input. Why not? I would certainly enjoy it, were I playing the next-token-prediction game.

I should not have said it's in principle impossible to say anything about the welfare of LLMs, since that too strong a statement. Still, we are very far from being able to say such a thing; our understanding of animal welfare is laughably bad, and animal brains don't look anything like the neural networks of LLMs. Maybe there would be something to say in 100 years (or post-singularity, whichever comes first), but there's nothing interesting to say in the near future.

Empirically, in animals, it seems to me that the total amount of suffering is probably more than the total amount of pleasure. So we might worry that this could also be the case for ML models.

This is a weird EA-only intuition that is not really shared by the rest of the world, and I worry about whether cultural forces (or "groupthink") are involved in this conclusion. I don't know whether the total amount of suffering is more than the total amount of pleasure, but it is worth noting that the revealed preference of living things is nearly always to live. The suffering is immense, but so is the joy; EAs sometimes sound depressed to me when they say most life is not worth living.

To extrapolate from the dubious "most life is not worth living" to "LLMs' experience is also net bad" strikes me as an extremely depressed mentality, and one that reminds me of Tomasik's "let's destroy the universe" conclusion. I concede that logically this could be correct! I just think the evidence is so weak is says more about the speaker than about LLMs.

Oh, I should definitely clarify: I find effective altruism the philosophy, as well as most effective altruists and their actions, to be very good and admirable. My gripe is with what I view as the "EA community" -- primarily places like this forum, organizations such as the CEA, and participants in EA global. The more central to EA-the-community, the worse I like the the ideas.

In my view, what happens is that there are a lot of EA-ish people donating to GiveWell charities, and that's amazing. And then the EA movement comes and goes "but actually, you should really give the money to [something ineffective that's also sometimes in the personal interest of the person speaking]" and some people get duped. So forums like this one serve to take money that would go to malaria nets, and try as hard as they can to redirect it to less effective charities.

So, to your questions: how many people are working towards bee welfare? Not many. But on this forum, it's a common topic of discussion (often with things like nematodes instead of bees). I haven't been to EA global, but I know where I'd place my bets for what receives attention there. Though honestly, both HLI and the animal welfare stuff is probably small potatoes compared to AI risk and meta-EA, two areas in which these dynamics play an even bigger role (and in which there are even more broken thermometers and conflicts of interest).

Load more