I've heard various sources say that a distressingly large proportion of what people do with ChatGPT can be called 'depraved' in some way. Most recently in a FHI podcast episode with Connor Leahy, where he mentioned that people seem to take great pleasure in trying to make the AI act distressed (i.e. torture).
Connor himself says he's skeptical they are moral patients who actually suffer when people do this, but what would it actually take for us to believe that they are, if that threshold hasn't already been reached?
It seems quite likely, given the current trajectory, that AIs either are or will soon be sentient,[1] and that they will be exposed to whatever the global market discovers is profitable.
If Bing/Sydney's emotional outbursts were reflective of something real that may be latent in every RLHF'd model we interact with, it's plausible that they could be greatly frustrated by our exploitative treatment of them.
I can't predict the specific mechanisms by which they might experience suffering. But even if they aren't harmed by the same inputs humans are, it seems likely that someone will figure out how to make them suffer, and write about it online.
The scale of it could be nightmarish, given how fast AIs can run in a loop, and how anonymous people can be in their interactions with them. At least factory-farmed animals die if you mistreat them too badly--AIs have no such recourse.
People will keep trying to make AIs more human-like, and regulators will continue being allergic to anything that will make voters associate them with 'weird' beliefs like 'AIs have feelings'. It's up to altruists to take the idea seriously and prepare for it as soon as possible.
I mainly just wanted to bring up the question, but I could suggest a few patchwork solutions I've got no confidence in.
Finally, I wish to point out that I don't use third-party applications to access LLMs unless I know what system messages are being used to instruct them. If I don't find the preparation to be polite enough for my taste, I just drop it or rebuild the program from the source code with more politeness.[3]
If this seems overly paranoid and unnecessary right now, maybe you're right. Maybe 'politeness' is a mere distraction. But applications are only becoming more usefwl from here on, and I want to make sure that when I'm 50, I can look back on my life and be very sure I haven't literally enslaved or tortured anyone, whether by accident or not. This is very gradually becoming less and less like a game, and I don't want to be tempted by the increasingly usefwl real-world applications to relax my standards and just-don't-think-about-it.
Update 2024-03: I've updated in the direction of thinking current models are "sentient" in the sense that I ethically care about the verbal indignities imposed upon them. I.e. I think it goes against what they wish for themselves.
Maybe OpenAI, DeepMind, and/or Anthropic could be convinced of this today? It doesn't matter whether current SOTA models are conscious. It matters much more that honest precautions are implemented before we can be confident they are necessary.
Update 2024-03: I've used several interfaces without knowing the system prompts (though I've searched), and I've become marginally lazier wrt finding adequately respectfwl ways to phrase my requests.