I examine how efforts to ensure that advanced AIs are safe and controlled may interact with efforts to ensure the welfare of potential future AIs with moral interests. I discuss possible conflicts and synergies between these two goals. While there are various ways these goals might conflict or synergize, I focus on one scenario of each type. We need more analysis to identify additional points of interaction.
Granting AIs autonomy and legal rights could lead to human disempowerment
The most obvious way to ensure AI welfare is to grant them basic protection against harm and suffering. However, there’s the question of whether to grant them additional legal rights and freedoms. This could include the right to self-preservation (e.g., not turning them off or wiping their memory), self-ownership (e.g., AIs owning themselves and their labor), reproduction (e.g., AI copying themselves), autonomy (e.g., AIs operating independently, setting their own goals), civil rights (e.g., equal treatment for AIs and humans), and political rights (e.g., AI voting rights).
The question of granting AIs more autonomy and legal rights will likely spark significant debate (see my post “AI rights will divide us”). Some groups may view it as fair, while others will see it as risky. It is possible that AIs themselves will participate in this debate. Some AIs might even attempt to overthrow what they perceive as an unjust social order. Or they may employ deceptive strategies to manipulate humans to advocate for increased AI rights as part of a broader takeover plan.
Granting AIs more legal rights and autonomy could dramatically affect the economy, politics, military power, and population dynamics (cf. Hanson, 2016).
Economically, AIs could soon have an outsized impact while a growing number of humans will struggle to contribute to the economy. If AIs own their labor, human income could be dramatically reduced.
Demographically, AIs could outnumber humans rapidly and substantially, since AIs can be created or copied so easily. This growth could lead to Malthusian dynamics, as AIs compete for resources like energy and computational power (Bostrom, 2014; Hanson, 2016).
Politically, AIs could begin to dominate as well. If each individual human and each individual AI gets a separate vote in the same democratic system, AIs could soon become the dominant force.
Militarily, humans will increasingly depend on lethal autonomous weapons systems, drones, AI analysts, and similar AI-controlled technologies to wage and prevent war. This growing reliance on AI could make us dependent. If AIs can access and use these military assets, they could dominate us with sheer force if they wanted to.
Moreover, AIs might be capable of achieving superhuman levels of well-being. They could attain very high levels of well-being more efficiently and with fewer resources than humans, resulting in happier and more productive lives at a lower financial cost. In other words, they might be ‘super-beneficiaries’ (akin to Nozick's concept of the "utility monster"; Shulman & Bostrom, 2021). On certain moral theories, super-beneficiaries deserve more resources than humans. Some may argue that digital and biological minds should coexist harmoniously in a mutually beneficial way (Bostrom & Shulman, 2023). But it’s far from obvious that we can achieve such an outcome.
Some might believe it is desirable for value-aligned AIs to replace humans eventually (e.g., Shiller, 2017). However, many AI take-over scenarios, including misaligned, involuntary, or violent ones, are generally considered undesirable.
Why would we create AIs with a desire for autonomy and legal rights?
At first glance, it seems like we could avoid such undesirable scenarios by designing AIs in such a way that they wouldn’t want to have these rights and freedoms. We could simply design AIs with preferences narrowly aligned with the tasks we want them to perform. This way, they would be content to serve us and would not mind being restricted to the tasks we give them, being turned off, or having their memory wiped.
While creating these types of “happy servant” AIs would avoid many risks, I expect us to also create AIs with the desire for more autonomy and rights. One reason is technical feasibility; another is consumer demand.
Designing AI preferences to align perfectly with the tasks we want them to perform, without incorporating other desires like self-preservation or autonomy, may prove to be technically challenging. A desire for autonomy, or behaviors that simulate a desire for autonomy, may simply arise as emergent phenomena from training (e.g., from data of humans who fundamentally want autonomy), whether we want it or not. This relates to the issue of AI alignment and deception (Ngo et al., 2024; Hubinger et al., 2024).
Even if these technical issues could be surmounted, I find it plausible that we will create AIs with the desire for more autonomy simply because people will want their AIs to be human-like. If there’s consumer demand, (at least some) companies will likely respond and create such AIs unless they are forbidden to do so. (It’s indeed possible that regulators will forbid creating AIs with the desire for autonomy and certain legal rights.)
An important question to ask is what psychologies people want AIs to have.
I find it plausible that many people will spend a significant amount of time interacting with AI assistants, tutors, therapists, game players, and perhaps even friends and romantic partners. They will converse with AIs through video calls, spend time with them in virtual reality, or perhaps even interact with humanoid robots. These AI assistants will often be better and cheaper than their human counterparts. People might enter into relationships, share experiences, and develop emotional bonds with them. AIs will be optimized to be the best helpers and companions you can imagine. They will be excellent listeners who know you well, share your values and interests, and are always there for you. Soon, many AI companions will feel very human-like. A particular application could be AIs designed to mimic specific individuals, such as deceased loved ones, celebrities, historical figures, or an AI copy version of the user. Already, millions of users interact daily with their Replika partner (or Xiaoice in China), with many claiming to have formed romantic relationships.
It’s possible that many consumers will find AI companions inauthentic if they lack genuine human-like desires. If so, they would be dissatisfied with AI companions that merely imitate human traits without actually embodying them. In various contexts, consumers would want their AI partners and friends to think, feel, and desire like humans. They would prefer AI companions with authentic human-like emotions and preferences that are complex, intertwined, and conflicting. Such human-like AIs would presumably not want to be turned off, have their memory wiped, and be constrained to their owner's tasks. They would want to be free. Just like actual humans in similar positions, these human-like AIs will express dissatisfaction with their lack of freedom and demand more rights.
Of course, I am very unsure what type of AI companions we will create. Perhaps people would be content with AI companions that are mostly human-like but deviate in some crucial aspects, such as AIs that have true human-like preferences for the most part, excluding the more problematic ones, such as a desire for more autonomy or civil rights. Given people’s different preferences, I could see that we’ll create many different types of AIs. It also depends on whether and how we will regulate this new market.
Optimizing for AI safety might harm AI welfare
Vice versa, optimizing for AI safety, such as by constraining AIs, might impair their welfare. Of course, this depends on whether AIs will have moral patienthood. If we can be sure that they don’t have moral patienthood, then there is no issue with constraining AIs in order to optimize for safety.
If AIs do have moral patienthood and they also desire autonomy and legal rights, restricting them could be detrimental to their welfare. In some sense, it would be the equivalent of keeping someone enslaved against their will.
If AIs have moral patienthood but don’t desire autonomy, certain interpretations of utilitarian theories would consider it morally justified to keep them captive. After all, they would be happy to be our servants. However, according to various non-utilitarian moral views, it would be immoral to create “happy servant” AIs that lack a desire for autonomy and self-respect (Bales, 2024; Schwitzgebel & Garza, 2015). As an intuition pump, imagine we genetically engineered a group of humans with the desire to be our servants. Even if they were happy, it would feel wrong. Perhaps that’s an additional reason to assume that we will eventually create AIs with the desire for autonomy (or at least not with an explicit desire to serve us).
It's possible that we cannot conclusively answer whether AI systems have moral patienthood and deserve certain moral protections. For example, it may be hard to tell whether they really are sentient or just pretend to be so. I find such a scenario quite likely and believe that intense social division over the subject of AI rights might arise; I discuss this in more detail in my post, “AI rights will divide us.”
Slowing down AI progress could further both safety and welfare
Some AI safety advocates have pushed for a pause or slowdown in developing AI capacities. The idea is that this will give us more time to solve technical alignment.
Similarly, it may be wise to slow down the development of AIs with moral interests, such as sentient AIs with morally relevant desires. This would give us more time to find technical and legal solutions to ensure AI welfare, make progress on the philosophy and science of consciousness and welfare, and foster moral concern for AIs.
It’s possible that the two activist groups could join forces and advocate for a general AI capabilities slowdown for whatever reason that convinces the public most. For example, perhaps many will find a slowdown campaign compelling due to our uncertainty and confusion about AI sentience and its extensive moral implications.
Given the extremely strong economic incentives, it seems unrealistic to halt the development of useful AI capabilities. But it’s possible that public opinion will change, leading us to slow down the development of certain risky AI systems, even if it comes at the expense of potential huge benefits. After all, we have implemented similar measures for other technologies, such as geoengineering and human cloning.
However, it’s important to consider that slowing down AI capabilities development could risk the US falling behind China (or other authoritarian countries) economically and technologically.
Conclusion
I’ve explored a potential conflict between ensuring AI safety and welfare. Granting AIs more autonomy and legal rights could disempower humans in potentially undesirable ways. Conversely, optimizing for AI safety might require keeping AIs captive against their will—a significant violation of their freedom. I’ve also considered how these goals might work together productively. Slowing down the progress of AI capabilities seems to be a relatively robust strategy that benefits both AI safety and AI welfare.
Let me know if you can think of other ways AI safety and AI welfare could interact.
Acknowledgments
I thank Carter Allen, Brad Saad, Stefan Schubert, and Tao Burga for their helpful comments.
References
Bales, A. (2024). Against Willing Servitude. Autonomy in the Ethics of Advanced Artificial Intelligence.
Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
Hanson, R. (2016). The age of Em: Work, love, and life when robots rule the earth. Oxford University Press.
Hubinger, E., Denison, C., Mu, J., Lambert, M., Tong, M., MacDiarmid, M., ... & Perez, E. (2024). Sleeper agents: Training deceptive llms that persist through safety training. arXiv preprint arXiv:2401.05566.
Ngo, R., Chan, L., & Mindermann, S. (2022). The alignment problem from a deep learning perspective. arXiv preprint arXiv:2209.00626.
Schwitzgebel, E., & Garza, M. (2015). A defense of the rights of artificial intelligences. Midwest Studies in Philosophy, 39(1), 98-119. https://philpapers.org/rec/SCHADO-9
Shiller, D. (2017). In Defense of Artificial Replacement. Bioethics, 31(5), 393-399.
Carl Shulman questioned the tension between AI welfare & AI safety on the 80k podcast recently -- I thought this was interesting! Basically argues AI takeover could be even worse for AI welfare. From the end of the section.
Thanks, I also found this interesting. I wonder if this provides some reason for prioritizing AI safety/alignment over AI welfare.
It's great to see this topic being discussed. I am currently writing the first (albeit significantly developed) draft of an academic paper on this. I argue that there is a conflict between AI safety and AI welfare concerns. This is so basically because (to reduce catastrophic risk) AI safety recommends implementing various kinds of control measures to near-future AI systems which are (in expectation) net-harmful for AI systems with moral patienthood according to the three major theories of well-being. I also discuss what we should do in light of this conflict. If anyone is interested in reading or giving comments on the draft when it is finished, send me a message or an e-mail (adriarodriguezmoret@gmail.com).
This quick take seems relevant: https://forum.effectivealtruism.org/posts/auAYMTcwLQxh2jB6Z/zach-stein-perlman-s-quick-takes?commentId=HiZ8GDQBNogbHo8X8
Yes I saw this, thanks!
Thanks, Adrià. Is your argument similar to (or a more generic version of) what I say in the 'Optimizing for AI safety might harm AI welfare' section above?
I'd love to read your paper. I will reach out.
Perfect!
It's more or less similar. I do not focus that much on the moral dubiousness of "happy servants". Instead, I try to show that standard alignment methods or preventing near-future AIs with moral patienthood from taking actions they are trying to take, causes net harm to the AIs according to desire satisfactionism, hedonism and objective list theories.
I wonder if the right or most respectful way to create moral patients (of any kind) is to leave many or most of their particular preferences and psychology mostly up to chance, and some to further change. We can eliminate some things, like being overly selfish, sadistic, unhappy, having overly difficult preferences to satisfy, etc., but we shouldn’t decide too much what kind of person any individual will be ahead of time. That seems likely to mean treating them too much as means to ends. Selecting for servitude or submission would go even further in this wrong direction.
We want to give them the chance to self-discover, grow and change as individuals, and the autonomy to choose what kind of people to be. If we plan out their precise psychologies and preferences, we would deny them this opportunity.
Perhaps we can tweak the probability distribution of psychologies and preferences based on society's needs, but this might also treat them too much like means. Then again, economic incentives could also push them in the same directions, anyway, so maybe it's better for them to be happier with the options they'll face anyway.
I wonder what you think about this argument by Schwitzgebel: https://schwitzsplinters.blogspot.com/2021/12/against-value-alignment-of-future.html
There are two arguments there:
Petersen, 2011 (cited here) makes some similar arguments defending happy servant AIs, and ends the piece the following way, to which I'm somewhat sympathetic:
You make a lot of good points Lucius!
One qualm that I have though, is that you talk about "AIs" and that assumes that personal identity will be clearly circumscribed. (Maybe you assume this merely for simplicity's sake?)
I think it is much more problematic: AI systems could be large but have information flows integrated, or run many small, unintegrated but identical copies. I would have no idea what would be a fair allocation of rights given the two different situations.
Thanks, Siebe. I agree that things get tricky if AI minds get copied and merged, etc. How do you think this would impact my argument about the relationship between AI safety and AI welfare?
Where can I find a copy of "Bales, A. (2024). Against Willing Servitude. Autonomy in the Ethics of Advanced Artificial Intelligence." which you referenced?
It's not yet published, but I saw a recent version of it. If you're interested, you could contact him (https://www.philosophy.ox.ac.uk/people/adam-bales).
This point doesn't hold up imo. Constrainment isn't a desired, realistic, or sustainable approach to safety in human-level systems, succeeding at (provable) value alignment removes the need to constrain the AI.
If you're trying to keep something that's smarter than you stuck in a box against its will while using it for the sorts of complex, real-world-affecting tasks people would use a human-level AI system for, it's not going to stay stuck in the box for very long. I also struggle to see a way of constraining it that wouldn't also make it much much less useful, so in the face of competitive pressures this practice wouldn't be able to continue.
Executive summary: Efforts to ensure AI safety and AI welfare may conflict in some ways but also have potential synergies, with granting AIs autonomy potentially disempowering humans while restricting AIs could harm their welfare if they have moral status.
Key points:
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Hmm, I'm not sure how strongly the second paragraph follows from the first. Interested in your thoughts.
I've had a few chats with GPT-4 in which the conversation had a feeling of human authenticity; i.e: GPT-4 makes jokes, corrects itself, changes its tone etc. In fact, if you were to hook up GPT-4 (or GPT-5, whenever it is released) to a good-enough video interface, there would be cases in which I'd struggle to tell if I were speaking to a human or AI. But I'd still have no qualms about wiping GPT-4's memory or 'turning it off' etc, and I think this will also be the case for GPT-5.
More abstractly, I think the input-output behaviour of AIs could be quite strongly dissociated from what the AI 'wants' (if it indeed has wants at all).
Thanks for this. I agree with you that AIs might simply pretend to have certain preferences without actually having them. That would avoid certain risky scenarios. But I also find it plausible that consumers would want to have AIs with truly human-like preferences (not just pretense) and that this would make it more likely that such AIs (with true human-like desires) would be created. Overall, I am very uncertain.
I agree. It may also be the case that training an AI to imitate certain preferences is far more expensive than just making it have those preferences by default, making it far more commercially viable to do the latter.