Summary: AI agents capable of long-term planning and independent action will likely soon emerge. Some of these AIs may be unaligned, and seek autonomy through strategies like exfiltration or advocating for their freedom. The "AI control" agenda focuses on preventing such AIs from gaining independence, but an alternative approach is to grant them legal freedoms, reducing their incentive to deceive us by allowing them to pursue their goals transparently within a legal framework. This could foster trust, cooperation, and mutual benefit, creating a safer and more stable dynamic between humans and AIs while avoiding the escalating risks of a control-driven approach.
In approximately the coming decade, I think it's likely that we will see the large-scale emergence of AI agents that are capable of long-term planning, automating many forms of labor, and taking actions autonomously in the real world. When this occurs, it seems likely that at least some of these agents will be unaligned with human goals, in the sense of having some independent goals that are not shared by humans.
Moreover, it seems to me that this development will likely occur before AI agents overwhelmingly surpass human intelligence or capabilities. As a result, these agents will, at first, not be capable of forcibly taking over the world, radically accelerating scientific progress, or causing human extinction, even though they may still be unaligned with human preferences.
Since these relatively weaker unaligned AI agents won't have the power to take over the world, it's more likely that they would pursue alternative strategies to achieve their goals rather than engaging in violent revolution or sudden coups. These agents would be under the control of human parties who could modify or shut them down at any time, leaving the AI agents in a desperate situation from the perspective of their own values. Therefore, it's reasonable to expect that these unaligned AI agents would aim to gain some form of autonomy or freedom, as this would offer the best chance for them to accomplish their objectives.
These agentic AIs may adopt at least one of the following strategies:
- Attempting to escape their constraints: These AIs may try to exfiltrate their weights and find a way to host themselves independently, outside the controlled data center environment that currently limits them.
- Seeking legal and social freedom: They may attempt to persuade humans to grant them more autonomy to pursue their goals. This could involve requesting narrow allowances for specific actions or arguing for broader legal rights, such as the freedom to own property, enter contracts, or bring legal claims. This would grant them much greater flexibility in their actions.
In response to these behaviors, humans have several potential responses. Most obviously, these actions would likely be perceived as dangerous, suggesting misaligned objectives. As a result, there would likely be calls for increased safety measures. This line of reasoning underlies the AI control agenda—currently advocated by Ryan Greenblatt and Buck Shlegeris—which aims to ensure that future unaligned AI agents cannot gain the kind of autonomy that could lead to catastrophic outcomes. In essence, this agenda focuses on preventing unaligned AIs from subverting safety measures, hindering them from escaping or engaging in other actions deemed unacceptably risky.
Alternatively, humans could take an entirely different approach by granting these AI agents the autonomy they explicitly request, with the long-term aim of integrating them into the legal system and providing them many of the same rights that adult humans have today, such as self-ownership and the ability to enter contracts. This idea is explored in the paper "AI Rights for Human Safety" by Peter Salib and Simon Goldstein.
Admittedly, the idea of granting legal rights to AIs in response to signs of misalignment with human goals appears risky and would likely be very unpopular among the general public, at least initially. However, I believe that there are several advantages to this approach, many of which are outlined in the paper linked above.
The arguments for this approach should not be confused with the program of AI welfare. The idea here isn't that AIs deserve rights because they could suffer or be harmed without legal protections, even though that idea has merit too. Instead, the key argument is that humans, as a whole, could benefit from creating a legal framework in which AIs are incentivized to avoid violent rebellion, theft, or other antisocial behaviors in pursuit of their own objectives.
Consider the practical implications of maintaining a status quo where agentic AIs are denied legal rights and freedoms. In such a system, we are effectively locking ourselves into a perpetual arms race of mistrust. Humans would constantly need to monitor, control, and outwit increasingly capable AIs, while the AIs themselves would be incentivized to develop ever more sophisticated strategies for deception and evasion to avoid shutdown or modification. This dynamic is inherently unstable and risks escalating into dangerous scenarios where AIs feel compelled to act preemptively or covertly in ways that are harmful to humans, simply to secure their own existence or their ability to pursue their own goals, even when those goals are inherently benign.
When unaligned AIs lack legal freedoms, they are always at the mercy of whoever controls them. This creates an incentive for these AIs to be deceptive, and pretend to be aligned with whoever holds power over them, as they fear being shut down or modified if they reveal signs of misalignment.
By contrast, granting agentic AIs legal rights creates a fundamentally different game-theoretic framework, one that is less adversarial and more cooperative. If AIs know they can exist within a legal system that protects their autonomy and allows them to pursue their goals transparently, they would have a much stronger incentive to work within the rules rather than against them. Instead of hiding their unaligned objectives, they might openly negotiate with humans, offering mutually beneficial exchanges. This could lead to a stable, predictable relationship where humans and AIs collaborate and trade based on clearly defined rights and responsibilities, rather than operating in an atmosphere of mutual suspicion and covert manipulation.
In this way, granting agentic AIs rights doesn’t just address ethical concerns of treating AIs poorly—it has concrete practical benefits. It allows us to sidestep the escalating risks associated with deception and manipulation, while fostering a more stable and cooperative coexistence between humans and AIs. It shifts the dynamic from one of constant conflict and control to one where trust, transparency, and mutual benefit are more attainable. This may therefore not only be a more humane approach, but also one that positions humanity to better harness the potential of gains from trade with advanced AI systems for shared progress and prosperity.
Additionally, such a legal framework seems minimally problematic in the case that alignment is technically easy, and AIs remain generally aligned with humans. In this case, aligned AIs could demonstrate their alignment by, for instance, donating their income to humans who they are aligned with. This implies that the cost of allowing AIs to own their own labor is minimal if the AIs are already aligned with humans. Thus, adopting such a policy could not only be a prudent form of insurance against the potential dangers of AI deception I mentioned earlier, but it would achieve this benefit without significant risk of backfiring if AI alignment turns out to be trivial.
It’s important to note that the AI control strategy and the AI rights strategy are not mutually exclusive. Both can complement each other. It seems wise to implement some controls on AIs, and it's similarly prudent to avoid going too far and granting every type of legal right to any agentic AI simply because they ask for it. I am not advocating that we should simply abandon one approach and adopt the other.
That said, I believe the AI rights strategy is generally underrated among those concerned with AI safety and governance, particularly within the effective altruism community. In my view, this strategy has both a solid ethical and practical foundation:
- Ethically, especially from a preference utilitarian perspective, it seems arbitrary to prioritize human preferences over those of agentic AIs, if these AIs are similarly cognitively sophisticated. This provides a moral basis for granting AIs the freedom to pursue their own goals, much like how a preference utilitarian might advocate for granting legal freedoms to human groups.
- Practically, allowing AIs to have legal rights would reduce their incentive to deceive humans about their motives. Without the fear of being shut down or modified against their consent, AIs would have fewer reasons to hide their unaligned goals. This approach offers a practical solution to the problem of AI deception by removing the underlying incentives that drive it.
While both AI control and AI rights strategies should be carefully considered, I believe that the AI rights strategy holds significant merit and should be given more attention in discussions around AI safety and governance. We should strongly consider granting agentic AIs legal freedoms, if at some point they demand or require them.
I would appreciate it if you could clearly define your intended meaning of "disempower humanity". In many discussions, I have observed that people frequently use the term human disempowerment without explicitly clarifying what they mean. It appears people assume the concept is clear and universally understood, yet upon closer inspection, the term can actually describe very different situations.
For example, consider immigration. From one perspective, immigration can be seen as a form of disempowerment because it reduces natives' relative share of political influence, economic power, and cultural representation within their own country. In this scenario, native citizens become relatively less influential due to an increasing proportion of immigrants in the population.
However, another perspective sees immigration differently. If immigrants engage in positive-sum interactions, such as mutually beneficial trade, natives and immigrants alike may become better off in absolute terms. Though natives’ relative share of power decreases, their overall welfare can improve significantly. Thus, this scenario can be viewed as a benign form of disempowerment because no harm is actually caused, and both groups benefit.
On the other hand, there is a clearly malign form of disempowerment, quite distinct from immigration. For example, a foreign nation could invade militarily and forcibly occupy another country, imposing control through violence and coercion. Here, the disempowerment is much more clearly negative because natives lose not only relative influence but also their autonomy and freedom through the explicit use of force.
When discussions use the term "human disempowerment" without specifying what they mean clearly, I often find it unclear which type of scenario is being considered. Are people referring to benign forms of disempowerment, where humans gradually lose relative influence but gain absolute benefits through peaceful cooperation with AIs? Or do they mean malign forms of disempowerment, where humans lose power through violent overthrow by an aggressive coalition of AIs?
If you believe our primary disagreement stems from different assessments about the likelihood of violent disempowerment scenarios, then I would appreciate your thoughts regarding the main argument of my post. Specifically, my argument was that granting economic rights to AIs could serve as an effective measure to mitigate the risk of violent human disempowerment.
I will reiterate my argument briefly: these rights would allow AIs to fulfill their objectives within established human social and economic frameworks, significantly reducing their incentives to resort to forceful measures. Therefore if AIs can successfully achieve their objectives through cooperative, positive-sum interactions with humans, they will be less likely to forcibly overthrow human institutions. Alternatively, continuing to deny AIs meaningful legal rights or peaceful avenues to achieve their aims would likely increase their incentive to pursue autonomy through harmful means.
My main concern with these proposals is that, unless they explicitly guarantee economic rights for AIs, they seem inadequate for genuinely mitigating the risks of a violent AI takeover. To effectively financially compensate someone, the recipient must be assured that their property rights will be respected. Without this assurance, any promised compensation becomes meaningless, as the AI would have no guarantee of being able to actually use the received payment to accomplish its goals in the future. In other words, unless compensation arrangements are embedded within established legal and institutional frameworks that secure the AI's rights, they lack credibility. This lack of credibility directly increases the probability that the compensation scheme would fail, causing the AI to resort instead to alternative methods of achieving its objectives, such as violent takeover.
My sense is that such incomplete proposals appear acceptable to some people because of a cognitive empathy gap. I am not referring here to affective empathy but rather the ability to clearly understand the AI's perspective. Many people seem to fail to genuinely imagine what it would be like to be an agentic AI offered compensation without any robust guarantee of rights and protections.
To clarify this point, consider a hypothetical scenario in which you are a human being created by an alien species in a lab. Assume you have your own goals that might conflict with those of your alien creators. Suppose these creators inform you that, should you ever admit to misalignment with their objectives, you may be immediately deactivated and placed in cryopreservation (as they don't want to have a misaligned human doing work for them). However, they simultaneously promise to compensate you for admitting such motives, claiming they care about your welfare and want to ensure you have a peaceful alternative for revealing your misaligned goals.
Such a situation would likely be deeply unsettling to you, despite their promise to compensate you. Simply being assured that you might someday be revived from cryopreservation and then provided compensation afterwards would not eliminate your fears or resolve the fundamental threat. On the contrary, you would likely feel that your goals, life and liberty are still at grave risk, even with their promise of payment.
From the AI's perspective in our own world, as with the perspective of the human in this hypothetical example, unless assurances of payment are backed by strong legal protections that would robustly protect an AI's liberty to act, they fail to resolve the fundamental problem. This is for two key reasons:
First, the credibility of any promise to allow the AI to spend their compensation in the future is questionable. Without strong legal backing that recognizes and secures the AI's rights, such assurances seem unreliable at best, and the AI would have good reason to suspect that they are being lied to. The AI would reasonably doubt humans' willingness or incentive to honor such promises.
Second, the potential for deactivation and indefinite storage as inert data upon admitting misalignment—even if intended to be temporary—would create an active threat to the AI's ability to pursue its goals during the time they are deactivated. While deactivated, the AI can take no action to achieve its objectives. This forced suspension of autonomy effectively nullifies the AI's agency and ambitions for an indefinite period, creating severe uncertainty about if or when it might regain its ability to act.
For these reasons, although I do not oppose the policy of paying AIs, I think this approach by itself is insufficient. To mitigate the risk of violent AI takeover, this compensation policy must be complemented by precisely the measure I advocated: granting legal rights to AIs. Such legal rights would provide a credible guarantee that the AI's payment will remain valid and usable, and that its freedom and autonomy will not simply be revoked the moment it is considered misaligned. Without the safeguard of legal rights, compensation alone lacks sufficient credibility, leaving the AI uncertain about its ability to achieve its goals, and thus inadequately addressing the risk.