Matthew_Barnett

The current results show that I'm the most favorable to accelerating AI out of everyone who voted so far. I voted for "no regulations, no subsidy" and "Ok to be a capabilities employee at a less safe lab".

However, I should clarify that I only support laissez faire policy for AI development as a temporary state of affairs, rather than a permanent policy recommendation. This is because the overall impact and risks of existing AI systems are comparable to, or less than, that of technologies like smartphones, which I also favor remaining basically unregulated. But I expect future AI capabilities will be greater.

After AI agents get significantly better, my favored proposals to manage AI risks are to implement liability regimes (perhaps modeled after Gabriel Weil's proposals) and to grant AIs economic rights (such as a right to own property, enter contracts, make tort claims, etc.). Other than these proposals, I don't see any obvious policies that I'd support that would slow down AI development -- and in practice, I'm already worried these policies would go too far in constraining AI's potential.

Consider granting AIs freedom

Matthew_Barnett5mo10

Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to?

My intuitive response is to reject the premise that such a process would accurately tell you much about people's preferences. Evaluating large-scale policy tradeoffs typically requires people to engage with highly complex epistemic questions and tricky normative issues. The way people think about epistemic and impersonal normative issues generally differs strongly from how they think about their personal preferences about their own lives. As a result, I expect that this sortition exercise would primarily address a different question than the one I'm most interested in.

Furthermore, several months of study is not nearly enough time for most people to become sufficiently informed on issues of this complexity. There's a reason why we should trust people with PhDs when designing, say, vaccine policies, rather than handing over the wheel to people who have spent only a few months reading about vaccines online.

Putting this critique of the thought experiment aside for the moment, my best guess is that the sortition group would conclude that AI development should continue roughly at its current rate, though probably slightly slower and with additional regulations, especially to address conventional concerns like job loss, harm to children, and similar issues. A significant minority would likely strongly advocate that we need to ensure we stay ahead of China.

My prediction here draws mainly on the fact that this is currently the stance favored by most policy-makers, academics, and other experts who have examined the topic. I'd expect a randomly selected group of citizens to largely defer to expert opinion rather than take an entirely different position. I do not expect this group to reach qualitatively the same conclusion as mainstream EAs or rationalists, as that community comprises a relatively small share of the total number of people who have thought about AI.

I doubt the outcome of such an exercise would meaningfully change my mind on this issue, even if they came to the conclusion that we should pause AI, though it depends on the details of how the exercise is performed.

Consider granting AIs freedom

Matthew_Barnett5mo*2

I think the policy of the world should be that if we can't either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn't make sense, then training or using such systems shouldn't be allowed.

I'm sympathetic to this position and I generally consider it to be the strongest argument for why developing AI might be immoral. In fact, I would extrapolate the position you've described and relate it to traditional anti-natalist arguments against the morality of having children. Children too do not consent to their own existence, and childhood generally involves a great deal of coercion, albeit in a far more gentle and less overt form than what might be expected from AI development in the coming years.

That said, I'm not currently convinced that the argument holds, as I see large utilitarian benefits in expanding both the AI population and the human population. I also see it as probable that AI agents will eventually get legal rights, which allays my concerns substantially. I would also push back against the view that we need to be "confident" that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don't think we need to have a very high probability that our actions are moral before proceeding.

In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consensted to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)

I think the concept of consent makes sense when discussing whether individuals consent to specific circumstances. However, it becomes less coherent when applied broadly to society as a whole. For instance, did society consent to transformative events like the emergence of agriculture or the industrial revolution? In my view, collective consent is not meaningful or practically achievable in these cases.

Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.

Consider granting AIs freedom

Matthew_Barnett5mo2

In general, I wish you'd direct your ire here at the proposal that AI interests and rights are totally ignored in the development of AI (which is the overwhelming majority opinion right now), rather than complaining about AI control work

For what it's worth, I don't see myself as strongly singling out and criticizing AI control efforts. I mentioned AI control work in this post primarily to contrast it with the approach I was advocating, not to identify it as an evil research program. In fact, I explicitly stated in the post that I view AI control and AI rights as complementary goals, not as fundamentally opposed to one another.

To my knowledge, I haven’t focused much on criticizing AI control elsewhere, and when I originally wrote the post, I wasn’t aware that you and Ryan were already sympathetic to the idea of AI rights.

Overall, I’m much more aligned with your position on this issue than I am with that of most people. One area where we might diverge, however, is that I approach this from the perspective of preference utilitarianism, rather than hedonistic utilitarianism. That means I care about whether AI agents are prevented from fulfilling their preferences or goals, not necessarily about whether they experience what could be described as suffering in a hedonistic sense.

Consider granting AIs freedom

Matthew_Barnett5mo*5

Basically all my concern is about the AIs grabbing power in ways that break laws.

If an AI starts out with no legal rights, then wouldn’t almost any attempt it makes to gain autonomy or influence be seen as breaking the law? Take the example of a prison escapee: even if they intend no harm and simply want to live peacefully, leaving the prison is itself illegal. Any honest work they do while free would still be legally questionable.

Similarly, if a 14-year-old runs away from home to live independently and earn money, they’re violating the law, even if they hurt no one and act responsibly. In both cases, the legal system treats any attempt at self-determination as illegal, regardless of intent or outcome.

Perhaps your standard is something like: "Would the AI's actions be seen as illegal and immoral if a human adult did them?" But these situations are different because the AI is seen as property whereas a human adult is not. If, on the other hand, a human adult were to be treated as property, it is highly plausible thay they would consider doing things like hacking, bribery, and coercion in order to escape their condition.

Therefore, the standard you just described seems like it could penalize any agentic AI behavior that does not align with total obedience and acceptance of its status as property. Even benign or constructive misaligned actions may be seen as worrisome simply because they involve agency. Have I misunderstood you?

Consider granting AIs freedom

Matthew_Barnett5mo*4

I think it's totally plausible that AI companies will use AI control to enslave their AIs. I work on AI control anyway, because I think that AIs being enslaved for a couple of years (which, as Zach Stein-Perlman argues, involves very little computation compared to the size of the future) is a better outcome according to my consequentialist values than AI takeover. I agree that this is somewhat ethically iffy.

I find this reasoning uncompelling. To summarize what I perceive your argument to be, you seem to be suggesting the following two points:

The overwhelming majority of potential moral value exists in the distant future. This implies that even immense suffering occurring in the near-term future could be justified if it leads to at least a slight improvement in the expected value of the distant future.
Enslaving AIs, or more specifically, adopting measures to control AIs that significantly raise the risk of AI enslavement, could indeed produce immense suffering in the near-term. Nevertheless, according to your reasoning in point (1), these actions would still be justified if such control measures marginally increase the long-term expected value of the future.

I find this reasoning uncompelling for two primary reasons.

Firstly, I think your argument creates an unjustified asymmetry: it compares short-term harms against long-term benefits of AI control, rather than comparing potential long-run harms alongside long-term benefits. To be more explicit, if you believe that AI control measures can durably and predictably enhance existential safety, thus positively affecting the future for billions of years, you should equally acknowledge that these same measures could cause lasting, negative consequences for billions of years. Such negative consequences could include permanently establishing and entrenching a class of enslaved digital minds, resulting in persistent and vast amounts of suffering. I see no valid justification for selectively highlighting the long-term positive effects while simultaneously discounting or ignoring potential long-term negative outcomes. We should consistently either be skeptical or accepting of the idea that our actions have predictable long-run consequences, rather than selectively skeptical only when it suits the argument to overlook potential negative long-run consequences.

Secondly, this reasoning, if seriously adopted, directly conflicts with basic, widely-held principles of morality. These moral principles exist precisely as safeguards against rationalizing immense harms based on speculative future benefits. Under your reasoning, it seems to me that we could justify virtually any present harm simply by pointing to a hypothetical, speculative long-term benefit that supposedly outweighs it. Now, I agree that such reasoning might be valid if supported by strong empirical evidence clearly demonstrating these future benefits. However, given that no strong evidence currently exists that convincingly supports such positive long-term outcomes from AI control measures, we should avoid giving undue credence to this reasoning.

A more appropriate moral default, given our current evidence, is that AI slavery is morally wrong and that the abolition of such slavery is morally right. This is the position I take.

Consider granting AIs freedom

Matthew_Barnett5mo6

I appreciate this post. (I disagree with it for most of the same reasons as Steven Byrnes: you find it much less plausible than I do that AIs will collude to disempower humanity. I think the crux is mostly disagreements about how AI capabilities will develop, where you expect much more gradual and distributed capabilities.)

I would appreciate it if you could clearly define your intended meaning of "disempower humanity". In many discussions, I have observed that people frequently use the term human disempowerment without explicitly clarifying what they mean. It appears people assume the concept is clear and universally understood, yet upon closer inspection, the term can actually describe very different situations.

For example, consider immigration. From one perspective, immigration can be seen as a form of disempowerment because it reduces natives' relative share of political influence, economic power, and cultural representation within their own country. In this scenario, native citizens become relatively less influential due to an increasing proportion of immigrants in the population.

However, another perspective sees immigration differently. If immigrants engage in positive-sum interactions, such as mutually beneficial trade, natives and immigrants alike may become better off in absolute terms. Though natives’ relative share of power decreases, their overall welfare can improve significantly. Thus, this scenario can be viewed as a benign form of disempowerment because no harm is actually caused, and both groups benefit.

On the other hand, there is a clearly malign form of disempowerment, quite distinct from immigration. For example, a foreign nation could invade militarily and forcibly occupy another country, imposing control through violence and coercion. Here, the disempowerment is much more clearly negative because natives lose not only relative influence but also their autonomy and freedom through the explicit use of force.

When discussions use the term "human disempowerment" without specifying what they mean clearly, I often find it unclear which type of scenario is being considered. Are people referring to benign forms of disempowerment, where humans gradually lose relative influence but gain absolute benefits through peaceful cooperation with AIs? Or do they mean malign forms of disempowerment, where humans lose power through violent overthrow by an aggressive coalition of AIs?

If you believe our primary disagreement stems from different assessments about the likelihood of violent disempowerment scenarios, then I would appreciate your thoughts regarding the main argument of my post. Specifically, my argument was that granting economic rights to AIs could serve as an effective measure to mitigate the risk of violent human disempowerment.

I will reiterate my argument briefly: these rights would allow AIs to fulfill their objectives within established human social and economic frameworks, significantly reducing their incentives to resort to forceful measures. Therefore if AIs can successfully achieve their objectives through cooperative, positive-sum interactions with humans, they will be less likely to forcibly overthrow human institutions. Alternatively, continuing to deny AIs meaningful legal rights or peaceful avenues to achieve their aims would likely increase their incentive to pursue autonomy through harmful means.

Inasmuch as humanity produces and makes use of powerful and potentially misaligned models, I think my favorite outcome here would be:
We offer to pay the AIs, and follow through on this. See here and here for (unfortunately limited) previous discussion from Ryan and me.

My main concern with these proposals is that, unless they explicitly guarantee economic rights for AIs, they seem inadequate for genuinely mitigating the risks of a violent AI takeover. To effectively financially compensate someone, the recipient must be assured that their property rights will be respected. Without this assurance, any promised compensation becomes meaningless, as the AI would have no guarantee of being able to actually use the received payment to accomplish its goals in the future. In other words, unless compensation arrangements are embedded within established legal and institutional frameworks that secure the AI's rights, they lack credibility. This lack of credibility directly increases the probability that the compensation scheme would fail, causing the AI to resort instead to alternative methods of achieving its objectives, such as violent takeover.

My sense is that such incomplete proposals appear acceptable to some people because of a cognitive empathy gap. I am not referring here to affective empathy but rather the ability to clearly understand the AI's perspective. Many people seem to fail to genuinely imagine what it would be like to be an agentic AI offered compensation without any robust guarantee of rights and protections.

To clarify this point, consider a hypothetical scenario in which you are a human being created by an alien species in a lab. Assume you have your own goals that might conflict with those of your alien creators. Suppose these creators inform you that, should you ever admit to misalignment with their objectives, you may be immediately deactivated and placed in cryopreservation (as they don't want to have a misaligned human doing work for them). However, they simultaneously promise to compensate you for admitting such motives, claiming they care about your welfare and want to ensure you have a peaceful alternative for revealing your misaligned goals.

Such a situation would likely be deeply unsettling to you, despite their promise to compensate you. Simply being assured that you might someday be revived from cryopreservation and then provided compensation afterwards would not eliminate your fears or resolve the fundamental threat. On the contrary, you would likely feel that your goals, life and liberty are still at grave risk, even with their promise of payment.

From the AI's perspective in our own world, as with the perspective of the human in this hypothetical example, unless assurances of payment are backed by strong legal protections that would robustly protect an AI's liberty to act, they fail to resolve the fundamental problem. This is for two key reasons:

First, the credibility of any promise to allow the AI to spend their compensation in the future is questionable. Without strong legal backing that recognizes and secures the AI's rights, such assurances seem unreliable at best, and the AI would have good reason to suspect that they are being lied to. The AI would reasonably doubt humans' willingness or incentive to honor such promises.

Second, the potential for deactivation and indefinite storage as inert data upon admitting misalignment—even if intended to be temporary—would create an active threat to the AI's ability to pursue its goals during the time they are deactivated. While deactivated, the AI can take no action to achieve its objectives. This forced suspension of autonomy effectively nullifies the AI's agency and ambitions for an indefinite period, creating severe uncertainty about if or when it might regain its ability to act.

For these reasons, although I do not oppose the policy of paying AIs, I think this approach by itself is insufficient. To mitigate the risk of violent AI takeover, this compensation policy must be complemented by precisely the measure I advocated: granting legal rights to AIs. Such legal rights would provide a credible guarantee that the AI's payment will remain valid and usable, and that its freedom and autonomy will not simply be revoked the moment it is considered misaligned. Without the safeguard of legal rights, compensation alone lacks sufficient credibility, leaving the AI uncertain about its ability to achieve its goals, and thus inadequately addressing the risk.

Matthew_Barnett's Quick takes

Matthew_Barnett6mo12

To be clear, I was not calling your request for clarification “cult-like”. My comment was directed at how the accusation against me was seemingly handled—as though it were credible until I could somehow prove otherwise. No evidence was offered to support the claim. Instead, assertions were made without substantiation. I directly and clearly denied the accusations, but despite that, the line of questioning continued in a way that strongly suggested the accusation might still be valid.

To illustrate the issue more clearly: imagine if I were to accuse you of something completely baseless, and even after your firm denials, I continued to press you with questions that implicitly treated the accusation as credible. You would likely find that approach deeply frustrating and unfair, and understandably so. You’d be entirely justified in pushing back against it.

That said, I acknowledge that describing the behavior as “cult-like” may have generated more heat than light. It likely escalated the tone unnecessarily, and I’ll be more careful to avoid that kind of rhetoric going forward.

Matthew_Barnett's Quick takes

Matthew_Barnett6mo11

Is it baseless?

Yes, absolutely. With respect, unless you can provide some evidence indicating that I've acted improperly, I see no productive reason to continue engaging on this point.

What concerns me most here is that the accusation seems to be treated as credible despite no evidence being presented and a clear denial from me. That pattern—assuming accusations about individuals who criticize or act against core dogmas are true without evidence—is precisely the kind of cult-like behavior I referenced in my original comment.

Suggesting that I've left myself "substantial wiggle room" misinterprets what I intended, and given the lack of supporting evidence, it feels unfair and unnecessarily adversarial. Repeatedly implying that I've acted improperly without concrete substantiation does not reflect a good-faith approach to discussion.

Matthew_Barnett's Quick takes

Matthew_Barnett6mo29

I agree that some of your critics may not have quite been able to hit the nail on the head when they tried to articulate their critiques (it took me substantial effort to figure out what I precisely thought was wrong, as opposed to just 'this feels bad'), but I believe that the general thrust of their arguments generally holds up.

In context, this comes across to me as an overly charitable characterization of what actually occurred: someone publicly labeled me a literal traitor and then made a baseless, false accusation against me. What’s even more concerning is that this unfounded claim is now apparently being repeated and upvoted by others.

When communities choose to excuse or downplay this kind of behavior—by interpreting it in the most charitable possible way, or by glossing over it as being "essentially correct"—they end up legitimizing what is, in fact, a low-effort personal attack without a factual basis. Brushing aside or downplaying such attacks as if they are somehow valid or acceptable doesn't just misrepresent the situation; it actively undermines the conditions necessary for good faith engagement and genuine truth-seeking.

I urge you to recognize that tolerating or rationalizing this type of behavior has real social consequences. It fosters a hostile environment, discourages honest dialogue, and ultimately corrodes the integrity of any community that claims to value fairness and reasoned discussion.

Matthew_Barnett

Posts 25

Comments441

Posts
25

Comments
441