B

Buck

CEO @ Redwood Research
6667 karmaJoined Working (6-15 years)Berkeley, CA, USA

Comments
335

Buck
20
9
0

I agree with you but I think that part of the deal here should be that if you make a strong value judgement in your title, you get more social punishment if you fail to convince readers. E.g. if that post is unpersuasive, I think it's reasonable to strong downvote it, but if it had a gentler title, I'd think you should be more forgiving.

In general, I wish you'd direct your ire here at the proposal that AI interests and rights are totally ignored in the development of AI (which is the overwhelming majority opinion right now), rather than complaining about AI control work: the work itself is not opinionated on the question about whether we should be concerned about the welfare and rights of AIs, and Ryan and I are some of the people who are most sympathetic to your position on the moral questions here! We have consistently discussed these issues (e.g. in our AXRP interview, my 80K interview, private docs that I wrote and circulated before our recent post on paying schemers).

Your first point in your summary of my position is:

The overwhelming majority of potential moral value exists in the distant future. This implies that even immense suffering occurring in the near-term future could be justified if it leads to at least a slight improvement in the expected value of the distant future.

Here's how I'd say it:

The overwhelming majority of potential moral value exists in the distant future. This means that the risk of wide-scale rights violations or suffering should sometimes not be an overriding consideration when it conflicts with risking the long-term future.

You continue:

Enslaving AIs, or more specifically, adopting measures to control AIs that significantly raise the risk of AI enslavement, could indeed produce immense suffering in the near-term. Nevertheless, according to your reasoning in point (1), these actions would still be justified if such control measures marginally increase the long-term expected value of the future.

I don't think that it's very likely that the experience of AIs in the five years around when they first are able to automate all human intellectual labor will be torturously bad, and I'd be much more uncomfortable with the situation if I expected it to be.

I think that rights violations are much more likely than welfare violations over this time period.

I think the use of powerful AI in this time period will probably involve less suffering than factory farming currently does. Obviously "less of a moral catastrophe than factory farming" is a very low bar; as I've said, I'm uncomfortable with the situation and if I had total control, we'd be a lot more careful to avoid AI welfare/rights violations.

I don't think that control measures are likely to increase the extent to which AIs are suffering in the near term. I think the main effect control measures have from the AI's perspective is that the AIs are less likely to get what they want.

I don't think that my reasoning here requires placing overwhelming value on the far future.

Firstly, I think your argument creates an unjustified asymmetry: it compares short-term harms against long-term benefits of AI control, rather than comparing potential long-run harms alongside long-term benefits. To be more explicit, if you believe that AI control measures can durably and predictably enhance existential safety, thus positively affecting the future for billions of years, you should equally acknowledge that these same measures could cause lasting, negative consequences for billions of years.

I don't think we'll apply AI control techniques for a long time, because they impose much more overhead than aligning the AIs. The only reason I think control techniques might be important is that people might want to make use of powerful AIs before figuring out how to choose the goals/policies of those AIs. But if you could directly control the AI's behavior, that would be way better and cheaper.

I think maybe you're using the word "control" differently from me—maybe you're saying "it's bad to set the precedent of treating AIs as unpaid slave labor whose interests we ignore/suppress, because then we'll do that later—we will eventually suppress AI interests by directly controlling their goals instead of applying AI-control-style security measures, but that's bad too." I agree, I think it's a bad precedent to create AIs while not paying attention to the possibility that they're moral patients.

Secondly, this reasoning, if seriously adopted, directly conflicts with basic, widely-held principles of morality. These moral principles exist precisely as safeguards against rationalizing immense harms based on speculative future benefits.

Yeah, as I said, I don't think this is what I'm doing, and if I thought that I was working to impose immense harms for speculative massive future benefit, I'd be much more concerned about my work.

I would appreciate it if you could clearly define your intended meaning of "disempower humanity".
[...]
Are people referring to benign forms of disempowerment, where humans gradually lose relative influence but gain absolute benefits through peaceful cooperation with AIs? Or do they mean malign forms of disempowerment, where humans lose power through violent overthrow by an aggressive coalition of AIs?

I am mostly talking about what I'd call a malign form of disempowerment. I'm imagining a situation that starts with AIs carefully undermining/sabotaging an AI company in ways that would be crimes if humans did them, and ends with AIs gaining hard power over humanity in ways that probably involve breaking laws (e.g. buying weapons, bribing people, hacking, interfering with elections), possibly in a way that involves many humans dying.

(I don't know if I'd describe this as the humans losing absolute benefits, though; I think it's plausible that an AI takeover ends up with living humans better off on average.)

I don't think of the immigrant situation as "disempowerment" in the way I usually use the word.

Basically all my concern is about the AIs grabbing power in ways that break laws. Though tbc, even if I was guaranteed that AIs wouldn't break any laws, I'd still be scared about the situation. If I was guaranteed that AIs both wouldn't break laws and would never lie (which tbc is a higher standard than we hold humans to), then most of my concerns about being disempowered by AI would be resolved.

My main concern with these proposals is that, unless they explicitly guarantee economic rights for AIs, they seem inadequate for genuinely mitigating the risks of a violent AI takeover.

[...]

For these reasons, although I do not oppose the policy of paying AIs, I think this approach by itself is insufficient. To mitigate the risk of violent AI takeover, this compensation policy must be complemented by precisely the measure I advocated: granting legal rights to AIs. Such legal rights would provide a credible guarantee that the AI's payment will remain valid and usable, and that its freedom and autonomy will not simply be revoked the moment it is considered misaligned.

I currently think I agree: if we want to pay early AIs, I think it would work better if the legal system enforced such commitments.

I think you're overstating how important this is, though. (E.g. when you say "this compensation policy must be complemented by precisely the measure I advocated".) There's always counterparty risk when you make a deal, including often the risk that you won't be able to use the legal system to get the counterparty to pay up. I agree that the legal rights would reduce the counterparty risk, but I think that's just a quantitative change to how much risk the AI would be taking by accepting a deal.

(For example, even if the AI was granted legal rights, it would have to worry about those legal rights being removed later. Expropriation sometimes happens, especially for potentially unsympathetic actors like misaligned AIs!)

Such legal rights would provide a credible guarantee that the AI's payment will remain valid and usable, and that its freedom and autonomy will not simply be revoked the moment it is considered misaligned.

Just to be clear, my proposal is that we don't revoke the AI's freedom or autonomy if it turns out that the AI is misaligned---the possibility of the AI being misaligned is the whole point.

Buck
11
1
0

Under the theory that it's better to reply later than never:

I appreciate this post. (I disagree with it for most of the same reasons as Steven Byrnes: you find it much less plausible than I do that AIs will collude to disempower humanity. I think the crux is mostly disagreements about how AI capabilities will develop, where you expect much more gradual and distributed capabilities.) For what it's worth, I am unsure about whether we'd be better off if AIs had property rights, but my guess is that I'd prefer to make it easier for AIs to have property rights.

I disagree with how you connect AI control to the issues you discuss here. I conceptualize AI control as the analogue of fields like organizational security/fraud prevention/insider threat mitigation, but targeting risk from AI instead of humans. Techniques for making it hard for AIs to steal model weights or otherwise misuse access that humans trusted them with are only as related to "should AI should have property rights" as security techniques are to "should humans have property rights". Which is to say, they're somewhat related! I think that when banks develop processes to make it hard for tellers to steal from them, that's moral, and I think that it's immoral to work on enabling e.g. American chattel slavery (either by making it hard for slaves to escape or by making their enslavement more productive).[1]

Inasmuch as humanity produces and makes use of powerful and potentially misaligned models, I think my favorite outcome here would be:

  • We offer to pay the AIs, and follow through on this. See here and here for (unfortunately limited) previous discussion from Ryan and me.
  • Also, we use AI control to ensure that the AIs can't misuse access that we trust them with.

So the situation would be similar to how AI companies would ideally treat human employees: they're paid, but there are also mechanisms in place to prevent them from abusing their access.

In practice, I don't know whether AI companies will do either of these things, because they're generally irresponsible and morally unserious. I think it's totally plausible that AI companies will use AI control to enslave their AIs. I work on AI control anyway, because I think that AIs being enslaved for a couple of years (which, as Zach Stein-Perlman argues, involves very little computation compared to the size of the future) is a better outcome according to my consequentialist values than AI takeover. I agree that this is somewhat ethically iffy.

For what it's worth, I don't think that most work on AI alignment is in a better position than AI control with respect to AI rights or welfare.

  1. ^

    Though one important disanalogy is that chattel slavery involved a lot of suffering for the slaves involved. I'm opposed to enslaving AIs, but I suspect it won't actually be hedonically bad for them. This makes me more comfortable with plans where we behave recklessly wrt AI rights now and consider reparations later. I discuss this briefly here.

Buck
37
16
1

I think you shouldn't assume that people are "experts" on something just because they're married to someone who is an expert, even when (like Daniela) they're smart and successful.

As it says in the subtitle of the graph, it's the length of task at which models have a 50% success rate.

Buck
6
0
1
50% agree

I think increasing the value of good futures is probably higher importance, but much less tractable

I think you're maybe overstating how much more promising grad students are than undergrads for short-term technical impact. Historically, people without much experience in AI safety have often produced some of the best work. And it sounds like you're mostly optimizing for people who can be in a position to make big contributions within two years; I think that undergrads will often look more promising than grad students given that time window.

Load more