A brief list of ways AI safety efforts could be net negative

Elias Schmied

A brief list of ways AI safety efforts could be net negative

Elias Schmied

2 min readJun 19

Comments 15

Sorted by

New & upvoted

Jamie_Harris

Future life may turn out to be net negative (e.g. s-risks might become real), and work that preserves the chances of humanity spreading may essentially be enabling that.

In almost the opposite direction, AI might just be very useful and beneficial and a bunch of the risks people in AI safety worry about might not end up becoming real, or might be quite easy to overcome. And in the meantime, AI safety work may have:

Slowed down the rate at which we achieve those things
Decreased the chance we ever achieve them (e.g. if a pause leads to a ban etc)
Distracted talent and money from working on more valuable things

Elias Schmied

On your first sentence, not sure if you read footnote 3 - I believe I cover under the "value of the future" bullet point why I didn't include this. (combined with AIs probably being moral patients).

And your three bullet points, very briefly:

From a longtermist perspective, not important.
I consider permanent stagnation pretty unlikely for evolutionary reasons, eventually factions that want to grow will be selected for.
Yeah it's a factor, but AI just seems important enough that I highly doubt that it's a major one.

Jamie_Harris

I hadn't read footnote 3! And I agree that the 2nd list looks less important from a solely longtermist perspective. (They may still be reasons that AI safety ends up doing significant harm though; I hadn't realised you were taking a 'strong longtermist' stance in the post.)

Elias Schmied

1w*

Ah yeah, maybe I should have made that clear, sorry. The original post that the list came from did, and I didn't think of that when I just copied the list over.

titotal

I can add four more to the list of possible dangers:

Effective altruists tend to be more utilitarian, more "maximizing" with regards to ethics, and more willing to bite philosophical bullets, compared to the general population. EA's involvement in AI safety could end up spreading these beliefs to future AI. It's well established how this sort of utilitarian reasoning can go wrong (utility monsters, etc), so this could increase the chance of disaster.
Effective altruists also have an implicit epistemology that differs from typical subject matter experts. If there are flaws in parts of the epistemology of EA or other groups involved in AI safety (which I believe to be true), then these flaws could be absorbed by future AI systems, and give rise to negative outcomes due to poor decisonmaking, etc.
People that are informed by AI safety arguments could decide that extreme actions are necessary to avoid AI doom, resulting in violence or rash actions that cause extreme harm.
The focus on AI safety in particular could result in less attention being paid to other threats to humanity that turn out to be more pressing, but are neglected due to the overrepresentation of AI safety people in the existential threat field.

Elias Schmied

Interesting stuff! I probably don't buy these as major worries but yeah, interesting.

Geoffrey Miller

Elias -- good post, and it's worth being aware of these potential downsides.

To me, the most significant of these is the risk of legitimate AI safety efforts being co-opted into 'AI safety-washing' by companies such an Anthropic -- which systematically recruit naive young EAs to do their 'safety work', but which ends up being little more than corporate virtue-signaling.

Anthony DiGiovanni 🔸

4w*

Most of our impact comes from acausal effects, and effects on the base reality if we are in a simulation: I’m confused here like everyone else, but I currently don’t buy this as a major factor because we only know our reality, and therefore the same things that are good here should naively also have good acausal effects in expectation.

If I understand correctly, this is the "Extrapolation" response to unawareness I discuss here. What do you think of my response?

Elias Schmied

4w*

I think it's not quite your "Extrapolation", because it's specifically about the acausal mechanism - by definition, the only (EDIT: direct) acausal effect possible is to make other agents take similar actions to us.
(and then the simulation thing I kind of sweep under the rug because the footnote was quickly written, but the argument is somewhat similar (although very vague and I'd like something better): Whatever purpose our simulators have for simulating us, it's probably good for their reality too if we reach a good outcome in the simulation.)

Anthony DiGiovanni 🔸

4w*

Hmm, these arguments seem too anchored on what we happen to currently be aware of.

I think we're very confused about the theory of acausal control / what "similarity" is, and should expect lots of crucial considerations to come up there. (Some relevant writings on this here and here.) As one example: If you try to do something cooperative because this gives evidence (or logically causes) that other agents try to do good things for your values, their attempts to help your values might backfire.
I don't see what the basis is for inferring "it's probably good for their reality too if we reach a good outcome in the simulation". Why would we think we can understand their motivations? They'd be alien intelligences.

Elias Schmied

OK I read the LW comment you linked and skimmed the post, but I don't see how they show that we should expect lots of crucial considerations to come up specifically - they seem to argue more for "we're clueless about how much we should do ECL"? (but correct me if I'm wrong, I may have missed something). On your example, but why should I expect their attempts to do so to backfire if I don't already expect our own attempts to backfire? That seems like it just grounds out in the original debate.
(btw, I also would like to get less confused about the similarity thing)

On the simulators, it just seems like its hard to think of possible simulator-motivations where us reaching good outcomes in the simulation would be bad for the base reality, and easy to think of ones where it would be neutral or good.

In general, I think the unawareness angle is genuinely interesting, I'm just not moved by it as much as you, probably for a few different reasons that would take some time to articulate.

Anthony DiGiovanni 🔸

they seem to argue more for "we're clueless about how much we should do ECL"?

I think they suggest that there's just a lot of subtlety in working out the implications of acausal decision theories in practice. Which is reason to expect more crucial considerations in this domain generally / reason to doubt your "by definition" argument.

but why should I expect their attempts to do so to backfire

Why should you expect them to be positive in expectation either? (The broader point of the unawareness sequence is that there's an ambiguous pile of positive and negative effects to weigh up.)

On the simulators, it just seems like its hard to think of possible simulator-motivations where us reaching good outcomes in the simulation would be bad for the base reality, and easy to think of ones where it would be neutral or good.

But then we're back to the Extrapolation argument, which you claimed you weren't committed to. I'm saying, even if the balance of effects we can think of looks good, we're looking at a super tiny sliver of the set of effects our fully aware selves would be weighing up — and it's a biased sample of such effects, so extrapolating from that sample is dubious.

Elias Schmied

4w*

"But then we're back to the Extrapolation argument, which you claimed you weren't committed to."

No, I didn't claim that - I said the snippet you quoted wasn't the Extrapolation argument, and I stand by that. I'm definitely sympathetic to something like Extrapolation in general.

Anthony DiGiovanni 🔸

4w*

(Edited for tone)

Sorry, I don't understand. The snippet I quoted — about acausal stuff and simulations — is what's at issue in this discussion.

Regardless, I'm still interested in where you object to my response to Extrapolation. Could you please say more on that?

Elias Schmied

The original snippet was more about the acausal stuff, and wasn't Extrapolation, and the distinct argument I subsequently mentioned about simulators was Extrapolation.

There's no quick objection I can give to your response specifically (except to simplistically say "no, I don't buy it, we know more than that, we can have some reasonable guesses about simulators' intentions") - properly laying out my disagreements would take a little bit of time and effort, as is usually the case for deep worldview differences.

Comments

^{^}

The closest thing I’m aware of is Safeguarding the Safeguards, but even that is more narrow.

^{^}

To be clear, I don’t personally think AI safety has been net negative so far, like some do. I wouldn’t even say that I have a properly considered view about it - maybe 60% that it’s been net positive, with very low credal resilience.

But I do feel a vibe of overconfidence in the discourse here sometimes, and I think this can have downstream consequences, e.g. an action bias.

^{^}

Quickly, here are others that I excluded because I don’t personally see them as potentially major factors, and didn’t want to water down the main list by including a bunch of implausible galaxy-brained stuff:

Differential slowdown of safety-minded actors: This feels somewhat falsified and “out of fashion” now that Anthropic has taken the lead and concern about China passing the US is a bit lower than 1-2 years ago? And the AI safety community also has less relative power now that more and larger forces have gotten involved.
Putting AI doom stories in the training data: I don’t buy that this could be a major factor, there’s a lot of stuff in the training data and post-training applies a lot of optimization away from a Simulators-style reproduction of the training data.
Theoretical concerns about the value of the future, most commonly associated with suffering-focused people: Since AI would most likely expand through the universe too, I don’t see this as an argument that AI safety might be net negative specifically (over and above what’s already in the list) (although I do think there are important considerations in general there).
“Crying wolf” dynamics if doom predictions don’t pan out: I don’t buy this as a major factor, since many safety people are not that overtly/confidently doomy nowadays, and so wouldn’t lose credibility.
Most of our impact comes from acausal effects, and effects on the base reality if we are in a simulation: I’m confused here like everyone else, but I currently don’t buy this as a major factor because we only know our reality, and therefore the same things that are good here should naively also have good acausal effects in expectation. (except that it maybe pushes for somewhat more cooperation and virtue ethics).

^{^}

Holden Karnofsky: “Most things that touch policy at all in any way will move us along that spectrum in one direction or another, so therefore have a high chance of being negative [...]

And then most things that you can do in AI at all will have some impact on policy. Even just alignment research: policy will be shaped by what we’re seeing from alignment research, how tractable it looks, what the interventions look like.” (h/t Anthony DiGiovanni)

^{^}

Holden Karnofsky: “there’s also a lot of micro ways in which you could do harm. Just literally working in safety and being annoying, you might do net harm. You might just talk to the wrong person at the wrong time, get on their nerves. I’ve heard lots of stories of this. Just like, this person does great safety work, but they really annoyed this one person, and that might be the reason we all go extinct” (h/t Anthony DiGiovanni)

^{^}

Among other things.

^{^}

I associate these with people like Richard Ngo (and here) and Oliver Habryka.