Hide table of contents

Comment Permalink

Answer by Question MarkMar 30, 202214

It depends on what you mean by "neglected", since neglect is a spectrum. It's a lot less neglected than it was in the past, but it's still neglected compared to, say, cancer research or climate change. In terms of public opinion, the average person probably has little understanding of AI safety. I've encountered plenty of people saying things like "AI will never be a threat because AI can only do what it's programmed to do" and variants thereof.

What is neglected within AI safety is suffering-focused AI safety for preventing S-risks. Most AI safety research and existential risk research in general seems to be focused on reducing extinction risks and on colonizing space, rather than on reducing the risk of worse than death scenarios. There is also a risk that some AI alignment research could be actively harmful. One scenario where AI alignment could be actively harmful is the possibility of a "near miss" in AI alignment. In other words, risk from AI alignment roughly follows a Laffer curve, with AI that is slightly misaligned being more risky than both a perfectly aligned AI and a paperclip maximizer. For example, suppose there is an AI aligned to reflect human values. Yet "human values" could include religious hells. There are plenty of religious people who believe that an omnibenevolent God subjects certain people to eternal damnation, which makes one wonder if these sorts of individuals would implement a Hell if they had the power. Thus, an AI designed to reflect human values in this way could potentially involve subjecting certain individuals to something equivalent to a Biblical Hell.

Regarding specific AI safety organizations, Brian Tomasik wrote an evaluation of various AI/EA/longtermist organizations, in which he estimated that MIRI has a ~38% chance of being actively harmful. Eliezer Yudkowsky has also harshly criticized OpenAI, arguing that open access to their research poses a significant existential risk. Open access to AI research may increase the risk of malevolent actors creating or influencing the first superintelligence to be created, which poses a potential S-risk.

Showing 3 of 10 replies (Click to show all)

Anthony DiGiovanniApr 3 20226

Ambitious value learning and CEV are not a particularly large share of what AGI safety researchers are working on on a day-to-day basis, AFAICT. And insofar as researchers are thinking about those things, a lot of that work is trying to figure out whether those things are good ideas the first place, e.g. whether they would lead to religious hell.

Sure, but people are still researching narrow alignment/corrigibility as a prerequisite for ambitious value learning/CEV. If you buy the argument that safety with respect to s-risks is non-monotonic in proximity ... (read more)

Lukas_Gloor

Apr 2 2022

That's a cool chart! I actually think the most useful things to do to reduce s-risks can be conceptualized as part of the red box. For one thing, solving global coordination seems really hard and the best way to solve it may include aligned AI, anyway. "...and everyone actually follows that manual!" is the hard one, but I'd imagine the EA community will come up with some kind of serious attempt, and people interested in reducing s-risks may not have a comparative advantage at making that happen. So we're back to the red box. I think people interested in reducing s-risks should mostly study alignment schemes and their goal architectures and pick ones that implement hyperexistential separation as much as possible. This produces not-terrible futures even if you fail to address the problem in the top-right blue box. You might reply "AI alignment is too difficult to be picky, and we don't have any promising approaches anyway." In that case, you'd anyway have a large probability of an existential catastrophe, so you can just make sure people don't try some Hail Mary thing that is unusually bad for s-risks. By contrast, if you think AI alignment isn't too difficult, there might be multiple approaches with a shot at working, and those predictably differ with respect to hyperexistential separation.

Paul_Christiano

Mar 31 2022

Regarding susceptibility to s-risk: * If you keep humans around, they can decide on how to respond to threats and gradually improve their policies as they figure out more (or their AIs figure out more). * If you build incorrigible AIs who will override human preferences (so that a threatened human has no ability to change the behavior of their AI), while themselves being resistant to threats, then you may indeed reduce the likelihood of threats being carried out. * But in practice all the value is coming from you solving "how do we deal with threats?" at the same time that you solved the alignment problem. * I don't think there's any real argument that solving CEV or ambitious value learning per se helps with these difficulties, except insofar as your AI was able to answer these questions. But in that case a corrigible AI could also answer those questions. * Humans may ultimately build incorrigible AI for decision-theoretic reasons, but I think the decision should do so should probably be separated from solving alignment. * I think the deepest coupling comes from the fact that the construction of incorrigible AI is itself an existential risk, and so it may be extremely harmful to build technology that enables that prior to having norms and culture that are able to use it responsibly. * Overall, I'm much less sure than you that "making it up as you go along alignment" is bad for s-risk.

See in context

[ Question ]

Is AI safety still neglected?

by Coafos

Mar 30 20221 min read2 answers 0