Author: Leonard Dung
Abstract: Many researchers and intellectuals warn about extreme risks from artificial intelligence. However, these warnings typically came without systematic arguments in support. This paper provides an argument that AI will lead to the permanent disempowerment of humanity, e.g. human extinction, by 2100. It rests on four substantive premises which it motivates and defends: first, the speed of advances in AI capability, as well as the capability level current systems have already reached, suggest that it is practically possible to build AI systems capable of disempowering humanity by 2100. Second, due to incentives and coordination problems, if it is possible to build such AI, it will be built. Third, since it appears to be a hard technical problem to build AI which is aligned with the goals of its designers, and many actors might build powerful AI, misaligned powerful AI will be built. Fourth, because disempowering humanity is useful for a large range of misaligned goals, such AI will try to disempower humanity. If AI is capable of disempowering humanity and tries to disempower humanity by 2100, then humanity will be disempowered by 2100. This conclusion has immense moral and prudential significance.
My thoughts: I read through it rather quickly so take what I say with a grain of salt. That said, it seemed persuasive and well-written. Additionally, the way that they split up the argument was quite nice. I'm very happy to see an attempt to make this argument more philosophically rigorous and I hope to see more work in this vein.
This generally makes sense to me. I also think human irrationality could prompt a war with AIs. I don't disagree with the claim insofar as you're claiming that such a war is merely plausible (say >10% chance), rather than a default outcome. (Although to be clear, I don't think such a war would likely cut cleanly along human vs. AI lines.)
On the other hand, humans are currently already irrational and yet human vs. human wars are not the default (they happen frequently, e.g. but at any given time on Earth, the vast majority of humans are not in a warzone or fighting in an active war). It's not clear to me why human vs. AIs would make war more likely to occur than in the human vs. human case, if by assumption the main difference here is that one side is more rational.
In other words, if we're moving from a situation of irrational parties vs. other irrational parties to irrational parties vs. rational parties, I'm not sure why we'd expect this change to make things more warlike and less peaceful as a result. You mention one potential reason:
I don't think this follows. Humans presumably also had empathy in e.g. 1500, back when war was more common, so how could it explain our current relative peace?
Perhaps you mean that cultural changes caused our present time period to be relatively peaceful. But I'm not sure about that; or at least, the claim should probably be made more specific. There are many things about the environment that have changed since our relatively more warlike ancestors, and (from my current perspective) I think it's plausible that any one of them could have been the reason for our current relative peace. That is, I don't see a good reason to single out human values or empathy as the main cause in itself.
For example, humans are now a lot richer per capita, which might mean that people have "more to lose" when going to war, and thus are less likely to engage in it. We're also a more globalized culture, and our economic system relies more on long-distance trade than it did in the past, making war more costly. We're also older, in the sense that the median age is higher (and old people are less likely to commit violence), and women got the right to vote (who perhaps are less likely to support hawkish politicians).
To be clear, I don't put much confidence in any of these explanations. As of now, I'm very uncertain about why the 21st century seems relatively peaceful compared to the distant past. However I do think that: