Author: Leonard Dung
Abstract: Many researchers and intellectuals warn about extreme risks from artificial intelligence. However, these warnings typically came without systematic arguments in support. This paper provides an argument that AI will lead to the permanent disempowerment of humanity, e.g. human extinction, by 2100. It rests on four substantive premises which it motivates and defends: first, the speed of advances in AI capability, as well as the capability level current systems have already reached, suggest that it is practically possible to build AI systems capable of disempowering humanity by 2100. Second, due to incentives and coordination problems, if it is possible to build such AI, it will be built. Third, since it appears to be a hard technical problem to build AI which is aligned with the goals of its designers, and many actors might build powerful AI, misaligned powerful AI will be built. Fourth, because disempowering humanity is useful for a large range of misaligned goals, such AI will try to disempower humanity. If AI is capable of disempowering humanity and tries to disempower humanity by 2100, then humanity will be disempowered by 2100. This conclusion has immense moral and prudential significance.
My thoughts: I read through it rather quickly so take what I say with a grain of salt. That said, it seemed persuasive and well-written. Additionally, the way that they split up the argument was quite nice. I'm very happy to see an attempt to make this argument more philosophically rigorous and I hope to see more work in this vein.
We have more empirical evidence that we can look at when it comes to human-human wars, making it easier to have well-calibrated beliefs about chances of winning. When it comes to human-AI wars, we're more likely to have wildly irrational beliefs.
This is just one reason war could occur though. Perhaps a more likely reason is that there won't be a way to maintain the peace, that both sides can be convinced will work, and is sufficiently cheap that the cost doesn't eat up all of the gains from avoiding war. For example, how would the human faction know that if it agrees to peace, the AI faction won't fully dispossess the humans at some future date when it's even more powerful? Even if AIs are able to come up with some workable mechanisms, how would the humans know that it's not just a trick?
Without credible assurances (which seems hard to come by), I think if humans do agree to peace, the most likely outcome is that it does get dispossessed in the not too distant future, either gradually (for example getting scammed/persuaded/blackmailed/stolen from in various ways), or all at once. I think society as a whole won't have a strong incentive to protect humans because they'll be almost pure consumers (not producing much relative to what they consume), and such classes of people are often killed or dispossessed in human history (e.g., landlords after communist takeovers).
I mainly mean that without empathy/altruism, we'd probably have even more wars, both now and then.
Well, yes, I'm also pretty scared of this. See this post where I talked about something similar. I guess overall I'm still inclined to push for a future where "AI alignment" and "human safety" are both solved, instead of settling for one in which neither is (which I'm tempted to summarize your position as, but I'm not sure if I'm being fair).