Summary
I believe that advanced AI systems will likely be aligned with the goals of their human operators, at least in a narrow sense. I’ll give three main reasons for this:
- The transition to AI may happen in a way that does not give rise to the alignment problem as it’s usually conceived of.
- While work on the alignment problem appears neglected at this point, it’s likely that large amounts of resources will be used to tackle it if and when it becomes apparent that alignment is a serious problem.
- Even if the previous two points do not hold, we have already come up with a couple of smart approaches that seem fairly likely to lead to successful alignment.
This argument lends some support to work on non-technical interventions like moral circle expansionor improving AI-related policy, as well as work on special aspects of AI safety like decision theory or worst-case AI safety measures.
"Between 1 and 10%" also feels surprisingly low to me for general AI-related catastrophes. I at least would have thought that experts are less optimistic than that.
But pending clarification, I wouldn't put much weight on this estimate given that the interviews mentioned in the 80k problem area profile you link to seemed to be about informing the entire problem profile rather than this estimate specifically. So it's not clear e.g. whether the interviews included a question about all-things-considered risk for AI-related catastrophe that was asked to Nick Bostrom, an anonymous leading professor of computer science, Jaan Tallinn, Jan Leike, Miles Brundage, Nate Soares, and Daniel Dewey.