There seem to be two main framings emerging from recent AGI x-risk discussion: default doom, given AGI, and default we're fine, given AGI.
I'm interested in what people who have low p(doom|AGI) think are the reasons that things will basically be fine once we have AGI (or TAI, PASTA, ASI). What mechanisms are at play? How is alignment solved so that there are 0 failure modes? Can we survive despite imperfect alignment? How? Is alignment moot? Will physical limits be reached before there is too much danger?
If you have high enough p(doom|AGI) to be very concerned, but you're still only at ~1-10%, what is happening in the other 90-99%?
Added 22Apr: I'm also interested in detailed scenarios and stories, spelling out how things go right post-AGI. There are plenty of stories and scenarios illustrating doom. Where are the similar stories illustrating how things go right? There is the FLI World Building Contest, but that took place in the pre-GPT-4+AutoGPT era. The winning entry has everyone acting far too sensibly in terms of self-regulation and restraint. I think we can now say, given the fervour over AutoGPT, that this will not happen, with high likelihood.
I can think of a few scenarios where AGI doesn't kill us.
- Species aren't lazy (those who are - or would be - are outcompeted by those who aren't).
- The pets scenario is basically an existential catastrophe by other means (who wants to be a pet that is a caricature of a human like a pug is to a wolf?). And obviously so is the torture/dystopia one (i.e. not an "OK outcome"). What mechanism would allow us to get alignment right on the first try?
- This seems like a very unstable equilibrium. All that is needed is for one of the experts to be as good as Ilya Sutskever at AI Engineering, to get past that bottleneck in shor
... (read more)