As will be very clear from my post, I'm not a computer scientist. However, I am reasonably intelligent and would like to improve my understanding of AI risk.
As I understand it (please do let me know if I've got this wrong), the risk is that:
- an AGI could rapidly become many times more intelligent and capable than a human: so intelligent that its relation to us would be analogous to our own relation to ants.
- such an AGI would not necessarily prioritise human wellbeing, and could, for example, could decide that its objectives were best served by the extermination of humanity.
And the mitigation is:
- working to ensure that any such AGI is "aligned," that is, is functioning within parameters that prioritise human safety and flourishing.
What I don't understand is why we (the ants in this scenario) think our efforts have any hope of being successful. If the AGI is so intelligent and powerful that it represents an existential risk to humanity, surely it is definitionally impossible for us to rein it in? And therefore surely the best approach would be either to prevent work to develop AI (honestly this seems like a nonstarter to me, I can't see e.g. Meta or Google agreeing to it), or to accept that our limited resources would be better applied to more tractable problems?
Any thoughts very welcome, I am highly open to the possibility that I'm simply getting this wrong in a fundamental way.
Epistemic status: bewitched, bothered and bewildered.
'...the thought of a superpowerful AI that shares the value system of e.g. LessWrong is slightly terrifying to me.'
Old post, but I've meant to say this for several months: Whilst I am not a fan of Yudkowsky, I do think that his stuff about this showed a fair amount of sensitivity to the idea that it would be unfair if a particular group of people just programed their values into the AI, taking no heed of the fact that humans disagree. (Not that that means there is no reason to worry about the proposal to build a "good" AI that runs everything).
His original (since abandoned I think) proposal, was that we would get the AI to have goal like 'maximizes things all or pretty much all fully informed humans would agree are good, minimizes things all or almost all fully informed would humans agree are bad, and where humans would disagree on whether something is good or bad even after being fully informed of all relevant facts, try and minimize your impact on that thing, and leave it up to humans to sort out amongst themselves.' (Not an exact rendition, but close enough for present purposes.) Of course, there's a sense in which that still embodies liberal democratic values about what is fair, but I'm guessing if your a contemporary person with a humanities degree, you probably share those very broad and abstract values.