In 2012, Holden Karnofsky[1] critiqued MIRI (then SI) by saying "SI appears to neglect the potentially important distinction between 'tool' and 'agent' AI." He particularly claimed:
Is a tool-AGI possible? I believe that it is, and furthermore that it ought to be our default picture of how AGI will work
I understand this to be the first introduction of the "tool versus agent" ontology, and it is a helpful (relatively) concrete prediction. Eliezer replied here, making the following summarized points (among others):
- Tool AI is nontrivial
- Tool AI is not obviously the way AGI should or will be developed
Gwern more directly replied by saying:
AIs limited to pure computation (Tool AIs) supporting humans, will be less intelligent, efficient, and economically valuable than more autonomous reinforcement-learning AIs (Agent AIs) who act on their own and meta-learn, because all problems are reinforcement-learning problems.
11 years later, can we evaluate the accuracy of these predictions?
- ^
Some Bayes points go to LW commenter shminux for saying that this Holden kid seems like he's going places
Relevant, I think, is Gwern's later writing on Tool AIs:
Personally, I think the distinction is basically irrelevant in terms of safety concerns, mostly for reasons outlined by the second bullet-point above. The danger is in the fact that "useful answers" you might get out of a Tool AI are those answers which let you steer the future to hit narrow targets (approximately described as "apply optimization power" by Eliezer & such).
If you manage to construct a training regime for something that we'd call a Tool AI, which nevertheless gives us something smart enough that it does better than humans in terms of creating plans which affect reality in specific ways[1], then it approximately doesn't matter whether or not we give it actuators to act in the world[2]. It has to be aiming at something; whether or not that something is friendly to human interests won't depend on what we name we give the AI.
I'm not sure how to evaluate the predictions themselves. I continue to think that the distinction is basically confused and doesn't carve reality at the relevant joints, and I think progress to date supports this view.
Which I claim is a reasonable non-technical summary of OpenAI's plan.
Though note that even if whatever lab develops it doesn't do so, the internet has helpfully demonstrated that the people will do it themselves, and quickly, too.