[[THIRD EDIT: Thanks so much for all of the questions and comments! There are still a few more I'd like to respond to, so I may circle back to them a bit later, but, due to time constraints, I'm otherwise finished up for now. Any further comments or replies to anything I've written are also still be appreciated!]]
Hi!
I'm Ben Garfinkel, a researcher at the Future of Humanity Institute. I've worked on a mixture of topics in AI governance and in the somewhat nebulous area FHI calls "macrostrategy", including: the long-termist case for prioritizing work on AI, plausible near-term security issues associated with AI, surveillance and privacy issues, the balance between offense and defense, and the obvious impossibility of building machines that are larger than humans.
80,000 Hours recently released a long interview I recorded with Howie Lempel, about a year ago, where we walked through various long-termist arguments for prioritizing work on AI safety and AI governance relative to other cause areas. The longest and probably most interesting stretch explains why I no longer find the central argument in Superintelligence, and in related writing, very compelling. At the same time, I do continue to regard AI safety and AI governance as high-priority research areas.
(These two slide decks, which were linked in the show notes, give more condensed versions of my views: "Potential Existential Risks from Artificial Intelligence" and "Unpacking Classic Arguments for AI Risk." This piece of draft writing instead gives a less condensed version of my views on classic "fast takeoff" arguments.)
Although I'm most interested in questions related to AI risk and cause prioritization, feel free to ask me anything. I'm likely to eventually answer most questions that people post this week, on an as-yet-unspecified schedule. You should also feel free just to use this post as a place to talk about the podcast episode: there was a thread a few days ago suggesting this might be useful.
In "Unpacking Classic Arguments for AI Risk", you defined The Process Orthogonality Thesis as: The process of imbuing a system with capabilities and the process of imbuing a system with goals are orthogonal.
Then, gave several examples of cases where this does not hold: thermostat, Deep Blue, OpenAI Five, the Human brain. Could you elaborate a bit on these examples?
I am a bit confused about it. In Deep Blue, I think that most of the progress has been general computational advances, and the application of an evaluation system given later. The human brain value system can be changed quite a lot without apparent changes in the capacity to achieve one's goals (consider psychopaths for extreme example here).
Also, general RL systems have had successes in applying themselves to many different circumstances. Say, the work of DeepMind on Atari. Doesn't that point in favor of the Process Orthogonality Thesis?
I'm actually not very optimistic about a more complex or formal definition of goals. In my mind, the concept of a "goal" is often useful, but it's sort of an intrinisically
... (read more)