AI Safety started as a realization some rationalists had about the possibility of building up powerful general intelligent systems, and how we may control them. Since then, however, the community has grown. My favorite version of the bottom line is something like
We want to build intelligent systems because we believe they could bring up lots of wonderful things, and it will become one of the most important technologies humankind has ever invented. We further want to make sure that powerful AI systems are as beneficial to everyone as possible. To ensure that is indeed the case, we need to make sure AI systems can understand our desires and hopes, both as individuals and as a society, even when we do not have a clear picture of them or they conflict. AI Safety is about developing the techniques that allow AI systems to learn what we want, and how we can fulfill this future.
On the other hand, in the community, there is a lot of focus on existential risk, which is a true thing, but it is definitely not necessary to make the point about the importance of AI Safety. Maybe the focus on quantities will appeal to very rational-oriented people but has the stark risk of making people feel they are being Pascal mugged (I certainly did the first time). For this reason, I think we have to change the discourse.
Similarly, I think much of the culture of the early days persist today. It was focused on independent research centers like MIRI or independent researchers because the early community thought academia was not ready for the task of creating a new research field centered on understanding intelligence. Alternatively, perhaps because of the influence of Silicon Valley, startup stood up as a sexy place to work on AI Safety too.
However, I think that the time has come to make sure AI Safety becomes a respected academic field. I know that we don't like many things about academia: it is bureaucratic, credentialist, slow to move, too centered on ego, promoter of publication races to the bottom and of poor quality of peer reviews. But academia is also how the world perceives science and how they tell knowledge from quackery. It is a powerful machine forcing researchers to make concrete advances in their field of expertise, and most of the time if they do not it is because science is very hard, not because the incentives do not push them brutally to do so.
I also know people believe AI Safety is preparadigmatic, but I argue we have to get over it, or risk building castles in the air. There does not need to be a single paradigm, but we certainly need at least some paradigm to push forward our understanding. It's ok if we have many. This would allow us to solve subproblems in concrete ways, instead of staying in high-level ideas about how we would like things to turn out. Let's have more academic (peer-reviewed) papers, not because blog posts are bad, but because we need to show good and concrete solutions for concrete problems, and publishing papers forces us to do so. Publishing papers provides the tight feedback we seek if we want to solve AI Safety, and academia the mentoring environment we need to face this problem. And in fact, it seems that the lack of concreteness of the AI Safety problems is one key aspect that may be holding back extraordinary researchers from joining efforts in a problem that they also believe to be important.
Instead of academia, our community relies sometimes on independent researchers. It is comforting that some of us can use the money we have to carry out this important research, and I celebrate it. But I wished this were more the exception than the rule. AI Safety might be a new field, but science nowadays is hardly a place where one can make any important contribution without the expertise and lots of effort. I believe there are many tools from machine, deep and reinforcement learning that can be used to solve this problem, and we need experts on them, not volunteers. This is a bit sad for some of us, because it means perhaps we might not be the right people to solve the problem. But it is not who solves it, but actually getting it done what matters, and I think that without academia this will not get done.
For this reason, I am happy the Future of Life Institute is trying to promote an academic community of researchers in AI Safety. I know the main bottleneck might be the lack of experienced researchers with mentoring capability. That's reasonable, but one key idea to address this might be focussing our efforts on better defining the subproblems. Mathematicians know definitions are very important, and definitions of these (toy?) problem may be the key to both making it easier for senior researchers with the mentoring capacity to try their hand at AI Safety, and also to make concrete and measurable progress that will allow us to sell AI Safety as scientific research area to the world.
PD: I did not write this for the red teaming contest, but I think this is a good candidate for it.
It sounds like our views are close!
I agree that this would be immensely valuable if it works. Therefore, I think it's important to try it. I suspect it likely won't succeed because it's hard to usefully simplify problems in a pre-paradigmatic field. I feel like if you can do that, maybe you've already solved the hardest part of the problem.
(I think most of my intuitions about the difficulty of usefully simplifying AI alignment relate to it being a pre-paradigmatic field. However, maybe the necessity of "security mindset" for alignment also plays into it.)
In my view, progress in pre-paradigmatic fields often comes from a single individual or a tight-knit group with high-bandwidth internal communication. It doesn't come from lots of people working on a list of simplified problems.
(But maybe the picture I'm painting is too black-and-white. I agree that there's some use to getting inputs from a broader set of people, and occasionally people who isn't usually very creative can have a great insight, etc.)
That's true. What I said sounded like a blanket dismissal of original thinking in academia, but that's not how I meant it. Basically, my picture of the situation is as follows:
Few people are capable of making major breakthroughs in pre-paradigmatic fields because that requires a rare kind of creativity and originality (and probably also being a genius). There are people like that in academia, but they have their quirks and they'd mostly already be working on AI alignment if they had the relevant background. For the sort of people I'm thinking about, they are drawn to problems like AI risk or AI alignment. They likely wouldn't need things to be simplified. If they look at a simplified problem, their mind immediately jumps to all the implications of the general principle and they think through the more advanced version of the problem because that's way more interesting and way more relevant.
In any case, there are a bunch of people like that in long-termist EA because EA heavily selects for this sort of thinking. People from academia who excel at this sort of thinking often end up at EA aligned organizations.
So, who is left in academia and isn't usefully contributing to alignment but could maybe contribute to it if we knew what we wanted from them? Those are the people who don't invent entire fields on their own.