Do you want to talk about why you're relatively optimistic? I've tried to explain my own concerns/pessimism at https://www.lesswrong.com/posts/EByDsY9S3EDhhfFzC/some-thoughts-on-metaphilosophy and https://forum.effectivealtruism.org/posts/axSfJXriBWEixsHGR/ai-doing-philosophy-ai-generating-hands.
Thanks! I hope this means you'll spend some more time on this type of work, and/or tell other philosophers about this argument. It seems apparent that we need more philosophers to work on philosophical problems related to AI x-safety (many of which do not seem to be legible to most non-philosophers). Not necessarily by attacking them directly (this is very hard and probably not the best use of time, as we previously discussed) but instead by making them more legible to AI researchers, decisionmakers, and the general public.
In my view, there isn't much desire for work like this from people in the field and they probably wouldn't use it to inform deployment unless a lot of effort is also added from the author to meet the right people, convince theme to spend the time to take it seriously etc.
Any thoughts on Legible vs. Illegible AI Safety Problems, which is in part a response to this?
I wrote a post that I think was partly inspired by this discussion. The implication of it here is that I don't necessarily want philosophers to directly try to solve the many hard philosophical problems relevant to AI alignment/safety (especially given how few of them are in this space or concerned about x-safety), but initially just to try to make them "more legible" to others, including AI researchers, key decision makers, and the public. Hopefully you agree that this is a more sensible position.
I agree that many of the problems on my list are very hard and probably not the highest marginal value work to be doing from an individual perspective. Keep in mind that the list was written 6 years ago, when it was less clear when the AI takeoff would start in earnest, or how many philosophers will become motivated to work on AI safety when AGI became visibly closer. I still had some hope that when the time came, a significant fraction of all philosophers would become self-motivated or would be "called to arms" by a civilization-wide AI safety effort, and would be given sufficient resources including time, so the list was trying to be more comprehensive (listing every philosophical problem that I thought relevant to AI safety) than prioritizing. Unfortunately, the reality is nearly the completely opposite of this.
Currently, one of my main puzzles is why philosophers with public AI x-risk estimates still have numbers in the 10% range, despite reality being near the most pessimistic of my range of expectations, and it looking like that the AI takeoff/transition will occur while most of these philosophical problems will remain in a wide open or totally confused state, and AI researchers seem almost completely oblivious or uncaring about this. Why are they not making the same kind of argument that I've been making, that philosophical difficulty is a reason that AI alignment/x-safety is harder than many think, and an additional reason to pause/stop AI?
Right, I know about Will MacAskill, Joe Carlsmith, and your work in this area, but none of you are working on alignment per se full time or even close to full time AFAIK, and the total effort is clearly far from adequate to the task at hand.
I think some have given up philosophy to work on other things such as AI alignment.
Any other names you can cite?
In my view, there isn't much desire for work like this from people in the field and they probably wouldn't use it to inform deployment unless a lot of effort is also added from the author to meet the right people, convince them to spend the time to take it seriously etc.
Thanks, this makes sense to me, and my follow-up is how concerning do you think this situation is?
One perspective I have is that at this point, several years into a potential AI takeoff, with AI companies now worth trillions in aggregate, alignment teams at AI companies still have virtually no professional philosophical oversight (or outside consultants that they rely on), and are kind of winging it based on their own philosophical beliefs/knowledge. It seems rather like trying to build a particle collider or fusion reaction with no physicists on the staff, only engineers.
(Or worse, unlike engineers' physics knowledge, I doubt that receiving a systematic education in fields like ethics and metaethics is a hard requirement for working as an alignment researcher. And even worse, unlike the situation in physics, we don't even have settled ethics/metaethics/metaphilosophy/etc. that alignment researchers can just learn and apply.)
Maybe the AI companies are reluctant to get professional philosophers involved, because in the fields that do have "professional philosophical oversight", e.g., bioethics, things haven't worked out that well. (E.g. human challenge trials being banned during COVID.) But to me, this would be a signal to yell loudly that our civilization is far from ready to attempt or undergo an AI transition, rather than a license to wing it based on one's own philosophical beliefs/knowledge.
As an outsider, the situation seems cray alarming to me, and I'm confused that nobody else is talking about it, including philosophers like you who are in the same overall space and looking at roughly the same things. I wonder if you have a perspective that makes the situation not quite as alarming as it appears to me.
Do you have any insights into why there are so few philosophers working in AI alignment, or closely with alignment researchers? (Amanda Askell is the only one I know.) Do you think this is actually a reasonable state of affairs (i.e., it's right or fine that almost no professional philosophers work directly as or with alignment researchers), or is this wrong/suboptimal, caused by some kind of cultural or structural problem? It's been 6 years since I wrote Problems in AI Alignment that philosophers could potentially contribute to and I've gotten a few comments from philosophers saying they found the list helpful or that they'll think about working on some of the problems, but I'm not aware of any concrete follow-ups.
If it is some kind of cultural or structural problem, it might be even higher leverage to work on solving that, instead of object level philosophical problems. I'd try to do this myself, but as an outsider to academic philosophy and also very far from any organizations who might potentially hire philosophers to work on AI alignment, it's hard for me to even observe what the problem might be.
Interesting re belief in hell being a key factor, I wasn't thinking about that.
It seems like the whole AI x-risk community has latched onto "align AI with human values/intent" as the solution, with few people thinking even a few steps ahead to "what if we succeeded"? I have a post related to this if you're interested.
possibly the future economy will be so much more complicated that it will still make sense to have some distributed information processing in the market rather than have all optimisation centrally planned
I think there will be distributed information processing, but each distributed node/agent will be a copy of the central AGI (or otherwise aligned to it or shares its values), because this is what's economically most efficient, minimizes waste from misaligned incentives and so on. So there won't be the kind of value pluralism that we see today.
I assume we won't be able to know with high confidence in advance what economic model will be most efficient post-ASI.
There's probably a lot of other surprises that we can't foresee today. I'm mostly claiming that post-AGI economics and governance probably wont look very similar to today's.
Why do you think this work has less value than solving philosophical problems in AI safety?
From the perspective of comparative advantage and counterfactual impact, this work does not seem to require philosophical training. It seems to be straightforward empirical research, that many people could do, besides the very few professionally trained AI-risk-concerned philosophers that humanity has.
To put it another way, I'm not sure that Toby was wrong to work on this, but if he was, it's because if he hadn't, then someone else with more comparative advantage for working on this problem (due to lacking training or talent for philosophy) would have done so shortly afterwards.
The argument tree (arguments, counterarguments, counter-counterarguments, and so on) is exponentially sized and we don't know how deep or wide we need to expand it, before some problem can be solved. We do know that different humans looking at the same partial tree (i.e., philosophers who have read the same literature on some problem) can have very different judgments as to what the correct conclusion is. There's also a huge amount of intuition/judgment involved in choosing which part of the tree to focus on or expand further. With AIs helping to expand the tree for us, there are potential advantages like you mentioned, but also potential disadvantages, like AIs not having good intuition/judgment about what lines of arguments to pursue, or the argument tree (or AI-generated philosophical literature) becoming too large for any humans to read and think about in a relevant time frame. Many will be very tempted to just let AIs answer the questions / make the final conclusions for us, especially if AIs also accelerate technological progress, creating many urgent philosophical problems related to how to use them safely and beneficially. Or if humans try to make the conclusions, can easily get them wrong despite AI help with expanding the argument tree.
So I think undergoing the AI transition without solving metaphilosophy, or making AIs autonomously competent at philosophy (good at getting correct conclusions by themselves) is enormously risky, even if we have corrigible AIs helping us.