Back in February, I attended the Bay Area EA Global as I have every year since they started having them. I didn't have a solid plan for what to do there this year, though, so I decided to volunteer. That means I only attended the sessions where I was on room duty, and otherwise spent the day having a few 1:1s when I wasn't on shift.
That's okay because, as everyone always says, the 1:1s are the best part of EA Global, and once again they were proven right.
Among the many great folks I met and friends I caught up with, I got the chance to meet Ronen Bar and learn about his idea of AI moral alignment. And when he told me about it, I was embarrassed I hadn't thought of it myself.
Simply put, moral alignment says that, rather than trying to align AI with human values, we try to explicitly align it to be a positive force for all sentient beings.
In all my years of thinking about AI alignment, I've not exactly ignored animals and other creatures both known and unknown, but I also figured they'd get brought along because humans care about them. But I have to admit, while it might come to the same outcome, it feels more authentic to say I want AI that is aligned to all beings rather than just humans because I, though I may be human, do in fact care about the wellbeing of all life and wish for all of it to flourish as it best can with the aid of future AI technology.
I think I missed articulating an idea like moral alignment because I was too close to the ideas. That is, I understood intuitively that if we succeeded in building AI aligned with human flourishing, that would necessarily mean alignment with the flourishing of all life, and in fact I've said that the goal of building aligned AI is to help life flourish, but not that AI should be aligned to all life directly. Now that we are much closer to building artificial superintelligence and need to figure out how to align it, the importance of aligning to non-human life stands out to me as a near-term priority.
For example, I can imagine us building human-aligned AI that ignores the plight of factory farmed animals, the suffering of shrimp, and the pain of bugs because lots of humans don't seem to care that much about their conditions. Such an AI would perhaps not be perfectly aligned in the ideal way we originally imagined aligned AI would be, but it would certainly be a kind of alignment with human goals, and it would be a travesty for the non-human beings it left out.
So let's not do that. Let's figure out how to align AI so that it's not just good for a few people or even all people, but so that it's good for all beings everywhere.
That post on deliberative alignment seems to be just about one method by which we might build aligned AIs, not about the idea of moral alignment in general.
I'm probably less skeptical than you are because take as evidence that we align humans to moral value systems all the time. And although we don't do it perfectly, there are some very virtuous folks out there who take their morals seriously. So I think alignment to some system of morality is certainly possible.
Whether or not we can figure out which moral judgements are "right" is another matter, although perhaps we can at least build AI that is aligned with universally recognized norms like "don't murder" and "save lives".