Back in February, I attended the Bay Area EA Global as I have every year since they started having them. I didn't have a solid plan for what to do there this year, though, so I decided to volunteer. That means I only attended the sessions where I was on room duty, and otherwise spent the day having a few 1:1s when I wasn't on shift.
That's okay because, as everyone always says, the 1:1s are the best part of EA Global, and once again they were proven right.
Among the many great folks I met and friends I caught up with, I got the chance to meet Ronen Bar and learn about his idea of AI moral alignment. And when he told me about it, I was embarrassed I hadn't thought of it myself.
Simply put, moral alignment says that, rather than trying to align AI with human values, we try to explicitly align it to be a positive force for all sentient beings.
In all my years of thinking about AI alignment, I've not exactly ignored animals and other creatures both known and unknown, but I also figured they'd get brought along because humans care about them. But I have to admit, while it might come to the same outcome, it feels more authentic to say I want AI that is aligned to all beings rather than just humans because I, though I may be human, do in fact care about the wellbeing of all life and wish for all of it to flourish as it best can with the aid of future AI technology.
I think I missed articulating an idea like moral alignment because I was too close to the ideas. That is, I understood intuitively that if we succeeded in building AI aligned with human flourishing, that would necessarily mean alignment with the flourishing of all life, and in fact I've said that the goal of building aligned AI is to help life flourish, but not that AI should be aligned to all life directly. Now that we are much closer to building artificial superintelligence and need to figure out how to align it, the importance of aligning to non-human life stands out to me as a near-term priority.
For example, I can imagine us building human-aligned AI that ignores the plight of factory farmed animals, the suffering of shrimp, and the pain of bugs because lots of humans don't seem to care that much about their conditions. Such an AI would perhaps not be perfectly aligned in the ideal way we originally imagined aligned AI would be, but it would certainly be a kind of alignment with human goals, and it would be a travesty for the non-human beings it left out.
So let's not do that. Let's figure out how to align AI so that it's not just good for a few people or even all people, but so that it's good for all beings everywhere.
I don't think we need to solve ethics in order to work on improving the ethics of models. Ethics may be something unsolvable, yet some AI models are and will be instilled with some values, or there will be some system to decide on the value section problem. I think more people need to work on that.
Just now a great post relating to the value selection problem was published :
Beyond Short-Termism: How δ and w Can Realign AI with Our Values