Hide table of contents

Back in February, I attended the Bay Area EA Global as I have every year since they started having them. I didn't have a solid plan for what to do there this year, though, so I decided to volunteer. That means I only attended the sessions where I was on room duty, and otherwise spent the day having a few 1:1s when I wasn't on shift.

That's okay because, as everyone always says, the 1:1s are the best part of EA Global, and once again they were proven right.

Among the many great folks I met and friends I caught up with, I got the chance to meet Ronen Bar and learn about his idea of AI moral alignment. And when he told me about it, I was embarrassed I hadn't thought of it myself.

Simply put, moral alignment says that, rather than trying to align AI with human values, we try to explicitly align it to be a positive force for all sentient beings.

In all my years of thinking about AI alignment, I've not exactly ignored animals and other creatures both known and unknown, but I also figured they'd get brought along because humans care about them. But I have to admit, while it might come to the same outcome, it feels more authentic to say I want AI that is aligned to all beings rather than just humans because I, though I may be human, do in fact care about the wellbeing of all life and wish for all of it to flourish as it best can with the aid of future AI technology.

I think I missed articulating an idea like moral alignment because I was too close to the ideas. That is, I understood intuitively that if we succeeded in building AI aligned with human flourishing, that would necessarily mean alignment with the flourishing of all life, and in fact I've said that the goal of building aligned AI is to help life flourish, but not that AI should be aligned to all life directly. Now that we are much closer to building artificial superintelligence and need to figure out how to align it, the importance of aligning to non-human life stands out to me as a near-term priority.

For example, I can imagine us building human-aligned AI that ignores the plight of factory farmed animals, the suffering of shrimp, and the pain of bugs because lots of humans don't seem to care that much about their conditions. Such an AI would perhaps not be perfectly aligned in the ideal way we originally imagined aligned AI would be, but it would certainly be a kind of alignment with human goals, and it would be a travesty for the non-human beings it left out.

So let's not do that. Let's figure out how to align AI so that it's not just good for a few people or even all people, but so that it's good for all beings everywhere.

Cross-posted from my blog, Uncertain Updates.

27

1
1

Reactions

1
1
Comments5
Sorted by Click to highlight new comments since:

Really enjoyed your post. The idea of aligning AI to all sentient beings, not just humans, feels like a crucial shift. Like you said, it’s not enough to just follow human values because we often overlook a lot of suffering.

Your thoughts made me think of this sci-fi story called Kindness to Kin. It’s about an alien leader who can’t understand why humans would help others outside their family. But then a human points to her grandson (who feels empathy for everyone) and says he’s family too. The line that stuck with me was “We’ve been searching for our family for so, so long.”

That really connects with what you said about moral alignment being about all life, not just humans. Thanks for putting this out there!

Thinking this through: what's novel is not so much the idea that the path AI takes affects non-human welfare, but that it's worth developing this as its own subfield.

And the argument for this is much stronger in the current context: the arguments for rapid AI progress, AI companies not being responsible by default and AI not being aligned by default are much more legible these days.

And that makes it much easier to build energy around this as there seem to be folks in the EA animal welfare crowd who were skeptical about AI/AI risk before, but now see that this seems like it is going to be a big deal. Compared to standard AI alignment/governance, the explicit inclusion of animals makes it resonate more with their current interests, in addition being an area where their existing skills/knowledge are likely to be more applicable.

So I suspect what matters is not just having the idea, but deciding to promote the idea in the right context.

Scott Alexander discusses this in his post here. I'm skeptical that humans will able to align AI with morality anytime soon. Humans have been disagreeing about what morality consists of for a few thousand years. It's unlikely we'll solve the issue in the next 10.

I don't think we need to solve ethics in order to work on improving the ethics of models. Ethics may be something unsolvable, yet some AI models are and will be instilled with some values, or there will be some system to decide on the value section problem. I think more people need to work on that. 
Just now a great post relating to the value selection problem was published :
Beyond Short-Termism: How δ and w Can Realign AI with Our Values 
 

That post on deliberative alignment seems to be just about one method by which we might build aligned AIs, not about the idea of moral alignment in general.

I'm probably less skeptical than you are because take as evidence that we align humans to moral value systems all the time. And although we don't do it perfectly, there are some very virtuous folks out there who take their morals seriously. So I think alignment to some system of morality is certainly possible.

Whether or not we can figure out which moral judgements are "right" is another matter, although perhaps we can at least build AI that is aligned with universally recognized norms like "don't murder" and "save lives".

Curated and popular this week
Relevant opportunities