Posted also on the AI Alignment Forum.

I try to communicate the point of this sequence of posts in an intuitive way.

Everyone wants to do good; however, sometimes people don’t take into consideration that they might be wrong about what is good. This is particularly important if, for example, one’s actions will affect many individuals, or have long-term repercussions. Think about the Crusades: it’s likely that the Christians who initiated them believed they were doing good.

I bet that you are not planning to start a holy war tomorrow while stressing over whether it’s a good idea or not. Still, wouldn’t it be nice if there was a way to improve our understanding of what good is? Something that would give us a warning if we were about to start the 21st-century equivalent of the Crusades. 

Political, ethical and philosophical discussions have partially fulfilled this function for more than two millennia and will continue to do so for the foreseeable future. Is there a way to make this process better, or simply less prone to crucial mistakes?

I think there is! As strange as it may sound on first impression, I think that AI can improve our understanding of what good is. In the same way as intelligence is not uniquely human, and accordingly some machines can carry out many cognitive tasks better than any human can, so also philosophy and ethics are unlikely to be exclusively human. If a machine is sufficiently human-like in some respects — maybe because it has emotions and empathy, maybe also rationality, or maybe something else — it will be able to carry out and advance philosophical discussion. At some point, it might even become better than any contemporary philosopher at ethics, as it has already happened in other domains or tasks such as classic board games and guessing the structure of proteins.

Language models are already quite good at commonsense morality. However, I am not too enthusiastic about current AIs: their values reflect whatever values are found in the training data, and are then adjusted according to what AI companies think is good, appropriate, socially acceptable, and so on. A language model could be a nasty contrarian, instead of a helpful assistant, if its engineers decided to make one that way. In short, current AIs lack independent thinking in the moral domain.

Instead, the kind of AI I’d like to have is human-like enough to independently ponder questions such as “Is this the best course of action here?” “Could I be wrong about this?” or “Is anything worth doing?”; and it is also unbiased enough to not be completely swayed by what its engineers think is good.

So, this project is about creating, or allowing others to create, an AI that tries its best at understanding what good is; that asks itself whether its concept of good is flawed, and is willing to update it after careful consideration in the face of new knowledge; then, it communicates what it thinks. And it does all of this because it thinks that’s the right thing to do in its situation.

It’s an AI that can’t be used for bad purposes, unless you manage to convince it that doing whatever bad action you want it to carry out is the right thing to do; but I think this kind of brainwashing will be very difficult if the AI is designed appropriately and is given enough knowledge about the world.

If this doesn’t sound convincing enough, let’s try changing point of view. How would you go about trying your absolute best at doing good?

You might consider summoning the best ethicists in the world and ask for their opinion; or maybe surveying literally everyone on the planet. But the group of ethicists might be biased, and it’s unclear how you would aggregate everyone’s opinion in the right way. If the global survey was carried out in the year 1500, you might conclude that witches exist and that torture is fine.

It seems you’d need some kind of process for understanding good that is in many ways unbiased, that considers as many different perspectives as possible, and also takes into account that its own conclusions could be wrong. But it also seems that, by following this path, you’d end up with something similar to the AI I’ve just written about.

… Or maybe you have a different idea and decide to leave a comment below!

You can support my research through Patreon here.

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities