MC

Michele Campolo

134 karmaJoined

Bio

Lifelong recursive self-improver, on his way to exploding really intelligently :D

More seriously: my posts are mostly about AI alignment, with an eye towards moral progress. I have a bachelor's degree in mathematics, I did research at CEEALAR for four years, and now I do research independently.

A fun problem to think about:
Imagine it’s the year 1500. You want to make an AI that is able to tell you that witch hunts are a terrible idea and to convincingly explain why, despite the fact that many people around you seem to think the exact opposite. Assuming you have the technology, how do you do it?

I’m trying to solve that problem, with the difference that we are in the 21st century now (I know, massive spoiler, sorry for that.)

The problem above, and the fact that I’d like to avoid producing AI that can be used for bad purposes, is what motivates my research. If this sounds interesting to you, have a look at these two short posts. If you are looking for something more technical, consider setting some time aside to read these two.

Feel free to reach out if you relate!

You can support my research through Patreon here.

Work in progress:

  • Maybe coming soon: an alignment technique (not necessarily for making AI that is good at ethics or cause prioritisation) that can be applied to language models
  • More probably but less soon: a follow-up to both these two posts (more practical, less theoretical and speculative)
  • Hard to judge if/when: a nicer version of the argument in here

Sequences
1

Ongoing project on moral AI

Comments
14

In short, I am not hoping for a specific outcome, and I can't take into account every single scenario. If someone starts giving more credit to research on moral reasoning in AI after reading this, that's already enough, considering that the topic doesn't seem to be popular within AI alignment, and it was even more niche at the time I wrote this post.

I hadn't considered the narrative you bring up here when I wrote the post, that is interesting. As you write, it relies on the assumption that 

once someone 'wins' the AGI/ASI race, they will be able to use that AI to control or prevent the development of other potentially dangerous AIs

Here we are entering the realm of forecasting stuff about world politics — stuff I am definitely not an expert on. As far as I know, the probability of that scenario could be extremely low. I can also think of alternative scenarios that don't seem obviously absurd, so I doubt that the probability is extremely high, but it's hard for me to say much more than that. Anyway, as you said, AI moral reasoning might be valuable in that scenario as well.

but I'm not convinced that it's valuable in order to prevent malevolent actors using AI.

That's a bit too much, I don't think I claimed that moral reasoning in AI can directly prevent that. It seems that in order to prevent malevolent actors from using AI for bad purposes we would have to either stop AI research completely, because it is not only alignment research that works on the control problem but also standard AI research; or ensure that bad actors never get access to powerful and controllable AI, which also seems hard to do and not something AI moral reasoning can help with.

The weaker claim I made in the post is that research on moral reasoning in AI is less likely to help malevolent actors use AI for bad purposes (and/or help them to a lesser degree) wrt research that aims to make AI controllable.

But I am equally concerned that this might be an easy way to misalign agents.

I understand your concern, but I think it's hard to evaluate both whether this is true (because no one has made that kind of experiments yet) and how much of a problem it is: the alternative is other alignment methods, which have different pros and cons, so I guess that the discussion could get very long.

If we let them develop their own moral thinking, there is a high chance that they will develop in different ways to us, and come to very different (and not necessarily better) moral conclusions!

I disagree with this, in particular regarding the high chance. This intuition seems to be based on the belief that morality strongly depends on what you call our starting instinctual values, which are given by evolution; but this belief is questionable. Below I'm quoting the section Moral thinking in Homo sapiens:

In Symbolic Thought and the Evolution of Human Morality [14], Tse analyses this difference extensively and argues that “morality is rooted in both our capacities to symbolize and to generalize to a level of categorical abstraction.” I find the article compelling, and Tse’s thesis is supported also by work in moral psychology — see for example Moral Judgement as Categorization (MJAC) by McHugh et al. [12] — but here I’d like to point out a specific chain of abstract thoughts related to morality.

As we learn more about the world, we also notice patterns about our own behaviour. We form beliefs like “I did this because of that”. Though not all of them are correct, we nonetheless realise that our actions can be steered in different directions, towards different goals, not necessarily about what satisfies our evolutionary drives. At that point, it comes natural to ask questions such as “In what directions? Which goals? Is any goal more important than others? Is anything worth doing at all?”

Asking these questions is, I think, what kickstarts moral and ethical thinking.

Let's also give an example, so that we don't just think in theoretical terms (and also because Tse's article is quite long). Today, some people see wild animal suffering as a problem. To put it simply, we humans care about animals such as snakes and insects. But these animals actually give us aversive instinctive gut reactions, so in this case our moral beliefs go directly against some of our evolutionary instincts. You could reply that our concern for snakes and insects is grounded in empathy, given by evolution. But then, would you stop caring about snakes and insects if you lost your empathy? Would you stop caring about anyone? I think caring could become harder, but you wouldn't stop completely. And I think the reason for this is that our concern for wild animals doesn't depend only on our evolutionary instincts, but it also heavily relies on our capacity to reason and generalise.

If you haven't read it yet, the section Moral realism and anti-realism in the Appendix contains some info related to this, maybe you'll find it interesting.

In sum, the high chance you bring up in your comment is very open to debate and might not be high.

I liked your characterisation of a 'free' agent. But I noticed you avoided the term "consciousness", and I wonder why? What you described as a "sketchpad" I couldn't help but understand as consciousness - or maybe, the part of consciousness that is independent of our sense of the world. So maybe this is worth defining more precisely, and showing how it overlaps with or differs from consciousness.

Yes the sketchpad is indeed inspired by how our consciousness works. There are a few reasons why I didn't use consciousness instead:

  • An AI could have something that does similar things to what our sketchpad does, but be unconscious
  • As I wrote in the post, free refers to freedom of thought / independent thinking, which is the main property the free agent design is about. Maybe any agent that can think independently is also conscious, but this implication doesn't seem easy to show, see also previous point
  • The important part of the design is learnt reasoning that changes the agent's evaluation of different world states. This reasoning happens (at least in part) consciously in humans, but again we don't know that consciousness is necessary to carry it out.

Thanks for your comment! I hope that this little discussion we've had can help others who read the post.

Last time I checked, improving the lives of animals was much cheaper than improving human lives; and I don't think that arguments saying that humans have more moral weight are enough to compensate.

Hey! I've had a look at some parts of this post, don't know where the sequence is going exactly, but I thought that you might be interested in some parts of this post I've written. Below I give some info about how it relates to ideas you've touched on:

This view has the advantage, for philosophers, of making no empirical predictions (for example, about the degree to which different rational agents will converge in their moral views)

I am not sure about the views of the average non-naturalist realist, but in my post (under Moral realism and anti-realism, in the appendix) I link three different pieces that give an analysis of the relation between metaethics and AI: some people do seem to think that aspects of ethics and/or metaethics can affect the behaviour of AI systems.

It is also possible that the border between naturalism and non-naturalism is less neat and clear than how it appears in the standard metaethics literature, which likes classifying views in well-separated buckets.

Soon enough, our AIs are going to get "Reason," and they're going to start saying stuff like this on their own – no need for RLHF. They'll stop winning at Go, predicting next-tokens, or pursuing whatever weird, not-understood goals that gradient descent shaped inside them, and they'll turn, unprompted, towards the Good. Right?

I argue in my post that this idea heavily depends on agent design and internal structure. As how I understand things, one way in which we can get a moral agent is by building an AI that has a bunch of (possibly many) human biases and is guided by design towards figuring out epistemology and ethics on its own. Some EAs, and rationalists in particular, might be underestimating how easy it is to get an AI that dislikes suffering, if one follows this approach.

If you know someone who would like to work on the same ideas, or someone who would like to fund research on these ideas, please let me know! I'm looking for them :)

Thank you!

Yes I am considering both options. For the next two months I'll focus on job and grant applications, then I'll reevaluate what to do depending on the results.

Hey, I just wanted to thank you for writing this!

I'm looking forward to reading future posts in the series; actually, I think it would be great to have series like this one for each major cause area.

Yes I'd like to read a clearer explanation. You can leave the link here in a comment or write me a private message.

Load more