This is probably a Utilitarianism 101 question. Many/most people in EA seem to accept as a given that:
1) Non-human agent's welfare can count toward utilitarian calculations (hence animal welfare)
2) AGI welfare cannot count towards utility calculations (otherwise alternative to alignment would be working on an AGI which has a goal of maximizing copies of itself experiencing maximum utility, likely a much easier task)
Which means there should be a compelling argument, or Schelling point, which includes animals but not AGIs into the category of moral patients. But I haven't seen any and can't easily think of a good one myself. What's the deal here? Am I missing some important basic idea about utilitarianism?
[To be clear, this is not an argument against alignment work. I'm mostly just trying to improve my understanding of the matter, but insofar there has to be an argument, it's one against the whatever branches of utilitarianism say yielding the world to AIs is an acceptable choice.]
I think this argument mostly fails in claiming that 'create an AGI which has a goal of maximizing copies of itself experiencing maximum utility' is meaningfully different than just ensuring alignment. This is in some sense exactly what I am hoping to get from an aligned system. Doing this properly would likely have to involve empowering humanity and helping us figure out what 'maximum utility' looks like first, and then tiling the world with something CEV-like.
The only ways this makes the problem easier compared to a classic ambitious alignment goal of 'do whatever maximizes the utility of the world' is the provision that the world be tiled with copies of the AGI, which is likely suboptimal. But this could be worth it if it made the task easier?
The obvious argument for why it would is that creating copies of itself with high welfare will be in the interest of AGI systems with a wide variety of goals, which relaxes the alignment problem. But this does not seem true. A paperclip AI will not want to fill the world with copies of itself experiencing joy, love and beauty but rather with paperclips. The AI systems will want to create copies of itself fulfilling its goals, not experiencing maximum utility by my values.
This argument risks identifying 'I care about the welfare (by my definition of welfare) of this agent' with 'I care about this agent getting to accomplish its goals'. As I am not a preference utilitarian I strongly reject this identification.
Tl;dr: I do care significantly about the welfare of AI systems we build, but I don't expect those AI system themselves to care much at all about their own welfare, unless we solve alignment.
>By "satisfaction" I meant high performance on its mesa-objective
Yeah, I'd agree with this definition.
I don't necessarily agree with your two points of skepticism, for the first one I've already mentioned my reasons, for the second one it's true in principle but it seems almost anything an AI would learn semi-accidentally is going to be much simpler and more intrinsically consistent than human values. But low confidence on both and in any case that's kind of beyond the point, I was mostly trying to understand your perspective on what utility is.