This Thursday, March 26, from 5-7pm UK time, we are hosting a live discussion of the debate week topic here in the comments. It’ll be quite like this previous symposium.
You can comment throughout the week on our discussion thread, but I’m organising this event to serve as a focal point — a pre-agreed time when interested people will be online and ready to respond to comments.
How it works:
- Any forum user can write a comment that asks a question or introduces a consideration, the answer of which might affect people’s answer to the debate statement.
- The symposium’s signed-up participants (listed below) will be online between 5-7pm GMT on Thursday, to respond to your comments.
- To be 100% clear - you, the reader, are very welcome to join in any conversation on this post. You don't have to be a listed participant to take part.
Our participants:
@Jo_🔸 : Jo works in animal welfare, with a focus on neglected species. He’s also written some great and under-rated pieces about transformative AI and animals.
@Alistair Stewart: Alistair is Development & Partnerships Manager at the Center for Reducing Suffering. He organised AI, Animals, & Digital Minds London 2025 and is currently co-organising Sentient Futures Summit London 2026. He has written about AGI & Animals:
- The Animal Gap in AI Governance
- And with Niki Dupuis, What failure looks like for animals
Lee Wall: Lee just finished an AIxBio ERA fellowship. Last year he gave a really cool talk on the idea of aligning AI agents to animal preferences via reinforcement learning. You can watch it here.
@Hannah McKay🔸: Hannah is an animal welfare research analyst at Rethink Priorities, where she has researched and written on farmed shrimp welfare, wild animal welfare and more. Lately, she’s been thinking about what the future of AI means for animal welfare.
What to do now?
- Add the event to your google calendar, so you don’t forget.
- Write a comment, for the participants to respond to on Thursday.

I think the debate motion bundles together several distinct mechanisms by which human flourishing under AGI could translate to animal welfare and I’m interested in which ones folks put the most weight on. I've tried to identify mechanisms that might connect human and animal welfare under AGI, each of which could hold in some possible worlds and fail in others. This list isn't a claim about what I think is most probable, since I'm highly uncertain. Some mechanisms (non-exhaustive list) might be:
Expanding moral circle: as AGI makes humans become more secure and prosperous, humans may extend moral concern outward to more groups. I think this is possible, but wealthy societies have industrialised animal agriculture and increased reported animal welfare concern simultaneously, and the concern doesn’t prevent poor animal welfare.
More resources: AGI-driven wealth could mean wealthier people could direct more resources for animal welfare. Global spending on improving animal welfare is currently tiny compared to the global meat industry, so more resources could make a meaningful difference.
Technological co-benefits: AGI solving human problems could also solve the barriers to replacing animal agriculture. I’m unsure how AGI-optimised factory farming plays out against other food systems that might come about with AGI.
Institutional improvement: AGI creates better, more rational institutions for humans and the benefits get extended to animals.
Moral AGI: a sufficiently capable AGI reasons from first principles, weights animal suffering heavily, and acts on it unprompted. Unlike moral circle expansion, which requires humans to change their values, this could bypass human values. I think this is possible but worry about AGI instead being well-aligned with today’s human values, which I don’t think would benefit animals.
Note: anything I comment during the symposium is my personal view and not necessarily the views of my employer :)
Agree that these are important and unresolved crucial considerations.
I guess a "meta" consideration here is to what extent things that hold in our world hold in a "human-friendly" post-AGI world. I'm pessimistic on resolving this, because given that there is absolutely no track record, we should be very uncertain on our answer to that question. We'll have to wait and see (if we can see), or just have different factions taking different bets: there could be alt protein groups focusing on preventing further bans, and alt protein group building their ToC on the assumption that bans will not matter in a post-AGI world.
Nice little Claude summary of the debate so far, which might help identify the missing points:
For example, I think a crux might be the tractability of animal-specific alignment work. e.g. can we align AI to specific values or (just) make it corrigible to our preferences and commands? I don't know, but this would massively affect my estimation of the tractability here.
This is definitely a hard debate to disentangle, because I would personally reject the question of alignment as a crux. For now, I strongly believe that the total welfare of animals has been entirely uncorrelated with our moral intentions toward animals. Total welfare has mostly changed because of land use, due to human interests.
I agree that in AGI-transformed futures that go well for humans, human desires may start playing a larger role. However, I expect that whether we mean well for animals (or don't care much about them) will not be cleanly correlated with outcomes for them.
There are worlds where we mean well for a large part of animals, stop intentionally killing them, and help certain wild animals. But that world could very well end up having a large population of animals living bad lives.
On the other hand, out of apathy and even negative feeling toward wild animals, we may decide to limit their spread and use resources in a way that optimizes for human flourishing, over animal abundance. That world could end up being much better for animal welfare.
Maybe some extreme scenarios tip the scales, for example if we bred incredibly happy genetically modified animals due to positive feelings toward them. But I'm not confident on putting any weight on such utilitarian-leaning scenarios when assessing post-AGI futures. Because part of the reason human moral intentions are not correlated with total animal welfare is that humans are not scope-sensitive utilitarians.
What kinds of values will humans have post-AGI, if AGI goes well for us? We don't need to be scope-sensitive utilitarians to want to adopt even radical preferences like ending animal exploitation and solving WAS, no? (Most humans don't like factory farming or the idea of cute animals being eaten alive.)
Solving WAS intuitively seems too niche for people to deliberately change their mind on that, but I could be wrong. After all, the Bible says that the Lion will lie down with the lamb and eat straw like the ox, so it could be that human preferences tend to come back to the idea that animal suffering can be bad even when it doesn't depend on human actions.
I guess the causal mechanism I'm thinking of here is:
Maybe this is foolish and naive on my part! And maybe I'm wrong to think our moral preferences/intuitions will be so robust to the disruption of AGI, even if AGI goes well for us.
Toby, would you be more optimistic for animals if we can align AGI to specific values rather than just making it corrigible to humans' preferences and commands?
My impression is that pro-animal views are (dramatically?) overrepresented at Anthropic relative to the rest of society. If Anthropic gets to AGI first and instils/locks in pro-animal values in/to that AGI, that seems better for animals than if whoever gets to AGI first just makes it purely corrigible, because most humans who operate the purely corrigible AGI won't be as pro-animal.
I think in the long-run I'd be more confident that corrigible AI would lead to good futures than AI that is aligned to specific values (besides perhaps some side-constraints). This is mainly because I'm pretty clueless and think our current values are likely to be wrong, and I'd rather we had more time to improve them.
I haven't thought enough about the relationship between power concentration and corrigibility though - I expect that could change my mind.
Oh yes but I made the above comment more to represent the view that I've seen in some AI x Animals work that we should be working on aligning AGI to pro-animal values, through things like AnimalHarmBench etc..
This makes sense. I would worry about the purely corrigible AGI being used by actors in such a way that we never get to instil the correct/good/post-long-reflection values in AGI/ASI down the line.
Yep fair, that's what I mean by "power concentration and corrigibility". AGI being constrained by some values makes it at least minimally democratic (values are shaped by everyone who makes up a language, especially for LLMs).
PS- looks like Michael Dickens just posted on this.
My position statement (20% disagree with the statement "If AGI goes well for humans, it'll go well for animals")[1]
If I accept conventional assumptions in EA Animal welfare[2], AGI will be negative for animals in expectation. On the other hand, AGI being good for humans makes it worse for animals in expectation. However, both rogue AGI and human-friendly AGI seem positive for animals in most scenarios: it just happens that the "bad" scenarios seem much worse than the "good" scenario.
Why is that? AGI, whether rogue or human-aligned, may not decide to keep other planets free of biological animals (though it seems like a bigger risk for human-aligned AGI). And EA Animal Welfare advocates generally believe that the likelihood that wild animal welfare is negative makes such spreading of biological animals too risky.
A small chance of this decision being made outweighs the positives. This seems very unlikely with rogue AGI (0.1%, perhaps much less), but it could still dominate the scales in my view. An AGI that is more human-friendly seems at least one order of magnitude more likely to terraform other planets.[3]
That said, this doesn't flip the sign of AI safety work. This judgment is lightly held; digital minds (human-like or animal-like) are a larger portion of welfare patients in expectation; and I have no idea of what the counterfactuals are. Thus, I don't treat this as an action-guiding beliefs.
To caveat, I think terraforming is still relatively unlikely in human-friendly scenarios because biodiversity becomes less instrumentally valuable post-AGI, so memes that would favor the existence of wild animal populations would lose in popularity. Even in human lock-in scenarios, the values that control AGI won't favor deep ecology.
How about farmed animals? Even in precision Livestock Farming's best and worst cases, suffering in factory farms shifts by a few orders of magnitude at most.[4] AGI makes the end of factory farming through developing alternatives more likely, though I'm more convinced by "biological food systems become unnecessary or unrecognizable" than "clean meat wins". In the vast majority of scenarios, wild animals would be the most numerous moral patients.[5]
Probably 0% on reflection because aliens could count as animals, but it's less indicative
Farmed animal welfare is negative, wild animal welfare is negative, "good" and "bad" relate to expected total welfare
Though what that looks like is still underdefined.
However, precision livestock farming offers massive near-term risks and opportunities for farmed animals, and interest in this area appears justified.
Human-friendly AGI could decide to only keep animals under human control, but that would probably not lead to massive animal populations.
This is a really interesting point that I hadn't thought of before.
Very lightly held counterargument to your conclusion:
P1: The more capable an AGI system is, the harder it is to align.
P2: Terraforming other planets requires AGI at the very top of the capability distribution.
P3: The pool of systems capable of terraforming is therefore drawn disproportionately from the capability range where misalignment is most likely.
Conclusion: Most worlds containing planet-terraforming AGI are probably rogue-AGI worlds. So the "spreading wild animal suffering to new planets" scenario may be more associated with alignment failure than alignment success.
Corollary: If you agree you should be mildly agree-voting.
Fair pushback!
For P1, I assumed that AGI going well for humans was basically even ASI going well for humans (just: "it happens we're in a good scenario for the fleshy humans"). I don't know if ASI is much less likely to go right than AGI - something as capable as AGI could already very easily be misaligned and I'm not sure that scales with increase in capabilities.
For P2, I'm scared that we don't necessarily need the top of the distribution of ASI to do this. I could imagine non-AGI worlds where human-brain-driven technological progress gets us there, though it seems very complicated resource-wise at this stage.
I agree that these two arguments together could undermine my vague "one order of magnitude difference" claim, but I'm not sure how much I believe them. I do come down to believing that most of my considerations will face counter-considerations which I am currently unaware of.
Nice points! A few questions:
I hesitated on how to frame the deep ecology thing, because I think it's entirely possible that it ends up locked in. I think my thought was something like the following. If AGI gets the values of its builders and then never modifies it, in the current race, it's unlikely that AGI would lock in deep ecology values: these don't seem massive in Chinese labs (could be wrong), and people in AI labs in the West are not hardcore ecologists, for the most part, because of political divisions.[1]
I do agree that AI systems could populate other worlds with animals for other reasons. Logically, we can't cover all of the reasons why systems that we don't know anything about would do something. The same applies to future humans.
(More broadly, I deliberately under-hedged all of the above. I don't think we have any action-guidance on AGIxAnimals)
Maybe it's good that environmentalists hate AI so much, because it ensures that people in labs are less likely to be friendly to pro-ecology views?
This will be more of a loose collection of weakly held takes and what I see as cruxes for this question than a firm position.
Also, some arguments I do not quite buy:
Some really cool points here Lee, and I mostly agree with you I think.
This could be very important. I'm not sure what it means for AGI to go well for humans if some of those humans have terminal preferences for suffering / are sadistic. If the AGI protects the rest of us from the sadists, is AGI going well for the sadists?
EDIT: as well as sadists, we can consider humans who think animal agriculture, testing etc. has enough aesthetic/historical/cultural value that it's worth continuing to do it in a post-AGI world of abundance.
My position statement (50% agree with the statement "If AGI goes well for humans, it'll go well for animals")
Can you say a bit more about what "AGI goes well for humans" means under your worldview? I hadn't heard of painism.
I should have sketched this out more.
In my view, AGI going well for humans should see:
Some kind of AGI technological innovation will be able to do 1); not clear to me how we get to 2), as we'll probably need some kind of political pro-democracy innovation (I don't think our existing political institutions will get us there).
What this world actually looks like, feels like, is very unclear to me! But if we do both those things, it seems more likely than not that we humans will both want and be able to help animals by abolishing animal exploitation and solving wild animal suffering.
Thanks! That's clarifying.
I wonder though - would that kind of world, where humans are empowered but don't experience intense (and perhaps moderate) suffering - be one where humans cared about animal welfare? I can see the intuition going either way. Either:
a) Extrapolating beyond person-to-person morality is (often) a luxury pursuit and more of it will happen in a post-scarcity world.
b) Caring about animal suffering in the food system and in nature requires compassion, and compassion is rooted in being able to imagine the states of the sufferer. If humans all live minimal suffering lives, they won't be able to do so.
I need to think about b) more. I see arguments in both directions.
I don't think I can properly imagine what it's like to be tortured or eaten alive, and yet the thought of each happening to me or someone else makes me feel some combination of horror, fear, upset and compassion. And the idea of suffering more intense than torture or being eaten alive (if future artificially sentient beings have wider welfare ranges than we do) is terrifying to me.
But if I could never suffer worse than a pinprick, maybe I would stop caring about the most intense forms of suffering. Concerning stuff.
If you had to allocate a marginal $500,000, would you put it towards animal-specific alignment work (like the ideas in this list) or general alignment work?
The backfire effects of general alignment work early on in AI safety may have outweighed the benefits; I worry that the same could be true for animal-specific alignment.
If I really believe that, I should probably want to avoid money going into animal-specific alignment at this stage, while extra 500,000$ to general alignment, while not necessarily positive, is less likely to cause major backfire events?
Thanks for your contributions to the discussion @Hannah McKay🔸 , @Jo_🔸 , @Lee Wall , and @Alistair Stewart!
I have to head off at 7, but you are welcome to keep commenting, as is anyone else who sees this comment.
Thanks for organising Toby!
Anyone can post a comment, which our guests and other participants can respond to. These comments might be questions, the answer to which might change your mind on the debate statement, or crucial considerations that you are uncertain about, and might be able to make progress on in this conversation.