Hide table of contents

This Thursday, March 26, from 5-7pm UK time, we are hosting a live discussion of the debate week topic here in the comments. It’ll be quite like this previous symposium.

You can comment throughout the week on our discussion thread, but I’m organising this event to serve as a focal point — a pre-agreed time when interested people will be online and ready to respond to comments. 

How it works:

  • Any forum user can write a comment that asks a question or introduces a consideration, the answer of which might affect people’s answer to the debate statement.
  • The symposium’s signed-up participants (listed below) will be online between 5-7pm GMT on Thursday, to respond to your comments.
  • To be 100% clear - you, the reader, are very welcome to join in any conversation on this post. You don't have to be a listed participant to take part. 

Our participants:

@Jo_🔸 : Jo works in animal welfare, with a focus on neglected species. He’s also written some great and under-rated pieces about transformative AI and animals

@Alistair StewartAlistair is Development & Partnerships Manager at the Center for Reducing Suffering. He organised AI, Animals, & Digital Minds London 2025 and is currently co-organising Sentient Futures Summit London 2026. He has written about AGI & Animals:

Lee Wall: Lee just finished an AIxBio ERA fellowship. Last year he gave a really cool talk on the idea of aligning AI agents to animal preferences via reinforcement learning. You can watch it here

@Hannah McKay🔸: Hannah is an animal welfare research analyst at Rethink Priorities, where she has researched and written on farmed shrimp welfare, wild animal welfare and more. Lately, she’s been thinking about what the future of AI means for animal welfare. 

What to do now?

  • Add the event to your google calendar, so you don’t forget.
  • Write a comment, for the participants to respond to on Thursday. 

21

0
0

Reactions

0
0
Comments31
Sorted by Click to highlight new comments since:

I think the debate motion bundles together several distinct mechanisms by which human flourishing under AGI could translate to animal welfare and I’m interested in which ones folks put the most weight on. I've tried to identify mechanisms that might connect human and animal welfare under AGI, each of which could hold in some possible worlds and fail in others. This list isn't a claim about what I think is most probable, since I'm highly uncertain. Some mechanisms (non-exhaustive list) might be:

Expanding moral circle: as AGI makes humans become more secure and prosperous, humans may extend moral concern outward to more groups. I think this is possible, but wealthy societies have industrialised animal agriculture and increased reported animal welfare concern simultaneously, and the concern doesn’t prevent poor animal welfare.

  • How strong do you think the empirical relationship between prosperity and animal welfare concern actually is? And will concern translate to meaningful change?

More resources: AGI-driven wealth could mean wealthier people could direct more resources for animal welfare. Global spending on improving animal welfare is currently tiny compared to the global meat industry, so more resources could make a meaningful difference.

  • Do you think rising incomes shift food consumption patterns toward higher-welfare products, or are attitudes to food sticky enough that the pattern persists even under significant income growth?
  • Do resources get distributed to those who would direct them to animal welfare or do they get concentrated somewhere else?

Technological co-benefits: AGI solving human problems could also solve the barriers to replacing animal agriculture. I’m unsure how AGI-optimised factory farming plays out against other food systems that might come about with AGI.

  • How do you expect AGI-optimised conventional farming to compete against AGI-optimised alternative proteins, or some other food system? Which gets there first, and does that create a lock-in dynamic?
  • How much can AGI help with non-technical barriers, like regulatory and political constraints?

Institutional improvement: AGI creates better, more rational institutions for humans and the benefits get extended to animals.

  • Do you expect that new institutions reshaped by AGI would still need to explicitly include animal welfare considerations in the objectives, or could they emerge as a byproduct?
  • If AGI concentrates institutional power in the hands of a small number of actors, does that make pro-animal institutional reform more or less likely?
  • How sticky are today’s laws/regulations?

Moral AGI: a sufficiently capable AGI reasons from first principles, weights animal suffering heavily, and acts on it unprompted. Unlike moral circle expansion, which requires humans to change their values, this could bypass human values. I think this is possible but worry about AGI instead being well-aligned with today’s human values, which I don’t think would benefit animals.

  • Do you think rigorous moral reasoning from first principles tends toward weighting animal welfare heavily? 
  • How much does it matter that animal welfare is underrepresented in AI alignment frameworks today? 

 

Note: anything I comment during the symposium is my personal view and not necessarily the views of my employer :)

Agree that these are important and unresolved crucial considerations. 

I guess a "meta" consideration here is to what extent things that hold in our world hold in a "human-friendly" post-AGI world. I'm pessimistic on resolving this, because given that there is absolutely no track record, we should be very uncertain on our answer to that question. We'll have to wait and see (if we can see), or just have different factions taking different bets: there could be alt protein groups focusing on preventing further bans, and alt protein group building their ToC on the assumption that bans will not matter in a post-AGI world.

Nice little Claude summary of the debate so far, which might help identify the missing points:

The debate centres on whether human-aligned AGI would automatically benefit animals, or whether animal-specific interventions are needed.

The pessimistic case is well-represented. Jim Buhler argues we have no good reason to assume AI safety work helps animals — saving humans preserves factory farming, and the claim that empowered humans would improve wild animal welfare rests on untenable assumptions. Simon Eckerström Liedholm (Wild Animal Initiative) estimates only ~30% probability of good animal outcomes conditional on good human outcomes, largely because the most likely alignment path locks in current human values, which permit enormous animal suffering. Hannah McKay (Rethink Priorities) argues that cultivated meat won't be automatically solved by AGI — regulatory, political, and consumer barriers form a sequential chain where the combined probability of resolution is low.

The bridge position comes from Aidan Kankyoku, who thinks it probably (~70%) goes well for animals but that this isn't sufficient certainty to neglect animal-specific alignment. He argues animal welfare is now functionally a subsidiary of the "Make AI Go Well" movement.

MichaelDickens contributed three posts: a taxonomy of alignment research by animal-friendliness, a cost-effectiveness model finding alignment-to-animals only marginally more cost-effective than general alignment, and a meditation on how current alignment paradigms (unlike CEV) give him roughly 50/50 odds on animal outcomes.

The discussion thread (~58 comments) skews disagree, though with real spread. The most common argument for disagreement is historical precedent: technological and economic progress has been bad for animals so far, with factory farming as the central exhibit. Value lock-in is the second recurring worry — that alignment to current human values would freeze in a set of preferences that are largely indifferent to animal suffering (SimonM_, Babel, Dylan Richardson, Tristan Katz). Several voters also flag the risk of spreading wild animal suffering to new planets. On the agree side, the strongest argument is economic: post-scarcity conditions erode factory farming's viability because alternatives become cheaper (OscarD, Erich_Grunewald, Brad West, JDBauman). A few voters (Ronak Mehta at 100% Agree, Ligeia, Artūrs Kaņepājs) argue that a genuinely superintelligent system would recognise animal sentience as morally relevant. A notable cluster sits at or near 0% Agree not because they're confident things go badly, but because they think the question is unanswerable given the number of branching futures (NickLaing, Seth Ariel Green, Jim Buhler). Peter Wildeford offers a useful split: on a causal reading (alignment mechanisms also help animals) he's pessimistic; on an evidential reading (conditional on good human outcomes, what world are we in?) he's somewhat more optimistic.

For example, I think a crux might be the tractability of animal-specific alignment work. e.g. can we align AI to specific values or (just) make it corrigible to our preferences and commands? I don't know, but this would massively affect my estimation of the tractability here. 

This is definitely a hard debate to disentangle, because I would personally reject the question of alignment as a crux. For now, I strongly believe that the total welfare of animals has been entirely uncorrelated with our moral intentions toward animals. Total welfare has mostly changed because of land use, due to human interests. 

I agree that in AGI-transformed futures that go well for humans, human desires may start playing a larger role. However, I expect that whether we mean well for animals (or don't care much about them) will not be cleanly correlated with outcomes for them. 

There are worlds where we mean well for a large part of animals, stop intentionally killing them, and help certain wild animals. But that world could very well end up having a large population of animals living bad lives.

On the other hand, out of apathy and even negative feeling toward wild animals, we may decide to limit their spread and use resources in a way that optimizes for human flourishing, over animal abundance. That world could end up being much better for animal welfare.

Maybe some extreme scenarios tip the scales, for example if we bred incredibly happy genetically modified animals due to positive feelings toward them. But I'm not confident on putting any weight on such utilitarian-leaning scenarios when assessing post-AGI futures. Because part of the reason human moral intentions are not correlated with total animal welfare is that humans are not scope-sensitive utilitarians.

What kinds of values will humans have post-AGI, if AGI goes well for us? We don't need to be scope-sensitive utilitarians to want to adopt even radical preferences like ending animal exploitation and solving WAS, no? (Most humans don't like factory farming or the idea of cute animals being eaten alive.)

Solving WAS intuitively seems too niche for people to deliberately change their mind on that, but I could be wrong. After all, the Bible says that the Lion will lie down with the lamb and eat straw like the ox, so it could be that human preferences tend to come back to the idea that animal suffering can be bad even when it doesn't depend on human actions.

I guess the causal mechanism I'm thinking of here is:

  1. Most humans feel at least a little sad when they see a baby gazelle being eaten alive by hyenas
  2. AGI is so powerful that humans can order it to do things like "stop baby gazelles being eaten alive whilst retaining the beauty of nature and the complexity of ecosystems" and then it'll just go away and do it somehow

Maybe this is foolish and naive on my part! And maybe I'm wrong to think our moral preferences/intuitions will be so robust to the disruption of AGI, even if AGI goes well for us.

Toby, would you be more optimistic for animals if we can align AGI to specific values rather than just making it corrigible to humans' preferences and commands?

My impression is that pro-animal views are (dramatically?) overrepresented at Anthropic relative to the rest of society. If Anthropic gets to AGI first and instils/locks in pro-animal values in/to that AGI, that seems better for animals than if whoever gets to AGI first just makes it purely corrigible, because most humans who operate the purely corrigible AGI won't be as pro-animal.

I think in the long-run I'd be more confident that corrigible AI would lead to good futures than AI that is aligned to specific values (besides perhaps some side-constraints). This is mainly because I'm pretty clueless and think our current values are likely to be wrong, and I'd rather we had more time to improve them. 

I haven't thought enough about the relationship between power concentration and corrigibility though - I expect that could change my mind. 

Oh yes but I made the above comment more to represent the view that I've seen in some AI x Animals work that we should be working on aligning AGI to pro-animal values, through things like AnimalHarmBench etc..

This makes sense. I would worry about the purely corrigible AGI being used by actors in such a way that we never get to instil the correct/good/post-long-reflection values in AGI/ASI down the line.

Yep fair, that's what I mean by "power concentration and corrigibility". AGI being constrained by some values makes it at least minimally democratic (values are shaped by everyone who makes up a language, especially for LLMs). 

PS- looks like Michael Dickens just posted on this

My position statement (20% disagree with the statement "If AGI goes well for humans, it'll go well for animals")[1]

If I accept conventional assumptions in EA Animal welfare[2], AGI will be negative for animals in expectation. On the other hand, AGI being good for humans makes it worse for animals in expectation. However, both rogue AGI and human-friendly AGI seem positive for animals in most scenarios: it just happens that the "bad" scenarios seem much worse than the "good" scenario.

Why is that? AGI, whether rogue or human-aligned, may not decide to keep other planets free of biological animals (though it seems like a bigger risk for human-aligned AGI). And EA Animal Welfare advocates generally believe that the likelihood that wild animal welfare is negative makes such spreading of biological animals too risky.

A small chance of this decision being made outweighs the positives. This seems very unlikely with rogue AGI (0.1%, perhaps much less), but it could still dominate the scales in my view. An AGI that is more human-friendly seems at least one order of magnitude more likely to terraform other planets.[3]

That said, this doesn't flip the sign of AI safety work. This judgment is lightly held; digital minds (human-like or animal-like) are a larger portion of welfare patients in expectation; and I have no idea of what the counterfactuals are. Thus, I don't treat this as an action-guiding beliefs.

To caveat, I think terraforming is still relatively unlikely in human-friendly scenarios because biodiversity becomes less instrumentally valuable post-AGI, so memes that would favor the existence of wild animal populations would lose in popularity. Even in human lock-in scenarios, the values that control AGI won't favor deep ecology.

How about farmed animals? Even in precision Livestock Farming's best and worst cases, suffering in factory farms shifts by a few orders of magnitude at most.[4] AGI makes the end of factory farming through developing alternatives more likely, though I'm more convinced by "biological food systems become unnecessary or unrecognizable" than "clean meat wins". In the vast majority of scenarios, wild animals would be the most numerous moral patients.[5]

  1. ^

    Probably 0% on reflection because aliens could count as animals, but it's less indicative

  2. ^

    Farmed animal welfare is negative, wild animal welfare is negative, "good" and "bad" relate to expected total welfare

  3. ^

    Though what that looks like is still underdefined.

  4. ^

    However, precision livestock farming offers massive near-term risks and opportunities for farmed animals, and interest in this area appears justified.

  5. ^

    Human-friendly AGI could decide to only keep animals under human control, but that would probably not lead to massive animal populations.

  6. Show all footnotes

AGI, whether rogue or human-aligned, may not decide to keep other planets free of biological animals (though it seems like a bigger risk for human-aligned AGI)

This is a really interesting point that I hadn't thought of before. 

Very lightly held counterargument to your conclusion:

P1: The more capable an AGI system is, the harder it is to align.

P2: Terraforming other planets requires AGI at the very top of the capability distribution.

P3: The pool of systems capable of terraforming is therefore drawn disproportionately from the capability range where misalignment is most likely.

Conclusion: Most worlds containing planet-terraforming AGI are probably rogue-AGI worlds. So the "spreading wild animal suffering to new planets" scenario may be more associated with alignment failure than alignment success.

Corollary: If you agree you should be mildly agree-voting. 

Fair pushback! 

For P1, I assumed that AGI going well for humans was basically even ASI going well for humans (just: "it happens we're in a good scenario for the fleshy humans"). I don't know if ASI is much less likely to go right than AGI - something as capable as AGI could already very easily be misaligned and I'm not sure that scales with increase in capabilities.

For P2, I'm scared that we don't necessarily need the top of the distribution of ASI to do this. I could imagine non-AGI worlds where human-brain-driven technological progress gets us there, though it seems very complicated resource-wise at this stage. 

I agree that these two arguments together could undermine my vague "one order of magnitude difference" claim, but I'm not sure how much I believe them. I do come down to believing that most of my considerations will face counter-considerations which I am currently unaware of.

Nice points! A few questions:

  • Why do you think deep ecology values would not get locked in to an AI system? Presumably there are ethical priorities we might want an AI system to have or constrain itself by which are not instrumental for something else, so it's not obvious to me that something needs to be instrumental to stick around as a value
  • Under what conditions do you think an AI system would possess values that lead it populate other worlds with animals? If deep ecology values are not locked in, I would think this makes AI's much less likely to spread biological life to other planets or terraform them to enable this. I could see something more like a "recklessness" where it accidentally spreads life in pursuit of some other goal, but I have a hard time seeing what human-aligned goal set which explicitly excludes deep ecology would lead it to seed other planets with wildlife. Maybe something adjacent like Pet Planets for people to entertain themselves with / other cases where humans want animals around for a purely hedonistic purpose which is orthogonal to their welfare?

I hesitated on how to frame the deep ecology thing, because I think it's entirely possible that it ends up locked in. I think my thought was something like the following. If AGI gets the values of its builders and then never modifies it, in the current race, it's unlikely that AGI would lock in deep ecology values: these don't seem massive in Chinese labs (could be wrong), and people in AI labs in the West are not hardcore ecologists, for the most part, because of political divisions.[1]

I do agree that AI systems could populate other worlds with animals for other reasons. Logically, we can't cover all of the reasons why systems that we don't know anything about would do something. The same applies to future humans.

(More broadly, I deliberately under-hedged all of the above. I don't think we have any action-guidance on AGIxAnimals)

  1. ^

    Maybe it's good that environmentalists hate AI so much, because it ensures that people in labs are less likely to be friendly to pro-ecology views?

This will be more of a loose collection of weakly held takes and what I see as cruxes for this question than a firm position.

  • I think that for farmed animal welfare, the crux is "If all the technical barriers to cultivated meat / brainless chickens / something that can outcompete meat in the market are solved so that we reach the theoretical optimum for suffering-free meat or meat alternatives, will we still be in too much of an economic stable state due to economies of scale for regular meat to get outcompeted?"
  • i.e. I have a strong intuition that the theoretically optimal way to produce meat or something that is experientially superior to it does not require raising sentient beings in horrific conditions, so I think whether things go well for farmed animals is mostly a function of economic or cultural lock-in
  • For wild animal welfare, I worry significantly more about lock-in of e.g. the naturalistic fallacy that systems are good as they are and we should abstain from any interventions. Whether this dominates long-term animal welfare concerns depends on the relative numbers of wild vs farmed vs other (pet?) animals. I think wild animal welfare does become a tractable problem post-AGI, and the question will be whether there is will to solve it, and whether those with the will to solve it are able to (i.e. how much does this require global agreement/coordination that won't be possible due to lock-in?)
  • Cruxes related to this: To what extent will animal advocates be enabled by AGI, to what extent will they need to be unilateralists, and to what extent could their actions backfire?
  • AGI going well for animals seems to depend on how much it amplifies the agency of animal advocates, which is a channel that could be closed off due to power concentration (i.e. we may not see the world where AI is focused on problems like cultivated meat or wild animal welfare in certain worlds). That said, I would not classify power-concentrating outcomes that leave some humans without meaningful agency over the world as "good for humans".
  • In general, I think this question as framed is going to produce a lot of false disagreement due to diverging definitions of "AGI going well for humans". For me, things like lock-in of current values, power concentration, and a loss of agency over the future are firmly in the bucket of "AGI did not go well for humans", which potentially cuts out a lot of the failure modes for animals.
  • Crux: Can AI be robustly aligned to animal preferences in principle? There are a lot potential concerns around specification gaps (e.g. you optimize for a proxy of animal welfare, not the real thing) for the long-term future of AI and animals. That said, I don't think this is something a sufficiently powerful AI could not solve with sufficiently advanced technology. If preference / welfare is a physical structure in the brains of animals, a hypothetical AI could determine exactly what to optimize for without needing some external channel, though it might still need to calibrate its understanding of what structures correspond to preference based on e.g. behaviors that communicate preferences or analogies to creatures for which this is clearer.
  • Crux: how many actors have terminal preferences for suffering? agency may be amplified for animal advocates, but it could also be amplified for malevolent actors.

Also, some arguments I do not quite buy:

  • Stability of regulations, culture, and consumer preferences keeping cultivated meat or something similar off the table. I think post-AGI, a lot of things that are not baked into the trajectory of technology or the preferences of agents will get eroded. For instance, I do not expect particular state regulations over cultivated meat to have much stability into the long-term future. The exception here is if these things get locked in -- I am not immediately sure how likely this is, but I have a harder time imagining things like a terminal consumer preference for slaughtered meat or current regulations getting locked in than some broad moral intuition that humans are a higher moral priority because they are "smarter", conservationism vs welfarism w.r.t. to wild systems, etc.
  • That because meat consumption is on the rise globally, post-TAI trends will look the same. I think this hinges too much on AI being a normal technology. If we had a strong reason to believe that it was physically impossible to produce meat / something better without suffering, this would change my mind, but I do not think this is true. I would put >85% probability on factory farming not existing 100 years post-AGI because I think optimizing for something consumable is more likely to optimize away the suffering than to keep it around by happenstance. I could see things getting worse near-term due to precision livestock farming until a phase shift later this century that makes vat meat economically competitive at scale. 

Some really cool points here Lee, and I mostly agree with you I think.

Crux: how many actors have terminal preferences for suffering? agency may be amplified for animal advocates, but it could also be amplified for malevolent actors.

This could be very important. I'm not sure what it means for AGI to go well for humans if some of those humans have terminal preferences for suffering / are sadistic. If the AGI protects the rest of us from the sadists, is AGI going well for the sadists?

EDIT: as well as sadists, we can consider humans who think animal agriculture, testing etc. has enough aesthetic/historical/cultural value that it's worth continuing to do it in a post-AGI world of abundance.

My position statement (50% agree with the statement "If AGI goes well for humans, it'll go well for animals")

  • As a suffering-focused ethicist who generally rejects moral aggregation across individuals (I am most sympathetic to painism), I have a higher bar for “AGI going well for humans” for humans than many others do; it’s not clear to me that previous technological advances went well for humans
    • Agricultural revolution’s “luxury trap”: going from hunting-gathering to farming allowed humans to consolidate unprecedented wealth and power, but at the cost of the wellbeing/welfare/rights of very many humans
    • Perhaps similar arguments can be made for the industrial and digital revolutions
    • Even AGI Omelas is not an instance of AGI going well
  • “AGI going well" necessarily leaves many humans the stated preference to help animals (which might look like "abolishing animal exploitation and solving wild animal suffering"), and it certainly gives us the means and opportunity to do so
  • I happen to think that AGI going well for humans is unlikely, even by the lights of someone who is more upside-focused
    • We're on track for creating something that is more intelligent than us (better at understanding the world and achieving goals within it) – and probably something with awareness, autonomy, agency, and the capacity for recursive self-improvement and self-replication – without understanding how it works, how to make it do what we want, or what it is we even want it to do
  • So, between normative and empirical claims, I believe a world in which AGI goes well for humans is a very small fraction of the possibility space
  • And when I try to think about what this AGI-going-well-for-humans world looks like, mostly I don’t really know, but it seems likely that in this world:
    • We retain and develop our moral wisdom (the most fundamental tenet of which is plausibly “non-maleficence and compassion towards all sentient beings”)
    • And we have the means to enact this moral wisdom
    • So, we abolish animal exploitation and solve wild animal suffering
    • Thus, AGI goes well for animals as well as humans!

Can you say a bit more about what "AGI goes well for humans" means under your worldview? I hadn't heard of painism. 

I should have sketched this out more.

In my view, AGI going well for humans should see:

  1. Intense (and perhaps even moderate) human suffering eradicated
  2. Probably, humans remaining empowered
    1. Probably, our species isn't disempowered by AGI; and
    2. Probably, there isn't severe inter-human inequality, specifically inequality of power; we don't have a political elite determining how all other human lives go

Some kind of AGI technological innovation will be able to do 1); not clear to me how we get to 2), as we'll probably need some kind of political pro-democracy innovation (I don't think our existing political institutions will get us there).

What this world actually looks like, feels like, is very unclear to me! But if we do both those things, it seems more likely than not that we humans will both want and be able to help animals by abolishing animal exploitation and solving wild animal suffering.

Thanks! That's clarifying. 

I wonder though - would that kind of world, where humans are empowered but don't experience intense (and perhaps moderate) suffering - be one where humans cared about animal welfare? I can see the intuition going either way. Either:

a) Extrapolating beyond person-to-person morality is (often) a luxury pursuit and more of it will happen in a post-scarcity world.

b) Caring about animal suffering in the food system and in nature requires compassion, and compassion is rooted in being able to imagine the states of the sufferer. If humans all live minimal suffering lives, they won't be able to do so. 

I need to think about b) more. I see arguments in both directions.

I don't think I can properly imagine what it's like to be tortured or eaten alive, and yet the thought of each happening to me or someone else makes me feel some combination of horror, fear, upset and compassion. And the idea of suffering more intense than torture or being eaten alive (if future artificially sentient beings have wider welfare ranges than we do) is terrifying to me.

But if I could never suffer worse than a pinprick, maybe I would stop caring about the most intense forms of suffering. Concerning stuff.

If you had to allocate a marginal $500,000, would you put it towards animal-specific alignment work (like the ideas in this list) or general alignment work?

animal-specific alignment
general alignment
Jo_🔸
3
0
0
1
10% agree

A marginal $500,000 should go to:

The backfire effects of general alignment work early on in AI safety may have outweighed the benefits; I worry that the same could be true for animal-specific alignment.

If I really believe that, I should probably want to avoid money going into animal-specific alignment at this stage, while extra 500,000$ to general alignment, while not necessarily positive, is less likely to cause major backfire events?

Thanks for your contributions to the discussion @Hannah McKay🔸 , @Jo_🔸 , @Lee Wall , and @Alistair Stewart!

I have to head off at 7, but you are welcome to keep commenting, as is anyone else who sees this comment.

Thanks for organising Toby!

Anyone can post a comment, which our guests and other participants can respond to. These comments might be questions, the answer to which might change your mind on the debate statement, or crucial considerations that you are uncertain about, and might be able to make progress on in this conversation. 

Curated and popular this week
Relevant opportunities