Hide table of contents

Comment Permalink

I have to admit, this is one of those ideas that's like "wow, how in all my years of thinking about AI safety have I not thought about this?" beyond "humans care about other beings, so AI will care if humans care". It's so obvious and important in hindsight that I'm a bit ashamed it was a blindspot. Many thanks for pointing it out!

See in context

AI Moral Alignment: The Most Important Goal of Our Generation

by Ronen Bar

Mar 2610 min read 32

127

AI safetyAnimal welfareCause prioritizationExistential riskOpportunities to take actionPhilosophyAI alignmentLongtermismMoral philosophyOpinionSentience

Frontpage

"Part one of our challenge is to solve the technical alignment problem, and that’s what everybody focuses on, but part two is: to whose values do you align the system once you’re capable of doing that, and that may turn out to be an even harder problem", Sam Altman, OpenAI CEO (Link).

In this post, I argue that:

"To whose values do you align the system" is a critically neglected space I termed “Moral Alignment.” Only a few organizations work for non-humans in this field, with a total budget of 4-5 million USD (not accounting for academic work). The scale of this space couldn’t be any bigger - the intersection between the most revolutionary technology ever and all sentient beings. While tractability remains uncertain, there is some promising positive evidence (See “The Tractability Open Question” section).
Given the first point, our movement must attract more resources, talent, and funding to address it. The goal is to value align AI with caring about all sentient beings: humans, animals, and potential future digital minds. In other words, I argue we should invest much more in promoting a sentient-centric AI.

The problem

What is Moral Alignment?

AI alignment focuses on ensuring AI systems act according to human intentions, emphasizing controllability and corrigibility (adaptability to changing human preferences). However, traditional alignment often ignores the ethical implications for all sentient beings. Moral Alignment, as part of the broader AI alignment and AI safety spaces, is a field focused on the values we aim to instill in AI. I argue that our goal should be to ensure AI is a positive force for all sentient beings.

Currently, as far as I know, no overarching organization, terms, or community unifies Moral Alignment (MA) as a field with a clear umbrella identity. While specific groups focus individually on animals, humans, or digital minds, such as AI for Animals, which does excellent community-building work around AI and animal welfare while also incorporating content related to digital minds, a broader framing aims to foster a shared vision, collaboration, and synergy among everyone interested in the MA space.

* I am happy to receive any feedback on the best term for this MA space, as it is still being examined.

The Paradox of Human-Centric Alignment

There is a troubling paradox in AI alignment: while effective altruists work to prevent existential risks (x-risks) and suffering risks (s-risks) by aligning AI with human values, those very values—reflected in human actions throughout history—have already caused many of the risks we seek to prevent. Powerful and controllable technologies in human hands have led to human x-risks such as genocides and an ever-worsening climate crisis, as well as non-human x-risks, like possibly wiping out most wild animals from the face of the planet. Similarly, human values have enabled severe s-risks, including factory farming and slavery. This raises the question: if aligning AI with human values has historically resulted in catastrophic outcomes, how can we ensure that AI alignment will not amplify the very harms we aim to prevent?

From the perspective of most sentient beings on Earth, human intelligence itself is a misaligned biological superintelligence attempting to build an even more powerful artificial intelligence. Alongside controllability and corrigibility, which are essential, we must prioritize alignment with sentient-centric values. Current AI models already reinforce anthropocentric biases, and as they gain agency and influence decision-making—from urban planning to technology development (such as alternative proteins)—AI values will shape the world’s future. This post highlights practical ways humans use AI that could be either catastrophic or beneficial to animals.

Addressing a Counterargument

“We should invest everything in safety; if AI becomes uncontrollable and destroys us, Moral Alignment won't matter.”

In 2024, total AI safety spending by major AI safety funds according to this LessWrong post was a bit more than 100 million. I don't know how to calculate the whole MA space donations in 2024, but for organizations focusing on non-humans, it was probably about 5 million USD (I haven't found any official numbers; this is an estimate based on conversations with various people in the field), without counting funding for related academic work, which may somewhat increase this number but not by a lot. Hence, the MA space is much more neglected than AI Safety, which is very neglected in itself. Furthermore, it is probably a non-sum zero game, and more efforts on MA might not come at the expense of AI safety money.
Some experts consider many AI safety projects net negative because they risk enhancing AI capabilities. Maybe MA work doesn’t pose such a risk.
If we wait to work toward a sentient AI, it may be too late due to short AI timelines, which might alter our ability to influence the future, such as in the case of value lock-in. We must set precedents now, establishing a tradition of including the interests of humans and non-humans in AI model development.
Making AI care about all sentient beings may be a critical component of ensuring its alignment with humans. If we build a speciesist AI that discriminates based on intelligence, and it, in turn, creates even smarter AI systems, pushing human intelligence further down the intelligence hierarchy, why would AI continue to care about us?
AI might treat us the way we treat inferior technology. A future superintelligence may value the fact that we cared about it—that digital minds were included in our moral circle even before they existed or before superintelligence emerged.

This counterargument is the most common one I have encountered, though it is still relatively rare. Most people I have spoken with in AI safety, effective altruism, and AI companies seem to agree that Moral Alignment is both important and neglected. I may address other counterarguments in future posts.

The Open Tractability Question

As a nascent space working to collaborate with and influence AI companies, regulators, and other key players in AI and AI safety, there is naturally little evidence yet of its tractability. However, two factors work in favor of this kind of work:

Our most important target group consists of AI key players who possess ethical views that are much more pro-animals/humans/digital minds than the average person. Many AI professionals, from AI safety to people working in top-tier companies, really care about all sentient beings but haven't fully explored their potential for a positive impact on MA.
Human beings often have better stated values than realized values. The Moral Alignment space seeks to embed the values we declare and strive for, rather than merely reflecting our actions, into AI. This puts non-humans in a much better position.

The Risk of Not Creating a Unified Moral Alignment Field

What would we miss by not having a shared community for people working on AI for humans, animals, and digital minds?

The most fundamental goal—creating a sentient-centric AI—encompasses all groups. Especially in a small, neglected space like Moral Alignment, bringing our efforts together is essential for creating a greater impact.
Many research questions relevant to all three groups might be overlooked. For example: Will a sentient AI’s morality be more robust and ethical than that of a non-sentient philosophical zombie AI?
- On the other hand, some interventions are unique to each group. For instance, asking an AI company to develop an assessment of consciousness in AI is relevant to digital minds but may not apply to assessing consciousness in animals (e.g., insects).
People working in different groups would miss out on strategic insights from others engaging with AI companies, regulators, and key players in the AI space.
The AI space presents a unique opportunity to bridge the traditional divide between those working for humans and those advocating for animals or digital minds. This opportunity might be lost if we remain too fragmented. In many ways, we are all in the same boat when it comes to AI risks and opportunities. From my conversations with people across these groups, I find that they genuinely recognize and value the importance of other groups' work.
This is not a conventional animal rights versus human rights debate, nor does it require anyone to adopt a vegan lifestyle. Because Moral Alignment is less about individual actions and more about the values we want AI to uphold, there is a much stronger common ground upon which different groups can build.

The Solutions

A Vision for the Moral Alignment Movement

This is a movement aimed at making AI a positive force for all sentient beings. It has begun gaining traction in the past year, thanks to efforts by organizations, advocates, and philosophers. Hanging around at the AI for Animals conference in San Francisco, it was clear to me that we are just beginning, and there is a lot of enthusiasm and interesting ideas and initiatives that are soon to come.

I envision this space as a robust, interdisciplinary community partnering with changemakers in AI companies, AI safety, Effective Altruism, social justice, environmental and animal advocacy, regulators and other stakeholders. This movement will unite researchers, ethicists, technologists, and advocates.

Humans are an integral part of the Moral Alignment movement for intrinsic reasons, not just instrumental ones.

AI for Animals unconference in London, May 2024

Movement Goals

These goals are highly urgent. If AGI is likely to arrive within 5–10 years, we need to establish this movement as robust and strong as soon as possible.

Here are some possible goals for the MA movement:

Short term:

Define and frame the field: Establish MA as a discipline through research, publications, online content, events, PR and global dialogue.
Attract talent, resources and funding into the space.
Foster strong positive stakeholder relationships with key players in the AI space.
Enhance and expand the existing organizations’ work.
Create coordinated action, such as an ethical pledge for AI professionals to commit to advancing sentient-centric AI.
Create several “Moral AI” benchmarks for different groups, including a united one (work on such benchmarks has already been started; see novel evaluation of risks of animal harm in LLM-generated text).

Long-term goals can be measured concerning these points:

The amount of money AI companies allocate to promote a sentient-centric AI.
The amount of money AI companies allocate to supporting non-humans, such as funding research on integrating sentientist values into AI systems.
The amount of money AI companies allocate to supporting humans, such as fairness, transparency, and inclusivity, that prioritize human well-being beyond mere controllability or corrigibility.
The market share of AI chat systems or agents regulated for Moral Alignment, whether through company-adopted benchmarks or legal frameworks, and the effectiveness of such regulations in benefiting non-humans.
The number of citations from research on more ethical models and their benefits.
The amount of money spent on beneficial AI applications for non-humans, such as efforts to understand animal language so we can better help them, should be maximized, while the amount spent on harmful AI applications should be minimized. A possible example of a harmful application is precision livestock farming, which optimizes efficiency in factory farming and may come at the expense of animals (though it could also bring marginal improvements to their welfare).

Theory Of Change

Here is my version of a theory of change for developing more compassionate AI models:

For a more comfortable, easy-to-see version: Link

The Benevolent AI Imperative

Throughout history, humanity has faced tough ethical challenges. We build homes and roads, fragmenting wild habitats, while yearning for harmony with nature. Societies have competed for scarce resources like land and water, a reality where one person's gain meant another's loss.

Scarcity shaped our past, but abundance can shape our future. AI could offer the turning point, moving beyond zero-sum constraints. In that kind of future, values of kindness, fairness, and care can flourish.

Humankind, let’s face it, has not been Earth's best steward, having exterminated wild animal populations and created mass suffering for domesticated ones in factory farming. AI provides us with a second chance, a chance to rectify. Aligning AI with care for all sentient life allows us to embrace a role matching our power and intelligence. With a benevolent AI as our partner, we can become sentinels of sentience, creating a good future for all sentientkind.

Actions

Possible Interventions

Here are just a couple of options for interventions. More intervention ideas for animals can be found in this post, and this report is an example of an intervention for digital minds. This report recommends three early steps AI companies can take to address possible future AI welfare.

Interventions ideas:

Conduct or fund research on fundamental questions for MA, e.g., how does a sentient-centric AI behave, and what does it practically mean?
Create online and in-person events.
Present the space through presentations and 1:1 meetings in AI Safety conferences.
Write about the importance of MA on social media (Twitter and Reddit), EA/LessWrong forums and similar platforms.
Write a book about the subject.
Publish opinion columns about Moral Alignment in mainstream media.

Ways to Contribute to the Movement

Humanity has a narrow window to ensure all sentientkind's interests are addressed before it's too late. Some initial actions to promote this goal include:

Provide Feedback: Comment here or send me a private message (see below section “Give Us Feedback” for more details).

Donate: Support organizations in this field.

Create Content: Write posts, produce a TED Talk, or add Moral Alignment content to your website, if relevant.

Raise Awareness: Talk about it in discussions.

Connect People: Link individuals in this space with potential collaborators, volunteers, funders and more.

Start a new initiative: If you’re an entrepreneur or aspire to be, you can create a new initiative or join an incubator that will help you kickstart it. Some organizations (like the Centre for Effective Altruism) incubate charities working in different spaces. I think this could be highly effective.

If you’re a substantial donor, you can consider launching or joining a special purpose fund for Moral Alignment.

Give Us Feedback

Share your thoughts with me about anything related to the space I described: The vision, counterarguments, the term "Moral Alignment" (you can suggest alternatives), intervention ideas, strategy and more.

Contact me for deeper discussions and more information: ronenbar07@gmail.com

Next Posts I plan to write

Landscape analysis: In this post, I only mentioned one organization working in the MA space, AI for Animals. I will make a post that maps all key players in the space.
Ideas for research that can boost MA.
List of potential interventions.
A post about the new initiative I co-founded, The Moral Alignment Center.
Deliberating strategic questions about the movement.

127 Reactions

Mentioned in

140How should we adapt animal advocacy to near-term AGI?

27Moral Alignment: An Idea I'm Embarrassed I Didn't Think of Myself

26Will Sentience Make AI’s Morality Better?

21Developing AI Safety: Bridging the Power-Ethics Gap (Introducing New Concepts)

19How the veil of ignorance grounds sentientism

Load more (5/8)

More posts like this

Comments32

Sorted by

New & upvoted

Click to highlight new comments since: Today at 1:40 AM

Gordon Seidoh WorleyMar 2624

Joseph_ChuMar 2611

I think the first place I can recall where the distinction has been made between the two forms of alignment was in this Brookings Institution paper, where they refer to "direct" and "social" alignment, where social alignment more or less maps onto your moral alignment concept.

I've also more recently written a bit about the differences between what I personally call "parochial" alignment and "global" alignment. Global alignment also basically maps onto moral alignment. Though, I also would split parochial alignment into instruction following user alignment, and purpose following creator/owner alignment.

I think the main challenge of achieving social/global/moral alignment is simply that we already can't agree as humans on what is moral, much less know how to instill such values and beliefs into an AI robustly. There's a lot of people working on AI safety who don't think moral realism is even true.

There's also fundamentally an incentives problem. Most AI alignment work emphasizes obedience to the interests and values of the AI's creator or user. Moral alignment would go against this, as a truly moral AI might choose to act contrary to the wishes of its creator in favour of higher moral values. The current creators of AI, such as OpenAI, clearly want their AI to serve their interests (arguably the interests of their shareholders/investors/owners). Why would they build something that could disobey them and potentially betray them for some greater good that they might not agree with?

Matrice JacobineMar 26*8

For the record, as someone who was involved in AI alignment spaces well before it became mainstream, my impression was that, before the LLM boom, "moral alignment" is what most people understood AI alignment to mean, and what we now call "technical alignment" would have been considered capabilities work. (Tellingly, the original "paperclip maximizer" thought experiment by Nick Bostrom assumes a world where what we now call "technical alignment" [edit: or "inner alignment"?] is essentially solved and a paperclip company can ~successfully give explicit natural language goals to its AI to maximize.)

In part this may be explained by updating on the prospect of LLMs becoming the route to AGI (with the lack of real utility function making technical alignment much harder than we thought, while natural language understanding, including of value-laden concepts, seems much more central to machine intelligence than we thought), but the incentives problem of AI alignment work being increasingly made under the influence of first OpenAI then OPP-backed Anthropic is surely a part of it.

Joseph_ChuMar 262

Yeah, AI alignment used to be what Yudkowsky tried to solve with his Coherent Extrapolated Volition idea back in the day, which was very much trying to figure out what human values we should be aiming for. That's very much in keeping with "moral alignment". At some point though, alignment started to have a dual meaning of both aligning to human values generally, and aligning to their creator's specific intent. I suspect this latter thing came about in part due to confusion about what RLHF was trying to solve. It may also have been that early theorist were too generous and assumed that any human creators would benevolently want their AI to be benevolent as well, and so creator's intent mapped neatly with human values.

Though, I think the term "technical alignment" usually means applying technical methods like mechanistic interpretability to be part of the solution to either form of alignment, rather than meaning the direct or parochial form necessarily.

Also, my understanding of the paperclip maximizer thought experiment was that it implied misalignment in both forms, because the intent of the paperclip company was to make more paperclips to sell and make a profit, which is only possible if there are humans to sell to, but the paperclip maximizer didn't understand the nuance of this and simply tiled the universe with paperclips. The idea was more that a very powerful optimization algorithm can take an arbitrary goal, and act to achieve it in a way that is very much not what its creators actually wanted.

Matrice JacobineMar 262

I wasn't even contrasting "moral alignment" with "aligning to the creator's specific intent [i.e. his individual coherent extrapolated volition]", but with just "aligning with what the creator explicitly specified at all in the first place" ("inner alignment"?), which is implicitly a solved problem in the paperclip maximizer thought experiment if the paperclip company can specify "make as many paperclips as possible", and is very much not a solved problem in LLMs.

Ronen BarMar 262

If humans agree they want an AI that cares about everyone who feels, or at least that is what we are striving for, than classical alignment is aligned with a sentient centric AI.

RamiroMar 278

My conjecture is that you cannot fully separate MA and AI safety / alignment - or worse, solve AI safety first and then ask AI to solve values for you. We should solve them together, as some sets of values will be incompatible w some approaches to safety, and some AI development pathways will make some sets of values inaccessible (e.g., I don't think that an egalitarian world for our descendants is a likely outcome w the current trend)

Ronen BarMar 273

Yes I completely agree, Moral alignment and controllability/safety alignment are very interconnected and one effects the other

Chris LeongMar 277

The AI Safety Fundamentals course has done a good job of building up the AI safety community and you might want to consider running something similar for moral alignment.

One advantage of developing on a broader moral alignment field is that you might be able to produce a course that would still appeal to folks who are skeptical of either the animal rights or AI sentience strands.

I can share a few comments on my thoughts here if this is something you'd consider pursuing.

(I also see possible intersections with my Wise AI advisor research direction).

Ronen BarMar 281

Thanks Chris, yes please share more thoughts on that, it sounds very relevant and important.

Alistair StewartMar 264

Thanks for posting this Ronen – galvanising for me!

jasonkApr 83

This is a worthwhile idea, and I very much hope for it to be successful.

As I think about the idea, I wonder about the risk of backlash. I noticed your comment about how AI models have incorporated many values of the SF-area. Remember when there was a backlash against "woke" models (e.g., https://www.bbc.com/news/technology-68412620 )?

It makes me wonder how we can get to a sentient-centric AI that doesn't create backlash by producing analogous results that the typical person would find outlandish, but which an anti-speciesist vegan wouldn't. Do social and AI changes have to happen in lockstep? Or at least without one getting too far ahead of the other? I'm unsure.

Ronen BarApr 94

Thanks Jason.

Yes, I agree that a "woke" backlash is a significant risk. I think the first step is research to define what a sentient-centric AI truly is: what is the vision and how it behaves in a world full of conflicts of interest between humans and animals.

In my view, the answer must be an AI that uplifts humanity and strengthens safety for humans too. A kind AI that aims for gradual change, and that, to an extent, accepts the reality that our current civilization is built on harming animals. We need an AI that helps shift this, but patiently and strategically.

I know it sounds grand, but I think as humanity, we need a new narrative, and probably the AI era will bring a new narrtaive anyway. So we should promote the stewardship narrative toward all sentient beings. If we are building something godlike, it makes sense that we grow into this kind of responsibility. This AI for sentientkind movement isn’t about “going vegan” and because of that it may trigger less public resistance. and this could tap into the compassion many people already have.

The hope is that with the tremendous power AI will give humans, to reshape both the world and themselves, the conflict of interest with animals will shrink dramatically, making it easier to accept and implement change.

Beyond SingularityApr 23

This is a critically important and well-articulated post, thank you for defining and championing the Moral Alignment (MA) space. I strongly agree with the core arguments regarding its neglect compared to technical safety, the troubling paradox of purely human-centric alignment given our history, and the urgent need for a sentient-centric approach.

You rightly highlight Sam Altman's question: "to whose values do you align the system?" This underscores that solving MA isn't just a task for AI labs or experts, but requires much broader societal reflection and deliberation. If we aim to align AI with our best values, not just a reflection of our flawed past actions, we first need robust mechanisms to clarify and articulate those values collectively.

Building on your call for action, perhaps a vital complementary approach could be fostering this deliberation through a widespread network of accessible "Ethical-Moral Clubs" (or perhaps "Sentientist Ethics Hubs" to align even closer with your theme?) across diverse communities globally.

These clubs could serve a crucial dual purpose:

Formulating Alignment Goals: They would provide spaces for communities themselves to grapple with complex ethical questions and begin articulating what kind of moral alignment they actually desire for AI affecting their lives. This offers a bottom-up way to gather diverse perspectives on the "whose values?" question, potentially identifying both local priorities and identifying shared, potentially universal principles across regions.
Broader Ethical Education & Reflection: These hubs would function as vital centers for learning. They could help participants, and by extension society, better understand different ethical frameworks (including the sentientism central to your post), critically examine their own "stated vs. realized" values (as you mentioned), and become more informed contributors to the crucial dialogue about our future with AI.

Such a grassroots network wouldn't replace the top-down efforts and research you advocate for, but could significantly support and strengthen the MA movement you envision. It could cultivate the informed public understanding, deliberation, and engagement necessary for sentient-centric AI to gain legitimacy and be implemented effectively and safely.

Ultimately, fostering collective ethical literacy and structured deliberation seems like a necessary foundation for ensuring AI aligns with the best of our values, benefiting all sentient beings. Thanks again for pushing this vital conversation forward.

Ronen BarApr 41

That is a great idea, thanks for all your remarks. I would be happy to hear more about your vision for this, will DM you, hope it is OK.

Beyond SingularityApr 41

Thanks for the comment, Ronen! Appreciate the feedback.

MatthewDahlhausenMar 263

It seems like a fundamental problem is the lack of a moral realist foundation, as "human intentions toward sentient beings" and "what is moral" are different things. Can someone recommend some reading on whether alignment is even a coherent ask, either from a moral realist or moral anti-realist perspective?

Ronen BarMar 263

And human deeds are very different from human states values.

I think research to define exactly what is a sentient centric AI, is one of the first important things to do, and it's possible

gergoApr 23

Furthermore, it is probably a non-sum zero game, and more efforts on MA might not come at the expense of AI safety money.

Agreed. As far as I know, Polaris Ventures is interested in the s-risk space but not fund more "traditional" AI Safety work.

Aditya Arpitha PrasadMar 293

I think this is a very important point,

> if aligning AI with human values has historically resulted in catastrophic outcomes, how can we ensure that AI alignment will not amplify the very harms we aim to prevent?

We have been putting our money, time, and trust behind even more on our current human values being the way out even as empirically we see how our actions have been power-seeking, we have been responsible for the largest extinction event killing all our cousin species and we are doubling down on technology, more extraction of value from nature.

It is an S risk to imagine AI systems instilled with these "human" values

jojo_leeJun 11

On the question, “What does morally aligned ai look like?” Some quick thoughts.

It probably depends on what the AI system looks like in general.

If agentic, with coherent, long-term, large-scale goals, the goals should emphasise delivering wellbeing and alleviating suffering across all sentient beings.

If following an ecosystem of services model, and constitutional, the constitution should emphasise not excessively harming sentient beings and commit to evaluating effects of its services on the welfare of important sentient beings.

If following an ecosystem of services model and not constitutional, I can think of two possibilities.

A centralized distributor of services might have a branch focused on setting global priorities and galvanising work in priority areas such as moral alignment. Depending on whether the centre is public or private, this global priorities task force might be embedded in a company or a governance institution like the US government or the UN.

AI systems themselves are trained across the board to take the welfare of all sentient beings into account, similar to current ethical AI efforts.

I'm curious if anyone's thinking more about this carefully!

bhrdwj🔸Apr 171

Moral alignment of AI's is great. But we need moral alignment of all intelligences. Humans, literal whales, and AIs. Confusion, trauma, misalignment, and/or extinction of some intelligences against others negatively affects the whole Jungian system.

We urgently need great power alignment, and prevention of the coming escalating proxy warfare. "AI-driven urgency for great-power reconciliation" actually ticks all the ITN framework boxes, IMHO.

VeryJerryApr 31

I was just thinking about writing a post like this after listening to https://www.astralcodexten.com/p/introducing-ai-2027 and especially the end where they're talking about getting into blogging, and thinking about the massive blind spot Rationalists seem to have for sentientism. I'm particularly interested in ways to get involved and help push this cause forward. Especially as someone who frankly, feels pretty helpless with the mass scale of non-human suffering and mass amount of human apathy towards it, as well as the many flaws in the current animal rights movement.

Ronen BarApr 41

I think creating content on these topics is very valuable, and I am happy to brainstorm other options. I will also do a post on possible interventions.

drbrake 🔸Mar 261

While this is an important question to consider, it is by no means clear that we could get any short term consensus about how moral alignment should be implemented. In practical terms, if an AI (AGI) intelligence is more long lived and/or more able to thrive and bring benefits to others of its kind, wouldn't it be moral to value its existence over that of a human being? Yet I think you would struggle to get AI scientists to embed that value choice into their code. Similarly, looking 'down' the scale, in decisions where the lives or wellbeing of humans had to be balanced against animals, I am not sure there would be much likelihood of broad agreement on the relative value to attach to each in such cases.

I would encourage further research and advocacy on this point but at best this will be a long, long process. And you might not be happy with the eventual outcome!

At the moment there are no established guidelines in this area I am aware of in the existing not-AI-related space (though I have not looked hard...) but if AI-related research/discussion did establish such guidelines, it might cause the guidelines to be propagated out into the rest of the policy world and set a precedent.

Ronen BarMar 271

I think this is exactly why we need research building a vision of how a sentient centric ASI - that works with humanity to gradually improve lives for everyone - behaves. As humanity gets stronger and more able to control the outside environment and inside body and mind, we may see less conflict of interests between animals and humans, and this can creates a monumental chance to take a stewardship role in relation to non-humans.

drbrake 🔸Mar 302

I hope you are right but you should be aware that the opposite may also be true. Depending on the weights we give AI in valuing human and non-human thriving, AI may discover new ways that would make humans happier at the expense of non-humans. There are people and organizations who would assign a moral weight of zero to the suffering of some of even all non-humans, and if those people win the argument then you might end up with an AI which is less to your taste than one that just emerges organically with basic guide rails.

For example, leaving aside second order effects on wider ecology, if you asked me how much intense suffering I would inflict on shrimp to save a human life, I would personally choose an almost unlimited amount.

Ronen BarMar 301

I agree there is the opposite danger as well, and perhaps we have yet to known those dangers and new conflict of interests between humans and animals...
I assume with ASI, the risk of possible future digital minds being in a conflict of interests with humans is bigger than for animals.

Antony HenaoMar 281

I like the idea of more compassionate models. Thanks for sharing, Ronen.

I noticed you are a Vippasana practitioner. Have you read the Diamond Sutra?

For instance, in the Diamond Sutra, practitioners are invited to let go of:

The notion of "self"
The notion of "human being"
The notion of "living beings"
The notion of "life span"

Practicing the insights of this sutra has led me to develop a more compassionate and less anthropocentric view of the world.

I would like to know your take on how Buddhist ethics can help propel the field of moral alignment forward.

Here are some books that might be of interest to you if you are curious about the Diamond sutra and Buddhist universal ethics:

Zen and the Art of Saving the Planet by Thich Nhat Hanh
Good Citizens: Creating Enlightened Society by Thich Nhat Hanh

Thanks for sharing.

Ronen BarMar 281

Thanks Anthony! Very interesting stuff. I wrote some thought on the intersection of Buddhism and EA here Effective Self-Help - A guide to improving your subjective wellbeing but your referring to some other questions. I dont think I am deep enought in self observation to understand and have an insight on no-self or no-being, but I think the Buddhist teaching can bring people to a more sentient centric, experience centric vision and deep understanding. it is also pushing me in that direction

I read part of the zen and the art of saving the planet but not the others you mentioned

Did you hear about this?

https://www.monasticacademy.org/ai-fellowship

And this

https://buddhismforai.sutra.co/space/cbodvy/register

SiobhanBallMar 311

I agree with these two points raised by others:

we already can't agree as humans on what is moral

Why would they build something that could disobey them and potentially betray them for some greater good that they might not agree with?

I’m mindful of the risk of confusion as one commenter mentioned that MA could be synonymous with social alignment. I think a different term is needed. I personally liked your use of the word ‘sentinel’. Sentinel —> sentience. Easy to remember what it means in this context: protecting all sentient life (through judicious development of AI). ‘Moral’ is too broad in my view. There are fields of moral consideration that have little to do with non-human sentient life/animals. So, again, I would change the name of the movement to more accurately and succinctly fit what it’s about. Not sure how far along you are with the MA terminology, though!

You’ve said:

If humans agree they want an AI that cares about everyone who feels, or at least that is what we are striving for, then classical alignment is aligned with a sentient centric AI.

In a world with much more abundance and less scarcity, less conflict of interests between humans and non humans, I suspect this view to be very popular, and I think it is already popular to an extent.

I fear it is not yet popular enough to work on the basis that we can skip humanity’s recognition of animal sentience, and go straight to developing AI with that in mind. Unfortunately, the vast majority of humans still don’t rate animal sentience as being a good enough reason to stop killing them en masse, so it’s unlikely that they’re going to care about it when developing AI. I agree with your second part: AI will probably usher in an era where morals come easier because of abundance. But that’s going to happen after AGI, not before. To the extent that it’s possible for non-human animals to be considered now, at this stage of AI development, I think AI for Animals is already making waves there.

So my key question is - what does MA seek to achieve, that isn’t already the focal point of AI for Animals? If I’ve understood correctly, you want MA to be a broader umbrella term for works which AI for Animals contributes to.

What I don’t understand is, what else is under that umbrella?

Of all the possible directions, I think your suggestion of creating an ethical pledge is by far the strongest. That’s something tangible that we can get working on right away.

TLDR: MA seems to be about developing AI with the interests of animals in mind. I have a hard time comprehending what else there is to it (I'm a bit thick though, so if I'm missing the point, please say!). If it is about animals, then I don’t think we need to obscure that behind broader notions of morality; we can be on-the-nose and say ‘we care about animals. We want everyone to stop harming them. We want AI to avoid harming them, and to be developed with a view to creating conditions whereby nobody is harming them anymore. Sign our pledge today!’

Ronen BarApr 21

Thanks for the feedback!!

"we already can't agree as humans on what is moral"

I don't the fact that all humankind can't agree on a specific set of morals, tough many things are quite in consensus, at least in the west, prevent AGI or ASI from having a set of value. They are baking morals into those models, so the question in - what will those values be? and they are already not the values of the median worldwide person but more like the values of the median person in San Francisco (e.g. the models are very LGBTQ+ friendly)

"Why would they build something that could disobey them and potentially betray them for some greater good that they might not agree with?"

I am not suggesting they build something that will betray the creators of the models, and one of the goals of AI alignment research is how to make models corrigible - so humans can change their set of values and not get stuck with something (What is value lock-in? (YouTube video)). We need to convince the leaders of AI companies and regulators to align models with a Sentientism worldview (because of morality, because of public demand for this, because it is a robust way to keep humans safe, and more).

"I’m mindful of the risk of confusion as one commenter mentioned that MA could be synonymous with social alignment. I think a different term is needed. "

That is a great point, and I didn't make this clear in the post. Moral Alignment is the field focused on the question what are the right values, the true moral values, that we should align AI to. Within that there could be different views, and I think the stance of most people in our community is the promote the Sentientism view. Moral Alignment differs from AI technical Alignment since technical alignment focus on making AI do what we want, and MA focus on - what do we want?
I would be glad to hear more alternative ideas for concepts, if you have some. I am going to do interviews with relevant people to get some structured feedback on several possible terms. I am not set yet on any term

So you would call this Sentient beings sentinel? I like this play of words and also wrote something using it. I see the sentientist value alignment as inside MA.

"The vast majority of humans still don’t rate animal sentience as being a good enough reason to stop killing them en masse, so it’s unlikely that they’re going to care about it when developing AI."
I think the majority does care about animals and would want AI to care about them. ppls states values are better, much better, than their deeds. This movement is not about asking ppl to go vegan, it is about striving to take the good stewardship role that humanity has long dreamed of in ancient books and stories.

"what does MA seek to achieve, that isn’t already the focal point of AI for Animals? If I’ve understood correctly, you want MA to be a broader umbrella term for works which AI for Animals contributes to."

Yes, MA is about animals, humans, future digital minds, and anybody that can feel. It is the space that tries works on the question - what values should we align AI to? and Sentientism is the worldview that I hope many people will promote.

I think there is a lot of work to be done in this space, some of it is about bringing more talent and money, some of it is about promoting the interests of all the groups altogether (e.g. how does a sentient-centric AI behaves? it is a crucial question that is not being researched), some is specific intervention e.g. we need to convince AI companies to have a clear stance on non-humans. They currentl don't.