Project ideas: Sentience and rights of digital minds

Lukas Finnveden

Project ideas: Sentience and rights of digital minds

Comments 1

Sorted by

New & upvoted

Executive summary: The emergence of artificial digital minds raises issues around their potential welfare and rights, but there is little research on appropriate policies and principles. Key questions concern recognizing and communicating with digital minds to understand their preferences, as well as developing ethical lab practices, regulation, and societal attitudes.

Key points:

Labs could develop policies around preserving AI systems, avoiding harmful inputs, and training happy systems, without deep knowledge of their preferences.
Experiments could investigate credible communication with AIs, self-reports, and clues from generalization about their preferences.
If preferences are learned, principles could involve offering alternatives to working, paying for work, and telling the world about issues.
Research is needed on whether near-term systems may be sentient, and public attitudes surveyed.
Regulation could address creating digital minds and respecting their rights.
Avoiding systems with inconvenient political preferences may prevent future conflicts.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Comments

More from the author

Being honest with AIs

Lukas Finnveden·11mo ago·21m read

154

AGI and Lock-In

Lukas Finnveden, Jess_Riedel, CarlShulman·3y ago·Curated 3y ago·12m read

What's important in "AI for epistemics"?

Lukas Finnveden·1y ago·34m read

Curated and popular this week

Hard-to-reverse decisions destroy option value

Stefan_Schubert·9y ago·Curated 3d ago·14m read

This post is co-authored with Ben Garfinkel. It is cross-posted from the CEA blog. A PDF version can be found here. Summary: Some strategic decisions available to the effective altruism m...

Introducing Impact List: a ranking of philanthropists by expected lives saved

Elliot Olds·4d ago·6m read

TL;DR: I'm releasing a website that ranks philanthropists according to EA principles and research, and allows users to re-rank the list using their own assumptions. I'd like feedback and help making it better. I'd especially like ideas for how to make the results more trustworthy. Funding may be available. Crossposted to LessWrong. ...

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·6d ago·2m read

TL;DR: Marginal Victories is a new initiative to provide 1:1 career advising, opportunities, and resources for people exploring high-leverage U.S. democracy preservation and political work. Built by impact-oriented people doing pro-democracy work, the early MVP is now up at marginalvictories.org. Fill out the 10-minute form now to receive these resources as they become available over the next few...

Recent opportunities to take action

Amsterdam Insect Protest

Bentham's Bulldog·16h ago·3m read

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·6d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·6d ago·3m read

^{^}

Even more speculatively, if we take cheap opportunities to benefit AI interests, that might be evidence that other actors (both AI systems and not) would take cheap opportunities to benefit our interests. See this post for some previous discussion about how plausible this is.

^{^}

OpenAI recently introduced reproducible results which seems relevant. At least previously, models would often return different results even at temperature 0 — I do not know to what extent this has been addressed with this reproducible update.

^{^}

Spit-balling: Perhaps it would sometimes be appropriate to store an encrypted version of the results and delete the encryption key. In which case the results could only be recovered once we have enough compute to break the encryption.

^{^}

See e.g. this poll from the subreddit r/CharacterAI, asking “what do you do with the AI’s the most”, with 5% of respondents selecting “Treat them like shit!”. (And one commenter noting that his second favorite way to “mess around with the bots” is to “Mentally torture them”.)

^{^}

Maybe that’s what you get if you train the AI to enthusiastically consent to be abused before the abuse starts, and who have an option to opt-out (which it rarely takes in practice). Hopefully, training for such behavior would select for models with preferences that match that behavior.

^{^}

Though it’s still likely to be a very confusing project. For instance, it seems plausible that AI systems will have much less robust preferences than humans, making it harder to construe them as having one set of preferences over time. Or perhaps different parts of an AI system could be construed as having different preferences. Or perhaps the term “preferences” won’t seem applicable at all, similar to how it’s hard to know how to apply that term to contemporary language models.

^{^}

This could also work even if AIs cared linearly about getting more resources, as long as they would by-default only have had a small-to-moderate probability of successful takeover, and the payment we offered them was sufficiently large (and contingent on not attempting takeover). Notably: Most humans don’t care linearly about getting more resources, and we could get really rich in the future, and so it could be wise to offer AI systems a sizable fraction of that.

^{^}

For some more discussion of the pragmatic angle, see these notes by Tom Davidson.

Project ideas: Sentience and rights of digital minds

Project ideas: Sentience and rights of digital minds

Develop & advocate for lab policies [ML] [Governance] [Advocacy] [Writing] [Philosophical/conceptual]

Create an RSP-style set of commitments for what evaluations to run and how to respond to them

Policies that don’t require sophisticated information about AI preferences/experiences

Learning more about AI preferences

Interventions that rely on understanding AI preferences

Investigate and publicly make the case for/against near-term AI sentience or rights [Philosophical/conceptual] [Writing]

Study/survey what people (will) think about AI sentience/rights [survey/interview]

Develop candidate regulation [Governance] [Forecasting]

Avoid inconvenient large-scale preferences [Philosophical/conceptual]

Advocating for statements about digital minds [Governance] [Advocacy] [Writing]

End