Introducing the Principles of Intelligent Behaviour in Biological and Social Systems (PIBBSS) Fellowship

adamShimi

Introducing the Principles of Intelligent Behaviour in Biological and Social Systems (PIBBSS) Fellowship

adamShimi

12 min readDec 18, 2021

Comments 5

Sorted by

New & upvoted

Peter Slattery 🔸

Thanks for this interesting post! I am probably not well-suited to apply for the fellowship. However, I was interested in the ideas you mentioned, so I wanted to share some ideas I had regardless. They might not be useful, but it was helpful for me to get them out of my head!

Behaviour science
I work in this space, and much of the theory seems very relevant to understanding non-human agents. For instance, I wonder there would be value in exploring if models of human behaviour such as COM-B and the FBM could be useful in modelling the actions of AI agents. For instance, if it is useful to theorise that a human agent's behaviour only occurs if they have sufficient motivation and ability and a trigger to act (as per the FBM), it might also be useful to do so for a non-human agent.

Persuasion
I used to be interested in this (it is basically attitude and behaviour change).

I wonder if the idea of persuasion and underlying theory is useful for understanding how AI agents should respond to information and choose which information to share with other agents to achieve goals (i.e., to persuade). If so, then communications/processing models such as McGuire, Shannon-Weaver, or Lasswell may be useful.

Related to that, I wrote a (not very good) paper outlining the concept of persuasion a long time ago, which finished with:
"From a philosophical perspective, we recommend that future research should consider if non-human agents can not only persuade but can also be persuaded. Research already explores how emerging technologies, such as artificial intelligences, may be human-like to varying extents (see Bostrom, 2014; Kurzweil, 2005; Searle, 1980). If we can believe that non-biological beings might be conscious and human-like (Calverley, 2008; Hofstadter & Dennett, 1988) then maybe we should also consider whether these beings will have beliefs, attitudes and behaviours and thus be subject to persuasion?"

Systems thinking
I am still a novice in this area and what I know is probably outdated. I wonder if there could be value in drawing on concepts in systems thinking when attempting to manage AI. As an example, this model suggests 12 leverage points for systems change (based on this work). Could we model/manage an agent's behavioural outcomes in the same way?

I am interested to know what you think, if you have time. Do any of these areas seem fruitful? Are they irrelevant, or are there better approaches already in use?

I am very aware that I don't have a good understanding of how AI agent's behaviour is modelled with the AI safety/governance literature. I also don't understand exactly i) what differences there are between those approaches and the approaches used in behavioural science/social science or ii) justifications for different approaches would be needed for each.

Can you (or anyone else) recommend things that I should read/watch to improve my understanding?

adamShimi

Thanks for the thoughtful comment!

Behaviour science
I work in this space, and much of the theory seems very relevant to understanding non-human agents. For instance, I wonder there would be value in exploring if models of human behaviour such as COM-B and the FBM could be useful in modelling the actions of AI agents. For instance, if it is useful to theorise that a human agent's behaviour only occurs if they have sufficient motivation and ability and a trigger to act (as per the FBM), it might also be useful to do so for a non-human agent

This sounds like a potentially good analogy, but one has to be careful that it doesn't rely on assumptions that only apply to humans, or to quite bounded agents.

I used to be interested in this (it is basically attitude and behaviour change).
I wonder if the idea of persuasion and underlying theory is useful for understanding how AI agents should respond to information and choose which information to share with other agents to achieve goals (i.e., to persuade). If so, then communications/processing models such as McGuire, Shannon-Weaver, or Lasswell may be useful.
Related to that, I wrote a (not very good) paper outlining the concept of persuasion a long time ago, which finished with:
"From a philosophical perspective, we recommend that future research should consider if non-human agents can not only persuade but can also be persuaded. Research already explores how emerging technologies, such as artificial intelligences, may be human-like to varying extents (see Bostrom, 2014; Kurzweil, 2005; Searle, 1980). If we can believe that non-biological beings might be conscious and human-like (Calverley, 2008; Hofstadter & Dennett, 1988) then maybe we should also consider whether these beings will have beliefs, attitudes and behaviours and thus be subject to persuasion?"

The topics of persuasion (both from AIs and of AIs) is indeed an important topic in alignment. There's a general risk that optimization is very easily spent to push for manipulation of human, whether intentionally (training an AI which actually end up wanting to do something else, and so has reason to manipulate us) or unintentionally (training an AI such that it's incentivized to answer what we would prefer rather than the most accurate and appropriate answer).

For the persuasion of AIs by AIs, there are some initial thoughts around memetics for AIs, but they are not fully formed yet.

Systems thinking
I am still a novice in this area and what I know is probably outdated. I wonder if there could be value in drawing on concepts in systems thinking when attempting to manage AI. As an example, this model suggests 12 leverage points for systems change (based on this work). Could we model/manage an agent's behavioural outcomes in the same way?

Don't know much about this literature, but it makes me think of more structural takes on the alignment problem, that emphasize the importance of the structure of society funneling and pushing optimization, rather than the individual power of agents to alter it.

I am interested to know what you think, if you have time. Do any of these areas seem fruitful? Are they irrelevant, or are there better approaches already in use?

So, as can be seen above, none of these ideas sounds bad or impossible to make work, but judging them correctly would require far more effort put into analyzing them. Maybe you should apply for the fellowship, especially for behavioral work on which you're more of an expert? ;)

I am very aware that I don't have a good understanding of how AI agent's behaviour is modelled with the AI safety/governance literature. I also don't understand exactly i) what differences there are between those approaches and the approaches used in behavioural science/social science or ii) justifications for different approaches would be needed for each.

Can you (or anyone else) recommend things that I should read/watch to improve my understanding?

It's a very good question, and shamefully I don't have any answer that's completely satisfying. But here are the next best things, some resources that will give you a more rounded perspective of alignment:

Richard Ngo's AGI safety from first principles, a condensed starter that presents the main line of arguments in a modern (post ML revolution) way.
Rob Miles's YouTube channel on alignment, with great videos on many different topics.
Andrew Critch and David Krueger's ARCHES, a survey of alignment problems and perspectives that puts more emphasis than most on structural approaches.

Peter Slattery 🔸

Thanks, Adam, this was very helpful! I really appreciate that you took the time to respond in such detail.

I will see what I can do for the fellowship. I might be able to convince someone else to do it and then I can collaborate with them :)

nora

PIBBSS Summer Research Fellowship -- Q&A event

What? Q&A session with the fellowship organizers about the program and application process. You can submit your questions here.
For whom? For everyone curious about the fellowship and for those uncertain whether they should apply.
When? Wednesday 12th January, 7 pm GMT
Where? On Google Meet, add to your calendar

Dušan D. Nešić (Dushan)

PIBBSS Fellowship 2023 is officially open!

Application deadline: Sunday, Feb 5th, 2023

Learn more and apply here.

Information sessions: 1st information session, 28th of January, 17:00 UTC (09:00 PST, 12:00 EST, 18:00 CET, 01:00 [29th of Jan] Singapore) Zoom Link

2nd information session, 29th of January, 11:00 UTC (03:00 PST, 06:00 EST, 12:00 CET, 19:00 Singapore) Zoom Link

Comments

More from the author

Refine: An Incubator for Conceptual Alignment Research Bets

adamShimi·4y ago·5m read

How to Diversify Conceptual AI Alignment: the Model Behind Refine

adamShimi·3y ago·10m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·2w ago·Curated 6d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

138

Let's taboo the V-word

lincolnq·3d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·16h ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

adamShimi

Thanks for the thoughtful comment!

Behaviour science
I work in this space, and much of the theory seems very relevant to understanding non-human agents. For instance, I wonder there would be value in exploring if models of human behaviour such as COM-B and the FBM could be useful in modelling the actions of AI agents. For instance, if it is useful to theorise that a human agent's behaviour only occurs if they have sufficient motivation and ability and a trigger to act (as per the FBM), it might also be useful to do so for a non-human agent

This sounds like a potentially good analogy, but one has to be careful that it doesn't rely on assumptions that only apply to humans, or to quite bounded agents.

I used to be interested in this (it is basically attitude and behaviour change).
I wonder if the idea of persuasion and underlying theory is useful for understanding how AI agents should respond to information and choose which information to share with other agents to achieve goals (i.e., to persuade). If so, then communications/processing models such as McGuire, Shannon-Weaver, or Lasswell may be useful.
Related to that, I wrote a (not very good) paper outlining the concept of persuasion a long time ago, which finished with:
"From a philosophical perspective, we recommend that future research should consider if non-human agents can not only persuade but can also be persuaded. Research already explores how emerging technologies, such as artificial intelligences, may be human-like to varying extents (see Bostrom, 2014; Kurzweil, 2005; Searle, 1980). If we can believe that non-biological beings might be conscious and human-like (Calverley, 2008; Hofstadter & Dennett, 1988) then maybe we should also consider whether these beings will have beliefs, attitudes and behaviours and thus be subject to persuasion?"

For the persuasion of AIs by AIs, there are some initial thoughts around memetics for AIs, but they are not fully formed yet.

Systems thinking
I am still a novice in this area and what I know is probably outdated. I wonder if there could be value in drawing on concepts in systems thinking when attempting to manage AI. As an example, this model suggests 12 leverage points for systems change (based on this work). Could we model/manage an agent's behavioural outcomes in the same way?

I am interested to know what you think, if you have time. Do any of these areas seem fruitful? Are they irrelevant, or are there better approaches already in use?

I am very aware that I don't have a good understanding of how AI agent's behaviour is modelled with the AI safety/governance literature. I also don't understand exactly i) what differences there are between those approaches and the approaches used in behavioural science/social science or ii) justifications for different approaches would be needed for each.

Can you (or anyone else) recommend things that I should read/watch to improve my understanding?

Richard Ngo's AGI safety from first principles, a condensed starter that presents the main line of arguments in a modern (post ML revolution) way.
Rob Miles's YouTube channel on alignment, with great videos on many different topics.
Andrew Critch and David Krueger's ARCHES, a survey of alignment problems and perspectives that puts more emphasis than most on structural approaches.

Introducing the Principles of Intelligent Behaviour in Biological and Social Systems (PIBBSS) Fellowship

Introducing the Principles of Intelligent Behaviour in Biological and Social Systems (PIBBSS) Fellowship

Introduction

Analogies as General Epistemic Strategies for Alignment

Examples of Successful Analogies in Alignment

The Problem: Difficulty of Epistemic Translation

Proposed Solution: Creating Institutional Context for Collaborations around Analogies

Pre-mortem: what could go wrong?

Details of the Fellowship Program

Appendix: Sample of project proposals

Biodiversity and Heterogeneity in Energy Flows

[h/t Jan Kulveit]

Basins of Robustness in Search Spaces

Institutional foundations of Linguistic Innovation

Social learning and the limitations of the RL framework