2021 AI Alignment Literature Review and Charity Comparison

Larks

2021 AI Alignment Literature Review and Charity Comparison

Larks

87 min readDec 23, 2021

176

Comments 18

Sorted by

New & upvoted

Misha_Yagudin

I am confused about the relevance of Ought's work to AI alignment. They got early solid endorsements for testing Christiano's ideas around factored cognition but since pivoted to work on Elicit, which seems really cool but doesn't feel very alignment related to me. I would appreciate someone making a case clarifying the relevance/importance of their work.

stuhlmueller

New post explaining the connection: Ought's theory of change.

stuhlmueller

Ought co-founder here. There are two ways Elicit relates to alignment broadly construed:

1 - Elicit informs how to train powerful AI through decomposition

Roughly speaking, there are two ways of training AI systems:

End-to-end training
Decomposition of tasks into human-understandable subtasks

We think decomposition may be a safer way to train powerful AI if it can scale as well as end-to-end training.

Elicit is our bet on the compositional approach. We’re testing how feasible it is to decompose large tasks like “figure out the answer to this science question by reading the literature” by breaking them into subtasks like:

Brainstorm subquestions that inform the overall question
Find the most relevant papers for a (sub-)question
Answer a (sub-)question given an abstract for a paper
Summarize answers into a single answer

Over time, more of this decomposition will be done by AI assistants.

At each point in time, we want to push the compositional approach to the limits of current language models, and keep up with (or exceed) what’s possible through end-to-end training. This requires that we overcome engineering barriers in gathering human feedback and orchestrating calls to models in a way that doesn’t depend much on current architectures.

I view this as the natural continuation of our past work where we studied decomposition using human participants. Unlike then, it’s now possible to do this work using language models, and the more applied setting has helped us a lot in reducing the gap between research assumptions and deployment.

2 - Elicit makes AI differentially useful for AI & tech policy, and other high-impact applications

In a world where AI capabilities scale rapidly, I think it’s important that these capabilities can support research aimed at guiding AI development and policy, and more generally help us figure out what’s true and make good plans as much as they help persuade and optimize goals with fast feedback or easy specification.

Ajeya mentions this point in The case for aligning narrowly superhuman models:

"Better AI situation in the run-up to superintelligence: If at each stage of ML capabilities progress we have made sure to realize models’ full potential to be helpful to us in fuzzy domains, we will be going into the next stage with maximally-capable assistants to help us navigate a potentially increasingly crazy world. We’ll be more likely to get trustworthy forecasts, policy advice, research assistance, and so on from our AI assistants. Medium-term AI challenges like supercharged fake news / clickbait or AI embezzlement seem like they would be less severe. People who are pursuing more easily-measurable goals like clicks or money seem like they would have less of an advantage over people pursuing hard-to-measure goals like scientific research (including AI alignment research itself). All this seems like it would make the world safer on the eve of transformative AI or AGI, and give humans more powerful and reliable tools for dealing with the TAI / AGI transition."

Beth mentions the more general point in Risks from AI persuasion under possible interventions:

“Instead, try to advance applications of AI that help people understand the world, and advance the development of truthful and genuinely trustworthy AI. For example, support API customers like Ought who are working on products with these goals, and support projects inside OpenAI to improve model truthfulness.”

I'll write more about how we view our role in the space in Q1 2022.

technicalities

Not at Ought, but I can try:

In engineering, there are many horrendous conceptual issues that just don't come up in practice. (I have in mind stuff like finite element analysis, a method which works really well despite its assumptions being constantly violated.)

Similarly, there things which are conceptually fine but practically intractable once you try and do them.

The idea with Elicit seems to be to try a difficult but tractable alignment problem, and so work out what problems we're overblowing and what we're overlooking.

NunoSempere

Likewise.

Linch

Great work as usual. Here's a minor comment before I dig more substantively:

In the past I have had very demanding standards around Conflicts of Interest, including being critical of others for their lax treatment of the issue. Historically this was not an issue because I had very few conflicts. However this year I have accumulated a large number of such conflicts, and worse, conflicts that cannot all be individually publically disclosed due to another ethical constraint.

As such the reader should assume I could be conflicted on any and all reviewed organisations. [Emphasis mine]

I think the issue with the last line is that if everything is seen as a conflict of interest, then nothing is. I obviously don't know the details of your ethical constraints, but I think readers who care about COIs might still benefit from lower-granularity announcement tags of the following form:

I have mild conflicts of interest with this organization.
I have moderate or strong conflicts of interest with this organization.

If orgs are only split into 3 categories (no, mild, and moderate/strong), this may preserve your desired privacy/other ethical constraints while still leaking enough bits that donors who care a lot about COIs can productively use that information.

Larks

You are correct that this would be much more useful - indeed this is essentially what I wrote into an earlier draft. Unfortunately the specific nature of the other ethical constraint makes it difficult to share even the existence of the conflict with any specific group/individual.

sawyer🔸

SFF (website) is a donor advised fund, advised by the people who make up BERI’s Board of Directors

This is not strictly true. The two fund advisors listed on SFF's website are Andrew and Eric. BERI's board is Andrew, Sawyer, and Jess (who replaced Eric earlier this year). I have personally never been involved with SFF's operations or grant evaluations (and in fact, doing so would be a major conflict of interest since BERI receives a lot of funding through those rounds). You're not the only person making this mistake, and it seems like an easy one to make given BERI and SFF's history. I don't know if this is just a minor gripe from a fussy insider, or if this conflation makes other people worry about conflicts of interest between the two orgs. But I figured I'd bring it up either way.

Amazing post as always, I'm so glad you do this!

Larks

You're right, that was out of date; fixed in both copies.

sawyer🔸

Thank you! I think the new text is more accurate and I appreciate your quick response.

NunoSempere

I think that the following would be valuable additions:

Recommending which organizations to work for in addition to which to donate to.
Giving your opinion as to which organizations meet a "bar for funding", or are "net positive"
- Alternatively, ranking organizations,
- or trying to compare them using this utility function extractor (though that would take ~<84 comparisons for the 23 non-meta organizations, which could be something of a slog)

Also, rot13, V jbhyq unir yvxrq gb frr n svany erpbzzraqngvba sbe gur bowrpg-yriry betnavmngvbaf vs gur YGSS/OREV/zrgn-betf jrer abg na bcgvba. Vg srryf yvxr qbvat fb jbhyq or vasbezngvir gb ynetre shaqref nf jryy nf gb gur YGSS vgfrys

Misha_Yagudin

I think the review of MIRI should drop AIRCS as, according to their website, the most recent workshop was in February 2020.

Habryka [Deactivated]

I think there is another AIRCS-like workshop planned this January. But not fully sure how MIRI-affiliated that one is.

Rohin Shah

Nitpicks:

CHAI researchers contributed to the following research led by other organisations:
Lindner et al.'s Learning What To Do by Simulating the Past

David was a CHAI intern while working on that paper (this is noted in a footnote on the front page, a common practice for papers). So that one is entirely a CHAI paper.

An increasingly large amount of the best work is being done in places that are inside companies: Deepmind, OpenAI, Redwood, Anthropic etc.

Redwood isn't inside a company afaik?

Larks

Thanks, fixed. The Redwood comment was an artifact of an earlier version of the sentence that referred to 'well funded groups' more generally.

HaydnBelfield

Great stuff, as per usual, and thanks for the kind words about our paper. Our next paper will address whether "2) rely on EU enforcement being so slow it is simply irrelevant" is indeed plausible across a range of scenarios (sneak preview: in some maybe; in others not!), and the one after that argue against "3) pushing for reforms to weaken antitrust laws".

FYI first sentence is "As in,,, and I have attempted".

MichaelA🔸

Thanks for writing this!

I think the link at the end of this passage is incorrect - it goes to Truthful AI. I think maybe you meant to link to this?

GAA's Nuclear Espionage and AI Governance provides an overview of the impact of communist spies on the Manhattan project, and some potential lessons for AI safety. It suggests that spying is more important if the scaling hypothesis is false and if AI projects are nationalised (as then nationalism could be a motivator, and groups might need to steal hardware rather if they can't buy it). It seems that generally spying is bad, but he does note that secrecy tends to beget secrecy, and could be hard to combine with interpretability, which might be important for alignment. See also the discussion here. #Strategy

jacquesthibs

Avast is telling me that the following link is malicious:

Ding's China's Growing Influence over the Rules of the Digital Road describes China's approach to influencing technology standards, and suggests some policies the US might adopt. #Policy

Comments

More from the author

129

Gwern on creating your own AI race and China's Fast Follower strategy.

Larks·1y ago·3m read

135

Sam Altman fired from OpenAI

Larks·2y ago·1m read

US government commission pushes Manhattan Project-style AI initiative

Larks·1y ago·1m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·2w ago·Curated 6d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

138

Let's taboo the V-word

lincolnq·3d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·13h ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

stuhlmueller

Ought co-founder here. There are two ways Elicit relates to alignment broadly construed:

1 - Elicit informs how to train powerful AI through decomposition

Roughly speaking, there are two ways of training AI systems:

End-to-end training
Decomposition of tasks into human-understandable subtasks

We think decomposition may be a safer way to train powerful AI if it can scale as well as end-to-end training.

Brainstorm subquestions that inform the overall question
Find the most relevant papers for a (sub-)question
Answer a (sub-)question given an abstract for a paper
Summarize answers into a single answer

Over time, more of this decomposition will be done by AI assistants.

2 - Elicit makes AI differentially useful for AI & tech policy, and other high-impact applications

Ajeya mentions this point in The case for aligning narrowly superhuman models:

"Better AI situation in the run-up to superintelligence: If at each stage of ML capabilities progress we have made sure to realize models’ full potential to be helpful to us in fuzzy domains, we will be going into the next stage with maximally-capable assistants to help us navigate a potentially increasingly crazy world. We’ll be more likely to get trustworthy forecasts, policy advice, research assistance, and so on from our AI assistants. Medium-term AI challenges like supercharged fake news / clickbait or AI embezzlement seem like they would be less severe. People who are pursuing more easily-measurable goals like clicks or money seem like they would have less of an advantage over people pursuing hard-to-measure goals like scientific research (including AI alignment research itself). All this seems like it would make the world safer on the eve of transformative AI or AGI, and give humans more powerful and reliable tools for dealing with the TAI / AGI transition."

Beth mentions the more general point in Risks from AI persuasion under possible interventions:

“Instead, try to advance applications of AI that help people understand the world, and advance the development of truthful and genuinely trustworthy AI. For example, support API customers like Ought who are working on products with these goals, and support projects inside OpenAI to improve model truthfulness.”

I'll write more about how we view our role in the space in Q1 2022.

2021 AI Alignment Literature Review and Charity Comparison

2021 AI Alignment Literature Review and Charity Comparison

Introduction

How to read this document

New to Artificial Intelligence as an existential risk?

Conflict of Interest

Research Organisations

FHI: The Future of Humanity Institute

GovAI: The Center for the Governance of AI

CHAI: The Center for Human-Compatible AI

MIRI: The Machine Intelligence Research Institute

GCRI: The Global Catastrophic Risks Institute

CSER: The Center for the Study of Existential Risk

OpenAI

Google Deepmind

Anthropic

ARC: Alignment Research Center

Redwood Research

Ought

AI Impacts

GPI: The Global Priorities Institute

CLR: The Center on Long Term Risk

CSET: The Center for Security and Emerging Technology

AI Safety camp

FLI: The Future of Life Institute

Lightcone Infrastructure

CLTR: Center for Long Term Resilience (formerly Alpenglow)

Rethink Priorities

Convergence

SERI: The Stanford Existential Risk Initiative

Other Research

Capital Allocators & Other Organisations

LTFF: Long-term future fund

OpenPhil: The Open Philanthropy Project

SFF: The Survival and Flourishing Fund

FTX Foundation

BERI: The Berkeley Existential Risk Initiative

Nonlinear Fund

80,000 Hours

AISS: AI Safety Support

Other News

Organisation Second Preferences

Methodological Thoughts

Inside View vs Outside View

Organisations vs Individuals

Politics

Openness

Research Flywheel

Differential AI progress

Near-term safety AI issues

Financial Reserves

Donation Matching

Poor Quality Research

The Bay Area

Conclusions

Disclosures

Looking for Research Assistant for Next Year

Sources