Thoughts on the AI Safety Summit company policy requests and responses

So8res

Comments 3

Sorted by

New & upvoted

vaniver

(I'm Matthew Gray)

Inflection is a late addition to the list, so Matt and I won’t be reviewing their AI Safety Policy here.

My sense from reading Inflection's response now is that they say the right things about red teaming and security and so on, but I am pretty worried about their basic plan / they don't seem to be grappling with the risks specific to their approach at all. Quoting from them in two different sections:

Inflection’s mission is to build a personal artificial intelligence (AI) for everyone. That means an AI that is a trusted partner: an advisor, companion, teacher, coach, and assistant rolled into one.

Internally, Inflection believes that personal AIs can serve as empathetic companions that help people grow intellectually and emotionally over a period of years or even decades.** Doing this well requires an understanding of the opportunities and risks that is grounded in long-standing research in the fields of psychology and sociology.** We are presently building our internal research team on these issues, and will be releasing our research on these topics as we enter 2024.

I think AIs thinking specifically about human psychology--and how to convince people to change their thoughts and behaviors--are very dual use (i.e. can be used for both positive and negative ends) and at high risk for evading oversight and going rogue. The potential for deceptive alignment seems quite high, and if Inflection is planning on doing any research on those risks or mitigation efforts specific to that, it doesn't seem to have shown up in their response.

I don't think this type of AI is very useful for closing the acute risk window, and so probably shouldn't be made until much later.

SummaryBot

Executive summary: The post provides thoughts on AI safety policies requested from AI labs by the UK government. It argues the policies are inadequate but some labs like Anthropic and OpenAI are relatively better. It suggests alternative priorities like compute limits, risk assessments, and contingency planning.

Key points:

The UK government's policy categories seem reasonable but miss key issues like independent risk assessments and contingency planning.
Current AI systems pose unacceptable risks; progress should halt until risks are addressed. But policies help labs acknowledge risks.
Anthropic and OpenAI's policies seem best, taking risks more seriously. DeepMind's is much worse. Meta's is far worse.
Governments should also institute compute limits, monitor chips, halt chip progress, require risk assessments, and develop contingency plans.
Independent risk assessments from actuaries could help determine which labs can continue operating.
If risks appear unaddressable before wide availability, governments need a plan for that scenario now.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Oliver Sourbut

Another high(er?) priority for governments:

start building multilateral consensus and preparations on what to do if/when
- AI developers go rogue
- AI leaked to/stolen by rogue operators
- AI goes rogue

Comments

More from the author

323

A personal reflection on SBF

So8res·3y ago·23m read

359

On Caring

So8res·11y ago·12m read

115

Comments on OpenAI's "Planning for AGI and beyond"

So8res·3y ago·15m read

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·1w ago·Curated 5d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

114

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·6d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

How (not) to fundraise from Anthropic staff

Jack Lewars·5d ago·7m read

Adapted from my Substack, Funding Anthropalypse. Short version: if you want a share of the coming Anthropic and OpenAI windfall - the $37bn+ that could be in play next year - the way in is to become 'legibly excellent', so the evaluators and donors that frontier lab staff already trust point them to yo...

Recent opportunities to take action

Starting an EA group @ SUNY Binghamton

micahzarin·4h ago·1m read

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·1d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·1d ago·3m read

vaniver

(I'm Matthew Gray)

Inflection is a late addition to the list, so Matt and I won’t be reviewing their AI Safety Policy here.

Inflection’s mission is to build a personal artificial intelligence (AI) for everyone. That means an AI that is a trusted partner: an advisor, companion, teacher, coach, and assistant rolled into one.

Internally, Inflection believes that personal AIs can serve as empathetic companions that help people grow intellectually and emotionally over a period of years or even decades.** Doing this well requires an understanding of the opportunities and risks that is grounded in long-standing research in the fields of psychology and sociology.** We are presently building our internal research team on these issues, and will be releasing our research on these topics as we enter 2024.

I don't think this type of AI is very useful for closing the acute risk window, and so probably shouldn't be made until much later.

^{^}

Inflection is a late addition to the list, so Matt and I won’t be reviewing their AI Safety Policy here.

Thanks to Rob Bensinger for assembling, editing, and occasionally rephrasing/extending my draft of this post, with shallow-but-not-deep thumbs up from me.

^{^}

And, as OpenAI’s write-up notes: “We refer to our policy as a Risk-Informed Development Policy rather than a Responsible Scaling Policy because we can experience dramatic increases in capability without significant increase in scale, e.g., via algorithmic improvements.”

^{^}

Matthew Gray writes: “I think OpenAI did a surprisingly good job of responding to this with ‘the real deal’.” Matt cites this line from OpenAI’s discussion of “superalignment”:

Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on human ability to supervise AI. But these techniques will not work for superintelligence, because humans will be unable to reliably supervise AI systems much smarter than us.

^{^}

Doing this fully correctly would also require that you in some sense hold the money that goes to possible future people for risking their fate. Taking into account only the interests of people who are presently alive still doesn’t properly line up all the incentives, since present people could then have a selfish excessive incentive to trade away large amounts of future people’s value in exchange for relatively small amounts of present-day gains.

^{^}

I (Nate) agree with Matt here.

^{^}

Unlike the CFI post authors, I (Nate) would give all of the companies here an F. However, some get a much higher F grade than others.

^{^}

From DeepMind:

This is why we are building on our industry-leading general and infrastructure security approach. Our models are developed, trained, and stored within Google’s infrastructure, supported by central security teams and by a security, safety and reliability organisation consisting of engineers and researchers with world-class expertise. We were the first to introduce zero-trust architecture and software security best practices like fuzzing at scale, and we have built global processes, controls, and systems to ensure that all development (including AI/ML) has the strongest security and privacy guarantees. Our Detection & Response team provides a follow-the-sun model for 24/7/365 monitoring of all Google products, services and infrastructure - with a dedicated team for insider threat and abuse. We also have several red teams that conduct assessments of our products, services, and infrastructure for safety, security, and privacy failures.

Thoughts on the AI Safety Summit company policy requests and responses

1. Thoughts on the AI Safety Policy categories

2. Higher priorities for governments

3. Thoughts on the submitted AI Safety Policies