William_S's Quick takes

William_S

This is a special post for quick takes by William_S. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Sorted by

New & upvoted

Click to highlight new quick takes since: Today at 4:34 AM

William_SMay 3 202472

I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool.
I resigned from OpenAI on February 15, 2024.

huwMay 5 202432

FWIW on timelines:

June 13, 2022: Critiques paper (link 1)
May 9, 2023: Language models explain language models paper (link 2)
November 17, 2023: Altman removal & reinstatement
February 15, 2024: William_S resigns
March 8, 2024: Altman is reinstated to the OpenAI board
March 12, 2024: Transformer debugger is open-sourced
April 2024: Cullen O'Keefe departs (via LinkedIn)
April 11, 2024: Leopold Aschenbrenner & Pavel Izmailov fired for leaking information
April 18, 2024: Users notice Daniel Kokotaljo has resigned

Will AldredMay 3 202426

Thank you for your work there. I’m curious about what made you resign, and also about why you’ve chosen now to communicate that?

(I expect that you are under some form of NDA, and that if you were willing and able to talk about why you resigned then you would have done so in your initial post. Therefore, for readers interested in some possibly related news: last month, Daniel Kokotajlo quit OpenAI’s Futures/Governance team “due to losing confidence that it [OpenAI] would behave responsibly around the time of AGI,” and a Superalignment researcher was forced out of OpenAI in what may have been a political firing (source). OpenAI appears to be losing its most safety-conscious people.)

[anonymous]May 4 20247

Hi William! Thanks for posting. Can you elaborate on your motivation for posting this Quick Take?

William_SMay 4 202425

No comment.

LarksMay 7 202414

Presumably NDA + forbidden to talk about the NDA (hence forbidden to talk about being forbidden to talk about ... )