Hide table of contents

I’m planning a leave of absence (aiming for around 3 months and potentially more) from Open Philanthropy, starting on March 8, to explore working directly on AI safety.

I have a few different interventions I might explore. The first I explore will be AI safety standards: documented expectations (enforced via self-regulation at first, and potentially government regulation later) that AI labs won’t build and deploy systems that pose too much risk to the world, as evaluated by a systematic evaluation regime. (More here.) There’s significant interest from some AI labs in self-regulating via safety standards, and I want to see whether I can help with the work ARC and others are doing to hammer out standards that are both protective and practical - to the point where major AI labs are likely to sign on.

During my leave, Alexander Berger will serve as sole CEO of Open Philanthropy (as he did during my parental leave in 2021).

Depending on how things play out, I may end up working directly on AI safety full-time. Open Philanthropy will remain my employer for at least the start of my leave, but I’ll join or start another organization if I go full-time.

The reasons I’m doing this:

First, I’m very concerned about the possibility that transformative AI could be developed soon (possibly even within the decade - I don’t think this is >50% likely, but it seems too likely for my comfort). I want to be as helpful as possible, and I think the way to do this might be via working on AI safety directly rather than grantmaking.

Second, as a general matter, I’ve always aspired to help build multiple organizations rather than running one indefinitely. I think the former is a better fit for my talents and interests.

  • At both organizations I’ve co-founded (GiveWell and Open Philanthropy), I’ve had a goal from day one of helping to build an organization that can be great without me - and then moving on to build something else.
  • I think this went well with GiveWell thanks to Elie Hassenfeld’s leadership. I hope Open Philanthropy can go well under Alexander’s leadership.
  • Trying to get to that point has been a long-term project. Alexander, Cari, Dustin and I have been actively discussing the path to Open Philanthropy running without me since 2018.1 Our mid-2021 promotion of Alexander to co-CEO was a major step in this direction (putting him in charge of more than half of the organization’s employees and giving), and this is another step, which we’ve been discussing and preparing for for over a year (and announced internally at Open Philanthropy on January 20).

I’ve become increasingly excited about various interventions to reduce AI risk, such as working on safety standards. I’m looking forward to experimenting with focusing my energy on AI safety.

Footnotes

  1. This was only a year after Open Philanthropy became a separate organization, but it was several years after Open Philanthropy started as part of GiveWell under the title “GiveWell Labs.” 

Comments31


Sorted by Click to highlight new comments since:
[anonymous]241
64
3

As AI heats up, I'm excited and frankly somewhat relieved to have Holden making this change. While I agree with 𝕮𝖎𝖓𝖊𝖗𝖆's comment below that Holden had a lot of leverage on AI safety in his recent role, I also believe he has an vast amount of domain knowledge that can be applied more directly to problem solving. We're in shockingly short supply of that kind of person, and the need is urgent.

Alexander has my full confidence in his new role as the sole CEO. I consider us incredibly fortunate to have someone like him already involved and and prepared to of succeed as the leader of Open Philanthropy.

My understanding is that Alexander has different views from Holden in that he prioritises global health and wellbeing over longtermist cause areas. Is there a possibility that Open Phil's longtermist giving decreases due to having a "non-longtermist" at the helm?

[anonymous]80
14
1

I believe that’s an oversimplification of what Alexander thinks but don’t want to put words in his mouth.

In any case this is one of the few decisions the 4 of us (including Cari) have always made together so we have done a lot of aligning already. My current view, which is mostly shared, is we’re currently underfunding x-risk even without longtermism math, both because FTXF went away and because I’ve updated towards shorter AI timelines in the past ~5 years. And even aside from that, we weren’t at full theoretical budget last year anyway. So that all nets out that to expected increase, not decrease.

I’d love to discover new large x-risk funders though and think recent history makes that more likely.

OK, thanks for sharing!

And yes I may well be oversimplifying Alexander's view.

Ofer
105
33
6

In your recent Cold Takes post you disclosed that your wife owns equity in both OpenAI and Anthropic. (She was appointed to a VP position at OpenAI, as was her sibling, after you joined OpenAI's board of directors[1]). In 2017, under your leadership, OpenPhil decided to generally stop publishing "relationship disclosures". How do you intend to handle conflicts of interest, and transparency about them, going forward?

You wrote here that the first intervention that you'll explore is AI safety standards that will be "enforced via self-regulation at first, and potentially government regulation later". AI companies can easily end up with "self-regulation" that is mostly optimized to appear helpful, in order to avoid regulation by governments. Conflicts of interest can easily influence decisions w.r.t. regulating AI companies (mostly via biases and self-deception, rather than via conscious reasoning).


  1. EDIT: you joined OpenAI's board of directors as part of a deal between OpenPhil and OpenAI that involved recommending a $30M grant to OpenAI. ↩︎

Can Holden clarify if and if so what proportion of those shares in OpenAI and Anthropic are legally pledged for donation?

For context, my wife is the President and co-founder of Anthropic, and formerly worked at OpenAI.

80% of her equity in Anthropic is (not legally bindingly) pledged for donation. None of her equity in OpenAI is. She may pledge more in the future if there is a tangible compelling reason to do so.

I plan to be highly transparent about my conflict of interest, e.g. I regularly open meetings by disclosing it if I’m not sure the other person already knows about it, and I’ve often mentioned it when discussing related topics on Cold Takes.

I also plan to discuss the implications of my conflict of interest for any formal role I might take. It’s possible that my role in helping with safety standards will be limited to advising with no formal powers (it’s even possible that I’ll decide I simply can’t work in this area due to the conflict of interest, and will pursue one of the other interventions I’ve thought about).

But right now I’m just exploring options and giving non-authoritative advice, and that seems appropriate. (I’ll also note that I expect a lot of advice and opinions on standards to come from people who are directly employed by AI companies; while this does present a conflict of interest, and a more direct one than mine, I think it doesn’t and can’t mean they are excluded from relevant conversations.)

Thanks for the clarification.

I notice that I am surprised and confused.

I'd have expected Holden to contribute much more to AI existential safety as CEO of Open Philanthropy (career capital, comparative advantage, specialisation, etc.) than via direct work.

I don't really know what to make of this.

That said, it sounds like you've given this a lot of deliberation and have a clear plan/course of action.

I'm excited about your endeavours in the project!

RE direct work, I would generally think of the described role as still a form of "leadership" — coordinating actors in the present — unlike  "writing research papers" or "writing code". I expect Holden to have a strong comparative advantage at leadership-type work.

Yes, it would be very different if he'd said "I'm going to skill up on ML and get coding"!

(I work at Open Phil, speaking for myself)

FWIW, I think this could also make a lot of sense. I don't think Holden would be an individual contributor writing code forever, but skilling up in ML and completing concrete research projects seems like a good foundation for ultimately building a team doing something in AI safety.

Buck
12
6
1

I don't think Holden agrees with this as much as you might think. For example, he spent a lot of his time in the last year or two writing a blog.

I've been meaning to ask: Are there plans to turn your Cold Takes posts on AI safety and The Most Important Century into a published book? I think the posts would make for a very compelling book, and a book could reach a much broader audience and would likely get much more attention. (This has pros and cons of course, as you've discussed in your posts.)

Neat! Cover jacket could use a graphic designer in my opinion. It's also slotted under engineering? Am I missing something?

I threw that book together for people who want to read it on Kindle, but it’s quite half-baked. If I had the time, I’d want to rework the series (and a more recent followup series at https://www.cold-takes.com/tag/implicationsofmostimportantcentury/) into a proper book, but I’m not sure when or whether I’ll do this.

For what it's worth, I don't see an option to buy a kindle version on Amazon - screenshot here

I think this was a goof due to there being a separate hardcover version, which has now been removed - try again?

Is it at all fair to say you’re shifting your strategy from a “marathon” to a “sprint” strategy? I.e. prioritising work that you expect to help soon instead of later.

Is this move due to your personal timelines shortening?

I wouldn’t say I’m in “sprinting” mode - I don’t expect my work hours to go up (and I generally work less than I did a few years ago, basically because I’m a dad now).

The move is partly about AI timelines, partly about the opportunities I see and partly about Open Philanthropy’s stage of development.

I'd love to chat with you about directions here, if you're interested. I don't know anyone with a bigger value of  p(survival | West Wing levels of competence in major governments) - p(survival | leave it to OpenAI and DeepMind leadership). I've published technical AI existential safety research at top ML conferences/journals, and I've gotten two MPs in the UK onside this week. You can see my work at michael-k-cohen.com, and you can reach me at michael.cohen@eng.ox.ac.uk.

You may have already thought of this, but one place to start exploring what AI standards might look like is exploring what other safety standards for developing risky new things do in fact look like. The one I'm most familiar with (but not at all an expert on) is DO-178C Level A, the standard for developing avionics software where a bug could crash the plane. "Softer" examples worth looking at would include the SOC2 security certification standards.

I wrote a related thing here as a public comment to the NIST regulation framework developers, who I presume are high on your list to talk to as well: https://futuremoreperfect.substack.com/p/ai-regulation-wonkery

I’m in no position to judge how you should spend your time all things considered, but for what it’s worth, I think your blog posts on AI safety have been very clear and thoughtful, and I frequently recommend them to people (example). For example, I’ve started using the phrase “The King Lear Problem” from time to time (example).

Anyway, good luck! And let me know if there’s anything I can do to help you. 🙂

I think your first priority is promising and seemingly neglected (though I'm not familiar with a lot of work done by governance folk, so I could be wrong here). I also get the impression that MIRI folk believe they have an unusually clear understanding of risks, would like to see risky development slow down and are pessimistic about their near-term prospects for solving technical problems of aligning very capable intelligent systems and generally don't see any clearly good next steps. It appears to me that this combination of skills and views positions them relatively well for developing AI safety standards. I'd be shocked if you didn't end up talking to MIRI about this issue, but I just wanted to point out that from my point of view there seems to be a substantial amount of fit here.

are pessimistic about their near-term prospects for solving technical problems of aligning very capable intelligent systems and generally don't see any clearly good next steps

I don't think they claim to have better longer-term prospects, though.

I think they do? Nate at least says he’s optimistic about finding a solution given more time

MIRI folk believe they have an unusually clear understanding of risks

"Believe" being the operative word here. I really don't think they do.

I'm not sold on how well calibrated their predictions of catastrophe are, but I think they have contributed a large number of novel & important ideas to the field.

I don't think they would claim to have significantly better predictive models in a positive sense, they just have far stronger models of what isn't possible and cannot work for ASI, and it constrains their expectations about the long term far more. (I'm not sure I agree with, say, Eliezer about his view of uselessness of governance, for example - but he has a very clear model, which is unusual.) I also don't think their view about timelines or takeoff speeds is really a crux - they have claimed that even if ASI is decades away, we still can't rely on current approaches to scale.

[comment deleted]2
0
0
Curated and popular this week
 ·  · 25m read
 · 
Epistemic status: This post — the result of a loosely timeboxed ~2-day sprint[1] — is more like “research notes with rough takes” than “report with solid answers.” You should interpret the things we say as best guesses, and not give them much more weight than that. Summary There’s been some discussion of what “transformative AI may arrive soon” might mean for animal advocates. After a very shallow review, we’ve tentatively concluded that radical changes to the animal welfare (AW) field are not yet warranted. In particular: * Some ideas in this space seem fairly promising, but in the “maybe a researcher should look into this” stage, rather than “shovel-ready” * We’re skeptical of the case for most speculative “TAI<>AW” projects * We think the most common version of this argument underrates how radically weird post-“transformative”-AI worlds would be, and how much this harms our ability to predict the longer-run effects of interventions available to us today. Without specific reasons to believe that an intervention is especially robust,[2] we think it’s best to discount its expected value to ~zero. Here’s a brief overview of our (tentative!) actionable takes on this question[3]: ✅ Some things we recommend❌ Some things we don’t recommend * Dedicating some amount of (ongoing) attention to the possibility of “AW lock ins”[4]  * Pursuing other exploratory research on what transformative AI might mean for animals & how to help (we’re unconvinced by most existing proposals, but many of these ideas have received <1 month of research effort from everyone in the space combined — it would be unsurprising if even just a few months of effort turned up better ideas) * Investing in highly “flexible” capacity for advancing animal interests in AI-transformed worlds * Trying to use AI for near-term animal welfare work, and fundraising from donors who have invested in AI * Heavily discounting “normal” interventions that take 10+ years to help animals * “Rowing” on na
 ·  · 3m read
 · 
About the program Hi! We’re Chana and Aric, from the new 80,000 Hours video program. For over a decade, 80,000 Hours has been talking about the world’s most pressing problems in newsletters, articles and many extremely lengthy podcasts. But today’s world calls for video, so we’ve started a video program[1], and we’re so excited to tell you about it! 80,000 Hours is launching AI in Context, a new YouTube channel hosted by Aric Floyd. Together with associated Instagram and TikTok accounts, the channel will aim to inform, entertain, and energize with a mix of long and shortform videos about the risks of transformative AI, and what people can do about them. [Chana has also been experimenting with making shortform videos, which you can check out here; we’re still deciding on what form her content creation will take] We hope to bring our own personalities and perspectives on these issues, alongside humor, earnestness, and nuance. We want to help people make sense of the world we're in and think about what role they might play in the upcoming years of potentially rapid change. Our first long-form video For our first long-form video, we decided to explore AI Futures Project’s AI 2027 scenario (which has been widely discussed on the Forum). It combines quantitative forecasting and storytelling to depict a possible future that might include human extinction, or in a better outcome, “merely” an unprecedented concentration of power. Why? We wanted to start our new channel with a compelling story that viewers can sink their teeth into, and that a wide audience would have reason to watch, even if they don’t yet know who we are or trust our viewpoints yet. (We think a video about “Why AI might pose an existential risk”, for example, might depend more on pre-existing trust to succeed.) We also saw this as an opportunity to tell the world about the ideas and people that have for years been anticipating the progress and dangers of AI (that’s many of you!), and invite the br
 ·  · 14m read
 · 
Cross-posted from my personal blog after strong encouragement from two people. The text is partially personal musings over my journey through the pre-surgery process, and partially a review of arguments why it might or might not make sense for a person who cares about effective altruism to donate a kidney. Some of the content is specific to where I am based in (Finland), and some of the content is less honed than I'd like, but if I didn't timebox my writing and just push whatever I have at the end of it I probably would not publish anything. Hope you enjoy! June 4, 2025 I was eating phở in the city center when the call came. The EM study had been done. "Bad news," my nephrologist said. "You have a kidney disease." The words hit harder than the spicy broth. After more than nine months of tests, blood draws, and even a kidney biopsy, my journey to donate a kidney to my friend had just come to an abrupt end. Let’s rewind to where this all began. The Decision When Effective Altruism volunteer Mikko mentioned that his kidney transplant was showing signs of chronic rejection, I jumped and offered him my kidney. I had read Dylan Matthew’s kidney donation story, Scott Alexander’s kidney donation story, and had taken part in a discussion on estimating the level of kidney demand in Finland in a local EA discussion group that had at least one Finnish doctor involved. The statistics were reassuring. Kidney donations are very safe with a perioperative death rate of around 0.03% to 0.06%. You will have a slightly increased risk of kidney-related issues later in life (1-2% likelihood of kidney failure), and there is also some weak evidence that suggests people who donate a kidney have very slightly shorter life expectancy than those who have both kidneys intact. The QALY Trade-off There are multiple attempts to calculate what the quality-adjusted life year (QALY) trade for a kidney donation actually is, and to me it looks like donating a kidney is by default similar to