Evan R. Murphy

Technical AI Governance Researcher @ UC Berkeley

606 karmaJoined Oct 2021Working (6-15 years)Vancouver, BC, Canada

Bio

I'm doing research and other work focused on AI safety/security, governance and risk reduction. Currently my top projects are (last updated Feb 26, 2025):

Technical researcher for UC Berkeley at the AI Security Initiative, part of the Center for Long-Term Cybersecurity (CLTC)
Serving on the board of directors for AI Governance & Safety Canada

General areas of interest for me are AI safety strategy, comparative AI alignment research, prioritizing technical alignment work, analyzing the published alignment plans of major AI labs, interpretability, deconfusion research and other AI safety-related topics.

Research that I’ve authored or co-authored:

See publications on Google Scholar
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
(Scroll down to read other posts and comments I've written)

Before getting into AI safety, I was a software engineer for 11 years at Google and various startups. You can find details about my previous work on my LinkedIn.

While I'm not always great at responding, I'm happy to connect with other researchers or people interested in AI alignment and effective altruism. Feel free to send me a private message!

How others can help me

Give me feedback anonymously

Posts
7

Sorted by New

Evan R. Murphy's Quick takes

Evan R. Murphy

· 5y ago · 1m read

AI Risk: Can We Thread the Needle? [Recorded Talk from EA Summit Vancouver '25]

Evan R. Murphy

· 9mo ago · 2m read

Proposal: Funding Diversification for Top Cause Areas

Evan R. Murphy

· 4y ago · 3m read

New US Senate Bill on X-Risk Mitigation [Linkpost]

Evan R. Murphy

· 4y ago · 1m read

New series of posts answering one of Holden's "Important, actionable research questions"

Evan R. Murphy

· 4y ago · 1m read

Action: Help expand funding for AI Safety by coordinating on NSF response

Evan R. Murphy

· 4y ago · 4m read

People in bunkers, "sardines" and why biorisks may be overrated as a global priority

Evan R. Murphy

· 5y ago · 3m read

Comments
73

AMA: Ask Career Advisors Anything

Evan R. Murphy1y3

I applied to a couple of these using the pages you linked to. One of them already got back to me with next steps. Thanks!

AMA: Ask Career Advisors Anything

Evan R. Murphy1y1

Hi! Would any of the career advisors possibly be available for a 1-on-1 call?

I currently work in technical AI governance/safety research. I'm contemplating getting into fundraising or donor advising, as well as some other possibilities, and would love to talk through this with someone.

Thanks for considering! 🙏

Joining the Carnegie Endowment for International Peace

Evan R. Murphy2y1

one thing I have been pretty enthused about for a while is putting more effort into investigating potentially concerning AI incidents in the wild. Based on case studies, I believe that exposing and helping the public understand any concerning incidents could easily be the most effective way to galvanize more interest in safety standards, including regulation. I'm not sure how many concerning incidents there are to be found in the wild today, but I suspect there are some, and I expect there to be more over time as AI capabilities advance.

Interesting idea - I can see how exposing AI incidents could be important. This brought to my mind the paper Malla: Demystifying Real-world Large Language Model Integrated Malicious Services. (No affiliation with the paper, just one that I remember reading and we referenced in some Berkeley CLTC AI Security Initiative research earlier this year.) The researchers on the Malla paper dug into the dark web and uncovered hundreds of malicious services based on LLMs being distributed in the wild.

Evan R. Murphy's Quick takes

Evan R. Murphy2y6

Animal welfare

Open Phil claims that campaigns to make more Americans go vegan and vegetarian haven't been very successful. But does this analysis account for immigration?

If people who already live in the US are shifting their diets, but new immigrants skew omnivore, a simple analysis could easily miss the former shift because immigration is fairly large in the US.

Source of Open Phil claim at https://www.openphilanthropy.org/research/how-can-we-reduce-demand-for-meat/ :

But these advocates haven’t achieved the widespread dietary changes they’ve sought — and that boosters sometimes claim they have. Despite the claims, 6% of Americans aren’t vegan and vegetarianism hasn’t risen fivefold lately: Gallup polls show a constant 5-6% of Americans have identified as vegetarians since 1999 (Gallup found 2% identified as vegans the only time it asked, in 2012). The one credible poll showing vegetarianism doubling in recent years still found only 5-7% of Americans identifying as vegetarian in 2017 — consistent with the stable Gallup numbers.

Shutting down AI Safety Support

Evan R. Murphy3y11

Will the AI alignment Slack continue to run?

Thanks JJ and everyone who has worked on AISS for all your great work!

AGI x Animal Welfare: A High-EV Outreach Opportunity?

Evan R. Murphy3y1

Peter Singer and Tse Yip Fai were doing some work on animal welfare relating to AI last year: https://link.springer.com/article/10.1007/s43681-022-00187-z It looks like Fai at least is still working in this area. But I'm not sure whether they have considered or initiated outreach to AGI labs, that seems like a great idea.

If your AGI x-risk estimates are low, what scenarios make up the bulk of your expectations for an OK outcome?

Evan R. Murphy3y3

I place significant weight on the possibility that when labs are in the process of training AGI or near-AGI systems, they will be able to see alignment opportunities that we can't from a more theoretical or distanced POV. In this sense, I'm sympathetic to Anthropic's empirical approach to safety. I also think there are a lot of really smart and creative people working at these labs.

Leading labs also employ some people focused on the worst risks. For misalignment risks, I am most worried about deceptive alignment, and Anthropic recently hired one of the people who coined that term. (From this angle, I would feel safer about these risks if Anthropic were in the lead rather than OpenAI. I know less about OpenAI's current alignment team.)

Let me be clear though: Even if I'm right above and massively catastrophic misalignment risk one of these labs creating AGI is ~20%, I consider that very much an unacceptably high risk. I think even a 1% chance of extinction is unacceptably high. If some other kind of project had a 1% chance of causing human extinction, I don't think the public would stand for it. Imagine some particle accelerator or biotech project had a 1% chance of causing human extinction. If the public found out, I think they would want the project shut down immediately until it could be pursued safely. And I think they would be justified in that, if there's a way to coordinate on doing so.

If your AGI x-risk estimates are low, what scenarios make up the bulk of your expectations for an OK outcome?

Answer by Evan R. MurphyMay 02, 20231

A key part of my model right now relies on who develops the first AGI and on how many AGIs are developed.

If the first AGI is developed by OpenAI, Google DeepMind or Anthropic - all of whom seem relatively cautious (perhaps some more than others) - I put the chance of massively catastrophic misalignment at <20%.

If one of those labs is first and somehow able to prevent other actors from creating AGI after this, then that leaves my overall massively catastrophic misalignment risk at <20%. However, while I think it's likely one of these labs would be first, I'm highly uncertain about whether they would achieve the pivotal outcome of preventing subsequent AGIs.

So, if some less cautious actor overtakes the leading labs, or if the leading lab who first develops AGI cannot prevent many others from building AGI afterward, I view there's a much higher likelihood of massively catastrophic misalignment from one of these attempts to build AGI. In this scenario, my overall massively catastrophic misalignment risk is definitely >50%, and perhaps closer to the 75%~90% range.

NYT: Google will ‘recalibrate’ the risk of releasing AI due to competition with OpenAI

Evan R. Murphy3y2

You're right - I wasn't very happy with my word choice calling Google the 'engine of competition' in this situation. The engine was already in place and involves the various actors working on AGI and the incentives to do so. But these recent developments with Google doubling down on AI to protect their search/ad revenue are revving up that engine.

NYT: Google will ‘recalibrate’ the risk of releasing AI due to competition with OpenAI

Evan R. Murphy3y13

It's somewhat surprising to me the way this is shaking out. I would expect DeepMind and OpenAI's AGI research to be competing with one another*. But here it looks like Google is the engine of competition, less motivated by any future focused ideas about AGI more just by the fact that their core search/ad business model appears to be threatened by OpenAI's AGI research.

*And hopefully cooperating with one another too.

Evan R. Murphy

Bio

How others can help me

Posts 7

Comments73

Posts
7

Comments
73