3686 karmaJoined



AI governance/grantmaking. Formerly at the Center for AI Safety and Yale EA organizer.


Pragmatic AI Safety


People have been having similar thoughts to yours for many years, including myself. Navigating through EA epistemic currents is treacherous. To be sure, so is navigating epistemic currents in lots of other environments, including the "default" environment for most people. But EA is sometimes presented as being "neutral" in certain ways, so it feels jarring to see that it is clearly not.

Nearly everyone I know who has been around EA long enough to do things like run a university group eventually confronts the fact that their beliefs have been shaped socially by the community in ways that are hard to understand, including by people paid to shape your beliefs. It's challenging to know what to do in light of that. Some people reject EA. Others, like you, take breaks to figure things out more for themselves. And others press on, while trying to course correct some. Many try to create more emotional distance, regardless of what they do. There's not really an obvious answer, and I don't feel I've figured it fully out myself. All this is to just say: you're not alone. If you or anyone else reading this wants to talk, I'm here.

Finally, I really like this related post, as well as this comment on it. When I ran the Yale EA in depth fellowship, I assigned it as a reading.

Sorry not to weigh in on the object-level parts about university groups and what you think they should do differently, but as I've graduated I'm no longer a community builder so I'm somewhat less interested in weighing in on that.

It's also very much worth reading the linked pdf, which goes into more detail than the fact sheet.

Except, perhaps, dictators and other ne'er-do-wells.

I would guess that a significant number of power-seeking people in history and the present are power-seeking precisely because they think that those they are vying for power with are some form of "ne'er-do-wells." So the original statement:

Importantly, when longtermists say “we should try and influence the long-term future”, I think they/we really mean everyone.

with the footnote doesn't seem to mean very much. "Everyone, except those viewed as irresponsible," historically, at least, has certainly not meant everyone, and to some people means very few people.

There is for the ML safety component only. It's very different from this program in time commitment (much lower), stipend (much lower), and prerequisites (much higher, requires prior ML knowledge) though. There are a lot of online courses that just teach ML, so you could take one of those on your own and then this.


Sure, here they are! Also linked at the top now.

No, it is not being run again this year, sorry!

I have collected existing examples of this broad class of things on ai-improving-ai.safe.ai.

(More of a meta point somewhat responding to some other comments.)

It currently seems unlikely there will be a unified AI risk public communication strategy. AI risk is an issue that affects everyone, and many people are going to weigh in on it. That includes both people who are regulars on this forum and people who have never heard of it.

I imagine many people will not be moved by Yudkowsky's op ed, and others will be. People who think AI x-risk is an important issue but who still disagree with Yudkowsky will have their own public writing that may be partially contradictory. Of course people should continue to talk to each other about their views, in public and in private, but I don't expect that to produce "message discipline" (nor should it).

The number of people concerned about AI x-risk is going to get large enough (and arguably already is) that credibility will become highly unevenly distributed among those concerned about AI risk. Some people may think that Yudkowsky lacks credibility, or that his op ed damages it, but that needn't damage the credibility of everyone who is concerned about the risks. Back when there were only a few major news articles on the subject, that might have been more true, but it's not anymore. Now everyone from Geoffrey Hinton to Gary Marcus (somehow) to Elon Musk to Yuval Noah Harari are talking about the risks. While it's possible everyone could be lumped together as "the AI x-risk people," at this point, I think that's a diminishing possibility.

There is often a clash between "alignment" and "capabilities" with some saying AI labs are pretending to do alignment while doing capabilities and others say they are so closely tied it's impossible to do good alignment research without producing capability gains.

I'm not sure this discussion will be resolved anytime soon. But I think it's often misdirected.

I think often what people are wondering is roughly "is x a good person for doing this research?" Should it count as beneficial EA-flavored research, or is it just you being an employee at a corporate AI lab? The alignment and capabilities discussions often seem secondary to this.

Instead, think we should stick to a different notion: something is "pro-social" (not attached to the term) AI x-risk research if it's research that (1) has a shot of reducing x-risk from AI (rather than increasing it or doing nothing) and (2) is not incentivized enough by factors external to the lab, to pro-social motivation, and to EA (for example: the market, the government, the public, social status in silicon valley, etc.)

Note (1) should include risks that the intervention changes timelines in some negative way, and (2) does not mean the intervention isn't incentivized at all, just that it isn't incentivized enough.

This is actually similar enough to the scale/tractability/neglectedness framework but it (1) incorporates downside risk and (2) doesn't run into the problem of having EAs want to do things "nobody else is doing" (including other EAs). EAs should simply do things that are underincentivized and good.

So, instead of asking things like, "is OpenAI's alignment research real alignment?" ask "how likely is it to reduce x-risk?" and "is it incentivized enough by external factors?" That should be how we assess whether to praise the people there or tell people they should go work there.


Note: edited "external to EA" to "external to pro-social motivation and to EA"

I expect that if plant based alternatives ever were to become as available, tasty, and cheap as animal products, a large proportion of people and likely nearly all EAs would become vegan. Cultural effects do matter, but in the end I expect them to be mostly downstream of technology in this particular case. Moral appeals have unfortunately had limited success on this issue.

Load more