Scrappy note on the AI safety landscape. Very incomplete, but probably a good way to get oriented to (a) some of the orgs in the space, and (b) how the space is carved up more generally.
(A) Technical
(i) A lot of the safety work happens in the scaling-based AGI companies (OpenAI, GDM, Anthropic, and possibly Meta, xAI, Mistral, and some Chinese players). Some of it is directly useful, some of it is indirectly useful (e.g. negative results, datasets, open-source models, position pieces etc.), and some is not useful and/or a distraction. It's worth developing good assessment mechanisms/instincts about these.
(ii) A lot of safety work happens in collaboration with the AGI companies, but by individuals/organisations with some amount of independence and/or different incentives. Some examples: METR, Redwood, UK AISI, Epoch, Apollo. It's worth understanding what they're doing with AGI cos and what their theories of change are.
(iii) Orgs that don't seem to work directly with AGI cos but are deeply technically engaging with frontier models and their relationship to catastrophic risk: places like Palisade, FAR AI, CAIS. These orgs maintain even more independence, and are able to do/say things which maybe the previous tier might not be able to. A recent cool thing was CAIS finding that models don't do well on remote work tasks -- only 2.5% of tasks -- in contrast to OpenAI's findings in GDPval suggests models have an almost 50% win-rate against industry professionals on a suite of "economically valuable, real-world tasks" tasks.
(iv) Orgs that are pursuing other* technical AI safety bets, different from the AGI cos: FAR AI, ARC, Timaeus, Simplex AI, AE Studio, LawZero, many independents, some academics at e.g. CHAI/Berkeley, MIT, Stanford, MILA, Vector Institute, Oxford, Cambridge, UCL and elsewhere. It's worth understanding why they want to make these bets, including whether it's their comparative advantage, an alignment with their incentives/grants, or whether they're seeing things that others haven't been able to see yet. (*Some of the above might be pursuing similar bets to AGI cos but with fewer resources or with increased independence etc.)
(v) Orgs pursuing non-software technical bets: e.g. FlexHEG, TamperSec
(B) Non-technical or less technical, but still aimed (or could be aimed) at directly** working the problem
(i) Orgs that do more policy-focussed/outreach/advocacy/other-non-technical things: e.g. MIRI, CAIS, RAND, CivAI, FLI, Safe AI forum, SaferAI, EU AI office, CLTR, GovAI, LawAI, CSET, CSER
(ii) AGI cos policy and governance teams, e.g. the RSP teams, the government engagement teams, and maybe even some influence and interaction with product teams and legal departments.
** "directly" here means something like "make a strong case to delay the development of AGI giving us more time to technically solve the problem", a first-order effect, rather than something like "fund someone who can make a case to delay...", which is a higher order effect
(i) Direct talent development: Constellation, Kairos, BlueDot, ARENA, MATS, LASR, Apart Research, Tarbell, etc. These orgs aim to increase the number of people going into above categories or speed them up. They don't usually (aim to) work directly on the problem, but sometimes incidentally do (e.g. via high quality outputs from MATS). There can be a multiplier effect for working in such orgs.
(ii) Infra: Constellation, FAR AI, Mox, LISA
(iii) Incubators: e.g. Seldon Labs, Constellation, Catalyze, EF, Fifty-Fifty
(D) Moving money
(i) Non-profit/philanthropic donors: e.g. OpenPhil, SFF, EA Funds, LongView, Schmidt Futures
(ii) Digital consciousness type-things: CLR, Eleos, NYU Center for Mind, Ethics, and Policy
(iii) Post-AGI futures: Forethought, MIT FutureTech
(F) For-profits trying to translate AI safety work into some kind of business model to validate research and possibly be well situated should more regulation mandate evals, audit, certifications etc.: e.g. Goodfire, Lakera, GraySwan, possibly dozens more startups + big professional services firms would be itching to get in on this when the regulations happen.
It is very worth investigating whether to work on any of these: The field is wide open and there are many approaches to pursue. "Defence in depth" (1, 2, 3) implies that there is work to be done across a lot of different attack surfaces, and so it's maybe not so central to identify a singular best thing to work on; it's enough to find something that has a plausible theory of change, that seems to be neglected and/or is patching some hole in a huge array of defences -- we need lots of people/orgs/resources to help with finding and patching the countless holes!
"anyone" is a high bar! Maybe worth looking at what notable orgs mightwanttofund, as a way of spotting "useful safety work not covered by enough people"?
PSA: If you're doing evals things, every now and then you should look back at OpenPhil's page on capabilities evals to check against their desiderata and questions in sections 2.1-2.2, 3.1-3.4, 4.1-4.3 as a way to critically appraise the work you're doing.
I've now spoken to ~1,400 people as an advisor with 80,000 Hours, and if there's a quick thing I think is worth more people doing, it's doing a short reflection exercise about one's current situation.
Below are some (cluster of) questions I often ask in an advising call to facilitate this. I'm often surprised by how much purchase one can get simply from this -- noticing one's own motivations, weighing one's personal needs against a yearning for impact, identifying blind spots in current plans that could be triaged and easily addressed, etc.
A long list of semi-useful questions I often ask in an advising call
Your context:
What’s your current job like? (or like, for the roles you’ve had in the last few years…)
The role
The tasks and activities
Does it involve management?
What skills do you use? Which ones are you learning?
Is there something in your current job that you want to change, that you don’t like?
Default plan and tactics
What is your default plan?
How soon are you planning to move? How urgently do you need to get a job?
Have you been applying? Getting interviews, offers? Which roles? Why those roles?
Have you been networking? How? What is your current network?
Have you been doing any learning, upskilling? How have you been finding it?
How much time can you find to do things to make a job change? Have you considered e.g. a sabbatical or going down to a 3/4-day week?
What are you feeling blocked/bottlenecked by?
What are your preferences and/or constraints?
Money
Location
What kinds of tasks/skills would you want to use? (writing, speaking, project management, coding, math, your existing skills, etc.)
What skills do you want to develop?
Are you interested in leadership, management, or individual contribution?
Do you want to shoot for impact? How important is it compared to your other preferences?
How much certainty do you want to have wrt your impact?
If you could picture your perfect job – the perfect combination of the above – which ones would you relax first in order to consider a role?
Reflecting more on your values:
What is your moral circle?
Do future people matter?
How do you compare problems?
Do you buy this x-risk stuff?
How do you feel about expected impact vs certain impact?
If possible, I'd recommend trying to answer these questions out loud with another person listening (just like in an advising call!); they might be able to notice confusions, tensions, and places worth exploring further. Some follow up prompts that might be applicable to many of the questions above:
How do you feel about that?
Why is that? Why do you believe that?
What would make you change your mind about that?
What assumptions is that built on? What would change if you changed those assumptions?
Have you tried to work on that? What have you tried? What went well, what went poorly, and what did you learn?
Is there anyone you can ask about that? Is there someone you could cold-email about that?
Signal boost: Check out the "Stars" and "Follows" on my github account for ideas of where to get stuck into AI safety.
A lot of people want to understand AI safety by playing around with code and closing some issues, but don't know where to find such projects. So I've recently starting scanning github for AI safety relevant projects and repositories. I've starred some, and followed some orgs/coders there as well, to make it easy for you to find these and get involved.
Excited to get more suggestions too! Feel to comment here, or send them to me at sk@80000hours.org
With another EAG nearby, I thought now would be a good time to push out this draft-y note. I'm sure I'm missing a mountain of nuance, but I stand by the main messages:
"Keep Talking"
I think there are two things EAs could be doing more of, on the margin. They are cheap, easy, and have the potential to unlock value in unsuspecting ways.
Talk to more people
I say this 15 times a week. It's the most no-brainer thing I can think of, with a ridiculously low barrier to entry; it's usually net-positive for one while often only drawing on unproductive hours of the other. Almost nobody would be where they were without the conversations they had. Some anecdotes:
- A conversation led both parties discovering a good mentor-mentee fit, leading to one dropping out of a PhD, being mentored on a project, and becoming an alignment researcher.
- A first conversation led to more conversations which led to more conversations, one of which illuminated a new route to impact which this person was a tremendously good fit for. They're now working as a congressional staffer.
- A chat with a former employee gave an applicant insight about a company they were interviewing with and helped them land the job (many, many such cases).
- A group that is running a valuable fellowship programme germinated from a conversation between three folks who previously were unacquainted (the founders) (again, many such cases).
Make more introductions to others (or at least suggest who they should reach out to)
By hoarding our social capital we might leave ungodly amounts of value on the table. Develop your instincts and learn to trust them! Put people you speak with in touch with other people who they should speak with -- especially if they're earlier in their discovery of using evidence and reason to do more good in the world. (By all means, be protective of those whose time is 2 OOMs more precious; but within +/- 1, let's get more people connected: exchanging ideas, improving our thinking, illuminating truth, building trust.)
At EAG, at the very least, point people to others they should be talking to. The effort in doing so is so, so low, and the benefits could be massive.
One habit to make that second piece of advice stick even more that I often recommend: introduce people to other people as soon as you think of it (i.e. pause the conversation and send them an email address or list of names or open a thread between the two people).
I often pause 1:1s to find links or send someone a message because I'm prone to forgetting to do follow-up actions unless I immediately do it (or write it down).
Scrappy note on the AI safety landscape. Very incomplete, but probably a good way to get oriented to (a) some of the orgs in the space, and (b) how the space is carved up more generally.
(A) Technical
(i) A lot of the safety work happens in the scaling-based AGI companies (OpenAI, GDM, Anthropic, and possibly Meta, xAI, Mistral, and some Chinese players). Some of it is directly useful, some of it is indirectly useful (e.g. negative results, datasets, open-source models, position pieces etc.), and some is not useful and/or a distraction. It's worth developing good assessment mechanisms/instincts about these.
(ii) A lot of safety work happens in collaboration with the AGI companies, but by individuals/organisations with some amount of independence and/or different incentives. Some examples: METR, Redwood, UK AISI, Epoch, Apollo. It's worth understanding what they're doing with AGI cos and what their theories of change are.
(iii) Orgs that don't seem to work directly with AGI cos but are deeply technically engaging with frontier models and their relationship to catastrophic risk: places like Palisade, FAR AI, CAIS. These orgs maintain even more independence, and are able to do/say things which maybe the previous tier might not be able to. A recent cool thing was CAIS finding that models don't do well on remote work tasks -- only 2.5% of tasks -- in contrast to OpenAI's findings in GDPval suggests models have an almost 50% win-rate against industry professionals on a suite of "economically valuable, real-world tasks" tasks.
(iv) Orgs that are pursuing other* technical AI safety bets, different from the AGI cos: FAR AI, ARC, Timaeus, Simplex AI, AE Studio, LawZero, many independents, some academics at e.g. CHAI/Berkeley, MIT, Stanford, MILA, Vector Institute, Oxford, Cambridge, UCL and elsewhere. It's worth understanding why they want to make these bets, including whether it's their comparative advantage, an alignment with their incentives/grants, or whether they're seeing things that others haven't been able to see yet. (*Some of the above might be pursuing similar bets to AGI cos but with fewer resources or with increased independence etc.)
(v) Orgs pursuing non-software technical bets: e.g. FlexHEG, TamperSec
(B) Non-technical or less technical, but still aimed (or could be aimed) at directly** working the problem
(i) Orgs that do more policy-focussed/outreach/advocacy/other-non-technical things: e.g. MIRI, CAIS, RAND, CivAI, FLI, Safe AI forum, SaferAI, EU AI office, CLTR, GovAI, LawAI, CSET, CSER
(ii) AGI cos policy and governance teams, e.g. the RSP teams, the government engagement teams, and maybe even some influence and interaction with product teams and legal departments.
** "directly" here means something like "make a strong case to delay the development of AGI giving us more time to technically solve the problem", a first-order effect, rather than something like "fund someone who can make a case to delay...", which is a higher order effect
(C) Field-building/Talent development/Physical infrastructure
(i) Direct talent development: Constellation, Kairos, BlueDot, ARENA, MATS, LASR, Apart Research, Tarbell, etc. These orgs aim to increase the number of people going into above categories or speed them up. They don't usually (aim to) work directly on the problem, but sometimes incidentally do (e.g. via high quality outputs from MATS). There can be a multiplier effect for working in such orgs.
(ii) Infra: Constellation, FAR AI, Mox, LISA
(iii) Incubators: e.g. Seldon Labs, Constellation, Catalyze, EF, Fifty-Fifty
(D) Moving money
(i) Non-profit/philanthropic donors: e.g. OpenPhil, SFF, EA Funds, LongView, Schmidt Futures
(ii) VCs: e.g. Halcyon, Fifty-Fifty
For added coverage,
(E) Others
(i) Multipolar scenarios: CLR, ACS Prague, FOCAL (CMU), CAIF
(ii) Digital consciousness type-things: CLR, Eleos, NYU Center for Mind, Ethics, and Policy
(iii) Post-AGI futures: Forethought, MIT FutureTech
(F) For-profits trying to translate AI safety work into some kind of business model to validate research and possibly be well situated should more regulation mandate evals, audit, certifications etc.: e.g. Goodfire, Lakera, GraySwan, possibly dozens more startups + big professional services firms would be itching to get in on this when the regulations happen.
It is very worth investigating whether to work on any of these: The field is wide open and there are many approaches to pursue. "Defence in depth" (1, 2, 3) implies that there is work to be done across a lot of different attack surfaces, and so it's maybe not so central to identify a singular best thing to work on; it's enough to find something that has a plausible theory of change, that seems to be neglected and/or is patching some hole in a huge array of defences -- we need lots of people/orgs/resources to help with finding and patching the countless holes!
This is super helpful - do you feel like your overview even points at what potentially useful safety work is currently not covered by anyone?
"anyone" is a high bar! Maybe worth looking at what notable orgs might want to fund, as a way of spotting "useful safety work not covered by enough people"?
I notice you're already thinking about this in some useful ways, nice. I'd love to see a clean picture of threat models overlaid with plans/orgs that aim to address them.
I think the field is changing too fast for any specific claim here to stay true in 6-12m.
I try to maintain this public doc of AI safety cheap tests and resources, although it's due a deep overhaul.
Suggestions and feedback welcome!
This is great, and people should do this for more cause areas!
PSA: If you're doing evals things, every now and then you should look back at OpenPhil's page on capabilities evals to check against their desiderata and questions in sections 2.1-2.2, 3.1-3.4, 4.1-4.3 as a way to critically appraise the work you're doing.
I've now spoken to ~1,400 people as an advisor with 80,000 Hours, and if there's a quick thing I think is worth more people doing, it's doing a short reflection exercise about one's current situation.
Below are some (cluster of) questions I often ask in an advising call to facilitate this. I'm often surprised by how much purchase one can get simply from this -- noticing one's own motivations, weighing one's personal needs against a yearning for impact, identifying blind spots in current plans that could be triaged and easily addressed, etc.
A long list of semi-useful questions I often ask in an advising call
If possible, I'd recommend trying to answer these questions out loud with another person listening (just like in an advising call!); they might be able to notice confusions, tensions, and places worth exploring further. Some follow up prompts that might be applicable to many of the questions above:
Good luck!
Signal boost: Check out the "Stars" and "Follows" on my github account for ideas of where to get stuck into AI safety.
A lot of people want to understand AI safety by playing around with code and closing some issues, but don't know where to find such projects. So I've recently starting scanning github for AI safety relevant projects and repositories. I've starred some, and followed some orgs/coders there as well, to make it easy for you to find these and get involved.
Excited to get more suggestions too! Feel to comment here, or send them to me at sk@80000hours.org
With another EAG nearby, I thought now would be a good time to push out this draft-y note. I'm sure I'm missing a mountain of nuance, but I stand by the main messages:
"Keep Talking"
I think there are two things EAs could be doing more of, on the margin. They are cheap, easy, and have the potential to unlock value in unsuspecting ways.
Talk to more people
I say this 15 times a week. It's the most no-brainer thing I can think of, with a ridiculously low barrier to entry; it's usually net-positive for one while often only drawing on unproductive hours of the other. Almost nobody would be where they were without the conversations they had. Some anecdotes:
- A conversation led both parties discovering a good mentor-mentee fit, leading to one dropping out of a PhD, being mentored on a project, and becoming an alignment researcher.
- A first conversation led to more conversations which led to more conversations, one of which illuminated a new route to impact which this person was a tremendously good fit for. They're now working as a congressional staffer.
- A chat with a former employee gave an applicant insight about a company they were interviewing with and helped them land the job (many, many such cases).
- A group that is running a valuable fellowship programme germinated from a conversation between three folks who previously were unacquainted (the founders) (again, many such cases).
Make more introductions to others (or at least suggest who they should reach out to)
By hoarding our social capital we might leave ungodly amounts of value on the table. Develop your instincts and learn to trust them! Put people you speak with in touch with other people who they should speak with -- especially if they're earlier in their discovery of using evidence and reason to do more good in the world. (By all means, be protective of those whose time is 2 OOMs more precious; but within +/- 1, let's get more people connected: exchanging ideas, improving our thinking, illuminating truth, building trust.)
At EAG, at the very least, point people to others they should be talking to. The effort in doing so is so, so low, and the benefits could be massive.
Edit: Punctuation
One habit to make that second piece of advice stick even more that I often recommend: introduce people to other people as soon as you think of it (i.e. pause the conversation and send them an email address or list of names or open a thread between the two people).
I often pause 1:1s to find links or send someone a message because I'm prone to forgetting to do follow-up actions unless I immediately do it (or write it down).