SK

Sudhanshu Kasewa

Advisor @ 80,000 Hours
254 karmaJoined

Comments
28

Great post! Was just thinking about an intuition pump of my own re: EV earlier today, and it has a similar backdrop, of vaccine development. Also, you gave me a line with which to lead into it:

The work I do doesn't end up helping other researchers get closer to coming up with a cure.

Oh but it could have helped! It probably does (but there are exceptions like if your work is heavily misguided to the degree that nobody would have worked on it, or is gated).

By doing the work and showing it doesn't lead to a cure, you're freeing someone else who would have done that work to do some other work instead. Assuming they would still be searching for a cure, you've increased the probability that the remaining researchers do in fact find a cure.

I encounter "in 99.9% of worlds, I end up making no progress" a lot in my work, and I offer in its place that it is important and valuable to chase down many different bets to their conclusions, that the vaccine is not developed by a single party alone in isolation from all the knowledge being generated around them, but through the collected efforts of thousands of failed attempts from as many groups. The victor can claim only the lion's share of the credit, not all of it; every (plausible) failed attempted gets some part of the value generated from the endeavour as a whole, even ex post.

"anyone" is a high bar! Maybe worth looking at what notable orgs might want to fund, as a way of spotting "useful safety work not covered by enough people"?

I notice you're already thinking about this in some useful ways, nice. I'd love to see a clean picture of threat models overlaid with plans/orgs that aim to address them. 

I think the field is changing too fast for any specific claim here to stay true in 6-12m.

Signal boost: Check out the "Stars" and "Follows" on my github account for ideas of where to get stuck into AI safety.


A lot of people want to understand AI safety by playing around with code and closing some issues, but don't know where to find such projects. So I've recently starting scanning github for AI safety relevant projects and repositories. I've starred some, and followed some orgs/coders there as well, to make it easy for you to find these and get involved.

Excited to get more suggestions too! Feel to comment here, or send them to me at sk@80000hours.org

Thanks. I sort of don't buy that that's what the Mechanize piece says, and in any case "no matter what you do" sounds a bit fatalistic, similar to death. Sure, we all die, but does that really mean we shouldn't try and live healthier for longer?

Not directly relating to your claim, but:

The Mechanize piece claims "Full automation is desirable", which I don't think I agree with both a priori and after reading their substantiation. It does not contend with the possibilities of catastrophic risks from fully automating, say, bioweapon research and development; it might be inevitable, but on desirability I think it's clear that it's only desirable once -- at the bare minimum -- substantial risks have been planned for and/or suitably mitigated. It's totally reasonable to delay the inevitable!

Thanks Matt. Good read.

A stronger technological determinism tempers this optimism by saying that the kinds of minds you get will be whichever are easiest to build or maintain, and that those quite-specific minds will dominate no matter what you do.

Is there a thing you would point to that substantiates or richly argues for this claim? It seems non-obvious to me.

I try to maintain this public doc of AI safety cheap tests and resources, although it's due a deep overhaul. 

 

Suggestions and feedback welcome!

Scrappy note on the AI safety landscape. Very incomplete, but probably a good way to get oriented to (a) some of the orgs in the space, and (b) how the space is carved up more generally.

 

(A) Technical

(i) A lot of the safety work happens in the scaling-based AGI companies (OpenAI, GDM, Anthropic, and possibly Meta, xAI, Mistral, and some Chinese players). Some of it is directly useful, some of it is indirectly useful (e.g. negative results, datasets, open-source models, position pieces etc.), and some is not useful and/or a distraction. It's worth developing good assessment mechanisms/instincts about these.

(ii) A lot of safety work happens in collaboration with the AGI companies, but by individuals/organisations with some amount of independence and/or different incentives. Some examples: METR, Redwood, UK AISI, Epoch, Apollo. It's worth understanding what they're doing with AGI cos and what their theories of change are.

(iii) Orgs that don't seem to work directly with AGI cos but are deeply technically engaging with frontier models and their relationship to catastrophic risk: places like Palisade, FAR AI, CAIS. These orgs maintain even more independence, and are able to do/say things which maybe the previous tier might not be able to. A recent cool thing was CAIS finding that models don't do well on remote work tasks -- only 2.5% of tasks -- in contrast to OpenAI's findings in GDPval suggests models have an almost 50% win-rate against industry professionals on a suite of "economically valuable, real-world tasks" tasks.

(iv) Orgs that are pursuing other* technical AI safety bets, different from the AGI cos: FAR AI, ARC, Timaeus, Simplex AI, AE Studio, LawZero, many independents, some academics at e.g. CHAI/Berkeley, MIT, Stanford, MILA, Vector Institute, Oxford, Cambridge, UCL and elsewhere. It's worth understanding why they want to make these bets, including whether it's their comparative advantage, an alignment with their incentives/grants, or whether they're seeing things that others haven't been able to see yet. (*Some of the above might be pursuing similar bets to AGI cos but with fewer resources or with increased independence etc.)

(v) Orgs pursuing non-software technical bets: e.g. FlexHEG, TamperSec

 

(B) Non-technical or less technical, but still aimed (or could be aimed) at directly** working the problem

(i) Orgs that do more policy-focussed/outreach/advocacy/other-non-technical things: e.g. MIRI, CAIS, RAND, CivAI, FLI, Safe AI forum, SaferAI, EU AI office, CLTR, GovAI, LawAI, CSET, CSER

(ii) AGI cos policy and governance teams, e.g. the RSP teams, the government engagement teams, and maybe even some influence and interaction with product teams and legal departments.

** "directly" here means something like "make a strong case to delay the development of AGI giving us more time to technically solve the problem", a first-order effect, rather than something like "fund someone who can make a case to delay...", which is a higher order effect

 

(C) Field-building/Talent development/Physical infrastructure

(i) Direct talent development: Constellation, Kairos, BlueDot, ARENA, MATS, LASR, Apart Research, Tarbell, etc. These orgs aim to increase the number of people going into above categories or speed them up. They don't usually (aim to) work directly on the problem, but sometimes incidentally do (e.g. via high quality outputs from MATS). There can be a multiplier effect for working in such orgs.

(ii) Infra: Constellation, FAR AI, Mox, LISA

(iii) Incubators: e.g. Seldon Labs, Constellation, Catalyze, EF, Fifty-Fifty

 

(D) Moving money

(i) Non-profit/philanthropic donors: e.g. OpenPhil, SFF, EA Funds, LongView, Schmidt Futures

(ii) VCs: e.g. Halcyon, Fifty-Fifty

 

For added coverage, 

(E) Others

(i) Multipolar scenarios: CLR, ACS Prague, FOCAL (CMU), CAIF

(ii) Digital consciousness type-things: CLR, Eleos, NYU Center for Mind, Ethics, and Policy

(iii) Post-AGI futures: Forethought, MIT FutureTech

 

(F) For-profits trying to translate AI safety work into some kind of business model to validate research and possibly be well situated should more regulation mandate evals, audit, certifications etc.: e.g. Goodfire, Lakera, GraySwan, possibly dozens more startups + big professional services firms would be itching to get in on this when the regulations happen.


It is very worth investigating whether to work on any of these: The field is wide open and there are many approaches to pursue. "Defence in depth" (1, 2, 3) implies that there is work to be done across a lot of different attack surfaces, and so it's maybe not so central to identify a singular best thing to work on; it's enough to find something that has a plausible theory of change, that seems to be neglected and/or is patching some hole in a huge array of defences -- we need lots of people/orgs/resources to help with finding and patching the countless holes!

PSA: If you're doing evals things, every now and then you should look back at OpenPhil's page on capabilities evals to check against their desiderata and questions in sections 2.1-2.2, 3.1-3.4, 4.1-4.3 as a way to critically appraise the work you're doing.

I was reminded of this post (Purchase Fuzzies and Utilons Separately), and it's something I do myself: work in some speculative EV-maximising space, but donate to "definitely doing good" things.

Thanks for doing this, Ben! 

Readers: Here's a spreadsheet with the above Taxonomy, and some columns which I'm hoping we can collectively populate with some useful pointers for each topic:

  1. Does [academic] work in this topic help with reducing GCRs/X-risks from AI?
  2. What's the theory of change[1] for this topic?
  3. What skills does this build, that are useful for AI existential safety?
  4. What are some Foundational Papers in this topic?
  5. What are some Survey Papers in this topic?
  6. Which academic labs are doing meaningful work on this topic?
  7. What are the best academic venues/workshops/conferences/journals for this topic?
  8. What other projects are working on this topic?
  9. Any guidance on how to get involved, who to speak with etc. about this topic?

For security reasons, I have not made it 'editable', but please comment on the sheet and I'll come by in a few days and update the cells.

[1] softly categorised as Plausible, Hope, Grand Hope

Load more