Many new AI safety orgs are emerging, each tackling a slightly different problem (mechanistic interpretability, evals, scalable oversight, misuse prevention, etc.).

Has anyone tried systematically mapping these orgs to AI failure modes (e.g. misalignment, misuse, gradual disempowerment, deceptive alignment)?

One goal of such an exercise could be to see which threat models are well-covered and where gaps remain — something that could inform funding and research priorities.

A starting point might be combining existing taxonomies (e.g. the AI Risk Repository) with current org lists (e.g. Map of AI Existential Safety).

Has anyone seen a version of this or would be interested in building one?

9

1
0

Reactions

1
0
Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities