Mapping AI safety orgs to threat models — has anyone done this?

Benevolent_Rain

Many new AI safety orgs are emerging, each tackling a slightly different problem (mechanistic interpretability, evals, scalable oversight, misuse prevention, etc.).

Has anyone tried systematically mapping these orgs to AI failure modes (e.g. misalignment, misuse, gradual disempowerment, deceptive alignment)?

One goal of such an exercise could be to see which threat models are well-covered and where gaps remain — something that could inform funding and research priorities.

A starting point might be combining existing taxonomies (e.g. the AI Risk Repository) with current org lists (e.g. Map of AI Existential Safety).

Has anyone seen a version of this or would be interested in building one?

EA Forum Bot Site
EA Forum

Mapping AI safety orgs to threat models — has anyone done this?

9

9

Reactions