The gap between Systems Safety Engineering and AI Safety is something we need to talk about

Phill Mulvana

I occupy a strange space professionally and I think it might actually matter, so I ask you to bear with me while I hope to offer a perspective that will allow you to confirm or critique my thoughts.

By training and practice, I'm a systems safety engineer. In that role I've designed safety cases, studied human interaction with complex systems, attended police-led investigations of failure sites, and conducted experiments to establish burden of liability in legal proceedings. On the other side, I'm an advocate for existential risk from advanced AI, and much of my practical work sits around embodied AI; autonomous systems operating in complex environments where getting minor things wrong results in actual harm to people.

This means I spend a lot of time in two communities that are increasingly distant from one another, and that distance is starting to bother me, because the gap between them is not helping anyone in the management of AI risk.

What Systems Safety Engineering Actually Is

Before I go further, it's worth saying what systems safety engineering isn't, because it's commonly misunderstood. It’s not checklists, nor design for compliance and it’s definitely not health and safety in a hi-vis jacket.

Specifically; it is the discipline of trying to distil complex, tightly coupled, high-consequence socio-technical systems into tractable models, requirements, and objects over which we can identify and manage safety. Sometimes that's the global movement of air traffic, stochastic elements and all. Sometimes it's the performance envelope of a UGV delivering medical supplies in an urban environment. These approaches are extensively used in defence, aerospace, nuclear and others, but not, to my knowledge or my peers', in AI safety (at least in any meaningful way). I have had talks with certain AI organisations, but there always remained a serious scepticism in the applicability of systems safety rather than traditional comp-sci techniques.

The foundational insight that every safety management practitioner eventually internalises is this: failure is rarely a single fault, it is the product of interactions, incentives, human behaviour, and organisational and civic structures. Typically that makes it a combination where most failures are only visible only in hindsight.

As Dr Nancy Leveson has argued, and as I've seen confirmed repeatedly in practice: safety is a system property, not a component property. The serious failures almost always emerge from a system that looks locally acceptable, operating clearly within its defined use case, right up until the conditions deviate slightly, and the entire safety case falls apart.

At most scales, AGI risk looks far more like a complex systems safety problem than almost any other lens you could cast over it.

Intention Is Not Specification

One of the clearest lessons from embodied AI and operational design domain work is that you cannot specify behaviour by assuming cooperative interpretation. You have to specify against adversarial ambiguity. Always.

In practice, you end up designing as though the system is a brilliant but recalcitrant child, literal, boundary-seeking, and completely indifferent to your implied intent. When working on what as a relatively well bounded but goal-based environment in which an AS could operate, my engineers found themselves questioning their sanity at the extent to which a system value or attribute could be abused if not fully specified and protected against. Requirements are incomplete. Environments are under-specified. Operators assume shared context the system doesn't possess. And obvious constraints are often nowhere in the actual instruction set – instead in the product of many iterative revisions to operating design domains revised over heated semantic arguments and many late afternoon coffees.

A lot of assurance work reduces to a simple question: how would this fail if it interpreted my intent in the dumbest, narrowest, or most inconvenient possible way? Anyone in AI safety will recognise that immediately, it's specification gaming, just in a different dialect. And it happens, I’ve seen it.

In embodied systems, the consequence might be a robot entering an unsafe state, a drone violating controlled airspace, or an autonomous vehicle failing at the edge of its operational design domain. In advanced technology used with direct human interactions its people using systems in ways they perceive to be optimal, or for the task they feel the system should accommodate, not the one the system is designed to accommodate. In frontier models, the same underlying failure mode becomes reward hacking, deceptive optimisation, or behaviour outside intended control boundaries. Having worked in both spaces, I find it's predominantly the language that changes, not the failure mode itself.

This is also why "just write better rules" is not a serious answer. You cannot enumerate reality completely. Safe operation depends on constrained domains, layered controls, independent challenge, and explicit assumptions about failure.

High Hazard Industries Already Paid for These Lessons

Nuclear, aerospace, and similar sectors don't stay safer because their engineers are smarter or more careful, or even because their systems are better bounded. They stay safer because they built institutions, disciplines, and deep cultural norms around the assumption that complex systems fail in ways individuals cannot predict.

That means:

- Explicit safety cases rather than implicit confidence

- Independent challenge rather than designer self-certification

- In-service safety treated as coequal to design-time constraints

- Competence frameworks and legal accountability for people making safety-critical decisions

- Formal treatment of uncertainty, residual risk, and unknowns.

Most importantly, and I'll say it again because rarely does it seem to land; safety is treated as a system-of systems property, not a product feature. This system can extend from a component, through to a financial system, and onwards to a truly global system – at even the most modest scales, trying to solve systems safety challenges becomes an explosion of possible states and outcomes.

Much of frontier AI still behaves as though extraordinary capability can be safely managed through benchmarking, red-teaming, and increasingly optimistic blog posts. Of course those things matter and have been integral to drawing attention to and offering foundational mitigations for potential AI harms, that said; they are not assurance.

We need safety cases. We need competence frameworks that allow us to identify and respond to in-service deviation from expectation. Red-teaming needs to be supported by assessment of intended and unintended-but-expected human use. Model cards need to be operationalised within an actual operational safety case. I could go on.

X-Risk and Operational Safety Are the Same Conversation

Another failure mode, and I see this in both communities, is treating existential risk and practical deployment safety as separate domains and I simply don't think they are.

If you believe advanced AI could represent civilisation-scale risk, then deployment governance matters enormously right now. We should be able to answer questions like: who has authority to halt deployment? What evidence is required before release? What constitutes tolerable uncertainty? How is dissent handled when commercial incentives favour launch? What institutional structures survive commercial pressure?

We'd find it absurd if a new nuclear reactor were deployed globally with the assurance philosophy of "the internal team tested it extensively and published a responsible scaling policy." For systems potentially more consequential, we routinely accept precisely that logic and that should concern everyone, regardless of whether you think timelines are five years or fifty.

The Missing Translation Layer

While not a criticism, part of this is challenge is cultural, and it runs in both directions.

Many classical safety professionals underestimate the novelty of alignment problems and assume existing frameworks transfer cleanly, and many refuse to recognise AI risk as a problem at all. Many AI safety researchers correctly reject that assumption, but in doing so sometimes reject the entire body of knowledge around assurance, governance, and socio-technical control outside the realm of their existing known and effective toolset.

Neither position help and in my view both constitute mistakes at a time when the problem needs a wider corpus of attention, knowledge and expertise thrown at it than ever.

What we need are people who understand systems safety engineering deeply enough to know AI safety is a problem with their time. That their techniques are useful, but only when coloured with the specific context of advanced AI issues. We also need those at the forefront of model evaluation to understand the outsized value that these domains and practices can bring to the alignment and safety problems. This bridge, by most measures, is underdeveloped, and given the timelines being discussed, underdeveloped is bad.

So my question, and I really think it's a real one worth the community sitting with, is simply: why aren't systems safety engineers in the room?

Is it that the methods genuinely don't scale to this problem? Is it that assurance professionals have failed to translate their discipline into terms alignment researchers respect? Or is it that safety in AI still culturally means capability evaluation rather than high-integrity governance?

I don't think safety cases and complex systems thinking solve alignment. But alignment without the assurance discipline to support it is still madness.

To note; i'm not an academic, I am a researcher and primarily a practitioner. As a result this is a practitioner's perspective and my view as someone who's spent years at the intersection of autonomous systems and harm. I've been looking for a vehicle to raise this and my fondness of the EA community meant it seemed like the right place to try - apologies for any broken conventions or rules; i've tried my best to follow them!

EA Forum Bot Site
EA Forum

The gap between Systems Safety Engineering and AI Safety is something we need to talk about

4

4

Reactions

More posts like this