IMO it is harmful on expectation for a technical safety researcher to work at DeepMind, OpenAI or Anthropic.
Four reasons:
- Interactive complexity. The intractability of catching up – by trying to invent general methods for AI corporations to somehow safely contain model interactions, as other engineers scale models' combinatorial complexity and outside connectivity.
- Safety-capability entanglements
- Commercialisation. Model inspection and alignment techniques can support engineering and productisation of more generally useful automated systems.
- Infohazards. Researching capability risks within an AI lab can inspire researchers hearing about your findings to build new capabilities.
- Shifts under competitive pressure
- DeepMind merged with Google Brain to do commercialisable research,
OpenAI set up a company and partnered with Microsoft to release ChatGPT,
Anthropic pitched to investors they'd build a model 10 times more capable. - If you are an employee at one of these corporations, higher-ups can instruct you to do R&D you never signed up to do.[1] You can abide, or get fired.
- Working long hours surrounded by others paid like you are, by a for-profit corp, is bad for maintaining bearings and your epistemics on safety.[2]
- DeepMind merged with Google Brain to do commercialisable research,
- Safety-washing. Looking serious about 'safety' helps labs to recruit idealistic capability researchers, lobby politicians, and market to consumers.
- 'let's build AI to superalign AI'
- 'look, pretty visualisations of what's going on inside AI'
This is my view. I would want people to engage with the different arguments, and think for themselves what ensures that future AI systems are actually safe.
- ^
I heard via via that Google managers are forcing DeepMind safety researchers to shift some of their hours to developing Gemini for product-ready launch.
I cannot confirm whether that's correct. - ^
For example, I was in contact with a safety researcher at an AGI lab who kindly offered to read my comprehensive outline on the AGI control problem, to consider whether to share with colleagues. They also said they're low energy. They suggested I'd remind them later, and I did, but they never got back to me. They're simply too busy it seems.
Yes, specifically by claim 1, positive value can only asymptotically approach 0
(ignoring opportunity costs).
Some relevant aspects are missing in what you shared so far.
Particularly, we need to consider that any one AGI lab is (as of now) beholden to the rest of society to continue operating.
This is clearly true in the limit. Imagine some freak mass catastrophe caused by OpenAI:
staff would leave, consumers would stop buying, and regulators would shut the place down.
But it is also true in practice.
From the outside, these AGI labs may look like institutional pillars of strength. But from the inside, management is constantly jostling, trying to source enough investments and/or profitable productisation avenues to cover high staff salaries and compute costs. This is why I think DeepMind allowed themselves to be acquired by Google in the first place. They ran a $649 million loss in 2019, and could simply not maintain that burn rate without a larger tech corporation covering their losses for them.
In practice, AGI labs are constantly finding ways to make themselves look serious about safety, and finding ways to address safety issues customers are noticing. Not just because some employees there are paying attention to those harms and taking care to avoid them. But also because they're dealing with newly introduced AI products that already have lots of controversies associated to it (in these rough categories: data laundering, worker exploitation, design errors and misuses, resource-intensive and polluting hardware).
If we think about this in simplified dimensions:
Outside stakeholders would need to perceive the system to be unsafe to restrict further scaling and/or uses (which IMO is much more effective than trying to make scaled open-ended systems comprehensively safe after the fact). Where 'the system' can include the institutional hierarchies and infrastructure through which an AI models is developed and deployed.
Corporations have a knack for finding ways to hide product harms, while influencing people to not notice or to dismiss those harms. See cases Big Tabacco, Big Pharma, Big Oil.
Corporations that manage to do that can make profit from selling products without getting shut down. This is what capitalism – open market transactions and private profit reinvestment – in part selects for. This is what Big Tech companies that win out over time manage to do.
(it feels like I'm repeating stuff obvious to you, but it bears repeating to set the context)
Are you stating an intuition that it would be surprising if AGI labs invested less in improving actual safety, then that would be overall less harmful?
I am saying with claim 4. that there is another dimension, perceived safety.
The more that an AI corporation is able to make the system be or at least look *locally* safe to users and other stakeholders (even if globally much more unsafe), the more the rest of society will permit and support the corporations to scale on. And the more that the AI corporation can promote that they are responsibly scaling toward some future aligned system that is *globally* safe, the more that nerdy researchers and other stakeholders open to that kind of messaging can treat that as a sign of virtue and give the corporation a pass there too.
And that unfortunately, by claim 1, that actual safety is intractable when scaling such open-ended (and increasingly automated) systems. Which is why in established safety-critical industries – eg. for medical devices, cars, planes, industrial plants, even kitchen devices – there are best practices for narrowly scoping the design of the machines to specific uses and contexts of use.
So actual safety is intractable for such open-ended systems, but AI corporations can and do disproportionately support research and research communication that increases perceived safety.
But actual safety is tractable for restricting corporate AI scaling (if you reduce the system's degrees of freedom of interaction, you reduce the possible ways things can go wrong). Unfortunately, fewer people move to restrict corporate-AI scaling if the corporate activities are perceived to be safe.
By researching safety at AGI labs, researchers are therefore predominantly increasing perceived safety, and as a result closing off realistic opportunities to improving actual safety.