IMO it is harmful on expectation for a technical safety researcher to work at DeepMind, OpenAI or Anthropic.
Four reasons:
- Interactive complexity. The intractability of catching up – by trying to invent general methods for AI corporations to somehow safely contain model interactions, as other engineers scale models' combinatorial complexity and outside connectivity.
- Safety-capability entanglements
- Commercialisation. Model inspection and alignment techniques can support engineering and productisation of more generally useful automated systems.
- Infohazards. Researching capability risks within an AI lab can inspire researchers hearing about your findings to build new capabilities.
- Shifts under competitive pressure
- DeepMind merged with Google Brain to do commercialisable research,
OpenAI set up a company and partnered with Microsoft to release ChatGPT,
Anthropic pitched to investors they'd build a model 10 times more capable. - If you are an employee at one of these corporations, higher-ups can instruct you to do R&D you never signed up to do.[1] You can abide, or get fired.
- Working long hours surrounded by others paid like you are, by a for-profit corp, is bad for maintaining bearings and your epistemics on safety.[2]
- DeepMind merged with Google Brain to do commercialisable research,
- Safety-washing. Looking serious about 'safety' helps labs to recruit idealistic capability researchers, lobby politicians, and market to consumers.
- 'let's build AI to superalign AI'
- 'look, pretty visualisations of what's going on inside AI'
This is my view. I would want people to engage with the different arguments, and think for themselves what ensures that future AI systems are actually safe.
- ^
I heard via via that Google managers are forcing DeepMind safety researchers to shift some of their hours to developing Gemini for product-ready launch.
I cannot confirm whether that's correct. - ^
For example, I was in contact with a safety researcher at an AGI lab who kindly offered to read my comprehensive outline on the AGI control problem, to consider whether to share with colleagues. They also said they're low energy. They suggested I'd remind them later, and I did, but they never got back to me. They're simply too busy it seems.
It depends on what you mean with 'work on safety'.
Standard practice for designing machine products to be safe in other established industries is to first narrowly scope the machinery's uses, the context of use, and the user group.
If employees worked at OpenAI / Anthropic / Deepmind on narrowing their operational scopes, all power to them! That would certainly help. It seems that leadership, who aim to design unscoped automated machinery to be used everywhere for everyone, would not approve though.
If working on safety means in effect playing close to a ceremonial role, where even though you really want to help, you cannot hope to catch up with the scaling efforts, I would reconsider. In other industries, when conscientious employees notice engineering malpractices that are already causing harms across society, sometimes one of them has the guts to find an attorney and become a whistleblower.
Also, in that case, I would prefer the AGI labs to not hire for those close-to-ceremonial roles.
I'd prefer them to be bluntly transparent to people in society that they are recklessly scaling ahead, and that they are just adding local bandaids to the 'Shoggoth' machinery.
Not that that is going to happen anyway.
If AGI labs can devote their budget to constructing operational design domains, I'm all up.
Again, that's counter to the leaders' intentions. Their intention is to scale everywhere and rely on the long-term safety researchers to tell them that there must be some yet-undiscovered general safe control patch.
I think we should avoid promoting AGI labs as a place to work at, or a place that somehow will improve safety. One of the reasons is indeed that I want us to be clear to idealistic talented people that they should really reconsider investing their career into supporting such an organisation.
BTW, I'm not quite answering from your suggested perspective of what an AGI lab "should do".
What feels relevant to me is what we can personally consider to do – as individuals connected into larger communities – so things won't get even worse.