I think you, and this community, have no idea how difficult it is to resist value/mission drift in these situations. This is not a friend:friend exchange. It’s a small community of nonprofits and individuals:the most valuable companies in the world. They aren’t just gonna pick up the values of a few researchers by osmosis.
From your other comment it seems like you have already been affected by the lab’s influence via the technical research community. The emphasis on technical solutions only benefits them, and it just so happens that to work on the big models you have to work with them. This is not an open exchange where they have been just as influenced by us. Sam and Dario sure want you and the US government to think they are the right safety approach, though.
Here’s our crux:
My subjective sense is there's a good chance we lose because all the necessary insights to build aligned AI were lying around, they just didn't get sufficiently developed or implemented.
For both theoretical and empirical reasons, I would assign a probably as low as 5% to there being alignment insights just laying around that could protect us at the superintelligence capabilities level and don’t require us to slow or stop development to implement in time.
I don’t see a lot of technical safety people engaging in advocacy, either? It’s not like they tried advocacy first and then decided on technical safety. Maybe you should question their epistemology.
What you write there makes sense but it's not free to have people in those positions, as I said. I did a lot of thinking about this when I was working on wild animal welfare. It seems superficially like you could get the right kind of WAW-sympathetic person into agencies like FWS and the EPA and they would be there to, say, nudge the agency in a way no one else cared about to help animals when the time came. I did some interviews and looked into some historical cases and I concluded this is not a good idea.
Therefore I think trying to influence the values and safety of labs by working there is a bad idea that would not be pulled off.
There should be protests against them (PauseAI US will be protesting them in SF 2/28) and we should all consider them evil for building superintelligence when it is not safe! Dario is now openly calling for recursive self-improvement. They are the villains-- this is not hard. The fact that you would think Zach's post with "maybe" in the title is scrutiny is evidence of the problem.
What you seem to be hinting at, essentially espionage, may honestly be the best reason to work in a lab. But of course those people need to be willing to break NDAs and there are better ways to get that info than getting a technical safety job.
(Edited to add context for bringing up "espionage" and implications elaborated.)
Great piece— great prompt to rethink things and good digests of implications.
If you agree that mass movement building is a priority, check out PauseAI-US.org (I am executive director), or donate here: https://www.zeffy.com/donation-form/donate-to-help-pause-ai
One implication I strongly disagree with is that people should be getting jobs in AI labs. I don’t see you connecting that to actual safety impact, and I sincerely doubt working as a researcher gives you any influence on safety at this point (if it ever did). There is a definite cost to working at a lab, which is capture and NDA-walling. Already so many EAs work at Anthropic that it is shielded from scrutiny within EA, and the attachment to “our player” Anthropic has made it hard for many EAs to do the obvious thing by supporting PauseAI. Put simply: I see no meaningful path to impact on safety working as an AI lab researcher, and I see serious risks to individual and community effectiveness and mission focus.
I didn’t mean “there is no benefit to technical safety work”; I meant more like “there is only benefit to labs to emphasizing technical safety work to the exclusion of other things”, as in it benefits them and doesn’t cost them to do this.