Book a 1:1 with me: https://cal.com/tylerjohnston/book
Share anonymous feedback with me: https://www.admonymous.co/tylerjohnston
Thank you for writing this! It's probably the most clear and rigorous way I've seen these arguments presented, and I think a lot of the specific claims here are true and important to notice.
That being said, I want to offer some counterarguments, both for their own sake and to prompt discussion in case I'm missing something. I should probably add the disclaimer that I'm currently working at an organization advocating for stronger self-governance among AI companies, so I may have some pre-existing biases toward defending this strategy. But it also makes this question very relevant to me and I hope to learn something here.
Addressing particular sections:
Only Profit-Maximizers Stay At The Frontier
This section is interesting and reminds me of some metaphors I've heard comparing the mechanism of free markets to Darwinism... i.e. you have to profit-maximize, and if you don't, someone else will and they'll take your place. It's survival of the fittest, like it or not. Take this naïve metaphor seriously enough and you would expect most market ecosystems to be "red in tooth and claw," with bare-minimum wages, rampant corner-cutting, nothing remotely resembling CSR/ESG, etc.
One problem is: I'm not sure how true this is to begin with. Plenty of large companies act in non-profit-maximizing ways simply out of human error, or passivity, or because the market isn't perfectly competitive (maybe they and their nearest rivals are benefitting from entrenchment and economies of scale that mean they no longer have to), or perhaps most importantly, because they are all responding to non-financial incentives (such as the personal values of the people at the company) that their competitors are equally subject to.
But more convincingly, I think social good / avoiding dangerous accidents really are just more aligned with profit incentives than the metaphor would naively suggest. I know your piece acknowledges this, but you also write it off as having limitations, especially under race conditions aiming toward a particular capabilities threshold.
But that doesn't totally follow to me — under such conditions, while you might be more open to high-variance, high-risk strategies to reach that threshold, you might also be more averse to those strategies since the costs (direct or reputational or otherwise) imposed by accidents before that threshold is reached become so much more salient. In the case of AI, the costs of a major misuse incident from an AI product (threatening investment/employee retention/regulatory scrutiny/etc.) might outweigh the benefits of moving quickly or without regard to safety — even when racing to a critical threshold. A lot of this probably depends on how far off you think such a capability threshold is, and where relative to the frontier you currently are. This is all to say that race dynamics might make high-variance high-risk strategies more attractive, but they also might make them less attractive, and the devil is probably in the details. I haven't heard a good argument for how the AI case shakes out (and I've been thinking about it for a while).
Also, correct me if I'm wrong, but one thing the worldview you write about here would suggest is that we shouldn't trust companies to fulfill their commitments to carbon neutrality, or that if they do, they will soon no longer be on the forefront of their industry — doing so is expensive, nobody is requiring it of them (at least not on the timeline they are committing to), the commitment is easy to abandon, and even if they do it, someone who chooses not to will outcompete them and take their place at the forefront of the market. But I just don't really expect that to happen. I think in 2030 there's a good chance Apple's supply chain will be carbon-neutral, and that they'll still be in the lead for consumer electronics (either because the reputational benefits of the choice, and the downstream effects it has on revenue and employee retention and whatnot, made it the profit-maximizing thing to do, and/or because they were sufficiently large/entrenched that they can just make choices like that due to non-financial personal/corporate values without damaging their competitive position, even when doing so isn't maximally efficient.)
Early in the piece, you write:
A profit-driven tech corporation seems exceedingly unlikely to hinge astronomical capex on an AI corporation that does not give off the unmistakable impression of pursuing maximal profits.
But we can already prove this isn't true given that OpenAI has a profit cap, their deal with Microsoft had a built-in expiration, and Anthropic is a B-corp. Even if you don't trust that some of these measures will be ahered to (e.g. I believe the details on OpenAI's profit cap quietly changed over time), they certainly do not give off the unmistakable impression of maximal profit seeking. But I think these facts exist because either (1) many of the people at these companies are thinking about social impact in addition to profit (2) social responsibility is an important intermediate step to being profitable, or (3) the companies are so entrenched that there simply are no alternative, extra profit-maximizing firms who can compete, i.e. they have the headroom to make concessions like this, much as Apple can make climate commitments. I'm not sure what the balance between these three explanations are, but #1 and #3 challenge the strong view that only seemingly hard-nosed profit-maximizers are going to win here, and #2 challenges the view that profit-maximizing is mutually exclusive with long-term safety efforts.
All this considered, my take here is instead something like "We should expect frontier AI companies to generally act in profit-maximizing ways, but we shouldn't expect them to always be perfectly profit-maximizing across all dimensions, nor should we expect that profit-maximizing is always opposed to safety."
Constraints from Corporate Structure Are Dangerously Ineffective
I don't have a major counterargument here, aside from the fact that well-documented and legally recognized corporate structures often can be pretty effective thanks in part to the fact that judges/regulators get input on when and how they can be changed, and while I'm no expert, my understanding is that there are ways to optimize for this.
But your idea that companies are exchangeable shells for what really matters under the hood — compute, data, algorithms, employees — seems very true and very underrated to me. I think of this as something like "realpolitik" for AI safety. What really matters, above ideology and figureheads and voluntary commitments, is where the actual power lies (which is also where the actual bottlenecks for developing AI are) and where that power wants to go.
Hope In RSPs Is Misguided
The claim that "RSPs on their own can and will easily be discarded once they become inconvenient" seems far too strong to me — and again, if it were true, we should expect to see this with all costly voluntary safety/CSR measures that are made in other industries (which often isn't the case).
A few things that may make non-binding voluntary commitments like RSPs hard to discard:
There's also the fact that RSPs aren't strictly an invention of the AI labs. Plenty of independent experts have been involved in developing and advocating for either RSPs or risk evaluation procedures that look like them.
Here, I think a more defensible claim would be "The fact that RSPs may be easily discarded when inconvenient should be a point in favor of binding solutions like legislation, or at least indicate that they should be considered one of many potentially fallible safeguards for a defense-in-depth strategy"
An optimistic view of RSPs might be that they are a good way to hold AI corporations accountable - that public and political attention would be able to somehow sanction labs once they did diverge from their RSPs. Not only is this a fairly convoluted mechanism of efficacy, it also seems empirically shaky: Meta is a leading AI corporation with industry-topping amounts of compute and talent and does not publish RSPs. This seems to have garnered neither impactful public and political scrutiny nor hurt the Meta AI business.
Minor factual point: probably worth noting that Meta, as well as most leading AI labs, have now committed to publish an RSP. Time will tell what their policy ends up looking like.
It's true that the presence of, and quality of, RSPs at individual companies doesn't seem to have translated to any public/political scrutiny yet. I'm optimistic this can change (it's what I'm working on), or perhaps even will change by default once models reach a new level of capabilities that make catastrophic risks from AI an ever-more-salient issue among the public.
The downside of choosing an RSP-based legislative process should be obvious - it limits, or at least frames, the option space to the concepts and mechanisms provided by the AI corporations themselves. But this might be a harmful limitation: As we have argued above, these companies are incentivized to mainly provide mechanisms they might be able to evade, that might fit their idiosyncratic technical advantages, that might strengthen their market position, etc. RSP codification hence seems like a worse way to safe AI legislation than standard regulatory and legislative processes.
This is a question: my understanding is that the RSP model was specifically inspired by regulatory pathways from other industries, where voluntary measures like this got codified into what is now seen (in retrospect) as sensible policy. Is this true? I can't remember where I heard it, and can't find mention of it now, but if so, it seems like those past cases might be informative in terms of how successful we can expect the RSP codification strategy to be today.
That actually brings me to one last meta point that I want to make, which is that I am tempted to think that we are just in a weird situation where there are psychological facts about the people at leading profit-driven AI labs that make the heuristic of profit maximization a poor predictor of their behavior, and a lot of this comes down to genuine, non-financial concern about long-term safety.
Earlier I mentioned how even in a competitive market, you might see multiple corporations collectively acting in non-profit-maximizing ways due to non-financial incentives collectively acting upon the decision-makers at each those companies. Companies are full of humans who make choices for non-financial reasons, like wanting to feel like a good person, wanting to have a peaceful home life where their loved ones accept and admire them, and genuinely wanting to fix problems in the world. I think the current psychological profile of AI lab leaders (and, indeed, the AI lab employees that hold the "real power" under the hood) is surprisingly biased toward genuine concern about the risks of AI. Many of them correctly recognized, way before anyone else, how important this technology would be.
Sorry for the long comment. l do think AI labs need fierce scrutiny and binding constraints, and their incentives are largely not pointing in the right place and might bias them toward putting profit over safety — again, this is my main focus right now — but I'm also not ready to totally write off their ability to adopt genuinely valuable and productive voluntary measures to reduce AI risk.
Hey yanni,
I just wanted to return to this and say that I think you were directionally correct here and, in light of recent news, recommending jobs at OpenAI in particular was probably a worse mistake than I realized when I wrote my original comment.
Reading the recent discussion about this reminded me of your post, and it's good to see that 80k has updated somewhat. I still don't know quite how to feel about the recommendations they've left up in infosec and safety, but I think I'm coming around to your POV here.
Thank you for writing this criticism! I did give it a read, and I shared some of your concerns around the framing and geopolitical stance that the piece takes.
Regarding the OOM issue, you ask:
Order of magnitude of what? Compute? Effective compute? Capabilities?
I'll excerpt the following from the "count the OOMs" section of the essay:
We can decompose the progress in the four years from GPT-2 to GPT-4 into three categories of scaleups:
- Compute: We’re using much bigger computers to train these models.
- Algorithmic efficiencies: There’s a continuous trend of algorithmic progress. Many of these act as “compute multipliers,” and we can put them on a unified scale of growing effective compute. ”
- Unhobbling” gains: By default, models learn a lot of amazing raw capabilities, but they are hobbled in all sorts of dumb ways, limiting their practical value. With simple algorithmic improvements like reinforcement learning from human feedback (RLHF), chain-of-thought (CoT), tools, and scaffolding, we can unlock significant latent capabilities.
We can “count the OOMs” of improvement along these axes: that is, trace the scaleup for each in units of effective compute. 3x is 0.5 OOMs; 10x is 1 OOM; 30x is 1.5 OOMs; 100x is 2 OOMs; and so on. We can also look at what we should expect on top of GPT-4, from 2023 to 2027.
It's clear to me what Aschenbrenner is referring to when he says "OOMs" — it's orders of magnitude scaleups in the three things he mentions here. Compute (measured in training FLOP), algorithmic efficiencies (measured by looking at what fraction of training FLOP is needed to achieve comparable capabilities following algorithmic improvements), and unhobbling (as measured, or rather estimated, by what scaleup in training FLOP would have provided equivalent performance improvements to what was provided by the unhobbling). I'll grant you, as does he, that unhobbling is hand-wavy and hard to measure (although that by no means implies it isn't real).
You could still take issue with other questions —as you do — including how strong the relationship is between compute and capabilities, or how well we can measure capabilities in the first place. But we can certainly measure floating point operations! So accusing him of using "OOMs" as a unit, and one that is unmeasurable/detached from reality, surprises me.
Also, speaking of the "compute-capabilities relationship" point, you write:
The general argument seems to be that increasing the first two "OOMs", i.e. increasing compute and improving algorithms, the AI capabilities will also increase. Interestingly, most of the examples given are actually counterexamples to this argument.
This surprised me as well since I took the fact that capabilities have improved with model scaling to be pretty incontrovertible. You give an example:
There are two image generation examples (Sora and GANs). In both examples, the images become clearer and have higher resolution as compute is increased or better algorithms are developed. This is framed as evidence for the claim that capabilities increase as "OOMs" increase. But this is clearly not the case: only the fidelity of these narrow-AI systems increase, not their capabilities.
I think I might see where the divergence between our reactions is. To me, capabilities for an image model means roughly "the capability to generate a clear, high-quality image depicting the prompt." As you admit, that has improved with scale. I think this definition probably best reflects common usage in the field, so I do think it supports his argument. And, I personally think that there are deeper capabilities being unlocked, too — for example, in the case of Sora, the capability of understanding (at least the practical implications of) object permanence and gravity and reflections. But I think others would be more inclined to disagree with that.
Huh, interesting! I guess you could define it this way, but I worry that muddies the definition of "campaign target." In common usage, I think the definition is approximately: what is the institution you are raising awareness about and asking to adopt a specific change? A simple test to determine the campaign target might be "What institution is being named in the campaign materials?" or "What institution has the power to end the campaign by adopting the demands of the campaigners?"
In the case of animal welfare campaigns against foodservice providers, it seems like that's clearly the foodservice companies themselves. Then, in the process of that campaign, one thing you'll do is raise awareness about the issue among that company's customers (e.g. THL's "foodservice provider guide" which raised awareness among public institutions), which isn't all that different from raising awareness among the public in a campaign targeting a B2C company.
I suppose this is just a semantic disagreement, but in practice, it suggests to me that B2B businesses are still vulnerable, in part because they aren't insulated from public opinion—they're just one degree removed from it.
EDIT: Another, much stronger piece of evidence in favor of influence on B2B: Chicken Watch reports 586 commitments secured from food manufacturers and 60 from distributors. Some of those companies are functionally B2C (e.g. manufacturing consumer packaged goods sold under their own brand) but some are clearly B2B (e.g. Perdue Farms' BCC commitment).
Thanks for the comment! I agree with a lot of your thinking here and that there will be many asymmetries.
One random thing that might surprise you: in fact, the sector that animal groups have had the most success with is a B2B one: foodservice providers. For B2B companies, individual customers are fewer in number and much more important in magnitude — so the prospect of convincing, for example, an entire hospital or university to switch their multi-million dollar contract to a competitor with a higher standard for animal welfare is especially threatening. I think the same phenomenon might carry over to the tech industry. However, even in the foodservice provider case, public perception is still one of the main driving factors (i.e., universities and hospitals care about the animal welfare practices of their suppliers in part because they know their students/clients care).
Your advice about outreach to employees and other stakeholders is well-taken too :) Thanks!
Hey! Thanks for the comment - this makes sense. I'm the founder and executive director (that's why I made this post under my name!) and The Midas Project is a nonprofit, which by law entails that details about our funding will be made public in annual filings and such reports will be available upon request, and that our work has to exclusively serve the public interest and not privately benefit anyone associated with the organization (which is generally determined by the IRS and/or independent audits). Hope this assuages some concerns.
It's true we don't have a "team" page or anything like that. FWIW, this is clearly the norm for campaigning/advocacy nonprofits (for example, take a look at the websites for the animal groups I mentioned, or Greenpeace/Sunrise Movement in the climate space) and that precedent is a big part of why I chose the relative level of privacy here — though I'm open to arguments that we should do it differently. I think the most important consideration is protecting the privacy of individual contributors since this work has the potential to make some powerful enemies... or just to draw the ire of e/accs on Twitter. Maybe both! I would be more open to adding an “our leadership” page, which is more common for such orgs - but we’re still building out a leadership team so it seems a bit premature. And, like with funding, leadership details will all be in public filings anyway.
Thanks again for the feedback! It's useful.
Thank you!
You’re right that the main tasks are digital advocacy - but even if you’re not on social media, there are some direct outreach tasks that involve emailing and calling specific stakeholders. We have one task like that live on our action hub now, and will be adding more soon.
Outside of that, we could use all sorts of general volunteer support - anything from campaign recruitment to writing content. Also always eager to hear advice on strategy. Would love to chat more if you’re interested.
Good question! I basically agree with you about the relative importance of foundation model developers here (although I haven’t thought too much about the third point you mentioned. Thanks for bringing it up.)
I should say we are doing some other work to raise awareness about foundation model risks - especially at OpenAI, given recent events - but not at the level of this campaign.
The main constraint was starting (relatively) small. We’d really like to win these campaigns, and we don’t plan to let up until we have. The foundation model developers are generally some of the biggest companies in the world (hence the huge compute, as you mention), and the resources needed to win a campaign likely scale in proportion to the size of the target. We decided it’d be good to keep building our supporter base and reputation before taking the bigger players on. Cognition in particular seems to be in the center of the triple venn diagram between “making high-risk systems,” “way behind the curve on safety issues,” and “small enough that they can’t afford to ignore this.”
Btw, my background is in animal advocacy, and this is somewhat similar to how groups scaled there. i.e. they started by getting local restaurants to stop serving fois gras, and scaled up to getting McDonalds to phase out eggs from battery cages nationwide. Obviously we have less time with this issue - so I would like to scale quickly.
Thank you for writing this! Was really interesting to read. I'd love to see more posts of this nature. And it seems like you've done a lot for the world — thank you.
I have a couple questions, if you don't mind:
You write
I would love to hear your reasoning (pessimism about fulfillment? WAW looking better?) and what sort of evidence has convinced you. I think this is really important, and I haven't seen an argument for this publicly anywhere. Ditto about your skepticism of the organizations leading this work.
Did you mean to change one of the years in the two statements of this form?
I'd love to hear more about this. How much value do you think e.g. the median EA doing direct work is creating? Or, put another way, how significant an annual donation would exceed the value of a talented EA doing direct work instead?