Working on various aspects of Econ + AI.
In complete generality, you could write effective labor as
.
That is, effective labor is some function of the number of human researchers we have, the effective inference compute we have (quantity of AIs we can run) and the effective training compute (quality of AIs we trained).
The perfect substitution claim is that once training compute is sufficiently high, then eventually we can spend the inference compute on running some AI that substitutes for human researchers. Mathematically, for some ,
where is the compute cost to run the system.
So you could think of our analysis as saying, once we have an AI that perfectly substitutes for AI researchers, what happens next?
Now of course, you might expect substantial recursive self-improvement even with an AI system that doesn't perfectly substitute for AI labor. I think this is a super interesting and important question. I'm trying to think more about this question, but its hard to make progress because its unclear what looks like. But let me try to gesture at a few things. Let's fix at some sub-human level
If you assume say Cobb-Douglas, i.e.
where denotes the share of labor tasks that AI can do, then you'll pick up another in the explosion condition i.e. will become This captures the intuition that as the fraction of tasks an AI can do increases, the explosion condition gets easier and easier to hit.
Here is a fleshed out version of Cheryl's response. Lets suppose actual research capital is but we just used in our estimation equation.
Then the true estimation equation is
re-arranging we get
So if we regress on a constant and then the coefficient on is still as long as q is independent of .
Nevertheless, I think this should increase your uncertainty in our estimates because there is clearly a lot going on behind the scenes that we might not fully understand - like how is research vs. training compute measured, etc.
Note that if you accept this, our estimation of in the raw compute specification is wrong.
The cost-minimization problem becomes
.
Taking FOCs and re-arranging,
So our previous estimation equation was missing an A on the relative prices. Intuitively, we understated the degree to which compute was getting cheaper. Now A is hard to observe, but let's just assume its growing exponentially with an 8 month doubling time per this Epoch paper.
Imputing this guess of A, and estimating via OLS with firm fixed effects gives us with standard errors.
Note that this doesn't change the estimation results for the frontier experiments since the in just cancels out.
I spent a bit of time thinking about this today.
Lets adopt the notation in your comment and suppose that is the same across research sectors, with common . Let's also suppose common .
Then we get blow up in iff
The intution for this result is that when , you are bottlenecked by your slower growing sector.
If the slower growing sector is cognitive labor, then asympotically , and we get so we have blow-up iff .
If the slower growing sector is experimental compute, then there are two cases. If experimental compute is blowing up on its own, then so is cogntive labor because by assumption cognitive labor is growing faster. If experimental compute is not blowing up on its own then asympotically and we get . Here we get a blow-up iff .[1]
In contrast, if then F is approximately the fastest growing sector. You get blow-up in both sectors if either sector blows up. Therefore, you get blow-up iff .
So if you accept this framing, complements vs substitutes only matters if some sectors are blowing up but not others. If all sectors have the returns to research high enough, then we get an intelligence explosion no matter what. This is an update for me, thanks!
I'm only analyzing blow-up conditions here. You could get e.g. double exponential growth here by having and .
This is a good point, we agree, thanks! Note that you need to assume that the algorithmic progress that gives you more effective inference compute is the same that gives you more effective research compute. This seems pretty reasonable but worth a discussion.
Although note that this argument works only with the CES in compute formulation. For the CES in frontier experiments, you would have the so the A cancels out.[1]
You might be able to avoid this by adding the A's in a less naive fashion. You don't have to train larger models if you don't want to. So perhaps you can freeze the frontier, and then you get? I need to think more about this point.
Thanks for the insightful comment.
I take your overall point as the static optimization problem may not be properly specified. For example, costs may not be linear in labor size because of adjustment costs to growing very quickly or costs may not be linear in compute because of bulk discounting. Moreover, these non-linear costs may be changing over time (e.g., adjustment costs might only matter in 2021-2024 as OpenAI, Anthropic have been scaling labor aggressively). I agree that this would bias the estimate of . Given the data we have, there should be some way to at least partially deal with this (e.g., by adding lagged labor as a control). I'll have to think about it more.
On some of the smaller comments:
wages/r_{research} is around 0.28 (maybe you have better data here)
The best data we have is The Information's article that OpenAI spent $700M on salaries and $1000M on research compute in 2024, so the (assuming you meant instead of ).
The whole industry is much larger now and elasticity of substitution might not be constant; if so this is worrying because to predict whether there's a software-only singularity we'll need to extrapolate over more orders of magnitude of growth and the human labor -> AI labor transition.
I agree. might not be constant over time, which is a problem for both estimation/extrapolation and also predicting what an intelligence explosion might look like. For example, if falls over time, then we may have a foom for a bit until falls below 1 and then fizzles. I've been thinking about writing something up about this.
Are you planning follow-up work, or is there other economic data we could theoretically collect that could give us higher confidence estimates?
Yes, although we're not decided yet on what is the most useful to follow-up on. Very short-term there is trying to accomodate non-linear pricing. Of course, data on what non-linear pricing looks like would be helpful e.g., how does Nvidia bulk discount.
We also may try to estimate with the data we have.
Yep. We are treating L as homogenous (no differentiation in skill, speed, etc.) I'm interested in thinking about quality differentiation a bit more.