TD

Tom_Davidson

1203 karmaJoined

Comments
57

Thanks, good Q.

 

I'm saying that if there is such a new paradigm then we could have >10 years worth of AI progress at rates of 2020-5, and >10 OOMs of effective compute growth according to the old paradigm. But that, perhaps, within the new paradigm these gains are achieved while the efficiency of AI algs only increases slightly. E.g. a new paradigm where each doubling of compute increases capabilities as much as 1000X does today. Then, measured within the new paradigm, the algorithmic progress might seem like just a couple of OOMs so 'effective compute' isn't rising fast, but relative to the old paradigm progress (and effective compute growth) is massive.

Fwiw, this X thread discusses the units I'm using for "year of Ai progress", and Eli Lifland gives a reasonable alternative. Either work as a way to understand the framework.

Can you give more examples of where ppl are getting this wrong?

I support the 80k pivot, and the blue dot page seems ok (but yes, I'd maybe prefer smg more opinionated).

While these concerns make sense in theory I'm not sure whether it's a problem in practice

Nice!

I think that condition is equivalent to saying that A_cog explodes iff either

  • phi_cog + lambda > 1 and phi_exp + lambda > 1, or
  • phi_cog > 1

Where the second possibility is the unrealistic one where it could explode with just human input

Agree that i wouldn't particularly expect the efficiency curves to be the same. 

But if the phi>0 for both types of efficiency, then I think this argument will still go through.

To put it in math, there would be two types of AI software technology, one for experimental efficiency and one for cognitive labour efficiency: A_exp and A_cog. The equations are then:

dA_exp = A_exp^phi_exp F(A_exp K_res, A_cog K_inf)

dA_cog = A_cog^phi_cog F(A_exp K_res, A_cog K_inf)

 

And then I think you'll find that, even with sigma < 1, it explodes when phi_exp>0 and phi_cog>0.

Although note that this argument works only with the CES in compute formulation. For the CES in frontier experiments, you would have the  so the A cancels out.

Yep, as you say in your footnote, you can choose to freeze the frontier, so you train models of a fixed capability using less and less compute (at least for a while). 

However, if , then a software-only intelligence explosion occurs only if . But if this condition held, we could get an intelligence explosion with constant, human-only research input. While not impossible, we find this condition fairly implausible. 

 

Hmm, I think a software-only intelligence explosion is plausible even if  , but without the implication that you can do it with human-only research input.

The basic idea is that when you double the efficiency of software, you can now:

  • Run twice as many experiments
  • Have twice as much cognitive labour

So both the inputs to software R&D double.

 

I think this corresponds to:

dA = A^phi F(A K_res, A K_inf)

 

And then you only need phi > 0 to get an intelligence explosion. Not phi > 1.

 

This is really an explosion in the efficiency at which you can run AI algorithms, but you could do that for a while and then quickly use your massive workforce to develop superintelligence, or start training your ultra-efficient algorithms using way more compute.

Thanks for this!

 

Let me try and summarise what I think is the high-level dynamic driving the result, and you can correct me if I'm confused.

 

CES in compute.

Compute has become cheaper while wages have stayed ~constant. The economic model then implies that:

  • If compute and labour were complements, then labs would spend a greater fraction of their research budgets on labour. (This prevents labour from becoming a bottleneck as compute becomes cheaper.)

Labs aren't doing this, suggesting that compute and labour are substitutes. 

 

CES in frontier experiments.

Frontier experiments have become more expensive while wages have stayed ~constant. The economic model then implies that:

  • If compute and labour were complements, then labs would spend a greater fraction of their research budgets on compute. (This relieves the key bottleneck of expensive frontier experiments.)

Labs are indeed doing this, suggesting that compute and labour are indeed complements. 

(Though your 'Research compute per employee' data shows they're not doing that much since 2018, so the argument against the intelligence explosion is weaker here than I'd have expected.)

The  condition is exactly what Epoch and Forethought consider when they analyze whether the returns to research are high enough for a singularity.[5]

Though we initially consider this, we then adjust for compute as an input to R&D and so end up considering the sigma=1 condition. It's under that condition that I think it's more likely than not that the condition for a software-only intelligence explosion holds

So you think ppl doing direct work should quit and earn to give if they could thereby double their salary? Can't be the right recommendation for everyone!

I like the vividness of the comparisons!

A few points against this being nearly as crazy as the comparisons suggest:

  • GPT-2030 may learn much less sample efficiently, and much less compute efficiently, than humans. In fact, this is pretty likely. Ball-parking, humans do 1e24 FLOP before they're 30, which is ~20X less than GPT-4. And we learn languages/maths from way fewer data points. So the actual rate at which GPT-2030 itself gets smarter will be lower than the rates implied. 
    • This is a sense of "learn" as in "improves its own understanding". There's another sense which is "produces knowledge for the rest of the world to use, eg science papers" where I think your comparisons are right. 
  • Learning may be bottlenecked by serial thinking time past a certain point, after which adding more parallel copies won't help. This could make the conclusion much less extreme.
  • Learning may also be bottlenecked by experiments in the real world, which may not immediately get much faster.
Load more