The tables have turned on AI sceptics

The AI revenue growth we've seen so far is compatible with several different explanations, including an AI investment bubble and narrow AI applications that are economically useful but will not lead to AGI anytime soon. Professional investors and financial analysts are generally split between these two camps. Only a small minority believe in near-term AGI.

Some criticisms of the famous METR time horizons graph:

As you mentioned, some of the problems and limitations of the METR time horizons graph are sometimes (but not always) clearly disclosed by METR employees, including the CEO of METR. However, note the wide difference between the caveated description of what the graph says and the interpretation of the graph as a strong indicator of rapid, exponential improvement in general AI capabilities.
Gary Marcus, a cognitive scientist and AI researcher, and Ernest Davis, a computer scientist and AAAI fellow, co-authored a blog post on the METR graph that looks at how the graph was made and concludes that “attempting to use the graph to make predictions about the capacities of future AI is misguided”.
Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, published a detailed breakdown of some of the problems with METR’s methodology. He concludes that it’s “impossible to draw meaningful conclusions from METR’s Long Tasks benchmark” and that the METR graph “contains far too many compounding errors to excuse”. Witkin calls out a specific tweet from METR, which presents the METR graph in the broad, uncaveated way that it’s often interpreted by believers in near-term AGI. He calls the tweet “an uncontroversial example of misleading science communication”. In a response to a comment on that post asking how much we should update our views based on the METR graph, Witkin responded, "to be very clear I am in fact claiming that the proper update is zero."

I'm just summarizing the conclusions here, not the substance of the critiques. I recommend that people go and read the critiques to how the authors reach these conclusions.

I guess the point of the expert survey you cited was to explain that it does not support the idea of near-term AGI, right? I was confused because the title and introduction strongly states that the evidence has turned in favour of near-term AGI, but then you say that 2 out of the 4 pieces of evidence you cite do not support the idea of near-term AGI. I think you're just trying to do a general survey of the evidence, both the convincing and unconvincing evidence, right?

Something I changed my mind about after looking into both the AI Impacts survey and the Forecasting Researching Institute's LEAP survey (as I wrote about here) is that survey results seem to be super sensitive to survey design, even choices in survey design that seem small to the designers, and that they don't anticipate having an impact. I'm not sure these kinds of surveys really matter that much, anyway, but I'm at least more interested in surveys where the designers are careful about these factors that can bias the results. The effects are not small, either. In one case, the result was 750,000 times higher or lower depending on how the question was posed.

I agree that Bio Anchors is also not convincing evidence of anything, for the reasons explained here.

Overall, this post is a bit weird because the title and intro make a super strong claim — the tables have turned! — but then the body doesn't cash the cheque that the title and intro write. The new evidence that has turned the tables on AI skepticism is just AI revenue and the METR graph? So, if you agree that the METR graph has been debunked at this point, then it's just AI revenue. And what does AI revenue really show? Can narrow AI not make a lot of money? Are you really prepared to defend that claim? Have at it!

Maybe the claim is something really specific, which is that if you take AI revenue growth over the last 3 years and extrapolate the same rate of growth indefinitely, you end up with some ridiculously large number, and for that number to be true, we would need to have something like AGI. But you can't just take any trend and extrapolate it indefinitely. You need to have some explanation of what's causing the trend and whether it will continue or not. When you step on the accelerator of your car, extrapolating that trend forward indefinitely means you'll eventually exceed the speed of light. But we don't just extrapolate things forward, we think about cause and effect.

You could look at all sorts of industries (like SaaS) or companies (like Tesla) during a few years when growth is super fast, extrapolate that forever, and conclude that one day they will account for 100% of gross world product and take over the entire world economy. But we assume this won't happen because we understand what will prevent this from happening, and we also don't know about anything that would cause it to happen. So, will AI revenue increase until the Singularity happens? That depends on the technology. So, what will happen with the technology? Now we're back to square one! Looking at a chart of AI revenue doesn't settle anything. Will the chart go asymptotic into AI heaven? Or will it level out, or even crash? The answer to that question is not in the chart. It's in the world.

Extrapolation of past trends with no causal explanation of why the trend will continue is not empiricism! It is mysticism! It amounts to saying: we don't know what's happening or why or how, but, somehow, we know what will happen. This is not science. This is not financial analysis. This is not anything.

A facetious graph from The Economist extrapolating when the first 14-bladed razor will arrive:

My own facetious graph:

(Why do you expect this trend not to continue?)

JoshYou

To understand just how unprecedented Anthropic's revenue growth is, I think it's helpful to consult this analysis/revenue projection for OpenAI published late last year:

Because OpenAI’s growth rates are unprecedented, we can’t compare it to other software companies in a hypergrowth phase when their revenues exceed $4 billion per annum. There are none!
Instead, we look at proxies. Data from investor ICONIQ of fast-growing software companies, once their revenues exceed $50 million (yes, m-illion), shows that for every doubling of revenue, their growth rates tend to drop by a third to a half. In other words, we’re flying blind. Their AI companies have broken all records, so I’ve just assumed a similar slowdown in growth rates.
Traditional software companies see growth rates spike, then plateau after 1-2 years of product-market fit. However, OpenAI and Anthropic have maintained over 100% growth well into the multi-billion-dollar scale, something previously thought impossible.

So, it's very unusual, bordering on unprecedented, for a company to grow its revenue faster than 100% per year when it's already making multiple billions of annualized revenue. In OpenAI's case, it grew by around 200-300% in 2025; Anthropic's was much faster at around 800% in 2025, but at the time this could have been explained away as catch-up growth from a smaller competitor.

What happened in 2026? Anthropic tripled its revenue in one quarter. Annualized, this would be an 8000% growth rate (~80x per year)! Now, it would indeed be stupid to project this growth rate to the entire rest of the year, which would mean $800B annualized, approximately the greatest of any company in the world. But there's yet another report that Anthropic is imminently approaching $45 billion as of early May, which is consistent with this trend! (1.5x growth in about one month) Anthropic's 800% growth rate last year was already basically unprecedented, and now it's growing 10 times faster than that, at a much higher revenue baseline! This is going to slow, maybe in the next month, or perhaps two, or three. But after that, how long will Anthropic continue to grow at >=10x annually? After a year of this, Anthropic would be comparable to the largest tech giants, and it seems pretty safe to forecast that Anthropic will still be growing at over 100% per year at that point.

And the models are still going to get better!

JoshYou

As an addendum, I don't think Anthropic's revenue growth this year proves that transformative AI is imminent, though I do think it is strongly suggestive. But it has ruled out that LLMs are simply the next of several major tech advances that have happened in my lifetime, such as the Internet or smartphones, which led to c. $1T in annual revenue for the leaders in those industries. LLMs will in fact be much bigger than that, absent some major intervening event such as a enormous catastrophe or global AI moratorium.

I love this quote from a TED Talk given by the physicist David Deutsch:

So, in science, two false approaches blight progress. One's well-known: untestable theories. But the more important one is explanationless theories. Whenever you're told that some existing statistical trend will continue but you aren't given a hard-to-vary account of what causes that trend, you're being told a wizard did it.

What’s the causal explanation for why AI revenue is happening? What’s the causal explanation for why it will supposedly continue sustainably, long-term? What theory or hypothesis explains the data, and would justify extrapolating the current statistical trend far into the future?

There are over 50,000 publicly traded companies on the global stock market. The NASDAQ has existed for 55 years; the New York Stock Exchange for 234 years. We have lots of data on stock prices.

Here’s an important fact about stocks: you can’t just look at the price graph for a stock over the last year, or the last three years, extrapolate this price movement forward for the next three years, and buy the stock (or short it) on that basis. You can’t just extrapolate the trend long-term. It doesn’t work.

You need a causal explanation, an investment thesis, about why the stock will go up or down, and by how much. What’s your thesis? And what’s the evidence for that thesis? What’s the analysis behind it?

From what I can tell, the vast majority of economists, professional investors, and financial analysts do not believe that LLMs will lead to AGI or transformative AI within the next decade. Yet they’re well-aware of the revenue growth of AI companies over the last three years. Many professional investors — maybe about half — think AI is in a bubble. What gives? Are they not seeing these graphs? Are they not looking at the graphs hard enough? If the statistical trend of AI revenue growth over the last three years is going to continue for the next five to ten years or more, how could they think this? It’s a real mystery!

If you think trends always continue!

(Much more on this topic here.)

-2

Here’s a longer and more specific explanation of why the revenue growth thing makes no sense.

In 2023, Anthropic's annualized revenue reached $100 million. Now, in 2026, Anthropic recently announced its annualized revenue has rapidly grown to $30 billion. This is an increase of 300x over 3 years. If Anthropic were to grow revenue at the same rate for another 3 years, in 2029, it would hit $9 trillion in annualized revenue. After 6 years, in 2032, it would hit $2.7 quadrillion in annualized revenue.

For comparison, gross world product (the combined GDP of all countries) is only $111 trillion. For Anthropic to sustain this growth rate for 6 years, the world economy would need to grow more than 24x. Otherwise, it’s impossible.

Gross world product grew at an average rate of 3.2% per year from 2010 to 2019, according to the UN. It averaged 3.8% per year over the 20 years from around 1993 to 2023, according to the IMF. This growth rate would need to average 54% over the next 6 years for gross world product to grow from $111 trillion now to $2.7 quadrillion in 2032. Obviously, economists are not forecasting this.

This is another illustration of how extrapolating statistical trends forward indefinitely doesn’t make sense and doesn’t work. Anthropic’s revenue growth and global economic growth are both statistical trends. Extrapolating them both forward leads to a logical contradiction — we get wildly different numbers for gross world product in 2032. We have to make a deeper analytical decision about what we think will happen, rather than just extrapolate trends forward. Just extrapolating trends doesn’t get you anywhere.

Everyone intuitively understands why simple extrapolation would be problematic in many, many, many cases. Jersey Mike’s Subs grew revenue from $2.68 billion in 2022 to $4.2 billion in 2025. That’s an annual growth rate of 15%. If this rate of growth continued, in 30 years its revenue would reach $36.7 trillion, exceeding the current GDP of the United States. Is Jersey Mike’s going to grow larger than the current United States economy in 30 years? Obviously not. These are the sort of absurd results you get if you just extrapolate statistical trends forward indefinitely.

People have a causal hypothesis for why Anthropic will grow revenue into the quadrillions within a decade: artificial general intelligence! But the flaw is in using Anthropic’s revenue growth rate as evidence for that hypothesis. For instance, let’s say I want to argue Jersey Mike’s Subs will invent artificial general intelligence within the next 30 years. I can point to Jersey Mike’s revenue growth and say, see, if we extrapolate the current trend, Jersey Mike’s revenue will grow larger than the current U.S. economy within 30 years, and larger than the current world economy within 40 years. The only thing that could explain this trend, I could argue, is if Jersey Mike’s invents AGI. Therefore, this trend is evidence that Jersey Mike’s will invent AGI.

Is Jersey Mike’s Subs' 15% annual growth rate evidence that it will invent AGI? No! Of course not! Obviously the trend won’t continue that long, and obviously the outcome you get by extrapolating the trend that long won’t be the real outcome. It’s not that AI companies are somehow special in that if you follow statistical trends through their indefinite continuation, you get radical results. You get radical results if you do that for fast food restaurants! This is not a feature of AI companies, it’s a feature of blind extrapolation!

If you believe Anthropic (or OpenAI, or Google DeepMind) will invent AGI within the next decade, and if you want to try to convince skeptics of this — including the many skeptical economists, financial analysts, and professional investors, who have no problem understanding companies’ revenue growth — it will not do to extrapolate revenue growth until it gets to an amount that would be impossible without AGI and then work backwards from that to the conclusion AGI will be invented. Equally possible, and more likely, AGI will just not be invented and revenue won’t grow that much. You have to argue for your causal hypothesis — that Anthropic (or another company) will invent AGI soon — using other kinds of evidence. Extrapolating revenue growth isn’t evidence. Not for Anthropic, and not for Jersey Mike’s.

Edited on 2026-05-21 at 05:45 UTC: Either the evidence to support Anthropic getting to $10 trillion or $100 trillion or $1 quadrillion in annual revenue is a) extrapolation of the statistical trend over the last 3 years, in which case it isn't evidence for the reasons described above — we can't just extrapolate statistical trends — or b) not an extrapolation of the statistical trend over the last 3 years but something else, e.g., a theory about the nature of LLMs, or a hypothesis about how scaling LLMs will get to AGI, in which case I refer to my original point that we need a causal explanation of what is happening to make a forecast, and can't just extrapolate statistical trends.

If you already believed back in 2023 that LLMs would soon scale to AGI, then maybe the last 3 years of revenue growth feels like confirmation of that view. I don't know. I can't speak to that. But the claim that the AI revenue we see now should be compelling to skeptics of near-term AGI and convert them to the near-term AGI view doesn't really make sense. There are logical steps missing. You need to fill in the argument by making the case that the only or best or most likely explanation for the AI revenue we've seen so far is that AGI will be invented soon. You need to make this case in such a way that is convincing to skeptics who don't already share your views about AI, LLMs, AGI, and so on. In other words, you need an additional argument about why your theory is the one that best fits the data. You can't just point to the data as automatic support for your theory, especially when, e.g., most economists, financial analysts, AI experts, and superforecasters seem to opt for a different theory.

JoshYou

I do have an implicit premise here that LLMs are the sort of thing where as they become more useful, as tracked by revenue, they are more AGI-like. If Jersey Mike's was growing as fast as Anthropic at the size of Anthropic, that would be a sign of an imminent and dramatic social and economic changes, but those changes wouldn't be AGI.

I think if LLMs were making $10T or $100T in revenue, close to the size of the current world economy, while still apparently growing and progressing quickly, that would be a strong sign that they were AGI, or had very many of the important elements of AGI, or were otherwise highly transformative. If that happens, it is very unclear what would happen subsequently to AI revenues or to the economy in general. So I'm not appealing to blind and indefinite extrapolation, I'm appealing to the growing likelihood that LLMs will reach the revenue level that would make you think it is pretty much AGI.

So your counterargument works a lot less well as revenue levels get closer to these "evidence of AGI" levels, and revenue is growing extremely quickly with little sign, or in this case negative signs, of slowing down.

Denkenberger🔸

AI estimates that global knowledge work is ~$35T/yr and that the value of AI is ~6x the revenue (though that could change). That would imply we would only need ~$6T/yr AI revenue to substitute for all knowledge work. Now it is true that AI would be affecting the capital side of gross world product (GWP) (~$60T/yr), and that the GWP would grow by then. But there is the time delay of diffusion, indicating that the impact of the AI at the time would be much larger eventually. So I think even $1-3T/yr revenue with strong growth would be fairly strong evidence for AGI/TAI.

Would you be willing to agree to a bet on this? Anthropic’s revenue has grown at a compound annual growth rate (CAGR) of 570% over the last 3 years. If this trend continued for 1 more year, then Anthropic would hit $200 billion in annualized revenue less than a year from now.

However, Anthropic’s own revenue projection is for $150 billion in 2029. If we infer from Anthropic’s valuation, its investors are implicitly pricing in much slower revenue growth over the next 3 years than a 570% CAGR.^[1]

As an additional data point, an HSBC analyst projected $241 billion in revenue for Anthropic in 2030. Coatue Management predicted $200 billion in revenue in 2031.

So, I propose a bet: if by June 1, 2027, Anthropic has at least $200 billion in annualized revenue, you win. If by June 1, 2027, Anthropic has less than $200 billion in annualized revenue, I win.

I would be happy to bet for a nominal amount like $20 to the charity of the winner’s choice.

I'm also open to shorter-term bets. For instance, I would bet that Anthropic will not hit $125 billion in annualized revenue by the end of 2026 (which is what extrapolation would imply).

^{^}
A sustained 570% CAGR would imply Anthropic will hit $9 trillion in annualized revenue in 2029. Let’s apply a super conservative revenue multiple, 1.0 (unreasonably low). Let’s also apply a super steep discount rate, 25% (way too high for a normal megacap tech company). Even with these assumptions, we still get a $4.6 trillion valuation for Anthropic. Anthropic’s current valuation is under $1 trillion.

As Scott notes, the maximum-entropy heuristic if you make no hypothesis about the explanation for a statistical trend is Lindy's law, the trend continuing for as long as it continued so far. So you might expect both the Anthropic and the Jersey Mike's trend to continue until ~2028, but not until 2050.

The Lindy effect is just a rule of thumb coined by some comedians in a restaurant called Lindy’s. Per Wikipedia:

The concept is named after Lindy's delicatessen in New York City, where the concept was informally theorized by comedians: a show running only two weeks would be expected to last another two weeks, while a show that has lasted two years could expect a further two-year run.[3][4]

It’s not a scientific principle. It’s not empirically true. (Scott Alexander doesn’t cite any evidence to support it.)^[1]

One area where we can see that the Lindy effect is empirically false is stock prices. If it were true, you could buy a portfolio of the 100 stocks that have gone up the most over the last 3 years, hold them for 3 years, and beat the S&P 500. But that doesn’t work.^[2]

Equity research analysts and institutional investors don’t approach financial modelling or earning estimates through blind extrapolation, or by applying a rule of thumb like the Lindy effect. They think causally, often in great detail, about companies’ future performance. And, even then, accurate forecasting is really hard.^[3]

Just by looking at Anthropic’s valuation, you can tell that investors are not baking in another 300x revenue growth in the next 3 years. For that to be true, Anthropic would need to be valued in the tens of trillions. (Multiply $9 trillion by even a low revenue multiple like the average for the S&P 500 and then apply a steep discount rate like 15%, you still get a valuation over $20 trillion.)

According to a document leaked to journalists, Anthropic’s own internal projection is around $150 billion in revenue in 2029. This is “only” a 5x increase from current annualized revenue, far below the 200-300x we’d get from extrapolation.^[4]

We so plainly and effortlessly see all the many, many, many places where blind extrapolation doesn’t work that we completely forget this when we look at the more ambiguous, uncertain cases. If you’ve just driven 100 metres toward a wall that is now 10 metres ahead of you, you obviously know you can’t just apply the Lindy effect and think you’re gonna be able to drive another 100 metres. If you ate two sandwiches today and one sandwich yesterday, maybe you’ll eat four sandwiches tomorrow, but you’re not likely going to eat eight the next day (which the Lindy effect would imply), and you’re definitely not going to eat 1,073,741,823 sandwiches a month from now.

Somehow, when it comes to certain technical topics, this all goes out the window. We forget the millions of cases where extrapolating trends just doesn’t work, and we say that graphs just have to keep going up and to the right. But why?

^{^}
Edit (2026-05-26 at 23:25 UTC):
There has been a small amount of serious, academic discussion of the Lindy effect in certain narrow, niche topic areas, but, as far as I know, virtually no one (or literally no one) in academia or science agrees with or even takes seriously that the Lindy effect is a generally or universally applicable rule you can use to predict trends — across all domains, across the whole universe? — with any accuracy.
Even the original concept raised informally by comedians is dubious. When do you decide to measure a show's duration? Whenever you decide to measure, you're effectively deciding that's the halfway point. Measure after the show's first day, and you'll be reliably wrong. You'll predict all shows last 2 days. Continue measuring every day and updating your prediction, and you'll also be reliably wrong, since for literally every single show, you'll predict it's 50% through its run on the day it closes. So, when do you decide to measure?
^{^}
Edit (2026-05-26 at 23:25 UTC):

Pay close attention to what is being claimed here (and what isn't). Specifically, whether or not momentum investing can be reliably used to attain alpha — dubious, but let's leave that aside — what's straightforwardly empirically true is that stocks don't just keep going up (or down) by the same amount in 3-year periods that they did in the previous 3-year period.
If this example is too confusing or not intuitive or not helpful, just move on to another example. There are literally millions of examples where the Lindy effect is false, and where blind extrapolation doesn't work. This example assumes a bit of background in the topic area and might be too complex or too niche to be a good example of the general point.
^{^}
Edit (2026-05-26 at 23:25 UTC):

I'm not talking here about day trading, algorithmic trading, or high-frequency trading. This pertains to financial analysts and investors who actually make forecasts of companies' future financial performance.

^{^}

Edit (2026-05-26 at 23:25 UTC):

If you don't believe Anthropic, its investors, or financial analysts, but do trust LLM-based chatbots — well, yeesh, you're really getting things backwards — Claude, ChatGPT, and Google Gemini all say it doesn't make sense to apply the Lindy effect to Anthropic's revenue. But I make this point only to appease people who disbelieve reliable sources and believe unreliable sources. AI chatbots are unreliable, frequently wrong, and can't be trusted. Some funny and striking examples of this: ChatGPT on EA and massive disvalue, evil simulators, its cult status, and scheming billionaires.

One area where we can see that the Lindy effect is empirically false is stock prices. If it were true, you could buy a portfolio of the 100 stocks that have gone up the most over the last 3 years, hold them for 3 years, and beat the S&P 500. But that doesn’t work.

... your link straightforwardly show the opposite? Momentum investing is moderately profitable in the first years before reverting to the mean as the momentum subside.

Similarly, you can find plenty work on the subject on the wiki page for the Lindy effect, notably connections with Zipf's law and the Pareto distribution. (The term "Lindy effect" itself was coined by Nassim Nicholas Taleb.)

Equity research analysts and institutional investors don’t approach financial modelling or earning estimates through blind extrapolation, or by applying a rule of thumb like the Lindy effect. They think causally, often in great detail, about companies’ future performance. And, even then, accurate forecasting is really hard.

True and neither Scott nor I said otherwise. You should have a broad prior distribution and after gaining more evidence about the gears level you should update. On the other hand it is also, uh, not true that quants can ever afford to be always strictly rigorous and not using rules of thumbs of similar caliber.

Exponential growth in time horizon with a ~4mo doubling time has been confirmed by other organizations on very different distributions (1, 2, edit: 3). Furthermore, it correlates very well with the Epoch Capabilities Index.

The blog post by the Australian AI safety organization says, “We apply METR’s time-horizon methodology…” How would this address the criticisms raised of METR’s methodology?

At a glance, the FutureTech pre-print makes some interesting choices, e.g., task quality is only scored up to above-average and above-average gets a perfect score, and acknowledges some of the limitations with their methodology, e.g., all tasks used for this experiment must contain all relevant information in the LLM prompt. (Is that realistic for most work tasks?) I wonder if this pre-print will be submitted for publication in a journal? FutureTech seems to be one of those weird MIT hybrids between an academic research group and a management consultancy. I’m not sure if they’ve ever published a peer-reviewed paper.

[Edit on 2026-05-14 at 18:56 UTC: After reading Peter Slattery’s comment below, I spent a few more minutes looking into it, and I’m still not sure what FutureTech is or what kind of stuff they publish. If someone knows and can explain it, that would be helpful. I could spend more time and get to the bottom of it, but I don’t want to spend more time on it right now.

Please also note the EA Forum team has limited my ability to reply to comments, so I can’t reply further. But if you want to continue the discussion, I’m reachable here.]

Someone could take the time to do a deep dive into the FutureTech pre-print and write a review, but I wonder if that’s a good use of anyone’s time? Is there a reason to think this group publishes high-quality research that is worth getting into?

If someone thinks it’s worthwhile, and they also think the pre-print is unlikely to be submitted for peer review, one option would be to ask the EA organization called The Unjournal to commission a review by an external expert.

Peter Slattery 🔸

Are you sure you are thinking of the correct organization when you say:

FutureTech seems to be one of those weird MIT hybrids between an academic research group and a management consultancy. I’m not sure if they’ve ever published a peer-reviewed paper.

I say that because the lab has many publications, including in top peer-reviewed journals like Science. For more context, here is the publications page and here is the bio for Neil Thompson, the head of the lab:

Dr. Thompson’s work has over 3000 citations with an h-index of 21 across his publication portfolio, including such well known and renowned papers as Expertise, The Computational Limits of Deep Learning, and There’s plenty of room at the Top: What will drive computer performance after Moore’s law? Dr. Thompson has been invited to present his work and recommendations to Congressional Staffers (House and Senate), the US Federal Reserve, the Pentagon, National Security Staff, the Department of Commerce, the Department of Energy, Brookings Institute, and most recently presented at a World Summit on the same program as the Prime Minister of India and Former Prime Ministers of England and Australia. With experience in 80+ countries, Dr. Thompson’s research and impact is on a global scale.

Peter Slattery 🔸

Oh, and the preprint will almost certainly be submitted for peer review, but it might take 1-2 years before it is published.

Okay, if we suspect peer review will eventually happen but the process will be very slow, then it might still be worthwhile to commission an external review, whether through The Unjournal. I once actually did this with my own money just because I was really, desperately curious about a pre-print published by a company that would never be submitted for peer review. I think it ended up costing me $400-500, something like that.

Whether it’s worth the time, effort, and money depends on how much people actually care about this pre-print and think it’s important. Does anyone actually, sincerely think whether we’re on the cusp of apocalypse/utopia hangs on whether this pre-print is correct or not? How much is this particular pre-print actually a crux for anyone?

If it is actually a crux on which people’s expectations around AGI within the next decade hang, then it’s probably worth paying the $500 or $1,000 or whatever it costs to do a review. But if it isn’t on anyone’s top 10 list or even top 20 list of most important pieces of evidence for near-term AGI, then I guess… it probably doesn’t matter whether the pre-print’s findings are true or false.

The argument from an AI safety perspective about why it would be a cost-effective use of funds is straightforward. First, knowing whether the pre-print’s findings stand up under scrutiny are important insofar as the informational content of the pre-print is important for understanding AI. Second, there is currently very little high-quality evidence, and especially very little academic-calibre evidence, to present to skeptics who want to be convinced that an existentially consequential AGI is on the horizon. What could convince them? Well, potentially scientific evidence of this sort. And if your hopes or plans for AI safety depend on, or would be greatly helped by, the ability to bring skeptics on board, well, then it’s worth a relatively small investment to marshal evidence to convince skeptics.

Another potential candidate for external review is the Remote Labor Index pre-print. But the same caveat applies.

How would this address the criticisms raised of METR’s methodology?

How would this not? It doesn't use the same tasks nor does it use the same human baseliner panel as the HCAST dataset.

Robi Rahman🔸

-2

This is a very impressively daft comment.

Revenue and benchmarks and razor blades aside, let's look at the object level. Have you used a frontier model recently? How smart were they three years ago and how smart are they now?

Ben_West🔸

Downvoted; I think this comment was unnecessarily rude.

Charlie_Guthmann

I disagree, Yarrow seems to be quite solider mindset on this issue for a while and it's annoying. I don't believe they are coming from a place of good faith (obv not confident).

Robi Rahman🔸

shrug that's fine, I don't mind the downvotes, but can we also enforce epistemic standards along with niceness? The above comment is refuting a strawman while not engaging with the rate of increase of AI capabilities which are the crux of the post.

Charlie_Guthmann

Agreed, and honestly I think it's ok to be a little rude if people seem to be acting in bad faith.

Jacob_Peacock

Good post—I appreciate this synthesis of evidence and agree with your conclusions. One (minor) point of disagreement:

Likewise, they expected the labor force participation rate to be 55%, down only slightly from today’s roughly 61%.

I’d characterize a 6 percentage point decline as fairly substantial rather than “only slightly.” In absolute terms, 6pp may not sound like much, but relative to historical variation in labor force participation, it’s quite large.

Since measurement began in the 1940s, the labor force participation rate has remained within a relatively narrow 58–67% band. Even the COVID shock was associated with only about a 3pp decline. That historical range also spans the transition from a predominantly male workforce to much higher female labor force participation.

Ben_West🔸

And credit to the AI skeptics that they seem to mostly have updated in light of the new evidence (or at least claimed that they never actually believed in long timelines, which is maybe less noble, but ends up in the same place).

Vasco Grilo🔸

Hi Stefan. I liked your post. I remain open to bets against short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views?

Craig Green 🔸

KOUADIO MENIANSOU NOEL ARTHUR

Thank you for writing this survey of the evidence. I initially assumed from the title that you were going to present evidence that the attitudes of the general public are changing towards AI, rather than arguments intended to effect a change in their attitudes.

I feel compelled to note that Anthropic and OpenAI report ARR differently, making direct comparisons difficult. So, that chart could be misleading. For the purposes of this discussion, it is probably fine, as it captures the acceleration of growth of these companies, and we aren't trying to directly compare them to each other.

I do think that current-generation AI capabilities are already at the point where they could drive significant growth in the economy with an adequate inference infrastructure and time to develop workflows. Basically, what I'm trying to say is that the revenue growth of these companies may not be direct evidence that AGI is imminent in the technical sense. It seems possible to me that AGI could be stalled by technical challenges even as current-generation and similar AIs drive significant economic growth.

This is a rigorous and well-structured argument, and I find the revenue growth framing particularly compelling it is the least theoretically laden of the three empirical anchors you present, and arguably the hardest to dismiss.
I want to add a perspective that I think is largely absent from timeline discussions: what these timelines mean when you're not in San Francisco, London, or Beijing.
I'm based in Abidjan, Côte d'Ivoire. I work in governance and program management, and I've spent the last few years watching how technology including much more mundane technology than AGI lands in contexts where infrastructure is fragile, institutions are under-resourced, and regulatory capacity is almost nonexistent. What I observe is a consistent pattern: the capability arrives long before the governance does. And the communities that bear the consequences of that gap are rarely the ones who were part of the conversation about whether to deploy.
Your point about METR's benchmarks not generalizing to "messier, open-ended tasks" resonates strongly from where I sit. In Côte d'Ivoire, almost every consequential task is messy and open-ended. Agricultural supply chains, local health delivery, land tenure disputes, budget transparency these are exactly the domains where AI is most likely to be deployed next, and least likely to perform as cleanly as benchmarks suggest. The failure modes in these contexts are not theoretical.
This leads me to a concern that I think deserves more attention in timeline discussions: the question is not only when transformative AI arrives, but who governs its deployment in the interim. The revenue growth you cite is overwhelmingly concentrated in a handful of countries. The regulatory frameworks being built right now in the EU, the US, the UK are being built without meaningful input from the regions most likely to be on the receiving end of AI deployment decisions made elsewhere.
Whether timelines are short or long, that governance gap is already open. And closing it requires starting now not after we've resolved the empirical debate about 2035 versus 2052.
I'd be curious whether others in this community are thinking seriously about what EA-aligned AI governance work looks like when it's designed for and by the Global South, rather than exported to it.

Hi Kouadio. Just want to let you know that your comments don't have paragraph breaks between the paragraphs. Maybe you are copying and pasting from another app and the formatting is getting messed up? I'm just saying this because the text looks like it's all in one big block and that makes it harder to read. I want to make sure you get a fair shot at saying what you want to say, and fixing this formatting issue will make people more likely to read your comments.

David Mathers🔸

The expert survey results are also just compatible with "short timelines", strictly speaking, if that means "AI that can do any work a human can for similar cost". If economists think that even that won't produce explosive growth but just a modest speed up, then they will not necessarily predict super-high growth by 2050 even if you specify that AGI arrives in 2030.

yuxin liu

This is indeed a rigorously argued article.

After reading it, I believe that the growth potential of artificial intelligence (AI) truly exists, and I also believe that AI has already begun to change our productivity, and this impact will continue to expand.

However, predictions about the future scalability of AI and its impact on productivity based on historical AI capability growth rate data may be somewhat simplistic.

The development progress of AI varies across different fields. While it may have already achieved significant results in areas such as programming, it may still require a long period of research in the field of embodied AI.

For example, if AI is to eventually achieve full automation of industrial production, thereby greatly liberating human labor, this requires online learning capabilities. This is because production scenarios require continuous iteration of production behavior strategies, whether it's updating a behavioral pattern in a complex production process (which is common in modern complex pipelines) or producing highly customized products. Research on online learning capabilities is still unclear at present.

Of course, this is just my intuitive conjecture and feeling, not a true prediction.

Josh Thorsteinson 🔸

If this trend continues, AI could be doing tasks that take humans a month within a few years.

Actually, if this trend continues, AI could be doing tasks that take humans a month in less than a year. Claude Mythos Preview (early) has a task length of "likely at least 16 hours" on the METR graph. Assuming its task length is 16 hours and extrapolating the 3-month doubling time you mention:

16 hours = 960 minutes. For a working month (~160 hours = 9,600 min), that's 10× further → log₂(10) ≈ 3.32 doublings → ~10 months.

I'll also point out that AI is already doing some tasks that take humans a month. E.g. GPT-5.4 Pro solved this open problem from the FrontierMath benchmark, which they estimate would take a human expert 1-3 months to solve.

Michael Goff

Thanks for this analysis. I continue to be impressed with the advancements the industry has been making, which in the last five or so years in particular have been far beyond what I had expected. Nevertheless, I haven't fully moved out of the skeptic camp for two reasons. One reason, regarding the hazards of extrapolating curves, has been discussed in some other comments.

The other reason is that, despite some attempts to make it rigorous, I still find the term "artificial general intelligence" to be vague, and I expect it to continue to be subject to a moving goalposts problem. There was a time when researchers reasoned that, since chess is a pinnacle of human cognition, AGI would be inherent in a system that can play chess better than any person. This view was revealed to be obviously false after Deep Blue in 1997.

I think a bold prognostication about the development of AI would be on firmer grounds if we avoided anthropomorphisms such as "human level".

David Mathers🔸

They way to deal with the vagueness of "AGI" is to think about substitutability for human labour in an imaginary world where no regulatory barriers prevent this.

Kevin Kuruc

Nice post, I agree with the broad point. Thanks for writing!

I think I disagree with the claim [regarding the expert sample of economists] "I think they simply haven't thought very much about the impact of AI on economic growth." A quick skim suggests the sample selection was for economists actively working on the effects of AI.

I also think 3.5% growth is under-ratedly big. Absent AI, my guess is that most economists would predict a growth slowdown (demographic drag, ideas getting harder to find, etc.) The counterfactual rate could be something like ~1.75 in 2050. If so -- this implies rapid AI progress would 2x the rate of economic growth relative to no AI. That's a big deal!

Comments