Hide table of contents

Comment Permalink

Thanks for linking "Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models". Also, I agree with:

MacAskill and Moorhouse argue that increases in training compute, inference compute and algorithmic efficiency have been increasing at a rate of 25 times per year, compared to the number of human researchers which increases 0.04 times per year, hence the 500x faster rate of growth. This is an inapt comparison, because in the calculation the capabilities of ‘AI researchers’ are based on their access to compute and other performance improvements, while no such adjustment is made for human researchers, who also have access to more compute and other productivity enhancements each year.

That comparison seems simplistic and inapt for at least a few reasons. That does seem like pretty "trust me bro" justification for the intelligence explosion lol. Granted, I only listened to the accompanying podcast, so I can't speak too much to the paper.

Still, I am of two minds. I still buy into a lot of the premise of "Preparing for the Intelligence Explosion". I find the idea of getting collectively blind-sighted by rapid, uneven AI progress ~eminently plausible. There didn't even need to be that much of a fig leaf.

Don't get me wrong, I am not personally very confident in "expert level AI researcher for arbitrary domains" w/i the next few decades. Even so, it does seem like the sort of thing worth thinking about and preparing about.

From one perspective, AI coding tools are just recursive self improvement gradually coming online. I think I understand some of the urgency, but I appreciate the skepticism a lot too.

Preparing for an intelligence explosion is a worthwhile thought experiment at least. It seems probably good to know what we would do in a world with "a lot of powerful AI" given that we are in a world where all sorts of people are trying to research/make/sell ~"a lot of powerful AI". Like just in case, at least.

I think I see multiple sides. Lots to think about.

See in context

Why I am Still Skeptical about AGI by 2030

by James Fodor

May 27 min read 13

131

AI safetyBuilding effective altruismForecastingAI risk skepticismCriticism of effective altruist causesAI benchmarksAI evaluations and standardsAI forecasting

Frontpage

Why I am Still Skeptical about AGI by 2030

Introduction

Rates of Growth

The Limitations of Benchmarks

Real-World Adoption

Conclusion

13 comments

Introduction

I have been writing posts critical of mainstream EA narratives about AI capabilities and timelines for many years now. Compared to the situation when I wrote my posts in 2018 or 2020, LLMs now dominate the discussion, and timelines have also shrunk enormously. The ‘mainstream view’ within EA now appears to be that human-level AI will be arriving by 2030, even as early as 2027. This view has been articulated by 80,000 Hours, on the forum (though see this excellent piece excellent piece arguing against short timelines), and in the highly engaging science fiction scenario of AI 2027. While my article piece is directed generally against all such short-horizon views, I will focus on responding to relevant portions of the article ‘Preparing for the Intelligence Explosion’ by Will MacAskill and Fin Moorhouse.

Rates of Growth

The authors summarise their argument as follows:

Currently, total global research effort grows slowly, increasing at less than 5% per year. But total AI cognitive labour is growing more than 500x faster than total human cognitive labour, and this seems likely to remain true up to and beyond the point where the cognitive capabilities of AI surpasses all humans. So, once total AI cognitive labour starts to rival total human cognitive labour, the growth rate of overall cognitive labour will increase massively. That will drive faster technological progress.

MacAskill and Moorhouse argue that increases in training compute, inference compute and algorithmic efficiency have been increasing at a rate of 25 times per year, compared to the number of human researchers which increases 0.04 times per year, hence the 500x faster rate of growth. This is an inapt comparison, because in the calculation the capabilities of ‘AI researchers’ are based on their access to compute and other performance improvements, while no such adjustment is made for human researchers, who also have access to more compute and other productivity enhancements each year.

It is also highly unclear if current rates of increase can reasonably be extrapolated. In particular, the components of the rate of increase of ‘AI researchers’ are not independent, since if the rate of algorithmic improvement slows, then it is highly likely investments in training and inference compute will also slow down. Furthermore, most new technologies improve very rapidly at first and then performance significantly slows; the cost for genome sequencing is a good recent example. Such a slowdown may already be beginning. For example, after months of anticipation prior to its release in February, OpenAI recently announced they will remove their new GPT4.5 model from API access in July. This apparently is due to the high cost of such a large model with only modest improvement in performance. The recent release of Llama 4 was also met with mixed reception owing to disappointing performance and controversies about its development. For all these reasons, I do not believe the 500-fold greater rate of increase in ‘AI researchers’ compared to human researchers is particularly accurate nor can it be confidently extrapolated to continue over the coming decade.

The authors then argue that even in the absence of continued increases in compute, deployment of AI to improve AI research could lead to a ‘software feedback loop’, where AI capabilities continue to improve due to improvements in AI capabilities elicited by AI researchers. MacAskill and Moorhouse defend this claim by quoting evidence that “empirical estimates of efficiency gains in various software domains suggest that doubling cognitive inputs (research effort) generally yields more than a doubling in software performance or efficiency.” Here they cite a paper which presents estimates for returns on research efforts for four software domains: computer vision, sampling efficiency in reinforcement learning, SAT solvers, and linear programming. These are all substantially more narrowly-defined than the very general capabilities required for improving AI researcher capability. Furthermore, the two machine learning related tasks (computer vision and sampling efficiency in RL) covered timespans of only ten and four years respectively. Furthermore, the paper in question is a methodological survey, and highlights that all presented estimates suffer from significant methodological shortcomings that are very difficult to overcome empirically. As such, this evidence is not a very convincing reason to think that doubling the number of ‘AI researchers’ working on improving AI will result in a self-sustaining software feedback loop for any significant period of time.

The Limitations of Benchmarks

MacAskill and Moorhouse also argue that individual AI systems are becoming rapidly more capable at performing research-related tasks, and will soon reach parity with human researchers. Specifically, they claim that within the next five years there will be ‘models which surpass the research ability of even the smartest human researchers, in basically every important cognitive domain’. Given the centrality of this claim to their overall case, they devote surprisingly little space to substantiating it. Indeed, their justification consists entirely of appeals to rapid increases in the performance of LLMs on various benchmark tasks. They cite GPQA (multiple choice questions covering PhD-level science topics), RE-Bench (machine learning optimisation coding tasks), and SWE-Bench (real-world software tasks). They also mention that LLMs can now ‘answer questions fluently and with more general knowledge than any living person.’

Exactly why improved performance on these tasks should warrant the conclusion that models will soon surpass research ability on ‘basically every important cognitive domain’ is not explained. As a cognitive science researcher, I find this level of analysis incredibly simplistic. The authors don’t explain what they mean by ‘cognitive domain’ or how they arrive at their conclusions about the capabilities of current LLMs compared to humans. Wikipedia has a nice list of cognitive capabilities, types of thinking, and domains of thought, and it seems to me that current LLMs have minimal ability to perform most of these reliably. Of course, my subjective look at such a list isn’t very convincing evidence of anything. But neither is the unexamined and often unarticulated claim that performance on coding problems, math tasks, and science multiple choice questions is somehow predictive of performance across the entire scope of human cognition. I am continually surprised at the willingness of EAs to make sweeping claims about the cognitive capabilities of LLMs with little to no theoretical or empirical analysis of human cognition or LLMs, other than a selection of machine learning benchmarks.

Beyond these general concerns, I documented in my earlier paper several major limitations with the use of benchmarks for assessing the performance of LLMs. Here I summarise the major issues:

Tests should only be used to evaluate the capabilities of a person or model if they have been validated as successfully generalising to tasks beyond the test itself. Extensive research is conducted within cognitive psychology for human intelligence and other psychometric tests, but much less work has been done for LLM benchmarks. The research that has been conducted often shows limited generalisation and significant overfitting of models to benchmarks.
Use of adversarial and interpretation techniques has repeatedly found that LLMs perform poorly on many tasks when more difficult examples are used. Further, models often do not use appropriate reasoning steps, instead confabulating explanations that seem plausible but do not actually account for the provided solution.
LLMs often do not successfully generalise to versions of the task beyond those they were trained on. The models often use superficial heuristics and pattern-matching rather than genuine understanding or reasoning steps.
The training data for many LLMs is contaminated with questions and solutions from known benchmarks, as well as synthetic data generated from such benchmarks. This is worsened by strong incentives of developers to fudge the training or evaluation process to achieve better benchmark results. Most recently, OpenAI has attracted criticism for their reporting of results on both the ARC-AI and the Frontier Math benchmarks.

Even more recent results corroborate these points. One recent analysis of the performance of LLMs on a new, and hence previously unseen, mathematics task found that “all tested models struggled significantly: only GEMINI-2.5-PRO achieves a non-trivial score of 25%, while all other models achieve less than 5%. Through detailed analysis of reasoning traces, we identify the most common failure modes and find several unwanted artifacts arising from the optimization strategies employed during model training.” A separate analysis of similar data found that models regularly rely on pattern recognition and heuristic shortcuts rather than engaging in genuine mathematical reasoning.

Real-World Adoption

One final issue pertains to the speed with which LLMs can be adapted to perform real-world tasks. MacAskill and Moorhouse discuss at length the possibility for ‘AI researchers’ to dramatically speed up the process of scientific research. However so far, the only example of a machine learning system performing a significant scientific research task is AlphaFold, a system designed to predict the structure of protein molecules given their amino acid sequence. In addition to being eight years old, AlphaFold does not solve the problem of protein folding. It is simply a tool for predicting protein structure, and even in that narrow task is has many limitations. LLMs are increasingly utilised in cognitive science research as an object of study in their own right, as well as providing a useful tool for text processing or data validation. However I am not aware of any examples of LLMs being applied to significantly accelerate any aspect of scientific research. Perhaps this will change rapidly within the next few years, but MacAskill and Moorhouse do not give any reasons for thinking so beyond generic appeals to increased performance on coding and multiple-choice benchmarks.

Other lines of evidence also indicate that the real-world impact of LLMs is modest. For instance, a large survey of workers in 11 exposed occupations in Denmark found effects of LLM adoption on earnings and hours worked of less than 1%. Similarly, a series of interviews of 19 policy analysts, academic researchers, and industry professionals who have used benchmarks to inform decisions regarding adoption or development of LLMs found that most respondents were skeptical of the relevance of benchmark performance for real-world tasks. As with past technologies, many factors including reliability problems, supply chain bottlenecks, organisational inertia, user training, and difficulty in adapting to specific use cases will mean that the real-world impacts of LLMs are likely to develop over the timespan of decades rather than a few years.

Conclusion

The coming few years will undoubtedly see continued progress and ongoing adoption of LLMs in various economic sectors. However, I find the case for 3-5 timelines for the development of AGI to be unconvincing. These arguments are overly dependent on simple explanations of existing trends while paying insufficient attention to the known limitations of such benchmarks. Similarly, I find that such arguments often rely on extensive speculation based primarily on science fiction scenarios and thought experiments, rather than careful modelling, historical parallels, or detailed consideration of the similarities and differences between LLMs and human cognition.

131 Reactions

More posts like this

Comments13

Sorted by

New & upvoted

Click to highlight new comments since: Today at 6:20 AM

finmMay 8*13

Thanks for writing this James, I think you raise good points, and for the most part I think you're accurately representing what we say in the piece. Some scattered thoughts —

I agree that you can't blithely assume scaling trends will continue, or that scaling “laws” will continue to hold. I don't think both assumptions are entirely unmotivated, because the trends have already spanned many orders of magnitude. E.g. pretraining compute has spanned ≈ 1e8 OOMs of FLOP since the introduction of the transformer, and scaling “laws” for given measures of performance hold up across almost as many OOMs, experimentally and practically. Presumably the trends have to break down eventually, but if all you knew were the trends so far, it seems reasonable to spread your guesses over many future OOMs.

Of course we know more than big-picture trends. As you say, GPT 4.5 seemed pretty disappointing, on benchmarks and just qualitatively. People at labs are making noises about pretraining scaling as we know it slowing down. I do think short timelines^[1] look less likely than they did 6 months or a year ago. Progress is often made by a series of overlapping s-curves, and the source of future gains could come from elsewhere, like inference scaling. But we don't give specific reasons to expect that in the piece.

On the idea of a “software-only intelligence explosion”, this piece by Daniel Eth and Tom Davidson gives more detailed considerations which are specific to ML R&D. It's not totally obvious to me why you should expect returns to diminish faster in broad vs narrow domains. In very narrow domains (e.g. checkers-playing programs), you might intuitively think that the ‘well’ of creative ways to improve performance is shallower, so further improvements beyond the most obvious are less useful. That said, I'm not strongly sold on the “software-only intelligence explosion” idea; I think it's not at all guaranteed (maybe I'm 40% on it?) but worth having on the radar.

I also agree that benchmarks don't (necessarily) generalise well to real-world tasks. In fact I think this is one of the cruxes between people who expect AI progress to quickly transform the world in various ways. I do think the cognitive capabilities of frontier LLMs can meaningfully be described as “broad” in the sense that: if I tried to think of a long list of tasks and questions which span an intuitively broad range of “cognitive skills” before knowing about the skill distribution of LLMs^[2] then I expect the best LLMs to perform well across the large majority of those tasks and questions.

But I agree there is a gap between impressive benchmark scores and real-world usefulness. Partly there's a kind of o-ring dynamic, where the speed of progress is gated by the slowest-moving critical processes — speeding up tasks that normally take 20% of a researcher's time by 1,000x won't speed them up by more than 25% if the other tasks can't be substituted. Partly there are specific competences which AI models don't have, like reliably carrying out long-range tasks. Partly (and relatedly) there's an issue of access to implicit knowledge and other bespoke training data (like examples of expert practitioners carrying out long time-horizon tasks) which are scarce in the public internet training data. And so on.

Note that (unless we badly misphrased something) we are not arguing for “AGI by 2030” in the piece you are discussing. As always, it depends on what “AGI” means, but if it means a world where tech progress is advancing across the board at 10x rates, or a time where there is basically no cognitive skill where any human still outperforms some AI, then I also think that's unlikely by 2030.

One thing I should emphasise, and maybe we should have emphasised more, is the possibility of faster (say 10x) growth rates from continued AI progress. Where I think the best accounts of why this could happen do not depend on AI becoming superhuman in some very comprehensive way, or on a software-only intelligence explosion happening. The main thing we were trying to argue is that sustained AI progress could significantly accelerate tech progress and growth, after the point where (if it ever happens) the contribution to research from AI reaches some kind of parity with humans. I do think this is plausible, though not overwhelmingly likely.

Thanks again for writing this.

^{^}
That is, “AGI by 2030” or some other fixed calendar date.
^{^}
Things like memorisation, reading comprehension, abstract reasoning, general knowledge, etc.

SummaryBotMay 27

Executive summary: In this critical and cautious analysis, the author argues that predictions of AGI emerging by 2030—especially claims centered on AI self-improvement and benchmark performance—are based on overstated analogies, flawed extrapolations, and speculative reasoning, and should not be treated as robust forecasts.

Key points:

The 500x growth comparison is misleading: The claim that AI cognitive labor is growing 500 times faster than human labor relies on asymmetric assumptions—counting improvements from compute and algorithms for AI but not for humans—and ignores likely slowdowns in AI progress due to diminishing returns and cost constraints.
Doubts about the 'software feedback loop': The notion that AI will improve its own capabilities through a self-reinforcing loop lacks strong empirical support; cited studies cover narrow domains with methodological issues and don’t generalize to broader cognitive tasks.
LLM benchmark performance is an unreliable proxy for general cognition: The author challenges the assumption that excelling on benchmarks like GPQA or SWE-Bench translates to surpassing humans in "basically every important cognitive domain," noting the absence of theoretical grounding and the overfitting and training contamination in benchmark tests.
Real-world adoption is slow and limited: Despite high expectations, LLMs have not yet significantly accelerated scientific research or productivity in practical domains; barriers like integration challenges, modest labor impacts, and skepticism from practitioners suggest slower, more incremental change.
Overall skepticism toward short AGI timelines: The post critiques EA-aligned projections (e.g., AI 2027) as relying too heavily on trend extrapolation, speculative feedback loops, and sci-fi narratives, while underweighting historical precedent, deployment realities, and unresolved gaps between current LLM capabilities and general intelligence.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Jacob Watts🔸May 46

Thanks for linking "Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models". Also, I agree with:

MacAskill and Moorhouse argue that increases in training compute, inference compute and algorithmic efficiency have been increasing at a rate of 25 times per year, compared to the number of human researchers which increases 0.04 times per year, hence the 500x faster rate of growth. This is an inapt comparison, because in the calculation the capabilities of ‘AI researchers’ are based on their access to compute and other performance improvements, while no such adjustment is made for human researchers, who also have access to more compute and other productivity enhancements each year.

From one perspective, AI coding tools are just recursive self improvement gradually coming online. I think I understand some of the urgency, but I appreciate the skepticism a lot too.

I think I see multiple sides. Lots to think about.

Yarrow🔸May 35

Kudos on a well-written, well-researched, and well-argued post!

The part about "AI researchers" (which don't actually exist; there is no such thing as an "AI researcher") vs. human researchers gets at a simple mistake, which is confusing inputs and outputs.

For example, I believe that a GPU running a large language model (LLM) uses a lot more energy than a human brain.^[1] Is this evidence that LLMs are smarter than humans?

No, of course not.

If anything, people have interpreted this as a weakness of AI. Maybe it means the way current AI systems solve problems is too "brute force" and we're missing something fundamental about the nature of intelligence.

Saying that AI systems are using more and more compute over time doesn't directly say anything about their actual intelligence, any more than saying that AI systems are using more and more energy over time does.

How intelligent are AI systems compared to humans? How much has that been changing over time? How do we measure that? These are the key questions and, as you pointed out in your section on benchmarks, a lot of AI benchmarks don't seem like they really measure this.

One potential way to measure the intelligence of AI vs. humans is the extent to which AI can displace human labour. You nicely covered this on your section on real-world adoption.

One study I can contribute to the discussion is this one I found about a customer support centre adopting LLMs. The outcome was mixed, with the study finding that LLMs harmed the productivity of the centre's most experienced employees.

I think this post does a great job of pointing out the ways in which the arguments for near-term artificial general intelligence (AGI) rely on hand-wavy reasoning and make simple, fundamental mistakes — mistakes so simple and so fundamental that I'm confused why people like Will MacAskill don't catch these mistakes themselves, since surely they should be able to clearly see these mistakes themselves and don't need people like you or I to point them out.

^{^}
Here's one estimate I found after a quick Google, but I don't know if it's accurate.

vanessa16May 72

Thanks James, interesting post!

A minor question: where you say the following,

MacAskill and Moorhouse argue that increases in training compute, inference compute and algorithmic efficiency have been increasing at a rate of 25 times per year, compared to the number of human researchers which increases 0.04 times per year, hence the 500x faster rate of growth. This is an inapt comparison, because in the calculation the capabilities of ‘AI researchers’ are based on their access to compute and other performance improvements, while no such adjustment is made for human researchers, who also have access to more compute and other productivity enhancements each year.

do you think human researchers' access to compute and other productivity enhancements would have a significant impact on their research capacity? It's not obvious to me how bottlenecked human researchers are by these factors, whereas they seem much more critical to "AI researchers".

More generally, are there things you would like to see the EA community do differently, if it placed more weight on longer AI timelines? It seems to me that even if we think short timelines are a bit likely, we should probably put quite a lot of resources towards things that can have an impact in the short term.

James FodorMay 234

Hey Vanessa!

My main point here is that if we think increased compute and processing will be valuable to AI researchers, which MacAskill and Moorhouse argue will be the case because of ability to analyse existing data, perform computational experiements, etc, then we should also think that such improvements will also be valuable to human researchers. Indeed, if AI becomes so valuable in its own right, I would also expect AI tools to augment human researcher capabilities. This is one reason why I don't think its very meaningful to assume that the number of AI-equivalent researchers will increase very rapidly while the number of human researchers only grows a few percent per year. In my view we should be adjusting for the capabilities of human researchers as well as for AI-equivalent researchers.

As for what the EA community should do, like I've been saying for years I think there should be more diversity in thought, projects, orgs, etc, particularly in terms of support and attention given by thought leaders. I find there is surprisingly little diversity of thought about AI safety in particular, and the EA community could do a lot better in fostering more diverse research and discussion on this important issue.

SharmakeMay 242

The main unremovable advantages of AIs over humans will probably be in the following 2 areas:

A serial speed advantage, from 50-1000x, with my median in the 100-500x speed advantage range, and more generally the ability to run slower or faster to do more work proportionally, albeit there are tradeoffs at either extreme of either running slow or fast.
The ability for compute/software improvements to directly convert into more researchers with essentially 0 serial time necessary, unlike basically all of reproduction (about the only cases where it even gets close are the days/hours doubling time of flies and some bacteria/viruses, but these are doing much simpler jobs and it's uncertain whether you could add more compute/learning capability without slowing down their doubling time.)

This is the mechanism by which you can get way more AI researchers very fast, while human researchers don't increase proportionally.

Humans probably do benefit assuming AI is useful enough to automate say AI research away, but these 2 unremovable limitations fundamentally prevent anything like an explosion in research, unlike AI research.

SjlverMay 91

This was a valuable read for me. Thanks!

I share some of your skepticism. At the same time, I think the argument relies on reasons that are quite speculative, such as:

it's highly unclear if rates of progress can be extrapolated
benchmarks might not generalize to the full complexity of real tasks
here's a list of cognitive processes that AI can't do yet; it might continue to fail at them for years to come

I can't shake off the feeling that this type of argument has often aged poorly when it comes to AI. I've certainly been baffled many times by AI solving tasks that I predicted to be very hard.

In contrast, texts like "Preparing for the Intelligence Explosion" and ai-2027.com essentially assume that some trends continue for a few more years. While that also relies on many assumptions, and the results sound like Sci-Fi, it seems to carry a lower burden of proof. Or at least, I find it more compelling.

Reflecting on this, I think a big difference between me and others is that I work as a software engineer. In this field, AI progress feels very visceral. I'm one of the people who shifted from writing 95% code myself to having >50% of code AI-generated, within one year or so. I'm doing a lot more code reviews these days, and spend time thinking about how tasks could be split up and partially solved by agents. There's a keen sense that this is only the beginning. New tools and significantly better workflows arrive every few weeks.

I agree that it will take time for advances to reach other fields. Many of my non-software-engineer friends hardly use AI today. But that said, advances in AI-powered software engineering might be all that's needed for AIs to improve at breakneck pace.

Yarrow🔸May 93

I can't shake off the feeling that this type of argument has often aged poorly when it comes to AI. I've certainly been baffled many times by AI solving tasks that I predicted to be very hard.

This may be true for games like chess, go, and StarCraft, or for other narrow tests of AI. But for claims that AI will do something useful, practical, and economically valuable — like driving cars or replacing humans on assembly lines — the opposite is true. The predictions about rapid AI progress have been dead wrong and the AI skeptics have been right.

SjlverMay 105

Why do you think that? Personally, I've lost several bets. For example, I've bet NO on "Will an AI win a gold medal on the IOI (competitive programming contest) before 2027?" and have already lost that 20 months before the start of 2027.

As a former IOI participant, that achievement feels amazing. As a software engineer, I absolutely find AI tools useful, practical, and economically valuable.

Yarrow🔸May 10*11

If AI is having an economic impact by automating software engineers' labour or augmenting their productivity, I'd like to see some economic data or firm-level financial data or a scientific study that shows this.

Your anecdotal experience is interesting, for sure, but the other people who write code for a living who I've heard from have said, more or less, AI tools save them the time it would take to copy and paste code from Stack Exchange, and that's about it.

I think AI's achievements on narrow tests are amazing. I think AlphaStar's success on competitive StarCraft II was amazing. But six years after AlphaStar and ten years after AlphaGo, have we seen any big real-world applications of deep reinforcement learning or imitation learning that produce economic value? Or do something else practically useful in a way we can measure? Not that I'm aware of.

Instead, we've had companies working on real-world applications of AI, such as Cruise, shutting down. The current hype about AGI reminds me a lot of the hype about self-driving cars that I heard over the last ten years, from around 2015 to 2025. In the five-year period from 2017 to 2022, the rhetoric on solving Level 4/5 autonomy was extremely aggressive and optimistic. In the last few years, there have been some signs that some people in the industry are giving up, such as Cruise closing up shop.

Similarly, some companies, including Tesla, Vicarious, Rethink Robotics, and several others have tried to automate factory work and failed.

Other companies, like Covariant, have had modest success on relatively narrow robotics problems, like sorting objects into boxes in a warehouse, but nothing revolutionary.

The situation is complicated and the truth is not obvious, but it's too simple to say that predictions about AI progress have overall been too pessimistic or too conservative. (I'm only thinking about recent predictions, but one of the first predictions about AI progress, made in 1956, was wildly overoptimistic.^[1])

I wrote a post here and a quick take here where I give my other reasons for skepticism about near-term AGI. That might help fill in more information about where I'm coming from, if you're curious.

^{^}
Quote:
An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.

SjlverMay 123

The economic data seems to depend on one's point of view. I'm no economist and I certainly can't prove to you that AI is having an economic impact. Its use grows quickly though: Statistics on AI market size

It's also important, I think, to distinguish between AI capabilities and AI use. The AI-2027 text argues that a selected few AI capabilities matter most, namely those related to software and AI engineering. These will drive the recursive improvements. Changes to other parts of the industry are downstream of that. Both our viewpoints seem to be consistent with this model: I see rapidly increasing capabilities in software, and you see that other fields have not been so affected yet.

I'll finish with yet another anecdote, because it happened just yesterday. I was on a mountain hike with my nephew (11 years). He proudly told me that they had a difficult math task in school, and "I was one of the few that could solve it without ChatGPT".

It's an anecdote, of course. At the same time, effects of AI seem to be large in education, and changes in education probably lead to changes in the industry.

MilesWMay 81

Good post. You put into words a gut feeling I had but didn't have strong evidence for.

Regarding AlphaFold I've had discussions with @Laura Leighton about how some of the main bottlenecks in research right now (apart from funding hurhur) are in labor and physical tasks. There is room for automation but it's mostly mechanical (Laura please correct me if I'm misremembering)