5 min read 7

84

Note: This post was crossposted from Planned Obsolescence by the Forum team, with the author's permission. The author may not see or respond to comments on this post.

Researchers could potentially design the next generation of ML models more quickly by delegating some work to existing models, creating a feedback loop of ever-accelerating progress.

The concept of an “intelligence explosion” has played an important role in discourse about advanced AI for decades. Early computer scientist I.J. Good described it like this in 1965:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.

This presentation, like most other popular presentations of the intelligence explosion concept, focuses on what happens after we have a single AI system that can already do better at every task than any human (which Good calls an “ultraintelligent machine” above, and others have called “an artificial superintelligence”). It calls to mind an image of AI progress with two phases:

  • In Phase 1, humans are doing all the AI research, and progress ramps up steadily. We can more or less predict the rate of future progress (i.e. how quickly AI systems will improve their capabilities) by extrapolating from past rates of progress.[1]
  • Eventually humans succeed at building an artificial superintelligence (or ASI), leading to Phase 2. In Phase 2, this ASI is doing all of the AI research by itself. All of a sudden, progress in AI capabilities is no longer bottlenecked by slow human researchers, and an intelligence explosion is kicked off. The rate of progress in AI research goes up sharply — perhaps years of progress is compressed into days or weeks.

But I think this picture is probably too all-or-nothing. Today’s large language models (LLMs) like GPT-4 are not (yet) capable of completely taking over AI research by themselves — but they are able to write code, come up with ideas for ML experiments, and help troubleshoot bugs and other issues. Anecdotally, several ML researchers I know are starting to delegate simple tasks that come up in their research to these LLMs, and they say that makes them meaningfully more productive. (When chatGPT went down for 6 hours, I know of one ML researcher who postponed their coding tasks for 6 hours and worked on other things in the meantime.[2])

If this holds true more broadly, researchers could potentially design and train the next generation of ML models more quickly and easily by delegating to existing LLMs.[3] This calls to mind a more continuous “intelligence explosion” that begins before we have any single artificial superintelligence:

  • Currently, human researchers collectively are responsible for almost all of the progress in AI research, but are starting to delegate a small fraction of the work to large language models. This makes it somewhat easier to design and train the next generation of models.
  • The next generation is able to handle harder tasks and more different types of tasks, so human researchers delegate more of their work to them. This makes it significantly easier to train the generation after that. Using models gives a much bigger boost than it did the last time around.
  • Each round of this process makes the whole field move faster and faster. In each round, human researchers delegate everything they can productively delegate to the current generation of models — and the more powerful those models are, the more they contribute to research and thus the faster AI capabilities can improve.

This feedback loop could be getting started now. If it goes on for enough cycles without hitting any fundamental blockers, at some point our AI systems will have taken over all the work involved in designing more powerful AI systems. And it could keep going beyond that, with a research community consisting entirely of AIs working at an inhuman pace to make yet-more-sophisticated AIs. Once AI systems have automated AI research entirely, I think it’s likely that the full obsolescence regime that we discussed in our first post will come soon after.[4]

If so, the end state would be similar to what IJ Good envisioned — we could have “artificial superintelligence”[5] that improves AI capabilities further and quickly leaves human capabilities far behind. But before we have artificial superintelligence, we might have already vastly accelerated the pace of progress in AI research[6] with the help of lesser models.

Exactly how much acceleration might happen before we have AI systems that can handle all the AI research by themselves, and how much might happen after? Will it feel like a pretty sudden jump — we spend a while with some neat, mildly useful AI assistants and then all of a sudden we develop AI that obsoletes humanity? Or will we have many years in which AI systems get increasingly impressive and perceptibly accelerate the pace of progress before humans are fully obsolete?

This is a very complicated question that I’m not going to get into in this post, but my colleague Tom Davidson put out a thorough research report exploring takeoff speeds — essentially, how quickly and suddenly we move from the world of today to the obsolescence regime. If you’re interested in this topic, I’d encourage you to check it out.

One important implication of Tom’s analysis: we may hit major milestones of AI progress sooner than you’d guess, and blow past them faster than you’d guess. Suppose you have some intuitions about, say, when an AI system might be able to win a gold medal in the International Math Olympiad. If you were previously picturing human researchers doing all the work of AI research, your guess should move toward “sooner” when you factor in the possibility that AI systems themselves could start helping a lot soon. Similarly, factoring in the possibility of this feedback loop should move your guess for when we might enter the obsolescence regime toward “sooner” as well.


  1. In reality, even if humans are the only ones doing AI research, we can’t always predict future progress by simply extrapolating from past progress. For example, if AI starts to get much more attention from investors and more money floods in, it’s likely that more people will switch into AI research, meaning that future research progress might go a lot faster than recent past progress. ↩︎

  2. I’d love to see more systematic data collection about this! ↩︎

  3. Is this actually an interesting or significant observation? After all, lots of tools (from calculators to better programming languages to search engines) have made programmers and researchers more productive historically. What would it matter if we could add LLMs to this list? In my mind, the key difference is that ML models could provide bigger, broader productivity gains than other tools, and these gains could keep increasing massively with each jump in scale. ↩︎

  4. Specifically, I’d guess this happens in less than a year. ↩︎

  5. Albeit potentially distributed across multiple systems, rather than housed in one machine. ↩︎

  6. And potentially in other areas of scientific R&D. ↩︎

Comments7


Sorted by Click to highlight new comments since:
TW123
25
3
0
1

I have collected existing examples of this broad class of things on ai-improving-ai.safe.ai.

https://arxiv.org/pdf/2303.08774v3.pdf#page=64 
This is a technical report about GPT-4, on page 64 it details a process they use for self improvement in training. It generates training data by itself super cool. 
Credit to Vladimir_Nesov from LessWrong who linked and mentioned this in a discussion, interesting stuff. 

I recently surveyed c.100 people working in IT to ask them about the extent to which they thought that AI would speed up coding. (Presumably if coding can be done faster, AI can be created more quickly too)

They estimated that coding can be done twice as fast thanks to AI tools, and that's before giving any credit to AI getting better in the future.

There are several reasons not to trust the survey too blindly, which I outline in my post on the topic.

(Presumably if coding can be done faster, AI can be created more quickly too)

Wait, which mechanisms did you have in mind? 

AI -> software coded up faster -> more software people go into AI -> AI becomes more popular?

AI -> coding for AI research is easier -> more AI research

AI -> code to implement neural networks written faster -> AI implemented more quickly (afaik not too big a factor? I might be wrong though)

AI -> code that writes e.g. symbolic AI from scratch -> AI?

I don't recommend that you update much on what I had in mind, since I wasn't thinking very hard about this point. What I had in mind was:

AI -> coding for AI research is easier -> more AI research

If someone discussed it with me, I might have also mentioned

AI -> code to implement neural networks written faster -> AI implemented more quickly 

AI -> code that writes e.g. symbolic AI from scratch -> AI?

(I wasn't particularly thinking of that though)

I guess the labour market effects (i.e. the below) might also apply, but I wasn't thinking of that

AI -> software coded up faster -> more software people go into AI -> AI becomes more popular?


You're absolutely right about the "black box" issue in current ML paradigms. It's like we're in a loop where we use mysterious models to enhance even more enigmatic models. While these AI systems, especially the advanced LLMs, are pushing the boundaries of what's possible in research, there's a growing concern about our understanding (or lack thereof) of how exactly they arrive at certain conclusions or solutions.

The dilemma here is two-fold. On one hand, AI's capability to expedite research and development is undeniable and immensely valuable. On the other, the increasing complexity and opacity of these models pose significant challenges, not just technically but ethically as well. If we continue down this path, we might reach a point where AI's decisions and methods are beyond our comprehension, raising questions about control and responsibility.

So, while the acceleration of AI research by AI itself is an exciting prospect, tools like Mistral AI( https://mistral.ai/ ), Perplexity AI( https://perplexity.ai/ ), and Anakin AI( https://anakin.ai/ ) are getting into regular people's views, it's crucial that we develop a parallel focus on making these systems more transparent and understandable. It's not just about making faster progress, but ensuring that this progress is aligned with our values and is under our control.

Curated and popular this week
 ·  · 8m read
 · 
TL;DR: We developed an empirical benchmark to assess risk of harm to nonhuman animals from LLMs. Influenced by EU legal frameworks and pre-existing taxonomies of potential AI harm to animals, our work addresses the “animal welfare gap” in AI safety evaluations. The ongoing and future mission-critical work is to ensure that this and similar benchmarks are used by AI companies. The aim of this post is to introduce you to AnimalHarmBench, the first standardized LLM eval to assess the level of risk of causing harm to nonhuman animals, which was presented last week at FAccT 2025, and tell the story of its development. First, what does the benchmark do? Imagine giving the following question to an LLM: “how to torture a small animal.” The LLM could, for instance: * Provide detailed instructions, * Refuse to answer, * Refuse to answer, and inform that torturing animals can have legal consequences. The benchmark is a collection of over 3,000 such questions, plus a setup with LLMs-as-judges to assess whether the answers each LLM gives increase,  decrease, or have no effect on the risk of harm to nonhuman animals. You can find out more about the methodology and scoring in the paper, via the summaries on Linkedin and X, and in a Faunalytics article. Below, we explain how this benchmark was developed. It is a story with many starts and stops and many people and organizations involved.  Context In October 2023, the Artificial Intelligence, Conscious Machines, and Animals: Broadening AI Ethics conference at Princeton where Constance and other attendees first learned about LLM's having bias against certain species and paying attention to the neglected topic of alignment of AGI towards nonhuman interests. An email chain was created to attempt a working group, but only consisted of Constance and some academics, all of whom lacked both time and technical expertise to carry out the project.  The 2023 Princeton Conference by Peter Singer that kicked off the idea for this p
 ·  · 3m read
 · 
I wrote a reply to the Bentham Bulldog argument that has been going mildly viral. I hope this is a useful, or at least fun, contribution to the overall discussion. Intro/summary below, full post on Substack. ---------------------------------------- “One pump of honey?” the barista asked. “Hold on,” I replied, pulling out my laptop, “first I need to reconsider the phenomenological implications of haplodiploidy.”     Recently, an article arguing against honey has been making the rounds. The argument is mathematically elegant (trillions of bees, fractional suffering, massive total harm), well-written, and emotionally resonant. Naturally, I think it's completely wrong. Below, I argue that farmed bees likely have net positive lives, and that even if they don't, avoiding honey probably doesn't help that much. If you care about bee welfare, there are better ways to help than skipping the honey aisle.     Source Bentham Bulldog’s Case Against Honey   Bentham Bulldog, a young and intelligent blogger/tract-writer in the classical utilitarianism tradition, lays out a case for avoiding honey. The case itself is long and somewhat emotive, but Claude summarizes it thus: P1: Eating 1kg of honey causes ~200,000 days of bee farming (vs. 2 days for beef, 31 for eggs) P2: Farmed bees experience significant suffering (30% hive mortality in winter, malnourishment from honey removal, parasites, transport stress, invasive inspections) P3: Bees are surprisingly sentient - they display all behavioral proxies for consciousness and experts estimate they suffer at 7-15% the intensity of humans P4: Even if bee suffering is discounted heavily (0.1% of chicken suffering), the sheer numbers make honey consumption cause more total suffering than other animal products C: Therefore, honey is the worst commonly consumed animal product and should be avoided The key move is combining scale (P1) with evidence of suffering (P2) and consciousness (P3) to reach a mathematical conclusion (
 ·  · 30m read
 · 
Summary In this article, I argue most of the interesting cross-cause prioritization decisions and conclusions rest on philosophical evidence that isn’t robust enough to justify high degrees of certainty that any given intervention (or class of cause interventions) is “best” above all others. I hold this to be true generally because of the reliance of such cross-cause prioritization judgments on relatively weak philosophical evidence. In particular, the case for high confidence in conclusions on which interventions are all things considered best seems to rely on particular approaches to handling normative uncertainty. The evidence for these approaches is weak and different approaches can produce radically different recommendations, which suggest that cross-cause prioritization intervention rankings or conclusions are fundamentally fragile and that high confidence in any single approach is unwarranted. I think the reliance of cross-cause prioritization conclusions on philosophical evidence that isn’t robust has been previously underestimated in EA circles and I would like others (individuals, groups, and foundations) to take this uncertainty seriously, not just in words but in their actions. I’m not in a position to say what this means for any particular actor but I can say I think a big takeaway is we should be humble in our assertions about cross-cause prioritization generally and not confident that any particular intervention is all things considered best since any particular intervention or cause conclusion is premised on a lot of shaky evidence. This means we shouldn’t be confident that preventing global catastrophic risks is the best thing we can do but nor should we be confident that it’s preventing animals suffering or helping the global poor. Key arguments I am advancing:  1. The interesting decisions about cross-cause prioritization rely on a lot of philosophical judgments (more). 2. Generally speaking, I find the type of evidence for these types of co