Hide table of contents
This is a linkpost for https://youtu.be/VcVfceTsD0A

Context: Max Tegmark is a cofounder of the Future of Life Institute and a leading AI Safety researcher at MIT. He recorded a podcast a few hours before the Future of Life Institute published a call to halt training on large-scale AI models more sophisticated than GPT-4 for six months. This is a summary of that podcast.

Summary

  • Max Tegmark gives arguments for why we'll develop Artificial General Intelligence much faster than expected by most AI safety researchers. Arguments include how: 
    • Training models on simple tasks (like predicting the next word in a sentence) trains surprisingly large complex/general behaviour. 
    • Current models are known to be inefficient and tiny improvements seem to be having outsized gains (not explained by more computational resources/data alone)
  • Since he doesn't see the rate of AI safety research matching the rate of AI development, he sees a coordinated global pause as the only chance to create necessary changes. 
  • Research questions he prioritises include:
    • How do we create AI models which can output formal mathematical proofs about their workings to check if they're aligned?
    • How do we extract and analyse an AI model's functions from its parameters?
    • How do we train an AI model to be hesitant and seek feedback before executing actions?

 

Raw Notes

  • AI interpretability research is revealing a lot of inefficiencies in the way modern models like transformers work. Ex Researchers see that AI models will store data like facts (“Where is the Eiffel Tower?”) in sparse matrices in their weights and this could be made a lot more efficient. If researchers add tiny fixes to get around this, there could be exponential improvements in performance instead of just incremental ones due to more data/computational resources. 
    • A good analogy that Max Tegmark gave for this was about creating the ability to fly. To mimic how birds fly, it took us until modern robotics today. But we invented planes to fly and other flying machines like hot air balloons hundreds of years before that. This was possible due to simpler and less efficient designs - even today’s planes are less energy-efficient than birds. 
    • The generalisable point is that we don’t need to mimic nature’s highly-optimised capabilities to achieve the end result more quickly and less efficiently. To create human-level AI, you don’t need to figure out how brains work. To get GPT-4 level reasoning ability, you can just create an inefficient model that’s only trying to predict which words come next in a sentence.
  • He believes it’s possible to slow down the AI arms race if we have enough public pressure so that all countries realise that an out-of-control model is bad for everyone. He says rapidly building AI systems that aren’t fully safe “isn’t an arms race, it’s a suicide race.
  • Currently, we’re violating all the reasonable safety norms with AI development. 
    • We have arms race dynamics between companies and states. 
    • People are teaching AI systems to code (which researchers recommended avoiding so that AI systems aren’t able to create other software). 
    • People have already taught AI systems about manipulating human psychology (every recommender system on a major social media platform knows us better than other humans know us). 
    • People are giving the most advanced AI systems access to the Internet. RE: Bing Chat, AutoGPT, …
    • People are letting the most advanced AI systems interact with each other and improve each other. RE: AutoGPT prompting itself, using vector databases to hack together long-term memory for AI systems.
  • We’re already seeing difficulties in regulating AI because the lobbyists for the largest tech companies are more so controlling policy than the policymakers are controlling the tech companies. Ex: In the EU, the AI Act kept getting pushback from individual companies to exclude their models from the act. 
    • And it’s entirely possible that the most wealthy companies keep consolidating even more because of compute/data requirements creating economies of scale.
  • He mentions how a colleague of his created a mathematical proof in general that continuing to optimise one goal forever leads you to make things worse eventually. Formalisation of what many folks might claim via their everyday experience.
  • He says that throughout history, groups of people (and other animals) that were no longer needed usually were treated very badly. He gives an example of how there were many horses before cars, but they were largely cast aside when they were no longer needed. He doesn’t give a specific example for humans. 
  • Right now, he says that we’ve started to automate jobs that people really love, not just jobs that are boring, dangerous, or undesirable. Ex: A lot of artists right now are no longer needed because AI models can generate photos, videos, 3D assets, etc.
  • He says that optimising AI to be truthful is very largely useful. He doesn’t elaborate much, but does give one example of how there will be fewer misinformation problems and ensuing conflicts if every human could trust the output of some AI model in a guaranteed way. Ie. This would get around the need to rely on some central authority to be truthful.
    • Technically, you could optimise truthfulness by using Brier loss evaluation metrics. 
  • He says that it’s a lot easier to verify if a mathematical proof is correct than to generate a mathematical proof. He wants to build on this foundation to create reliable checkers for advanced AI algorithms to see if they’re “trustworthy.” Then, discarding the algorithms that turn out not be trustworthy. 
    • Right now, this idea seems like it’s too vague to be useful. 
    • Another idea that’s currently at the vague intuition level is trying to create systems that extract out functioning mechanisms from a model’s parameters. Then, making those mechanisms run in an auditable way. 
  • He thinks we’re already past the point where cutting-edge systems like GPT-4 can be made open-source. Now, he thinks the potential for harm is just too high compared to the benefits of having multiple technical eyes to fix issues.
    • He sees that current safety checks on these systems try to prevent them from spreading misinformation, spreading harmful information, and conducting cyberattacks. He thinks these are good problems to work on, but they don’t have the same scale of importance as the potential of an advanced AI system getting full-blown general capabilities or the damage that already-general systems can do to the economy. 
    • He’s not concerned about AI systems intentionally trying to cause harm to humans relative to the probability of them accidentally doing it or doing it as a side action to get to another goal. 
  • Never mind changing what the education system is teaching kids, we need to change the education system so that it doesn’t ever define what kids should be taught during over a decade of change. 
  • The largest cost of nuclear weapons would be the catastrophe it causes on food systems. He says this recent model estimates 98-99% of people in northern countries dying due to famines following a global nuclear war.
Comments4


Sorted by Click to highlight new comments since:

Madhav - thanks for a very helpful summary of Max Tegmark's remarks. I agree with most of your comments about his views.

His line that 'this isn’t an arms race, it’s a suicide race' seems pretty compelling as a counter-argument against the view -- very commonly expressed on Twitter -- that the US must 'push ahead with AI, so we don't fall behind China'. 

Just listened to a podcast interview of yours, Geoffrey Miller (Manifold, with Steve Hsu). Do you really believe that it is viable to impose a very long pause (you mention 'just a few centuries'). The likelihood of such a thing taking place seems to me more than extremely remote -at least until we get a pragmatic example of the harm AI can do, a Trinity test of sorts.

Manuel - it may not be likely that we can impose a very long pause (on the order of centuries).

My main goal in proposing that was to remind people that with AI, we're talking about a 'major evolutionary transition' comparable to the evolution of multicellular life, or the evolution of human intelligence. Normally these take place on the time-span of hundreds of thousands of years to tens of millions of years. 

If AI development holds all the potential upside that we hope for, but it also threatens some of the downside risks that we dread, it may be useful to be thinking on these evolutionary time scales. Rather than the 'quarterly profit' schedules that many Big Tech companies think about.

Does anyone one have the time stamp, when he argues that it's easier to show that a mathematical proof is correct than to find the proof. I am working on making that argument rigorous for the case of an AI planer and would like to reference the conversation.

Edit: I found it: it's at 01:46:50

Curated and popular this week
 ·  · 13m read
 · 
Notes  The following text explores, in a speculative manner, the evolutionary question: Did high-intensity affective states, specifically Pain, emerge early in evolutionary history, or did they develop gradually over time? Note: We are not neuroscientists; our work draws on our evolutionary biology background and our efforts to develop welfare metrics that accurately reflect reality and effectively reduce suffering. We hope these ideas may interest researchers in neuroscience, comparative cognition, and animal welfare science. This discussion is part of a broader manuscript in progress, focusing on interspecific comparisons of affective capacities—a critical question for advancing animal welfare science and estimating the Welfare Footprint of animal-sourced products.     Key points  Ultimate question: Do primitive sentient organisms experience extreme pain intensities, or fine-grained pain intensity discrimination, or both? Scientific framing: Pain functions as a biological signalling system that guides behavior by encoding motivational importance. The evolution of Pain signalling —its intensity range and resolution (i.e., the granularity with which differences in Pain intensity can be perceived)— can be viewed as an optimization problem, where neural architectures must balance computational efficiency, survival-driven signal prioritization, and adaptive flexibility. Mathematical clarification: Resolution is a fundamental requirement for encoding and processing information. Pain varies not only in overall intensity but also in granularity—how finely intensity levels can be distinguished.  Hypothetical Evolutionary Pathways: by analysing affective intensity (low, high) and resolution (low, high) as independent dimensions, we describe four illustrative evolutionary scenarios that provide a structured framework to examine whether primitive sentient organisms can experience Pain of high intensity, nuanced affective intensities, both, or neither.     Introdu
 ·  · 7m read
 · 
Article 5 of the 1948 Universal Declaration of Human Rights states: "Obviously, no one shall be subjected to torture or to cruel, inhuman or degrading treatment or punishment." OK, it doesn’t actually start with "obviously," but I like to imagine the commissioners all murmuring to themselves “obviously” when this item was brought up. I’m not sure what the causal effect of Article 5 (or the 1984 UN Convention Against Torture) has been on reducing torture globally, though the physical integrity rights index (which “captures the extent to which people are free from government torture and political killings”) has increased from 0.48 in 1948 to 0.67 in 2024 (which is good). However, the index reached 0.67 already back in 2001, so at least according to this metric, we haven’t made much progress in the past 25 years. Reducing government torture and killings seems to be low in tractability. Despite many countries having a physical integrity rights index close to 1.0 (i.e., virtually no government torture or political killings), many of their citizens still experience torture-level pain on a regular basis. I’m talking about cluster headache, the “most painful condition known to mankind” according to Dr. Caroline Ran of the Centre for Cluster Headache, a newly-founded research group at the Karolinska Institutet in Sweden. Dr. Caroline Ran speaking at the 2025 Symposium on the recent advances in Cluster Headache research and medicine Yesterday I had the opportunity to join the first-ever international research symposium on cluster headache organized at the Nobel Forum of the Karolinska Institutet. It was a 1-day gathering of roughly 100 participants interested in advancing our understanding of the origins of and potential treatments for cluster headache. I'd like to share some impressions in this post. The most compelling evidence for Dr. Ran’s quote above comes from a 2020 survey of cluster headache patients by Burish et al., which asked patients to rate cluster headach
 ·  · 2m read
 · 
A while back (as I've just been reminded by a discussion on another thread), David Thorstad wrote a bunch of posts critiquing the idea that small reductions in extinction risk have very high value, because the expected number of people who will exist in the future is very high: https://reflectivealtruism.com/category/my-papers/mistakes-in-moral-mathematics/. The arguments are quite complicated, but the basic points are that the expected number of people in the future is much lower than longtermists estimate because: -Longtermists tend to neglect the fact that even if your intervention blocks one extinction risk, there are others it might fail to block; surviving for billions  (or more) of years likely  requires driving extinction risk very low for a long period of time, and if we are not likely to survive that long, even conditional on longtermist interventions against one extinction risk succeeding, the value of preventing extinction (conditional on more happy people being valuable) is much lower.  -Longtermists tend to assume that in the future population will be roughly as large as the available resources can support. But ever since the industrial revolution, as countries get richer, their fertility rate falls and falls until it is below replacement. So we can't just assume future population sizes will be near the limits of what the available resources will support. Thorstad goes on to argue that this weakens the case for longtermism generally, not just the value of extinction risk reductions, since the case for longtermism is that future expected population  is many times the current population, or at least could be given plausible levels of longtermist extinction risk reduction effort. He also notes that if he can find multiple common mistakes in longtermist estimates of expected future population, we should expect that those estimates might be off in other ways. (At this point I would note that they could also be missing factors that bias their estimates of