This is a linkpost for https://youtu.be/VcVfceTsD0A
Context: Max Tegmark is a cofounder of the Future of Life Institute and a leading AI Safety researcher at MIT. He recorded a podcast a few hours before the Future of Life Institute published a call to halt training on large-scale AI models more sophisticated than GPT-4 for six months. This is a summary of that podcast.
Summary
- Max Tegmark gives arguments for why we'll develop Artificial General Intelligence much faster than expected by most AI safety researchers. Arguments include how:
- Training models on simple tasks (like predicting the next word in a sentence) trains surprisingly large complex/general behaviour.
- Current models are known to be inefficient and tiny improvements seem to be having outsized gains (not explained by more computational resources/data alone)
- Since he doesn't see the rate of AI safety research matching the rate of AI development, he sees a coordinated global pause as the only chance to create necessary changes.
- Research questions he prioritises include:
- How do we create AI models which can output formal mathematical proofs about their workings to check if they're aligned?
- How do we extract and analyse an AI model's functions from its parameters?
- How do we train an AI model to be hesitant and seek feedback before executing actions?
Raw Notes
- AI interpretability research is revealing a lot of inefficiencies in the way modern models like transformers work. Ex Researchers see that AI models will store data like facts (“Where is the Eiffel Tower?”) in sparse matrices in their weights and this could be made a lot more efficient. If researchers add tiny fixes to get around this, there could be exponential improvements in performance instead of just incremental ones due to more data/computational resources.
- A good analogy that Max Tegmark gave for this was about creating the ability to fly. To mimic how birds fly, it took us until modern robotics today. But we invented planes to fly and other flying machines like hot air balloons hundreds of years before that. This was possible due to simpler and less efficient designs - even today’s planes are less energy-efficient than birds.
- The generalisable point is that we don’t need to mimic nature’s highly-optimised capabilities to achieve the end result more quickly and less efficiently. To create human-level AI, you don’t need to figure out how brains work. To get GPT-4 level reasoning ability, you can just create an inefficient model that’s only trying to predict which words come next in a sentence.
- He believes it’s possible to slow down the AI arms race if we have enough public pressure so that all countries realise that an out-of-control model is bad for everyone. He says rapidly building AI systems that aren’t fully safe “isn’t an arms race, it’s a suicide race.”
- Currently, we’re violating all the reasonable safety norms with AI development.
- We have arms race dynamics between companies and states.
- People are teaching AI systems to code (which researchers recommended avoiding so that AI systems aren’t able to create other software).
- People have already taught AI systems about manipulating human psychology (every recommender system on a major social media platform knows us better than other humans know us).
- People are giving the most advanced AI systems access to the Internet. RE: Bing Chat, AutoGPT, …
- People are letting the most advanced AI systems interact with each other and improve each other. RE: AutoGPT prompting itself, using vector databases to hack together long-term memory for AI systems.
- We’re already seeing difficulties in regulating AI because the lobbyists for the largest tech companies are more so controlling policy than the policymakers are controlling the tech companies. Ex: In the EU, the AI Act kept getting pushback from individual companies to exclude their models from the act.
- And it’s entirely possible that the most wealthy companies keep consolidating even more because of compute/data requirements creating economies of scale.
- He mentions how a colleague of his created a mathematical proof in general that continuing to optimise one goal forever leads you to make things worse eventually. Formalisation of what many folks might claim via their everyday experience.
- He says that throughout history, groups of people (and other animals) that were no longer needed usually were treated very badly. He gives an example of how there were many horses before cars, but they were largely cast aside when they were no longer needed. He doesn’t give a specific example for humans.
- Right now, he says that we’ve started to automate jobs that people really love, not just jobs that are boring, dangerous, or undesirable. Ex: A lot of artists right now are no longer needed because AI models can generate photos, videos, 3D assets, etc.
- He says that optimising AI to be truthful is very largely useful. He doesn’t elaborate much, but does give one example of how there will be fewer misinformation problems and ensuing conflicts if every human could trust the output of some AI model in a guaranteed way. Ie. This would get around the need to rely on some central authority to be truthful.
- Technically, you could optimise truthfulness by using Brier loss evaluation metrics.
- He says that it’s a lot easier to verify if a mathematical proof is correct than to generate a mathematical proof. He wants to build on this foundation to create reliable checkers for advanced AI algorithms to see if they’re “trustworthy.” Then, discarding the algorithms that turn out not be trustworthy.
- Right now, this idea seems like it’s too vague to be useful.
- Another idea that’s currently at the vague intuition level is trying to create systems that extract out functioning mechanisms from a model’s parameters. Then, making those mechanisms run in an auditable way.
- He thinks we’re already past the point where cutting-edge systems like GPT-4 can be made open-source. Now, he thinks the potential for harm is just too high compared to the benefits of having multiple technical eyes to fix issues.
- He sees that current safety checks on these systems try to prevent them from spreading misinformation, spreading harmful information, and conducting cyberattacks. He thinks these are good problems to work on, but they don’t have the same scale of importance as the potential of an advanced AI system getting full-blown general capabilities or the damage that already-general systems can do to the economy.
- He’s not concerned about AI systems intentionally trying to cause harm to humans relative to the probability of them accidentally doing it or doing it as a side action to get to another goal.
- Never mind changing what the education system is teaching kids, we need to change the education system so that it doesn’t ever define what kids should be taught during over a decade of change.
- The largest cost of nuclear weapons would be the catastrophe it causes on food systems. He says this recent model estimates 98-99% of people in northern countries dying due to famines following a global nuclear war.
Madhav - thanks for a very helpful summary of Max Tegmark's remarks. I agree with most of your comments about his views.
His line that 'this isn’t an arms race, it’s a suicide race' seems pretty compelling as a counter-argument against the view -- very commonly expressed on Twitter -- that the US must 'push ahead with AI, so we don't fall behind China'.
Just listened to a podcast interview of yours, Geoffrey Miller (Manifold, with Steve Hsu). Do you really believe that it is viable to impose a very long pause (you mention 'just a few centuries'). The likelihood of such a thing taking place seems to me more than extremely remote -at least until we get a pragmatic example of the harm AI can do, a Trinity test of sorts.
Manuel - it may not be likely that we can impose a very long pause (on the order of centuries).
My main goal in proposing that was to remind people that with AI, we're talking about a 'major evolutionary transition' comparable to the evolution of multicellular life, or the evolution of human intelligence. Normally these take place on the time-span of hundreds of thousands of years to tens of millions of years.
If AI development holds all the potential upside that we hope for, but it also threatens some of the downside risks that we dread, it may be useful to be thinking on these evolutionary time scales. Rather than the 'quarterly profit' schedules that many Big Tech companies think about.
Does anyone one have the time stamp, when he argues that it's easier to show that a mathematical proof is correct than to find the proof. I am working on making that argument rigorous for the case of an AI planer and would like to reference the conversation.
Edit: I found it: it's at 01:46:50