Hi I'm Steve Byrnes, an AGI safety / AI alignment researcher in Boston, MA, USA, with a particular focus on brain algorithms. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , Twitter , Mastodon , Threads , Bluesky , GitHub , Wikipedia , Physics-StackExchange , LinkedIn
I think wanting, or at least the relevant kind here, just is involuntary attention effects, specifically motivational salience
I think you can have involuntary attention that aren’t particularly related to wanting anything (I’m not sure if you’re denying that). If your watch beeps once every 10 minutes in an otherwise-silent room, each beep will create involuntary attention—the orienting response a.k.a. startle. But is it associated with wanting? Not necessarily. It depends on what the beep means to you. Maybe it beeps for no reason and is just an annoying distraction from something you’re trying to focus on. Or maybe it’s a reminder to do something you like doing, or something you dislike doing, or maybe it just signifies that you’re continuing to make progress and it has no action-item associated with it. Who knows.
Where I might disagree with "involuntary attention to the displeasure" is that the attention effects could sometimes be to force your attention away from an unpleasant thought, rather than to focus on it.
In my ontology, voluntary actions (both attention actions and motor actions) happen if and only if the idea of doing them is positive-valence, while involuntary actions (again both attention actions and motor actions) can happen regardless of their valence. In other words, if the reinforcement learning system is the reason that something is happening, it’s “voluntary”.
Orienting responses are involuntary (with both involuntary motor aspects and involuntary attention aspects). It doesn’t matter if orienting to a sudden loud sound has led to good things happening in the past, or bad things in the past. You’ll orient to a sudden loud sound either way. By the same token, paying attention to a headache is involuntary. You’re not doing it because doing similar things has worked out well for you in the past. Quite the contrary, paying attention to the headache is negative valence. If it was just reinforcement learning, you simply wouldn’t think about the headache ever, to a first approximation. Anyway, over the course of life experience, you learn habits / strategies that apply (voluntary) attention actions and motor actions towards not thinking about the headache. But those strategies may not work, because meanwhile the brainstem is sending involuntary attention signals that overrule them.
So for example, “ugh fields” are a strategy implemented via voluntary attention to preempt the possibility of triggering the unpleasant involuntary-attention process of anxious rumination.
The thing you wrote is kinda confusing in my ontology. I’m concerned that you’re slipping into a mode where there’s a soul / homunculus “me” that gets manipulated by the exogenous pressures of reinforcement learning. If so, I think that’s a bad ontology—reinforcement learning is not an exogenous pressure on the “me” concept, it is part of how the “me” thing works and why it wants what it wants. Sorry if I’m misunderstanding.
IMO, suffering ≈ displeasure + involuntary attention to the displeasure. See my handy chart (from here):
I think wanting is downstream from the combination of displeasure + attention. Like, imagine there’s some discomfort that you’re easily able to ignore. Well, when you do think about it, you still immediately want it to stop!
I don’t recall the details of Tom Davidson’s model, but I’m pretty familiar with Ajeya’s bio-anchors report, and I definitely think that if you make an assumption “algorithmic breakthroughs are needed to get TAI”, then there really isn’t much left of the bio-anchors report at all. (…although there are still some interesting ideas and calculations that can be salvaged from the rubble.)
I went through how the bio-anchors report looks if you hold a strong algorithmic-breakthrough-centric perspective in my 2021 post Brain-inspired AGI and the "lifetime anchor".
See also here (search for “breakthrough”) where Ajeya is very clear in an interview that she views algorithmic breakthroughs as unnecessary for TAI, and that she deliberately did not include the possibility of algorithmic breakthroughs in her bio-anchors model (…and therefore she views the possibility of breakthroughs as a pro tanto reason to think that her report’s timelines are too long).
OK, well, I actually agree with Ajeya that algorithmic breakthroughs are not strictly required for TAI, in the narrow sense that her Evolution Anchor (i.e., recapitulating the process of animal evolution in a computer simulation) really would work given infinite compute and infinite runtime and no additional algorithmic insights. (In other words, if you do a giant outer-loop search over the space of all possible algorithms, then you’ll find TAI eventually.) But I think that’s really leaning hard on the assumption of truly astronomical quantities of compute [or equivalent via incremental improvements in algorithmic efficiency] being available in like 2100 or whatever, as nostalgebraist points out. I think that assumption is dubious, or at least it’s moot—I think we’ll get the algorithmic breakthroughs far earlier than anyone would or could do that kind of insane brute force approach.
For what it’s worth, Yann LeCun is very confidently against LLMs scaling to AGI, and yet LeCun seems to have at least vaguely similar timelines-to-AGI as Ajeya does in that link.
Ditto for me.
Oh hey here’s one more: Chollet himself (!!!) has vaguely similar timelines-to-AGI (source) as Ajeya does. (Actually if anything Chollet expects it a bit sooner: he says 2038-2048, Ajeya says median 2050.)
I agree with Chollet (and OP) that LLMs will probably plateau, but I’m also big into AGI safety—see e.g. my post AI doom from an LLM-plateau-ist perspective.
(When I say “AGI” I think I’m talking about the same thing that you called digital “beings” in this comment.)
Here are a bunch of agreements & disagreements.
if François is right, then I think this should be considered strong evidence that work on AI Safety is not overwhelmingly valuable, and may not be one of the most promising ways to have a positive impact on the world.
I think François is right, but I do think that work on AI safety is overwhelmingly valuable.
Here’s an allegory:
There’s a fast-breeding species of extraordinarily competent and ambitious intelligent aliens. They can do science much much better than Einstein, they can run businesses much much better than Bezos, they can win allies and influence much much better than Hitler or Stalin, etc. And they’re almost definitely (say >>90% chance) coming to Earth sooner or later, in massive numbers that will keep inexorably growing, but we don’t know exactly when this will happen, and we also don’t know in great detail what these aliens will be like—maybe they will have callous disregard for human welfare, or maybe they’ll be great. People have been sounding the alarm for decades that this is a big friggin’ deal that warrants great care and advanced planning, but basically nobody cares.
Then some scientist Dr. S says “hey those dots in the sky—maybe they’re the aliens! If so they might arrive in the next 5-10 years, and they’ll have the following specific properties”. All of the sudden there’s a massive influx of societal interest—interest in the dots in particular, and interest in alien preparation in general.
But it turns out that Dr. S was wrong! The dots are small meteors. They might hit earth and cause minor damage but nothing unprecedented. So we’re back to not knowing when the aliens will come or what exactly they’ll be like.
Is Dr. S’s mistake “strong evidence that alien prep is not overwhelmingly valuable”? No! It just puts us back where we were before Dr. S came along.
(end of allegory)
(Glossary: the “aliens” are AGIs; the dots in the sky are LLMs; and Dr. S would be a guy saying LLMs will scale to AGI with no additional algorithmic insights.)
It would make AI Safety work less tractable
If LLMs will plateau (as I expect), I think there are nevertheless lots of tractable projects that would help AGI safety. Examples include:
It seems that many people in Open Phil have substantially shortened their timelines recently (see Ajeya here).
For what it’s worth, Yann LeCun is very confidently against LLMs scaling to AGI, and yet LeCun seems to have at least vaguely similar timelines-to-AGI as Ajeya does in that link.
Ditto for me.
See also my discussion here (“30 years is a long time. A lot can happen. Thirty years ago, deep learning was an obscure backwater within AI, and meanwhile people would brag about how their fancy new home computer had a whopping 8 MB of RAM…”)
To be clear, you can definitely find some people in AI safety saying AGI is likely in <5 years, although Ajeya is not one of those people. This is a more extreme claim, and does seem pretty implausible unless LLMs will scale to AGI.
I think this makes me very concern of a strong ideological and philosophical bubble in the Bay regarding these core questions of AI.
Yeah some examples would be:
Many ≠ All! But to the extent that these things happen, I’m against it, and I do complain about it regularly.
(To be clear, I’m not opposed to contingency-planning for the possibility that LLMs will scale to AGIs. I don’t expect that contingency to happen, but hey, what do I know, I’ve been wrong before, and so has Chollet. But I find that these kinds of claims above are often stated unconditionally. Or even if they’re stated conditionally, the conditionality is kinda forgotten in practice.)
I think it’s also important to note that these habits above are regrettably common among both AI pessimists and AI optimists. As examples of the latter, see me replying to Matt Barnett and me replying to Quintin Pope & Nora Belrose.
By the way, this might be overly-cynical, but I think there are some people (coming into the AI safety field very recently) who understand how LLMs work but don’t know how (for example) model-based reinforcement learning works, and so they just assume that the way LLMs work is the only possible way for any AI algorithm to work.
On the whole though, I think much of the case by proponents for the importance of working on AI Safety does assume that current paradigm + scale is all you need, or rest on works that assume it.
Yeah this is more true than I would like. I try to push back on it where possible, e.g. my post AI doom from an LLM-plateau-ist perspective.
There were however plenty of people who were loudly arguing that it was important to work on AI x-risk before “the current paradigm” was much of a thing (or in some cases long before “the current paradigm” existed at all), and I think their arguments were sound at the time and remain sound today. (E.g. Alan Turing, Norbert Weiner, Yudkowsky, Bostrom, Stuart Russell, Tegmark…) (OpenPhil seems to have started working seriously on AI in 2016, which was 3 years before GPT-2.)
I’m confused what you’re trying to say… Supposing we do in fact invent AGI someday, do you think this AGI won’t be able to do science? Or that it will be able to do science, but that wouldn’t count as “automating science”?
Or maybe when you said “whether 'PASTA' is possible at all”, you meant “whether 'PASTA' is possible at all via future LLMs”?
Maybe you’re assuming that everyone here has a shared assumption that we’re just talking about LLMs, and that if someone says “AI will never do X” they obviously means “LLMs will never do X”? If so, I think that’s wrong (or at least I hope it’s wrong), and I think we should be more careful with our terminology. AI is broader than LLMs. …Well maybe Aschenbrenner is thinking that way, but I bet that if you were to ask a typical senior person in AI x-risk (e.g. Karnofsky) whether it’s possible that there will be some big AI paradigm shift (away from LLMs) between now and TAI, they would say “Well yeah duh of course that’s possible,” and then they would say that they would still absolutely want to talk about and prepare for TAI, in whatever algorithmic form it might take.
OK yeah, “AGI is possible on chips but only if you have 1e100 of them or whatever” is certainly a conceivable possibility. :) For example, here’s me responding to someone arguing along those lines.
If there are any neuroscientists who have investigated this I would be interested!
There is never a neuroscience consensus but fwiw I fancy myself a neuroscientist and have some thoughts at: Thoughts on hardware / compute requirements for AGI.
One of various points I bring up is that:
My view is: implementing (1) via (3) would involve a lot of inefficient bottlenecks where there’s no low-level affordance that’s a good match to the algorithmic operation we want … but the same is true of implementing (1) via (2). Indeed, I think the human brain does what it does via some atrociously inefficient workarounds to the limitations of biological neurons, limitations which would not be applicable to silicon chips.
By contrast, many people thinking about this problem are often thinking about “how hard is it to use (3) to precisely emulate (2)?”, rather than “what’s the comparison between (1)←(3) versus (1)←(2)?”. (If you’re still not following, see my discussion here—search for “transistor-by-transistor simulation of a pocket calculator microcontroller chip”.)
Another thing is that, if you look at what a single consumer GPU can do when it runs an LLM or diffusion model… well it’s not doing human-level AGI, but it’s sure doing something, and I think it’s a sound intuition (albeit hard to formalize) to say “well it kinda seems implausible that the brain is doing something that’s >1000× harder to calculate than that”.
Yeah sure, here are two reasonable positions:
I think plenty of AI safety people are in (A), which is at least internally-consistent even if I happen to think they’re wrong. I also think there are also lots of AI safety people who would say that they’re in (B) if pressed, but where they long ago lost track of the fact that that’s what they were doing and instead they’ve started treating the contingency as a definite expectation, and thus they say things that omit essential caveats, or are wrong or misleading in other ways. ¯\_(ツ)_/¯
One thing I like is checking https://en.wikipedia.org/wiki/2024 once every few months, and following the links when you're interested.