AGI in sight: our look at the game board

Andrea_Miotti

Comments 18

Sorted by

New & upvoted

I just want to register a meta-level disagreement with this post which is your recommendations seem like really bad epistemics. I don't think we should just heuristics and information cascade ourselves to death as a community but actually create good gears level understandings of forecasting AI progress.

You cite that AI accelerationist arguments act as soldiers but you literally are deploying arguments as soldiers in this post!
You recommend terrible weird gossiping anti-agency mechanisms instead of pro-agency actions like work on safety, upskill, and field build.
You make a lot of arguments in negation that feel like weird sleight of hands. For instance, you say "We don’t know of any major AI lab that has participated in slowing down AGI development, or publicly expressed interest in it" but OpenAI's charter literally has the assist clause (regardless of whether or not you believe it's a promise they will hold it exists).

To be clear I think there are good arguments for short timelines (median 5-10) but you don't actually make them here^[1]. What you do instead is:

Say you can express technical disagreement but not say any empirical examples/obstacles because that's infohazardous.
A lot of the heuristics based arguments can't even be verified or prodded because they are "private conversations" which is I guess fine but then what do you want people to do with that?

I think people should think for themselves and engage with the arguments and models people provide for timelines and threat models but this post doesn't do that. It just directionally vibes a high p(doom) with a short timelines and tells people to panic and gossip.

^{^}
For instance: https://www.lesswrong.com/posts/rzqACeBGycZtqCfaX/fun-with-12-ooms-of-compute

Guy Raveh

If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments

I've just completed a master's degree in ML, though not in deep learning. I'm very sure there are still major obstacles to AGI, that will not be overcome in the next 5 years nor in the next 20. Primary among them is robust handling of OOD situations.

Look at self-driving cars as an example. It was a test case for AI companies, requiring much less than AGI to succeed, and they've so far failed despite billions in investment. From hearing about a fleet of self-driving cars that would be on the market in 2021 or 2022, estimates are now leaning more towards decades from now.

titotal

I will publicly predict now that there will be no AGI in the next 20 years. I expect significant achievements will be made, but only in areas where large amounts of relevant training data exist or can be easily generated. It will also struggle to catch on in areas like healthcare where misfiring results cause large damage and lawsuits.

I will also predict that there might be a "stall" of AI progress in a few years, once all the low-hanging fruit problems are picked off, and the remaining problems like self-driving cars aren't well suited for the current advantages of AI.

Kene David Nwosu

From hearing about a fleet of self-driving cars that would be on the market in 2021 or 2022, estimates are now leaning more towards decades from now.

Aren't there self-driving cars on the road in a few cities now? (Cruise and maybe Zoox, if I recall correctly).

lauren

just so we're clear - self driving cars are, in fact, one of the key factors pushing timelines down, and they've also done some pretty impressive work on non-killeveryone-proof safety which may be useful as hunch seeds for ainotkilleveryoneism.

they're not the only source of interesting research, though.

also, I don't think most of us who expect agi soon expect reliable agi soon. I certainly don't expect reliability to come early at all by default.

slg

This post reads like it wants to convince its readers that AGI is near/will spell doom, picking and spelling out arguments in a biased way.

Just because many ppl on the Forum and LW (including myself) believe that AI Safety is very important and isn't given enough attention by important actors, I don't want to lower our standards for good arguments in favor of more AI Safety.

Some parts of the post that I find lacking:

"We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down."

I don't think more than 1/3 of ML researchers or engineers at DeepMind, OpenAI, or Anthropic would sign this statement.

"No one knows how to predict AI capabilities."

Many people are trying though (Ajeya Cotra, EpochAI), and I think these efforts aren't worthless. Maybe a different statement could be: "New AI capabilities appear discontinuously, and we have a hard time predicting such jumps. Given this larger uncertainty, we should worry more about unexpected and potentially dangerous capability increases".

"RLHF and Fine-Tuning have not worked well so far."

Not taking into account if RLHF scales (as linked, Jan Leike of OpenAI doesn't think so) and if RLHF leads to deception, from my cursory reading and experience, ChatGPT shows substantially better behavior than Bing, which might be due to the latter not using RLHF.

Overall I do agree with the article and think that recent developments have been worrying. Still, if the goal of the articles is to get independently-thinking individuals to think about working on AI Safety, I'd prefer less extremized arguments.

Ada-Maaria Hyvärinen

We personally also recommend engaging with the writings of Eliezer, Paul, Nate, and John. We do not endorse all of their research, but they all have tackled the problem, and made a fair share of their reasoning public. If we want to get better together, they seem like a good start.

I realize this is a cross post and your original audience might know where to find all these recommendations even without further info, but if you want new people to look into their writings, it would be better to at least use full names of the authors you recommend.

Andrea_Miotti

Thanks a lot and good point, edited to include full names and links!

rvnnt

Eliezer Yudkowsky, Paul Christiano, Nate Soares (so8res), John Wentworth (johnswentworth).

Sanjay

There has been literally no regulation whatsoever to slow down AGI development

Thanks for your post; I'm sure it will be appreciated by many on this forum.

The claim that there has been literally no regulation whatsoever sounds a bit strong?

E.g. the US putting export bans on advanced chips to China? (BIS press release here, more commentary: 1, 2, 3, 4)

It looks to me like this was intended to slow down (China's) AI development, and indeed has a reasonable chance that it may slow down (overall) AI development.

(To be clear, I see this as a point of detail on one specific claim, and doesn't meaningfully detract from the overall thrust of your post)

Robi Rahman🔸

I agree the export controls on chips to China have the effect of slowing down AGI development, but that probably wasn't the intent behind the US government's decision to do this. The putative reason is to prevent China from using them in military technology.

Andrea_Miotti

Thanks, great to hear you found it useful!

As you mention, the export controls are aimed at, and have the primary effect of, differentially slowing down a specific country's AI development, rather than AGI development overall.

This has a few relevant side effects, such as reduced proliferation and competition, but doesn't slow down the frontier of overall AGI development (nor does it aim to do so).

Violet Hour

Hm, I still feel as though Sanjay’s example cuts against your point somewhat. For instance, you mentioned encountering the following response:

“It is better for us to have AGI first than [other organization], that is less safety minded than us.”

To the extent that regulations slow down potential AGI competitors in China, I’d expect stronger incentives towards safety, and a correspondingly lower chance of encountering potentially dangerous capabilities races. So, even if export bans don’t directly slow down the frontier of AI development, it seems plausible that such bans could indirectly do so (by weakening the incentives to sacrifice safety for capabilities development).

Your post + comment suggests that you nevertheless expect such regulation to have ~0 effect on AGI development races, although I’m unsure which parts of your model are driving that conclusion. I can imagine a couple of alternative pictures, with potentially different policy implications.

Your model could involve potential participants in AGI development races viewing themselves primarily in competition with other (e.g.) US firms. This, combined with short timelines, could lead you to expect the export ban to have ~0 effect on capabilities development.
- On this view, you would be skeptical about the usefulness of the export ban on the basis of skepticism about China developing AGI (given your timelines), while potentially being optimistic about the counterfactual value of domestic regulation relating to chip production.
- If this is your model, I might start to wonder “Could the chip export ban affect the regulatory Overton Window, and increase the chance of domestic chip controls?”, in a way that makes the Chinese export ban potentially indirectly helpful for slowing down AGI.
- To be clear, I'm not saying the answer to my question above is "yes", only that this is one example of a question that I'd have on one reading of your model, which I wouldn't have on other readings.
Alternatively, your model might instead be skeptical about the importance of compute, and consequently skeptical about the value of governance regimes surrounding a wide variety of even-somewhat-quixotic-suggestions relating to domestic chip regulation.
- I sensed that you might have a less compute-centric view based on your questions to leading AI researchers, asking if they “truly believe there are any major obstacles left” which major AI companies were unable to “tear down with their [current?] resources”.
- Based on that question – alongside your assigning a significant probability to <5 year timelines – I sensed that you might have a (potentially not-publicly-disclosable) impression about the current rate of algorithmic progress.^[1]

I don’t want to raise overly pernickety questions, and I’m glad you’re sharing your concerns. I’m asking for more details about your underlying model because the audience here will consist of people who (despite being far more concerned about AGI than the general population) are on average far less concerned – and on average know less about the technical/governance space – than you are. If you’re skeptical about the value of extant regulation affecting AGI development, it would be helpful at least for me (and I’m guessing others?) to have a bit more detail on what’s driving that conclusion.

^{^}
I don’t mean to suggest that you couldn’t have more ‘compute-centric’ reasons for believing in short timelines, only that some your claims (+tone) updated me a bit in this direction.

Milan Griffes

... there is a lot we can actually do. We are currently working on it quite directly at Conjecture.

I was hoping this post would explain how Conjecture sees its work as contributing to the overall AI alignment project, and was surprised to see that that topic isn't addressed at all. Could you speak to it?

slg

Comment by Paul Christiano on Lesswrong:

""RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.""
These three links are:
The first is Mysteries of mode collapse, which claims that RLHF (as well as OpenAI's supervised fine-tuning on highly-rated responses) decreases entropy. This doesn't seem particularly related to any of the claims in this paragraph, and I haven't seen it explained why this is a bad thing. I asked on the post but did not get a response.
The second is Discovering language model behaviors with model-written evaluations and shows that Anthropic's models trained with RLHF have systematically different personalities than the pre-trained model. I'm not exactly sure what claims you are citing, but I think you are making some really wild leaps.
The third is Compendium of problems with RLHF, which primarily links to the previous 2 failures and then discusses theoretical limitations.
I think these are bad citations for the claim that methods are "not working well" or that current evidence points towards trouble.
The current problems you list---"unhelpful, untruthful, and inconsistent"---don't seem like good examples to illustrate your point. These are mostly caused by models failing to correctly predict which responses a human would rate highly. That happens because models have limited capabilities and is rapidly improving as models get smarter. These are not the problems that most people in the community are worried about, and I think it's misleading to say this is what was "theorized" in the past.
I think RLHF is obviously inadequate for aligning really powerful models, both because you cannot effectively constrain a deceptively aligned model and because human evaluators will eventually not be able to understand the consequences of proposed actions. And I think it is very plausible that large language models will pose serious catastrophic risks from misalignment before they are transformative (it seems very hard to tell). But I feel like this post isn't engaging with the substance of those concerns or sensitive to the actual state of evidence about how severe the problem looks like it will be or how well existing mitigations might work.

Lixiang

I'm definitely not knowledgeable about AI, but my two cents is that there is a thing called the frame problem that makes AGI very hard to attain or even think about. I'm not gonna even try to exposit what that is, and that article is a bit dated, but I'd guess the problem still remains beyond anyone's comprehension.

CarlShulman

The kind of examples people used to use to motivate frame problem stories in the days of GOFAI in the 20th century are routinely solved by AI systems today.

Lixiang

Interesting, well maybe I'm off base then.

Comments

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·2w ago·Curated 6d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

138

Let's taboo the V-word

lincolnq·3d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·13h ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

Violet Hour

Hm, I still feel as though Sanjay’s example cuts against your point somewhat. For instance, you mentioned encountering the following response:

“It is better for us to have AGI first than [other organization], that is less safety minded than us.”

Your model could involve potential participants in AGI development races viewing themselves primarily in competition with other (e.g.) US firms. This, combined with short timelines, could lead you to expect the export ban to have ~0 effect on capabilities development.
- On this view, you would be skeptical about the usefulness of the export ban on the basis of skepticism about China developing AGI (given your timelines), while potentially being optimistic about the counterfactual value of domestic regulation relating to chip production.
- If this is your model, I might start to wonder “Could the chip export ban affect the regulatory Overton Window, and increase the chance of domestic chip controls?”, in a way that makes the Chinese export ban potentially indirectly helpful for slowing down AGI.
- To be clear, I'm not saying the answer to my question above is "yes", only that this is one example of a question that I'd have on one reading of your model, which I wouldn't have on other readings.
Alternatively, your model might instead be skeptical about the importance of compute, and consequently skeptical about the value of governance regimes surrounding a wide variety of even-somewhat-quixotic-suggestions relating to domestic chip regulation.
- I sensed that you might have a less compute-centric view based on your questions to leading AI researchers, asking if they “truly believe there are any major obstacles left” which major AI companies were unable to “tear down with their [current?] resources”.
- Based on that question – alongside your assigning a significant probability to <5 year timelines – I sensed that you might have a (potentially not-publicly-disclosable) impression about the current rate of algorithmic progress.^[1]

^{^}
I don’t mean to suggest that you couldn’t have more ‘compute-centric’ reasons for believing in short timelines, only that some your claims (+tone) updated me a bit in this direction.

^{^}

Edited to include DayDreamer, VideoDex and RT-1, h/t Alexander Kruel for these additional, better examples.

AGI in sight: our look at the game board

1. AGI is happening soon. Significant probability of it happening in less than 5 years.

2. We haven’t solved AI Safety, and we don’t have much time left.

3. Racing towards AGI: Worst game of chicken ever.

Actors

Slowing Down the Race

Question people

4. Conclusion

5. Disclaimer