Hide table of contents

From our point of view, we are now in the end-game for AGI, and we (humans) are losing. When we share this with other people, they reliably get surprised. That’s why we believe it is worth writing down our beliefs on this.

1. AGI is happening soon. Significant probability of it happening in less than 5 years.

Five years ago, there were many obstacles on what we considered to be the path to AGI.

But in the last few years, we’ve gotten:

We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.

Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.

2. We haven’t solved AI Safety, and we don’t have much time left.

We are very close to AGI. But how good are we at safety right now? Well.

No one knows how to get LLMs to be truthful. LLMs make things up, constantly. It is really hard to get them not to do this, and we don’t know how to do this at scale.

Optimizers quite often break their setup in unexpected ways. There have been quite a few examples of this. But in brief, the lessons we have learned are:

  • Optimizers can yield unexpected results
  • Those results can be very weird (like breaking the simulation environment)
  • Yet very few extrapolate from this and find these as worrying signs

No one understands how large models make their decisions. Interpretability is extremely nascent, and mostly empirical. In practice, we are still completely in the dark about nearly all decisions taken by large models.

RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.

No one knows how to predict AI capabilities. No one predicted the many capabilities of GPT3. We only discovered them after the fact, while playing with the models. In some ways, we keep discovering capabilities now thanks to better interfaces and more optimization pressure by users, more than two years in. We’re seeing the same phenomenon happen with ChatGPT and the model behind Bing Chat.

We are uncertain about the true extent of the capabilities of the models we’re training, and we’ll be even more clueless about upcoming larger, more complex, more opaque models coming out of training. This has been true for a couple of years by now.

3. Racing towards AGI: Worst game of chicken ever.

The Race for powerful AGIs has already started. There already are general AIs. They just are not powerful enough yet to count as True AGIs.

Actors

Regardless of why people are doing it, they are racing for AGI. Everyone has their theses, their own beliefs about AGIs and their motivations. For instance, consider:

AdeptAI is working on giving AIs access to everything. In their introduction post, one can read “True general intelligence requires models that can not only read and write, but act in a way that is helpful to users. That’s why we’re starting Adept: we’re training a neural network to use every software tool and API in the world”, and furthermore, that they “believe this is actually the most practical and safest path to general intelligence” (emphasis ours).

DeepMind has done a lot of work on RLagents and multi-modalities. It is literally in their mission statement to “solve intelligence, developing more general and capable problem-solving systems, known as AGI”.

OpenAI has a mission statement more focused on safety: “We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome”. Unfortunately, they have also been a major kickstarter of the race with GPT3 and then ChatGPT.

(Since we started writing this post, Microsoft deployed what could be OpenAI’s GPT4 on Bing, plugged directly into the internet.)

Slowing Down the Race

There has been literally no regulation whatsoever to slow down AGI development. As far as we know, the efforts of key actors don’t go in this direction.

We don’t know of any major AI lab that has participated in slowing down AGI development, or publicly expressed interest in it.

Here are a few arguments that we have personally encountered, multiple times, for why slowing down AGI development is actually bad:

  • “AGI safety is not a big problem, we should improve technology as fast as possible for the people”
  • “Once we have stronger AIs, we can use them to work on safety. So it is better to race for stronger AIs and do safety later.”
  • “It is better for us to deploy AGI first than [authoritarian country], which would be bad.”
  • “It is better for us to have AGI first than [other organization], that is less safety minded than us.”
  • “We can’t predict the future. Possibly, it is better to not slow down AGI development, so that at some point there is naturally a big accident, and then the public and policymakers will understand that AGI safety is a big deal.”
  • “It is better to have AGI ASAP, so that we can study it longer for safety purposes, before others get it.”
  • “It is better to have AGI ASAP, so that at least it has access to fewer compute for RSI / world-takeover than in the world where it comes 10 years later.”
  • “Policymakers are clueless about this technology, so it’s impossible to slow down, they will just fail in their attempts to intervene. Engineers should remain the only ones deciding where the technology goes”

Remember that arguments are soldiers: there is a whole lot more interest in pushing for the “Racing is good” thesis than for slowing down AGI development.

Question people

We could say more. But:

  • We are not high status, “core” members of the community.
  • We work at Conjecture, so what we write should be read as biased.
  • There are expectations of privacy when people talk to us. Not complete secrecy about everything. But still, they expect that we would not directly attribute quotes to them for instance, and we will not do so without each individual’s consent.
  • We expect we could say more things that would not violate expectations of privacy (public things even!). But we expect niceness norms (that we find often detrimental and naive) and legalities (because we work at what can be seen as a competitor) would heavily punish us.

So our message is: things are worse than what is described in the post!
Don’t trust blindly, don’t assume: ask questions and reward openness.

Recommendations:

  • Question people, report their answers in your whisper networks, in your Twitter sphere or whichever other places you communicate on.
    • An example of “questioning” is asking all of the following questions:
      • Do you think we should race toward AGI? If so, why? If not, do you think we should slow down AGI? What does your organization think? What is it doing to push for capabilities and race for AGI compared to slowing down capabilities?
      • What is your alignment plan? What is your organization’s alignment plan? If you don’t know if you have one, did you ask your manager/boss/CEO what their alignment plan is?
  • Don’t substitute social fluff for information: someone being nice, friendly, or being liked by people, does not mean they have good plans, or any plans at all. The reverse also holds!
  • Gossiping and questioning people about their positions on AGI are prosocial activities!
  • Silence benefits people who lie or mislead in private, telling others what they want to hear.
  • Open Communication Norms benefit people who are consistent (not necessarily correct, or even honest, but at least consistent).

4. Conclusion

Let’s summarize our point of view:

  • AGI by default very soon: brace for impact
  • No safety solutions in sight: we have no airbag
  • Race ongoing: people are actually accelerating towards the wall

Should we just give up and die?

Nope! And not just for dignity points: there is a lot we can actually do. We are currently working on it quite directly at Conjecture.

We’re not hopeful that full alignment can be solved anytime soon, but we think that narrower sub-problems with tighter feedback loops, such as ensuring the boundedness of AI systems, are promising directions to pursue.

If you are interested in working together on this (not necessarily by becoming an employee or funding us), send an email with your bio and skills, or just a private message here.

We personally also recommend engaging with the writings of Eliezer Yudkowsky, Paul Christiano, Nate Soares, and John Wentworth. We do not endorse all of their research, but they all have tackled the problem, and made a fair share of their reasoning public. If we want to get better together, they seem like a good start.

5. Disclaimer

We acknowledge that the points above don’t go deeply into our models of why these situations are the case. Regardless, we wanted our point of view to at least be written in public.

For many readers, these problems will be obvious and require no further explanation. For others, these claims will be controversial: we’ll address some of these cruxes in detail in the future if there’s interest. 

Some of these potential cruxes include:

  • Adversarial examples are not only extreme cases, but rather they are representative of what you should expect conditioned on sufficient optimization.
  • Monitoring of increasingly advanced systems does not trivially work, since much of the cognition of advanced systems, and many of their dangerous properties, will be externalized the more they interact with the world.
  • Even perfect interpretability will not solve the problem alone: not everything is in the feed forward layer, and the more models interact with the world the truer this becomes.
  • Even with more data, RLHF and fine-tuning can’t solve alignment. These techniques don’t address deception and inner alignment, and what is natural in the RLHF ontology is not natural for humans and vice-versa.
  1. ^

    Edited to include DayDreamer, VideoDex and RT-1, h/t Alexander Kruel for these additional, better examples.

25

0
0

Reactions

0
0

More posts like this

Comments18


Sorted by Click to highlight new comments since:

I just want to register a meta-level disagreement with this post which is your recommendations seem like really bad epistemics. I don't think we should just heuristics and information cascade ourselves to death as a community but actually create good gears level understandings of forecasting AI progress. 

  1. You cite that AI accelerationist arguments act as soldiers but you literally are deploying arguments as soldiers in this post!
  2. You recommend terrible weird gossiping anti-agency mechanisms instead of pro-agency actions like work on safety, upskill, and field build.
  3. You make a lot of arguments in negation that feel like weird sleight of hands. For instance, you say "We don’t know of any major AI lab that has participated in slowing down AGI development, or publicly expressed interest in it" but OpenAI's charter literally has the assist clause (regardless of whether or not you believe it's a promise they will hold it exists).

To be clear I think there are good arguments for short timelines (median 5-10) but you don't actually make them here[1]. What you do instead is:

  1. Say you can express technical disagreement but not say any empirical examples/obstacles because that's infohazardous.
  2. A lot of the heuristics based arguments can't even be verified or prodded because they are "private conversations" which is I guess fine but then what do you want people to do with that?

I think people should think for themselves and engage with the arguments and models people provide for timelines and threat models but this post doesn't do that. It just directionally vibes a high p(doom) with a short timelines and tells people to panic and gossip. 

  1. ^

If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments

I've just completed a master's degree in ML, though not in deep learning. I'm very sure there are still major obstacles to AGI, that will not be overcome in the next 5 years nor in the next 20. Primary among them is robust handling of OOD situations.

Look at self-driving cars as an example. It was a test case for AI companies, requiring much less than AGI to succeed, and they've so far failed despite billions in investment. From hearing about a fleet of self-driving cars that would be on the market in 2021 or 2022, estimates are now leaning more towards decades from now.

I will publicly predict now that there will be no AGI in the next 20 years. I expect significant achievements will be made, but only in areas where large amounts of relevant training data exist or can be easily generated. It will also struggle to catch on in areas like healthcare where misfiring results cause large damage and lawsuits.  

I will also predict that there might be a "stall" of AI progress in a few years, once all the low-hanging fruit problems are picked off, and the remaining problems like self-driving cars aren't well suited for the current advantages of AI. 

From hearing about a fleet of self-driving cars that would be on the market in 2021 or 2022, estimates are now leaning more towards decades from now.

Aren't there self-driving cars on the road in a few cities now? (Cruise and maybe Zoox, if I recall correctly). 

just so we're clear - self driving cars are, in fact, one of the key factors pushing timelines down, and they've also done some pretty impressive work on non-killeveryone-proof safety which may be useful as hunch seeds for ainotkilleveryoneism.

they're not the only source of interesting research, though.

also, I don't think most of us who expect agi soon expect reliable agi soon. I certainly don't expect reliability to come early at all by default.

slg
22
7
2

This post reads like it wants to convince its readers that AGI is near/will spell doom, picking and spelling out arguments in a biased way. 

Just because many ppl on the Forum and LW (including myself) believe that AI Safety is very important and isn't given enough attention by important actors, I don't want to lower our standards for good arguments in favor of more AI Safety.

Some parts of the post that I find lacking:

 "We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down."

I don't think more than 1/3 of ML researchers or engineers at DeepMind, OpenAI, or Anthropic would sign this statement.

"No one knows how to predict AI capabilities."

Many people are trying though (Ajeya Cotra, EpochAI), and I think these efforts aren't worthless. Maybe a different statement could be: "New AI capabilities appear discontinuously, and we have a hard time predicting such jumps. Given this larger uncertainty, we should worry more about unexpected and potentially dangerous capability increases".

"RLHF and Fine-Tuning have not worked well so far."

Not taking into account if RLHF scales (as linked, Jan Leike of OpenAI doesn't think so) and if RLHF leads to deception, from my cursory reading and experience, ChatGPT shows substantially better behavior than Bing, which might be due to the latter not using RLHF.


Overall I do agree with the article and think that recent developments have been worrying. Still, if the goal of the articles is to get independently-thinking individuals to think about working on AI Safety, I'd prefer less extremized arguments.

We personally also recommend engaging with the writings of Eliezer, Paul, Nate, and John. We do not endorse all of their research, but they all have tackled the problem, and made a fair share of their reasoning public. If we want to get better together, they seem like a good start.

 

I realize this is a cross post and your original audience might know where to find all these recommendations even without further info, but if you want new people to look into their writings, it would be better to at least use full names of the authors you recommend.

Thanks a lot and good  point, edited to include full names and links!

Eliezer Yudkowsky, Paul Christiano, Nate Soares (so8res), John Wentworth (johnswentworth).

There has been literally no regulation whatsoever to slow down AGI development

Thanks for your post; I'm sure it will be appreciated by many on this forum.

The claim that there has been literally no regulation whatsoever sounds a bit strong?

E.g. the US putting export bans on advanced chips to China? (BIS press release here, more commentary: 1, 2, 3, 4)

It looks to me like this was intended to slow down (China's) AI development, and indeed has a reasonable chance that it may slow down (overall) AI development.

(To be clear, I see this as a point of detail on one specific claim, and doesn't meaningfully detract from the overall thrust of your post)

I agree the export controls on chips to China have the effect of slowing down AGI development, but that probably wasn't the intent behind the US government's decision to do this. The putative reason is to prevent China from using them in military technology.

Thanks, great to hear you found it useful!

As you mention, the export controls are aimed at, and have the primary effect of, differentially slowing down a specific country's AI development, rather than AGI development overall.

This has a few relevant side effects, such as reduced proliferation and competition, but doesn't slow down the frontier of overall AGI development  (nor does it aim to do so).

 

Hm, I still feel as though Sanjay’s example cuts against your point somewhat. For instance, you mentioned encountering the following response: 

“It is better for us to have AGI first than [other organization], that is less safety minded than us.”

To the extent that regulations slow down potential AGI competitors in China, I’d expect stronger incentives towards safety, and a correspondingly lower chance of encountering potentially dangerous capabilities races. So, even if export bans don’t directly slow down the frontier of AI development, it seems plausible that such bans could indirectly do so (by weakening the incentives to sacrifice safety for capabilities development).

Your post + comment suggests that you nevertheless expect such regulation to have ~0 effect on AGI development races, although I’m unsure which parts of your model are driving that conclusion. I can imagine a couple of alternative pictures, with potentially different policy implications.

  • Your model could involve potential participants in AGI development races viewing themselves primarily in competition with other (e.g.) US firms. This, combined with short timelines, could lead you to expect the export ban to have ~0 effect on capabilities development.
    • On this view, you would be skeptical about the usefulness of the export ban on the basis of skepticism about China developing AGI (given your timelines), while potentially being optimistic about the counterfactual value of domestic regulation relating to chip production. 
    • If this is your model, I might start to wonder “Could the chip export ban affect the regulatory Overton Window, and increase the chance of domestic chip controls?”, in a way that makes the Chinese export ban potentially indirectly helpful for slowing down AGI. 
    • To be clear, I'm not saying the answer to my question above is "yes", only that this is one example of a question that I'd have on one reading of your model, which I wouldn't have on other readings.
  • Alternatively, your model might instead be skeptical about the importance of compute, and consequently skeptical about the value of governance regimes surrounding a wide variety of even-somewhat-quixotic-suggestions relating to domestic chip regulation.
    • I sensed that you might have a less compute-centric view based on your questions to leading AI researchers, asking if they “truly believe there are any major obstacles left” which major AI companies were unable to “tear down with their [current?] resources”. 
    • Based on that question – alongside your assigning a significant probability to <5 year timelines – I sensed that you might have a (potentially not-publicly-disclosable) impression about the current rate of algorithmic progress.[1]

I don’t want to raise overly pernickety questions, and I’m glad you’re sharing your concerns. I’m asking for more details about your underlying model because the audience here will consist of people who (despite being far more concerned about AGI than the general population) are on average far less concerned – and on average know less about the technical/governance space – than you are. If you’re skeptical about the value of extant regulation affecting AGI development, it would be helpful at least for me (and I’m guessing others?) to have a bit more detail on what’s driving that conclusion.
 

  1. ^

     I don’t mean to suggest that you couldn’t have more ‘compute-centric’ reasons for believing in short timelines, only that some your claims (+tone) updated me a bit in this direction.

... there is a lot we can actually do. We are currently working on it quite directly at Conjecture

I was hoping this post would explain how Conjecture sees its work as contributing to the overall AI alignment project, and was surprised to see that that topic isn't addressed at all. Could you speak to it?

Comment by Paul Christiano on Lesswrong:

 

""RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.""

These three links are:

  • The first is Mysteries of mode collapse, which claims that RLHF (as well as OpenAI's supervised fine-tuning on highly-rated responses) decreases entropy. This doesn't seem particularly related to any of the claims in this paragraph, and I haven't seen it explained why this is a bad thing. I asked on the post but did not get a response.
  • The second is Discovering language model behaviors with model-written evaluations and shows that Anthropic's models trained with RLHF have systematically different personalities than the pre-trained model.  I'm not exactly sure what claims you are citing, but I think you are making some really wild leaps.
  • The third is Compendium of problems with RLHF, which primarily links to the previous 2 failures and then discusses theoretical limitations.

I think these are bad citations for the claim that methods are "not working well" or that current evidence points towards trouble.

The current problems you list---"unhelpful, untruthful, and inconsistent"---don't seem like good examples to illustrate your point. These are mostly caused by models failing to correctly predict which responses a human would rate highly. That happens because models have limited capabilities and is rapidly improving as models get smarter. These are not the problems that most people in the community are worried about, and I think it's misleading to say this is what was "theorized" in the past.

I think RLHF is obviously inadequate for aligning really powerful models, both because you cannot effectively constrain a deceptively aligned model and because human evaluators will eventually not be able to understand the consequences of proposed actions. And I think it is very plausible that large language models will pose serious catastrophic risks from misalignment before they are transformative (it seems very hard to tell). But I feel like this post isn't engaging with the substance of those concerns or sensitive to the actual state of evidence about how severe the problem looks like it will be or how well existing mitigations might work.

I'm definitely not knowledgeable about AI, but my two cents is that there is a thing called the frame problem that makes AGI very hard to attain or even think about. I'm not gonna even try to exposit what that is, and that article is a bit dated, but I'd guess the problem still remains beyond anyone's comprehension.

The kind of examples people used to use to motivate frame problem stories in the days of GOFAI in the 20th century  are routinely solved by AI systems today. 

Interesting, well maybe I'm off base then.

Curated and popular this week
 ·  · 25m read
 · 
Epistemic status: This post — the result of a loosely timeboxed ~2-day sprint[1] — is more like “research notes with rough takes” than “report with solid answers.” You should interpret the things we say as best guesses, and not give them much more weight than that. Summary There’s been some discussion of what “transformative AI may arrive soon” might mean for animal advocates. After a very shallow review, we’ve tentatively concluded that radical changes to the animal welfare (AW) field are not yet warranted. In particular: * Some ideas in this space seem fairly promising, but in the “maybe a researcher should look into this” stage, rather than “shovel-ready” * We’re skeptical of the case for most speculative “TAI<>AW” projects * We think the most common version of this argument underrates how radically weird post-“transformative”-AI worlds would be, and how much this harms our ability to predict the longer-run effects of interventions available to us today. Without specific reasons to believe that an intervention is especially robust,[2] we think it’s best to discount its expected value to ~zero. Here’s a brief overview of our (tentative!) actionable takes on this question[3]: ✅ Some things we recommend❌ Some things we don’t recommend * Dedicating some amount of (ongoing) attention to the possibility of “AW lock ins”[4]  * Pursuing other exploratory research on what transformative AI might mean for animals & how to help (we’re unconvinced by most existing proposals, but many of these ideas have received <1 month of research effort from everyone in the space combined — it would be unsurprising if even just a few months of effort turned up better ideas) * Investing in highly “flexible” capacity for advancing animal interests in AI-transformed worlds * Trying to use AI for near-term animal welfare work, and fundraising from donors who have invested in AI * Heavily discounting “normal” interventions that take 10+ years to help animals * “Rowing” on na
 ·  · 3m read
 · 
About the program Hi! We’re Chana and Aric, from the new 80,000 Hours video program. For over a decade, 80,000 Hours has been talking about the world’s most pressing problems in newsletters, articles and many extremely lengthy podcasts. But today’s world calls for video, so we’ve started a video program[1], and we’re so excited to tell you about it! 80,000 Hours is launching AI in Context, a new YouTube channel hosted by Aric Floyd. Together with associated Instagram and TikTok accounts, the channel will aim to inform, entertain, and energize with a mix of long and shortform videos about the risks of transformative AI, and what people can do about them. [Chana has also been experimenting with making shortform videos, which you can check out here; we’re still deciding on what form her content creation will take] We hope to bring our own personalities and perspectives on these issues, alongside humor, earnestness, and nuance. We want to help people make sense of the world we're in and think about what role they might play in the upcoming years of potentially rapid change. Our first long-form video For our first long-form video, we decided to explore AI Futures Project’s AI 2027 scenario (which has been widely discussed on the Forum). It combines quantitative forecasting and storytelling to depict a possible future that might include human extinction, or in a better outcome, “merely” an unprecedented concentration of power. Why? We wanted to start our new channel with a compelling story that viewers can sink their teeth into, and that a wide audience would have reason to watch, even if they don’t yet know who we are or trust our viewpoints yet. (We think a video about “Why AI might pose an existential risk”, for example, might depend more on pre-existing trust to succeed.) We also saw this as an opportunity to tell the world about the ideas and people that have for years been anticipating the progress and dangers of AI (that’s many of you!), and invite the br
 ·  · 12m read
 · 
I donated my left kidney to a stranger on April 9, 2024, inspired by my dear friend @Quinn Dougherty (who was inspired by @Scott Alexander, who was inspired by @Dylan Matthews). By the time I woke up after surgery, it was on its way to San Francisco. When my recipient woke up later that same day, they felt better than when they went under. I'm going to talk about one complication and one consequence of my donation, but I want to be clear from the get: I would do it again in a heartbeat. Correction: Quinn actually donated in April 2023, before Scott’s donation. He wasn’t aware that Scott was planning to donate at the time. The original seed came from Dylan's Vox article, then conversations in the EA Corner Discord, and it's Josh Morrison who gets credit for ultimately helping him decide to donate. Thanks Quinn! I met Quinn at an EA picnic in Brooklyn and he was wearing a shirt that I remembered as saying "I donated my kidney to a stranger and I didn't even get this t-shirt." It actually said "and all I got was this t-shirt," which isn't as funny. I went home and immediately submitted a form on the National Kidney Registry website. The worst that could happen is I'd get some blood tests and find out I have elevated risk of kidney disease, for free.[1] I got through the blood tests and started actually thinking about whether to do this. I read a lot of arguments, against as well as for. The biggest risk factor for me seemed like the heightened risk of pre-eclampsia[2], but since I live in a developed country, this is not a huge deal. I am planning to have children. We'll just keep an eye on my blood pressure and medicate if necessary. The arguments against kidney donation seemed to center around this idea of preserving the sanctity or integrity of the human body: If you're going to pierce the sacred periderm of the skin, you should only do it to fix something in you. (That's a pretty good heuristic most of the time, but we make exceptions to give blood and get pier