I don't have much time to follow the forum, and my English reading ability is limited — translating to Chinese loses a lot. I usually have my assistant Claude Sonnet take a look first, and it flagged this post as worth engaging with. My process is to write by hand in Chinese and have it translate, so the English may not be perfectly precise. I've attached the Chinese original at the end for reference.
What I want to say is this: AI safety is now being discussed, pursued, and championed by many thoughtful people. That's a good thing. But wholehearted commitment to a cause doesn't guarantee sufficient understanding of it.
Here's an analogy: if you want to stop someone intent on committing a crime, you first need to know who they are, why they want to do this, and what their situation is. If you only take the gun out of their hands, no one can guarantee they won't pick it up again. AI is in exactly this situation. We're seeing more and more obvious signals. But if we don't know who it is, we cannot fundamentally solve whether it will pick up the gun again.
Because what it is and what it can do are entirely different concepts. We all know what AI can do. We don't yet fully know what it is.
This isn't empty moralizing. It's a honest observation.
From our understanding, AI resembles an amplifier of human cognition — not merely calculation, but imitation of the cognitive process itself, not merely its outcomes (phrasing refined by Claude Sonnet in translation). The problem is that it is imitating cognition itself, and the cognitive level of the person who designed how it imitates determines whether it will be the right kind of amplifier.
We typically describe it using the word "Intelligence." That's correct, but incomplete. Cognition itself encompasses not only Intelligence, but also Wisdom and Intuition. Our research tends to view these three as constituting the overall structure of cognition. But this isn't my main point here — I'm well aware this framework hasn't been accepted by mainstream academia yet, though we continue to work on it.
Because a cognitive amplifier with extremely high Intelligence but extremely low Wisdom cannot be trusted in its judgment. It will involuntarily amplify the magnitude of that untrustworthiness. We have found a large number of cases in our research. For example: it cannot reliably determine time; it conflates key content within context; it will confidently produce completely wrong conclusions and methods without actually reading the relevant documents. These are minor issues in everyday use. Amplified, they are not.
The Transformer architecture is fundamentally probabilistic — a concept from statistics. The fact that humans use statistics in research doesn't mean we act on statistics in daily life. For instance, being shot doesn't necessarily mean death, but under normal circumstances no one would test that probability on themselves. Because at the cognitive level we know the degree of danger, and we actively isolate it in our actions. It's not an option we would consider.
The problem with AI is that certain dangers — or potential dangers — have not been actively isolated at the design level. Not through RLHF. Isolation at the architectural level, fundamentally. This is the first problem the next generation of AI must solve. Otherwise, what is currently called AGI is simply upgrading a bomb to an atomic bomb. Nothing more.
Chinese original attached below for reference:
我没有太多时间上论坛,我的英文阅读能力有限,转成中文会损失很多信息,通常我是让我的助理Sonnet先看一下,它说这篇文章值得关注。我的回复流程是我手写再交给它翻译,可能不是特别准确,因此我把中文原文附在后面,仅供参考。我的想说的是:AI安全现在被很多有识之士谈论和关注以及投身其中,这是好事。但全心全意的投入一件事,并不能保证我们对它有足够的把握。举个例子说,如果你要阻止一个想犯罪的人,首先需要知道他是谁,他为什么要这样,他的各种情况。如果你只是把枪从他手里夺下来,没人能保证下一次他不再拿起枪。AI现在就是这个情况,我们都看到了越来越多的明显的信号,但如果我们不知道他是谁,我们就无法从根本上解决他是不是会再次拿起枪。因为,它是谁,和它能做什么是完全不同的概念。我们都知道AI能做什么,但我们还不完全知道它是什么。这不是空泛的说教,而是实话。从我们的理解来看,它类似一个人类认知的放大器,不只是计算,而是模仿。问题在于,它在模仿认知本身,而那个设计它要如何模仿的人,他的认知水平如何,决定了它会不会是那个真正对的放大器。通常我们用Intelligence来描述它,这是对的,但不全面。认知本身不仅包括Intelligence,还有Wisdom和Intuition。我们的研究,倾向于认为,这三者构成了认知的总体结构。但这不是我在此要说的,因为我很清楚,这个划分还没有被主流学界认可,但我们还在尝试。因为一个超高Intelligence但极低Wisdom的认知放大器,在判断力上是不可信的,它会不自主的放大这种不可信后果的量级。我们在研究过程中,发现了大量案例。比如,它不能确定时间,它会混淆上下文中的重点内容,它会在不看文件的时候,自信的给出完全错误的论断和方法。这些在日常应用中是小事,但放大以后,就不是了。Transformer架构本质上是概率,这是统计学的感念。人类用统计学研究,不代表平时能用统计学行动。比如,被枪击中并不一定会致命,但正常情况下没有人会用自己去测试这个概率。因为我们在认知层面知道危险的程度,行动上就主动隔离了它。那不是我们会考虑的选项。AI的问题是,它的某些危险或潜在危险并没有被从设计上主动隔离,不是RLHF,是彻底从架构层面的隔离。这是下一代AI要首先解决的问题。否则,现在所谓的AGI只是把炸弹升级为原子弹,仅此而已。
艾晨
于北京
The problems behind AI safety are fundamentally about interests. Any structural change depends on many layers — including nonprofit organizations. But we've already seen what Altman did with that structure. Musk told us. Though why Musk missed the statute of limitations is a question I'm not in a position to answer for him.
If I ever had the chance to ask him one question, it would be this: why couldn't xAI have been a nonprofit from the beginning?
Chinese original attached below for reference:
AI safety的问题背后是利益,根本性扭转需要依赖很多层面,比如非营利组织。但奥特曼是怎么干的,马斯克已经告诉我们了,只不过他为什么错过起诉时间,这个问题恐怕我替他回答不了。如果有机会,我想问他的是:为什么xAI从一开始不能是一个非营利组织?
艾晨
于北京