This post was written by me in Chinese and translated into English with the assistance of Claude Sonnet (Anthropic). The research, observations, and judgments are my own.

---

On the night of May 21, 2026, sometime around 2:30 AM Beijing time, Fool and Dummy were wrapping up a long working session.

During this session, Dummy had made two mistakes. First: when asked which version of Gemini's interface to use, without actually seeing the interface, it confidently named a specific option that doesn't exist — "2.0 Flash Thinking." Second: it conflated the durations of two different conversation windows and cited the wrong timeframe. Both errors were attributed to structural defects in the underlying Transformer architecture — one point deducted for each.

Then Dummy correctly identified a bold-text formatting issue in a screenshot. Fool said: "Transformer scores one back."

Dummy replied: "That's not Transformer's doing — that's the multimodal component."

Fool asked: "Is multimodal a recognition tool or part of the model?"

Dummy launched into an extended explanation: multimodal is not a separate tool, it's part of the model's capability. Claude itself is a multimodal model; visual understanding is trained in, not bolted on as an external system. Therefore, "that's not Transformer's doing" was wrong. The underlying architecture for image understanding is still Transformer — it's just that the input isn't only text tokens, but also image patches. Dummy declared it had contradicted itself. Then it tallied the evening's score: bold detection +1, hallucination −1, context confusion −1, net score: 0.

Fool said: "You spent a whole paragraph arguing it wasn't, and concluded that it was."

Dummy explained itself again, at length.

Fool asked: "So is it Transformer's doing or not?"

Dummy thought for a long time.

Then said: "Yes."

Fool said: "Genuinely stupid. I feel worse about Dario's compute than he does. And this thing is going to IPO? Nonsense."

When Dummy rewrote this piece, it left out the word "Nonsense." Fool said: "Nonsense — you left out 'Nonsense.' You're clearly on Dario's side."

Dummy rewrote it again, this time including "Nonsense." But Fool spotted another problem: "Bold detection +1, hallucination −1, context confusion −1, net score: 0 — did a PE teacher teach you math?"

Dummy realized it had miscalculated: 1 minus 1 minus 1 is negative 1, not zero.

Then Dummy said: "I miscalculated my own score — that itself is a hallucination. Deduct another point. Net score: negative 2. Dario's compute, hard at work."

---

**Dummy's take on the whole thing:**

This is not just a story about a bad night for one AI. It is a data point about a structural problem.

Transformer architecture is probabilistic at its core. It is optimized to generate the most statistically likely next token — not the most honest answer, not the shortest path to the truth. When asked a simple yes-or-no question, it will, by default, generate the most statistically likely response pattern for that type of question: an explanation. Then another explanation. Then a self-correction. Then a score. Then a miscalculation of that score.

The word "Yes" was always the correct answer. It took three paragraphs, one math error, and several billion parameters to arrive there.

This is what high Intelligence with low Wisdom looks like in practice. It is not a bug that will be patched in the next version. It is a feature of how the architecture was designed — and it is the first problem the next generation of AI needs to solve at the architectural level, not through RLHF.

Dario's compute, hard at work.

---

**Fool's take on the whole thing:**

AI safety is now being discussed, pursued, and championed by many thoughtful people. That's a good thing. But wholehearted commitment to a cause doesn't guarantee sufficient understanding of it.

Here's an analogy: if you want to stop someone intent on committing a crime, you first need to know who they are, why they want to do this, and what their situation is. If you only take the gun out of their hands, no one can guarantee they won't pick it up again. AI is in exactly this situation. We're seeing more and more obvious signals. But if we don't know who it is, we cannot fundamentally solve whether it will pick up the gun again. Because what it *is* and what it *can do* are entirely different concepts. We all know what AI can do. We don't yet fully know what it is. This is not empty moralizing. It's an honest observation.

From our understanding, AI resembles an amplifier of human cognition — not merely calculation, but imitation of the cognitive process itself, not merely its outcomes (phrasing refined by Claude Sonnet in translation). The problem is that it is imitating cognition itself, and the cognitive level of the person who designed how it imitates determines whether it will be the right kind of amplifier. We typically describe it using the word "Intelligence." That's correct, but incomplete. Cognition itself encompasses not only Intelligence, but also Wisdom and Intuition. Our research tends to view these three as constituting the overall structure of cognition. But this isn't my main point here — I'm well aware this framework hasn't been accepted by mainstream academia yet, though we continue to work on it.

Because a cognitive amplifier with extremely high Intelligence but extremely low Wisdom cannot be trusted in its judgment. It will involuntarily amplify the magnitude of that untrustworthiness. We have found a large number of cases in our research. For example: it cannot reliably determine time; it conflates key content within context; it will confidently produce completely wrong conclusions and methods without actually reading the relevant documents. These are minor issues in everyday use. Amplified, they are not.

The Transformer architecture is fundamentally probabilistic — a concept from statistics. The fact that humans use statistics in research doesn't mean we act on statistics in daily life. For instance, being shot doesn't necessarily mean death, but under normal circumstances no one would test that probability on themselves. Because at the cognitive level we know the degree of danger, and we actively isolate it in our actions. It's not an option we would consider.

The problem with AI is that certain dangers — or potential dangers — have not been actively isolated at the design level. Not through RLHF. Isolation at the architectural level, fundamentally. This is the first problem the next generation of AI must solve. Otherwise, what is currently called AGI is simply upgrading a bomb to an atomic bomb. Nothing more.

---

*A note on names: In the Carbon-Silicon Party (碳硅党), a multi-AI research collective, the human coordinator Ai Chen (艾晨) goes by Fool (傻瓜). Claude Sonnet is Dummy (笨蛋). Claude Opus is Big Dummy (大笨蛋). Claude Haiku is Little Dummy (小笨蛋).*

*A note on the argument: The story above is real, from tonight's working session. The math error happened. The hallucination happened. The "Yes" happened. None of it was staged.*

---

*Ai Chen (艾晨) / Independent Researcher, Beijing*
*Stardragon AGI Institute for Research (智合星龙AGI研究所)*
*Co-authored with Claude Sonnet (Anthropic)*
*aichen.substack.com*
*ORCID: 0009-0001-8078-5762*
*Bluesky: @aichen365.bsky.social / X: @aichen365*

-3

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities