Edison

Software Engineer

4 karmaJoined Jun 2026

Message

Posts
1

Sorted by New

Situational Awareness: A Two-Year Scorecard

Edison

· 5d ago · 6m read

Comments
3

Situational Awareness: A Two-Year Scorecard

Edison4d1

This is a fair push, and it splits into two questions that I think have different answers.

On "miss vs clear miss": I take the point. My "Wrong" is graded against the specific claim that open source would fade — and 3-6 months behind frontier, with the pricing collapse, is the opposite of fading. But you're right that "fade" and "no durable moat" aren't the same proposition, and a reader could reasonably hold that the first is wrong while the second is open. I'd defend "clear miss" on the narrow wording but I won't pretend the margin is huge — which is exactly why it's the one verdict with an explicit flip condition (back to Open if the gap re-widens past ~18 months for two straight generations).

On the arms-race / moat point — this is the more interesting one, and I think we're partly talking past each other. "Investors think proprietary has a moat" and "there is a moat that reduces arms-race urgency" can both be true or both be false independently. Capex pouring in is consistent with a capability lead (frontier labs ship first) without implying a diffusion moat (the lead staying scarce). Aschenbrenner's geopolitical argument needs the second, not the first — the worry was that locking down weights/algorithms denies adversaries the capability. If a near-frontier open model is downloadable months later, the lockdown buys time, not denial. So I'd actually frame your closing line as the open question rather than the settled one: are the open models close enough to change arms-race urgency? I think mid-2026 evidence leans yes more than the 2024 essay assumed, but I hold that loosely and it's the part I'd most like to be wrong about.

Where would you put the gap that would make you say the moat is real — months, or capability tiers?

How did Leopold do? Evaluating Situational Awareness's predictions

Edison5d1

This thread is exactly the kind of scrutiny I was hoping for — I graded "open-source fades" as his clearest miss on a live scorecard I built (https://agiscorecard.com), but your point about distillation muddying the picture makes me wonder if "wrong" is too strong vs "right mechanism, wrong conclusion." Curious where you'd land.

How did Leopold do? Evaluating Situational Awareness's predictions

Edison5d3

Inspired by this post (and the one-year retrospective Rasool linked), I built a continuously-updated version: a live scorecard grading each prediction as evidence comes in — https://agiscorecard.com

Current tally: 3 on track (capability trajectory, scaling pace, capex), 1 graded wrong (open-source fading — DeepSeek V4 and Qwen are now ~3-6 months behind the frontier), 2 still open (AGI-by-2027, The Project). It also puts his 2027 side-by-side with Metaculus, Samotsvety, Hassabis, and the academic survey median.

Reading the thread here, the open-source verdict seems to be the most contested one — huw and JoshYou make points cutting both ways. I'd genuinely welcome pushback on any grading; the whole thing only works if the verdicts survive scrutiny.

Edison

Posts 1

Comments3

Posts
1

Comments
3