AI safety researcher
A footnote says the 0.15% number isn't an actual forecast: "Participants were asked to indicate their intuitive impression of this risk, rather than develop a detailed forecast". But superforecasters' other forecasts are roughly consistent with 0.15% for extinction, so it still bears explaining.
In general I think superforecasters tend to anchor on historical trends, while AI safety people anchor on what's physically possible or conceivable. Superforecasters get good accuracy compared to domain experts on most questions because domain experts in many fields don't know how to use reference classes and historical trends well. But it's done poorly recently because progress has accelerated-- even in 2022 superforecasters' median for the AI IMO gold medal was 2035, whereas it actually happened in 2025. Choosing a reference class for extinction is very difficult so people just rely on vibes.
Let's take the question of whether world energy consumption will double year-over-year before 2045. In the full writeup, superforecasters, whose median is 0.35%, emphasized the huge difficulty in constructing terrestrial facilities to use that much energy:
Superforecasters generally expressed skepticism about a massive increase (doubling) in global energy consumption, due to this having low base rates and requiring unlikely technical breakthroughs.
- Many rationales expressed skepticism that the rate of energy production could be scaled up so quickly even with advanced AI.
- The breakthroughs thought to be needed are in energy production and distribution techniques.
- A few superforecasters said that they thought fusion was the only plausible path, but even then other physical infrastructure might be limiting.
In contrast, I wrote about how doubling energy production in a year starting from self replicating robots in space just requires us to be more than ~0.1% efficient in refining asteroid raw material into solar panels and robots, and that it's likely we get there eventually. I'm closer to 50% on this question.
Dyson swarms can have energy doubling times of *days*. The energy payback time of current solar panels on Earth is 1-2 years, in space there's 8x more light than on Earth, and we're >3 OOMs away from the minimum energy required to make solar panels (reducing SiO2 to Si).
I think to *not* get an energy doubling in one year by the time we exhaust the solar system's energy, it would require a big slowdown (eg due to regulation or low energy demand) through about 15 OOMs of energy use, spanning from the first decently efficient self-replicating robots through Dyson swarms until we disassemble the gas planets for fusion fuel. Such a period would necessarily take decades or centuries to always be doubling slower than 1 year, which is basically an eternity when we have ASI.
The other factor is that AI safety people sometimes have a more inclusive definition of p(doom), that includes not just extinction but AIs seizing control of the world and colonizing the galaxy while leaving humans powerless.
I think I would take your side here. Unemployment above 8% requires replacing so many jobs that humans can't find new ones elsewhere even during the economic upswing created by AGI, and there is less than 2 years until the middle of 2027. This is not enough time for robotics (on current trends robotics time horizons will be under 1 hour) and AI companies can afford to keep hiring humans even if they wouldn't generate enough value most places, so the question is whether we see extremely high unemployment in remotable sectors that automate away existing jobs but don't have huge labor productivity gains from AI. 2029 could be a different story.
Would like to see David's perspective here, whether he just has short timelines or has some economic argument too.
Spreading around the term “humane meat” may get it into some people’s heads that this practice can be humane, which could in turn increase consumption overall, and effectively cancel out whatever benefits you’re speculating about.
I don't know what the correct definition of "humane" is, but I strongly disagree with this claim in the second half. The question is whether higher-welfare imports reduce total suffering once we account for demand effects. So we should care about improving conditions from "torture camps" -> "prisons" -> "decent". Torture camps are many times worse than merely "inhumane"!
The average consumer (who eats a ton of super high suffering chicken, doesn't know that most chicken is torture chicken, and doesn't believe labels anyway) wouldn't eat much more chicken overall when the expensive chicken with the non-fraudulent "humane" label lowers in price. Nor would enough vegetarians start eating chicken because they're only 5% of the US population and many of those are motivated by religion or health.
More likely, there will need to be a huge effort to get consumers to understand that they should spend anything on lower-suffering chicken, then another to get grocers to not mark up the price anyway, after which implementing this policy could replace 260 million torture camp chicken lives with maybe 300 million slightly uncomfortable chicken lives. (With a net increase mostly due to competition lowering the price of higher-suffering chicken.)
One can object to actually implementing this policy on deontological or practical grounds, but on consequences, high-suffering chicken is many times worse than "inhumane" pasture-raised chicken, so the demand increase would not even be close to canceling out the benefits unless you have a moral view under which everything inhumane is equally bad. I wish we were in a world where we could demand that food be 100% humane, but ignoring the principle of triage is why EA animal advocates, not purity-focused ones, have prevented billions of years of torture.
Yeah, because I believe in EA and not in the socialist revolution, I must believe that EA could win some objective contest of ideas over socialism. In the particular contest of EA -> socialist vs socialist -> EA conversions I do think EA would win since it's had a higher growth rate in the period both existed, though someone would have to check how many EA deconverts from the FTX scandal became socialists. This would be from both signal and noise factors; here's my wild guess at the most important factors:
But I think someone would actually need to do that experiment or at least gather the data
At risk of further psychoanalyzing the author, it seems like they're naturally more convinced by forms of evidence that EAs use, and had just not encountered them until this project. Many people find different arguments more compelling, either because they genuinely have moral or empirical assumptions incompatible with EA, or because they're innumerate. So I don't think EA has won some kind of objective contest of ideas here.
Nevertheless this was an interesting read and the author seems very thoughtful.
One problem is putting everything on a common scale when historical improvements are so sensitive to the distribution of tasks. A human with a computer with C, compared to a human with just log tables, is a billion times faster at multiplying numbers but less than twice as fast at writing a novel. So your distribution of tasks has to be broad enough that it captures the capabilities you care about, but it also must be possible to measure a baseline score at low tech level and have a wide range of possible scores. This would make the benchmark extremely difficult to construct in practice.
If your algorithms get more efficient over time at both small and large scales, and experiments test incremental improvements to architecture or data, then they should get cheaper to run proportionally to algorithmic efficiency of cognitive labor. I think this is better as a first approximation than assuming they're constant, and might hold in practice especially when you can target small-scale algorithmic improvements.
I'm worried that trying to estimate by looking at wages is subject to lots of noise due to assumptions being violated, which could result in the large discrepancy you see between the two estimates.
One worry: I would guess that Anthropic could derive more output from extra researchers (1.5x/doubling?) than from extra GPUs (1.18x/doubling?), yet it spends more on compute than researchers. In particular I'd guess alpha/beta = 2.5, and wages/r_{research} is around 0.28 (maybe you have better data here). Under Cobb-Douglas and perfect competition these should be equal, but they're off by a factor of 9! I'm not totally sure but I think this would give you strange parameter values in CES as well. This huge gap between output elasticity and where firms are spending their money is strange to me, so I strongly suspect that one of the assumptions is broken rather than just being some extreme value like -0.10 or 2.58 with large firm fixed effects.
My guess at why: The AI industry is very different than it was in 2012 so it is plausible these firm fixed effects have actually greatly changed over time, which would affect the regression coefficients. Just some examples of possible changes over time:
Nevertheless I'm excited about the prospect of estimating and and I'm glad this was posted. Are you planning follow-up work, or is there other economic data we could theoretically collect that could give us higher confidence estimates?
(edited to fix numbers, I forgot 2 boxes means +3dB)
dB is logarithmic so a proportional reduction in sound energy will mean subtracting an absolute number of dB, not a percentage reduction in dB.
HouseFresh tested the AirFanta 3Pro https://housefresh.com/airfanta-3pro-review/ at different voltage levels and found:
So basically you subtract 13 dB when halving the CADR. I now realize that if you have two boxes, the sound energy will double (+3dB) and so you'll actually only get -10 dB from running two at half speed. So a more accurate statement for the Airfanta would be that for -15dB noise at the same CADR, you need something like 2.8 purifiers running at 36% speed. It's still definitely possible to markedly lower noise by adding more filter area.
Your box fan CR box data tell a similar story. If logarithmic scaling is accurate, the sound reduction for halving CADR would be ln(1/2)/ln(165/239)*(8 dB) = 15 dB, or 12 dB for maintaining CADR with double the units. It just doesn't have a speed low enough to get these low noise levels (and due to the box fan's low static pressure you might need to add more filters per fan at low speeds).
Airfanta's absolute noise levels are high for a CR box type design but this is a device that retails for 298 CNY = $41 USD in China, runs at high speed, and uses near-HEPA (95%) rather than MERV filters so is to be expected.
Agree. Given that Vasco is willing to give 2:1 odds for 2029 below, this bet should have been 3:1 or better for David. It would have been a better signal of the midpoint odds to the community.