Noah Birnbaum

Junior @ University of Chicago

627 karmaJoined May 2023Pursuing an undergraduate degree

Interests:

PhilosophyGlobal priorities researchEntrepreneurship

Bio

Participation
7

I am a rising junior at the University of Chicago (co-president of UChicago EA and founder of Rationality Group). I am mostly interested in philosophy (particularly metaethics, formal epistemology, and decision theory), economics, and entrepreneurship.

I also have a Substack where I post about philosophy (ethics, epistemology, EA, and other stuff). Find it here: https://substack.com/@irrationalitycommunity?utm_source=user-menu.

Reach out to me via email @ dnbirnbaum@uchicago.edu

How others can help me

If anyone has any opportunities to do effective research in the philosophy space (or taking philosophy to real life/ related field) or if anyone has any more entrepreneurial opportunities, I would love to hear about them. Feel free to DM me!

How I can help others

I can help with philosophy stuff (maybe?) and organizing school clubs (maybe?)

Posts
20

Sorted by New

Noah Birnbaum's Quick takes

Noah Birnbaum

· 4mo ago · 1m read

The software intelligence explosion debate needs experiments (linkpost)

Noah Birnbaum

· 11d ago · 9m read

Against being fanatical

Noah Birnbaum

· 2mo ago · 9m read

133

Why You Should Become a University Group Organizer

Noah Birnbaum

· 4mo ago · 8m read

Yes, Wellbeing Can Be Aggregated

Noah Birnbaum

· 4mo ago · 13m read

Morality Isn’t Objective

Noah Birnbaum

· 5mo ago · 19m read

University Groups Should Run Socials

Avik Garg

· 7mo ago · 8m read

Informal Poll on Moral Realism Among EAs at EAG Bay Area

Noah Birnbaum

· 8mo ago · 1m read

Issues in (Meta) Normative Ethics

Noah Birnbaum

· 10mo ago · 13m read

Children in Low-Income Countries

Noah Birnbaum

· 1y ago · 1m read

Comments
65

Cause prio cruxes in 2026?

Answer by Noah BirnbaumNov 12, 20259

Some questions that might be cruxy and important for money allocation:

Because there is some evidence that superforecaster aggregation might underperform in AI capabilities, how should epistemic weight be distributed between generalist forecasters, domain experts, and algorithmic prediction models? What evidence exists/can be gotten about their relative track records?

Are there better ways to do AIS CEA? What are they?

Is there productive work to be done in inter-cause comparison among new potential cause areas (i.e. digital minds, space governance, etc)? What types of assumptions do these rely on? I ask because it seems like people typically go into these fields because "woah, those numbers are really big," but that sort of reasoning applies to lots of those fields and doesn't tell you very much about resource distribution.

What are the reputational effects for EA (for people inside and outside the movement) going (more) all in on certain causes and then being wrong (i.e. AI is and continues to be a bubble)? Should this change how much EA should go in on things? Under what assumptions?

Sudhanshu Kasewa's Quick takes

Noah Birnbaum17d1

This is great, and people should do this for more cause areas!

Noah Birnbaum's Quick takes

Noah Birnbaum17d15

Opportunities

Idea for someone with a bit of free time:

While I don't have the bandwidth for this atm, someone should make a public (or private for, say, policy/reputation reasons) list of people working in (one or multiple of) the very neglected cause areas — e.g., digital minds (this is a good start), insect welfare, space governance, AI-enabled coups, and even AI safety (more for the second reason than others). Optional but nice-to-have(s): notes on what they’re working on, time contributed, background, sub-area, and the rough rate of growth in the field (you probably don’t want to decide career moves purely on current headcounts). And remember: perfection is gonna be the enemy of the good here.

Why this matters

Coordination.
It’s surprisingly hard to know who’s in these niches (independent researchers, part-timers, new entrants, maybe donors). A simple list would make it easier to find collaborators, talk to the right people, and avoid duplicated work.

Neglectedness clarity.
A major reason to work on ultra-neglected causes is… neglectedness. But we often have no real headcount, and that may push people into (or out of) fields they wouldn’t otherwise choose. Even technical AI safety numbers are outdated — the last widely cited 80k estimate (2022) was ~200 people, which is clearly very false now. (To their credit, they emphasized the difficulty and tried to update.)

Even rough FTE (full time equivalent) estimates + who’s active in each area would be a huge service for some fields.

Humanizing Expected Value

Noah Birnbaum24d3

I think this post would be pretty helpful (short yet string argument) for a (university or other) intro fellowship -- if you can add/substitute a reading to your current intro fellowship, consider adding it to yours!

How Well Does RL Scale?

Noah Birnbaum1mo1

I agree — it seems weird that people haven’t updated very much.

However, I wrote a similarly-purposed (though much less rigorous) post entitled “How To Update if Pre-Training is Dead,” and Vladmir Nesov wrote the following comment (before GPT 5 release), which I would be curious to hear your thoughts on:

Frontier AI training compute is currently increasing about 12x every two years, from about 7e18 FLOP/s in 2022 (24K A100s, 0.3e15 BF16 FLOP/s per chip), to about 1e20 FLOP/s in 2024 (100K H100s, 1e15 BF16 FLOP/s per chip), to 1e21 FLOP/s in 2026 (Crusoe/Oracle/OpenAI Abilene system, 400K chips in GB200/GB300 NVL72 racks, 2.5e15 BF16 FLOP/s per chip). If this trend takes another step, we'll have 1.2e22 FLOP/s in 2028 (though it'll plausibly take a bit longer to get there, maybe 2.5e22 FLOP/s in 2030 instead), with 5 GW training systems.

So the change between GPT-4 and GPT-4.5 is a third of this path. And GPT-4.5 is very impressive compared to the actual original GPT-4 from Mar 2023, it's only by comparing it to more recent models that GPT-4.5 isn't very useful (in its non-reasoning form, and plausibly without much polish). Some of these more recent models were plausibly trained on 2023 compute (maybe 30K H100s, 3e19 FLOP/s, 4x more than the original GPT-4), or were more lightweight models (not compute optimal, and with fewer total params) trained on 2024 compute (about the same as GPT-4.5).

So what we can actually observe from GPT-4.5 is that increasing compute by 3x is not very impressive, but the whole road from 2022 to 2028-2030 is a 1700x-3500x increase in compute from original GPT-4 (or twice that if we are moving from BF16 to FP8), or 120x-250x from GPT-4.5 (if GPT-4.5 is already trained in FP8, which was hinted at in the release video). Judging the effect of 120x from the effect of 3x is not very convincing. And we haven't really seen what GPT-4.5 can do yet, because it's not a reasoning model.

The best large model inference hardware available until very recently (other than TPUs) is B200 NVL8, with 1.5 TB of HBM, which makes it practical to run long reasoning on models with 1-3T FP8 total params that fit in 1-4 nodes (with room for KV caches). But the new GB200 NVL72s that are only starting to get online in significant numbers very recently each have 13.7 TB of HBM, which means you can fit a 7T FP8 total param model in just one rack (scale-up world), and in principle 10-30T FP8 param models in 1-4 racks, an enormous change. The Rubin Ultra NVL576 racks of 2028 will each have 147 TB of HBM, another 10x jump.

If GPT-4.5 was pretrained for 3 months at 40% compute utilization on a 1e20 FLOP/s system of 2024 (100K H100s), it had about 3e26 BF16 FLOPs of pretraining, or alternatively 6e26 FP8 FLOPs. For a model with 1:8 sparsity (active:total params), it's compute optimal to maybe use 120 tokens/param (40 tokens/param from Llama-3-405B, 3x that from 1:8 sparsity). So a 5e26 FLOPs of pretraining will make about 830B active params compute optimal, which means 7T total params. The overhead for running this on B200s is significant, but in FP8 the model fits in a single GB200 NVL72 rack. Possibly the number of total params is even greater, but fitting in one rack for the first model of the GB200 NVL72 era makes sense.

So with GB200 NVL72s, it becomes practical to run (or train with RLVR) a compute optimal 1:8 sparse MoE model pretrained on 2024 compute (100K H100s) with long reasoning traces (in thinking mode). Possibly this is what they are calling "GPT-5".

Going in the opposite direction in raw compute, but with more recent algorithmic improvements, there's DeepSeek-R1-0528 (37B active params, a reasoning model) and Kimi K2 (30B active params, a non-reasoning model), both pretrained for about 3e24 FLOPs and 15T tokens, 100x-200x less than GPT-4.5, but with much more sparsity than GPT-4.5 could plausibly have. This gives the smaller models about 2x more in effective compute, but also they might be 2x overtrained compared to compute optimal (which might be 240 tokens/param, from taking 6x the dense value for 1:32 sparsity), so maybe the advantage of GPT-4.5 comes out to 70x-140x. I think this is a more useful point of comparison than the original GPT-4, as a way of estimating the impact of 5 GW training systems of 2028-2030 compared to 100K H100s of 2024.

Noah Birnbaum's Quick takes

Noah Birnbaum1mo6

Yooo - nice! Seems good and would cost under ~100k.

Noah Birnbaum's Quick takes

Noah Birnbaum1mo25

Animal welfareShow more

MrBeast just released a video about “saving 1,000 animals”—a well-intentioned but inefficient intervention (e.g. shooting vaccines at giraffes from a helicopter, relocating wild rhinos before they fight each other to the death, covering bills for people to adopt rescue dogs from shelters, transporting lions via plane, and more). It’s great to see a creator of his scale engaging with animal welfare, but there’s a massive opportunity here to spotlight interventions that are orders of magnitude more impactful.

Given that he’s been in touch with people from GiveDirectly for past videos, does anyone know if there’s a line of contact to him or his team? A single video/mention highlighting effective animal charities—like those recommended by Animal Charity Evaluators (e.g. The Humane League, Faunalytics, Good Food Institute)—could reach tens of millions and (potentially) meaningfully shift public perception toward impact-focused giving for animals.

If anyone’s connected or has thoughts on how to coordinate outreach, this seems like a high-leverage opportunity I really have no idea how this sorta stuff works, but it seemed worth a quick take — feel free to lmk if I’m totally off base here).

Against being fanatical

Noah Birnbaum2mo1

The curve is not measuring things in value but rather intuitive pull according to this data--simplicity trade-off!

Against being fanatical

Noah Birnbaum2mo3

Sorry if it wasn't clear -- this is literally just the moral case intuition, and the numbers are just meant to reflect another moral intuition that your curve can either align with or not.

Some concrete decision would be based on how one weights simplicity mathematically vs fitting data, etc. I wanted to stay agnostic about it in this post.

I think I disagree with this last point in that -- it looks like threshold deontology is doing something like I am (in that it is giving two principles instead of one to fit more data), but it is often not cashing it out this way which makes it hard to figure out where you should start being more conseqentialist. One interpretation of what this proposal does is it makes it more explicit (given assumptions), so you know exactly where you're going to jump from deontic constraints to consequences.

Like I said in the post, I think that this graph definitely doesn't reflect all the complexities of normative theory building -- it was a mere metaphor/ very toy example. I do think that even if you think the graphic metaphor is merely that (a metaphor), you can still take my proposal conceptually seriously (as in, accept that there's some trade-off here, and plausibly case intuitions can outweigh general principles.

Inside the Biden admin's AI Policy approach | Jake Sullivan, Biden's NSA | via The Cognitive Revolution

Noah Birnbaum2mo1

The title says Bidan's NSA; I assume it is meant to say Biden's*

Noah Birnbaum

Bio

Participation7

How others can help me

How I can help others

Posts 20

Comments65

Idea for someone with a bit of free time:

Why this matters

Participation
7

Posts
20

Comments
65