CG

Charlie_Guthmann

1105 karmaJoined

Bio

pre-doc at Data Innovation & AI Lab
previously worked in options HFT and tried building a social media startup
founder of Northwestern EA club

Comments
289

giving up ~0.00000000000000000000000000000000000001% of the lightcone is easily worth it on moral uncertainty grounds.

Agreed this seems prudent and plausible, but not so much so that I would feel confident that this would be the result of CEV ish stuff. Despite some of the technical hurdles mentioned involved with trying to meaningfully specify up front/value locking that we get to keep this solar system for us and the animals I feel like I could be convinced this is still the more likely path to end up in a good future for us (but not all sentient life throughout the lightcone). 

 But also, if the CEV of human values involves killing all humans, then doesn't that kinda mean killing all humans is the correct thing to do?

yea but the correct thing (from a human CEV) to do isn't equivalent to what is good for humans (and animals). I might be getting into button pushing semantics here. 
 

I'm not sure why you think baking CEV into AI will result in a good future for animals (or humans), though if we are talking about "all sentient beings", I guess I would say probably. It seems quite likely to me that if there is a "CEV attractor state" or similar, it involves killing us all - I don't say this because I don't love animals or humanity. I just don't see how it could be remotely possible that we (earth evolved humans and animals) are efficient utility producers (by a wide range of definitions of "utility"). That being said, if CEV or similar is a real coherent concept, it almost certainly would prevent permanent torture/s-risks which would be nice. 

but CEV is a fuzzy concept to me so might be misunderstanding (i've read the lw page and some other basic stuff and have a basic sense of stance deference and cosmopolitanism) .

Anthropic itself isn't Molochian - The molochian-ness (idk if I'm using this term right I don't read SSC) is that any time there is a disagreement in the community over some issue, and one side aligns more with the real worlds outer loops (money, status, intellectual sexiness), that side will naturally acquire more power within the movement because the movement does not have any way to counteract this other than persuasion which is increasingly difficult as the problems we deal with get more abstract and complex. 

Yes I wrote it (with some help from Claude), glad you enjoyed it!

You are right your specific worry/content of the post is narrower, and generally I think you have approximately the right sense for what is going on and didn't mean for the parable to be an exact fictional substitute for your post but just related.  Also maybe I missed it but I think you forgot to mention the selection effect of those who ends up at anthropic itself, which is arguably bigger than the value drift inside of it - the tower was to some extent built by believers!. It's always hard for me to write these comments because I feel as though I could write a 40 page book about the group dynamics in EA :P.

I really do love this community but I basically have given up on it (in terms of my views of the long run trajectory, i still love the people and read what they write religiously). I think it's already too late and it's been institutionally/culturally captured. I'm never certain of course but I think probably FTX was EAs last chance to put in real political/financial policies (some of which you mentioned/gestured at) that stop it from value drifting with outer status/money loops, and at this point it's most likely a waste of time to try to fix it. I didn't realize it at the time but this is how I feel looking back. I mean god damn we didn't even clear house of the majority of the people directly implicated in the scandal! That would have been a bare minimum, I think. 

The problems are real but increasingly my advice is: you are better of hopping to something like humanism and working on improving it if you want to see the solutions implemented. The forum and EA movement at large if you don't live in a group house or for a prestigious EA org or have a bunch of money is basically when your older brother hands you an unplugged controller. I read the forum almost every day and have done so for years so one starts to pick up on some patterns. Since FTX I see a post like yours approx once a month (although more recently). They usually get between 20-50 upvotes so there is definitely some sort of coalition there but it's small and basically never does someone powerful in the movement interact with these posts, you can decide if that's a coincidence or not. And ultimately the posts always seem drift away with the wind. In this sense one can see how the movement would get accused of being paid opposition or something like that. 

Personally, I will be hopping ship the first chance I get (i.e. as soon as another community has a close level of intellectual rigor without the horrible incentives and incoherent structure). And yes I will still call myself an effective altruist :), only if asked will I clarify the lowercaseness of that statement. (see I always write way too much - I'm working on it lol). 

There was a question so simple that no honest person could refuse. 

A child is drowning. 

Do you help?

From this, a city sprouted. 

 

In the beginning there were no buildings. There was a leap. 

And then a plunge. 

Some cold wet socks.

And a coughing child firmly on solid earth. 

 

Those who witnessed firsthand saw how vast and strong the river was, and how many more children they could not save. Word spread and one leap became many. Small structures began to rise along the great riverbank. As more came they brought new ideas.

 

One day, someone decided to start counting. If you mapped out the expected distribution of drownings, you could triage. One jump could save two. Or a smaller leap might go further than another requiring more bravery. This counting was not a betrayal of the original question. It was the question taken seriously. 

 

From this the towers grew. In the towers the modelers worked and lived. At first the towers were short and adjacent to the riverbank. The modelers invented new tools - nets, boats, buoys, weather systems, river maps. These were real. They were the question taken seriously. With the help of those at the riverbanks children were rescued at rates never dreamt over.

 

Many of the people in the towers had been at the river once — stringing ropes, placing floats. They understood that an hour spent modeling could save more children than a year at the water. This was provably true.

 

The people who maintained the ropes and ladders were still respected. They were thanked at ceremonies. 

 

Nonetheless the success of the towers spurred more towers. Each new tower asked a bigger question. What about the children far downstream? The river turned into a huge delta. The tools available would be much better deployed there than at the cities adjacent rapids. And so on. 

 

And each answer to each question was bigger than the last, and at some point the answer was really big and the bigness was the point. But this was not a perversion of the question, it was the question taken seriously. 

 

And so the towers shot into the sky. Big questions require big models and big tools and big solutions. The towers debated the hard questions. Honestly, rigorously, sometimes for years. People changed their minds. Studies were revised. The city prided itself on this — it was, in fact, better at updating than anywhere else. The debates were real. It was just that the city's center of gravity never moved very far.

 

Some left. The city wished them well and did not study where they went.

To build these towers the modelers needed money. They recruited people of extreme wealth who were drawn by the very same question. These people were very generous and funded the towers, and the docks, and the nets, and the boats, and the medicine for the ear infections and anything else you could think of. This was all very real, and many lives were saved. The city had no elections, no recall votes, no formal process for anything. The billionaires simply funded the work, and the work followed the funding, and the funding followed the billionaires' interests, and the billionaires' interests followed from the models, which the billionaires had funded.

 

One day one of these philanthropists made a bet. The bet was large, and it failed. The bet’s rationale at least had the appearance of being built on the machinery of the city. Not everyone thought the bet served the city's purpose. But the reasoning was layered and the models were complex and it was genuinely hard to say whether the bet was a betrayal of the city's logic or its fullest expression. The loss was large enough that programs closed and people at the river were called home. The city was shaken. The city had meetings about it. The city discussed accountability. The city discussed reform.

 

Some towers altered their appearance, and the riverbank looked different too, it had to after all, with the new lack of funds. But overall the city was still the same. To meaningfully change the city, the city would have to have decided that it was in fact a city, governed by interests. The city could not admit this. The core premise of the city was that the counting was not politics but math.

 

And so the city continued. In time the bet was old history and the children kept getting saved and the towers continued off into the sky. 

 

The city has no architect. Nobody designed it. A thousand people made a thousand kind decisions and the decisions accumulated into a shape, and the shape made more of itself. There is no one to confront. There is nothing wrong with any single part of it.

I think we could use a documentary series where we just go follow around orgs or individual EAs for a couple days and see how they talk, live and act. It would be pretty cheap at the very least. 

Matches my intuition, I think there aren't that many experts and some of them already know how to make dangerous viruses and/or already have access to the labs. Practically speaking between 2027-2028 I'd assume the main uplift will be for people with like a bachelors in bio or chem and good at using frontier AI. 

Also underrated: being able to quickly gather a list of biology experts and biology labs that work on dangerous stuff near you with a break down of how deadly/contagious each is. Don't need to be an expert to rob a bank. Yesterday a friend who goes to Hopkins sent me a photo of a poster in front of a lab in the hallway that said "ZIKA VIRUS IS USED IN THIS LAB DO NOT PASS THROUGH AS A SHORTCUT"

Ok I see more now what you are getting at. 

some quick thoughts: 

  • Medical decisions are a function of evidence, theory, values.
  • LLMs are primarily imitation leaners, especially the older models in the paper but including newer. Probably because of this they don't seem to have especially fixed personas. (speculative) They seem to understand many different persona patterns and will chaotically inhabit different ones depending on the prompt.
  • (speculative) The persona it inhabits is a big input to the values and theories it inhabits while answering a question.
  • The evidence is a function of what it loosely has memorized in it's brain and the data you provide. You can approximate fixing it's evidence in place by giving it a db of papers and forcing it to cite papers in order to make final recommendations/systems.
  • You can approximate fixing it's theories and values by specifying them in the prompt
  • If you don't fix the above in place, it's hard to understand what exactly is going on.
  • Agreed this still cleanly tells us about the ~"clinical floor", or at least tells us normal ways in which this stuff might go poorly for unsophisticated users who don't understand that medical decisions are subjective decisions laden with uncertainty.
  • It's unclear to me why we would want to encourage using LLMs in this way, it seems plausible that this clinical floor is dynamic; that is that we can regulate or standardize to some extent what a good medical prompt would look like. Letting joe schmo prompt llms with no guardrails for his own medical advice is highly problematic. There are already economic incentives for ai companies to provide answers to all medical queries even if they make no sense or lack enough info. If I'm correct, I believe much of the variation is caused by the llm having an extremely wide prior on the correct answer and just kinda of randomly selecting from it. There is an iterative sense in which benchmarking consistency of point estimates on an llms wide prior might actually cause it to be less epistemically humble, which I think might be part the core problem underlying its current variance wrt to different prompts (though not at all confident). 

ofc i'm sure you have thought a lot about parts of this and I'm probably talking past you slightly. Also happy/interested to take a look at a demo when you have time. 

Hi Mamhud welcome to the forum :)

This is a complicated project (depending on the scope)! Not a doctor but I'll try to walk you through some my thinking if it's of any interest.

First of all, I believe there are some medical benchmarks, e.g. https://openai.com/index/healthbench/ 
https://crfm.stanford.edu/helm/medhelm/latest/
https://bench.arise-ai.org/

I'm not very familiar with any of them, maybe they suck. It's also a massive field and this would surely bit the tip of a much larger iceberg to getting to a reliable place. 


Also a quick high level on what "METR of x" would mean to the community. 

Benchmark/eval = tests AI for something

METR = A specific benchmark that measures the human time equivalent duration of tasks that ai can do with x% reliability. 

It's not clear to me exactly the scope of your benchmarking but e.g. demographic name swapping would be more analogous to the small but existing literature/benchmarking on llm biases than METR, though of course their could be something like health METR, but it would mean something specific and I'm not sure that's what you mean. 


To understand how to benchmark LLMs it helps to have a model of what an LLM is (or can be). 

The "brain" (and mouth and ears)

Level 1 — Pre-training. The raw model, trained on internet-scale data. Helps it understand lanaguage and the world

Level 2 — Post-training. RLHF, RLVF, make it into a friendly bot and better at math (or maybe medicine).  

How much the brain thinks

Level 3 — Inference scaling. How much compute you throw at the model at runtime. Thinking tokens, chain-of-thought, best-of-N sampling.

What digital actions the brain can take

Level 4 — Agentic harnesses. The scaffolding around the model: Claude Code, Codex, SWE-Agent, Pi, Devin. The digital robot armor for the AI brain.

The house the robot lives in

Level 5 — Context engineering. The prompt, the skill files, the retrieved context, the evolutionary algorithms that search prompt space. Everything that determines what the model sees when it starts working.

The world the robot lives in

Level 6 — The built environment. APIs designed for agent consumption, verification infrastructure, data markets, workflows rewritten to be machine-readable. The world reshaping itself around AI.

 

I couldn't read your labs nature papers because they are paywalled (lol) but from a quick skim the science direct one, the framework would be 
 

Framework LevelAddressed in Paper?Notes
1. Pre-trainingYesSpecified model families/sizes
2. Post-trainingYessome med fine tunes but no recent models with heavy rl
3. Inference ScalingPartiallyMentions reasoning models but not specific compute
4. Agentic HarnessesYesN/A (Single-turn prompt only)
5. Context EngineeringYesPrompt only context engineering
6. Built EnvironmentYesN/A (Offline/Lab setting)

Your models (gpt 4o level or lower) are about 1.5 years off the frontier "brains", and there are many other innovations that people believe are useful that you aren't using. So in thinking about the results, just understand that the "ceiling" could be much higher. As a general rule of thumb, if you want to prove ai capabilities -> use older open source models. If you want to disprove -> use newest/best models and tech stack. Proof moves up the capabilities stack (heuristic not law), disproof moves backwards. That's not to say it isn't useful to see what chatgpt 4o might do when pushed in a certain direction, after all tons of people will end up using less than frontier llms in suboptimal ways, just worth having a clear model and I think this framing will make it easier to quickly communicate to an audience what you are testing (though this is not a field standard, just something I made up). 


Now getting back to some of the clinical side of this - A wise man once told me "garbage in garbage out". My understanding is that we do not have something close to a good answer to "should this person get a mental health referral" or 75%+ of medical questions. 

It might help to walk you through a really reductive version of how an effective altruist might think of this triage. 

  1. what is the benefit of this intervention
  2. What is the cost

benefit might be measured in qalys (or many other outputs) and cost in dollars. The correct answer would choose the most cost effective treatments (again, really simplistic and reductive). While some parts of the American medical system look something like this, most medical decisions look a little different. So one must ask what the right answer to a medical benchmark looks like unless they just want to calcify the industries priors. 

Even if we do agree on the right answer looking something like the model above, we must enter into figuring out (1) & (2) and enter into the world of evidence based medicine. 

 

Finding the Evidence - Evidence-Based Medicine - Research & Subject Guides  at Stony Brook University

 

We simply don't have enough rcts or accurate enough models between them to have confident answers to most medical questions for specific people with specific DNA and specific life experiences. We don't have all the answers or anything close I think. Excluding the need for better theoretical models and more RCTs, Here are some of the current ontological problems with making clinical medical decisions. 

 

The clinical-research ontology gap for diseases - medical billing is often like ICD or similar, research done at MESH level which is often more granular and focused on causes not symptoms. I mean really, what is a "disease". Is a disease the symptoms or the cause? And since we have these different codes + medical data always tricky, we don't even necessarily know what the incidence or prevalence of most things. This would be a fundamental building block in doing some sort of hallucination free bayesian analysis I would think. 

What constitutes "evidence" - Hopefully there is a Cochrane review or similar, but if not, and we start moving down the pyramid, how do we incorporate evidence into a clinical decision. I'm not sure the medical field has a unified systematic take here, so again hard to see how you judge an llm? 

unknown drug/treatment prices - Both doctors and patients might not know the cost to the society or hospital or patient via insurance because of current healtcare setup. 

Fraud, phacking, poor statistics, etc. - Lot's of issues with the evidence itself.

bad/incomplete meta-studies and systematic reviews. 

 

Again this isn't all to say you shouldn't benchmark llms, but again worth being wary that you are trying to test them on fundamentally shaky and uncertain ground (scientifically, economically, politically). I have a lot more thoughts on text parsing, meta studies, clinical information compression sorting and maintenance but already wrote too much. Good luck!
 

I don't really understand this perspective. Let me try to make sure I'm understanding you. 

(1) Anthropic wrote a company policy/governance document that claimed something
(2) This document was the foundation of much of the communities and companies perspective on how to think about and interact with AI safety, including making major donations and career choices. There are large irreversable path dependencies here. 
(3) the document always felt quite dubious to you, to the point where it felt like it wouldn't hold the whole time, whether purposely or due to a lack of clarity on anthropics part (I agree completely!)
(4) While this wasn't all 100% predicatable write when rsp was written, it surely has become increasingly obvious to anthropic leadership for months at this point. Nothing that has happened in the last 6 months is all that surprising and in fact basically right on trend, and dario has stated this himself many times. Yet anthropic continued to wait, taking in significantly more funding and increasingly roping in huge swaths of this community, and only when it was literally the case that they were about to violate their own document (or already had), they change it. 
(5) This makes you feel better than if they kept lying/decieving/whatever more charitable word that could be used here. 

is this approximately your perspective? Obviously I'm throwing my own biasing perspective in here and apologies if I'm misinterpretting.

I mean sure in a trivial sense I feel better about them doing 5. Taking a step back, it really barely matters and is beside the point. Them admiting 5 It's just a natural segway for us to discuss 1-4. Nothing they say about their own commitments matter anymore really. Incentives matter. 

FWIW though I am still highly confused on if anthropic is net positive or negative and quite open to despite all of this thinking we should still be throwing our weight completely behind them. 

Load more