Comment Permalink

Existential Choices Debate WeekShow more

I want to see a bargain solver for AI alignment to groups: a technical solution that would allow AI systems to solve the pie cutting problem for groups and get them the most of what they want, for AI alignment. The best solutions I've seen for maximizing long run value involve using a bargain solver to decide what ASI does, which preserves the richness and cardinality of people's value functions and gives everyone as much of what they want as possible, weighted by importance. (See WWOTF Afterwards, the small literature on bargaining-theoretic approaches to moral uncertainty.) But existing democratic approaches to AI alignment seem to not be fully leveraging AI tools, and instead aligning AI systems to democratic processes that aren't empowered with AI tools (e.g. CIPs and CAIS'S alignment to the written output of citizens' assemblies.) Moreover, in my experience the best way to make something happen is just to build the solution. If you might be interested in building this tool and have the background, I would love to try to connect you to funding for it.

For deeper motivation see here.

Parker_Whitfill

Mar 21

Is the alignment motivation distinct from just using AI to solve general bargaining problems?

tylermjohnMar 211

I don't know! It's possible that you can just solve a bargain and then align AI to that, like you can align AI to citizens assemblies. I want to be pitched.

See in context

tylermjohn's Quick takes

by tylermjohn

This is a special post for quick takes by tylermjohn. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Sorted by

New & upvoted

Click to highlight new quick takes since: Today at 2:51 PM

tylermjohnApr 20*38

Career choiceShow more

I'd be excited to see 1-2 opportunistic EA-rationalist types looking into where marginal deregulation is a bottleneck to progress on x-risk/GHW, circulating 1-pagers among experts in these areas, and then pushing the ideas to DOGE/Mercatus/Executive Branch. I'm thinking things like clinical trials requirements for vaccines, UV light, anti-trust issues facing companies collaborating on safety and security, maybe housing (though I'm not sure which are bottlenecked by federal action). For most of these there's downside risk if the message is low fidelity, the issue becomes polarized, or priorities are poorly set, hence collaborating with experts. I doubt there's that much useful stuff to be done here, but marginal deregulation looks very easy right now and looks good to strike while the iron is hot.

Angelina LiApr 248

There's this ACX post (that I only skimmed and don't have strong opinions about) which mostly seems to do this, minus the "pushing" part.

tylermjohnApr 1216

Biosecurity & pandemicsShow more

AIxBio looks pretty bad and it would be great to see more people work on it

We're pretty close to having a country of virologists in a data center with AI models that can give detailed and accurate instructions for all steps of a biological attack — with recent reasoning models, we might have this already
These models have safeguards but they're trivial to overcome — Pliny the Liberator manages to jailbreak every new model within 24 hours and open sources the jailbreaks
Open source will continue to be just a few months behind the frontier given distillation and amplification, and these can be fine-tuned to remove safeguards in minutes for less than $50
People say it's hard to actually execute the biology work, but I don't see any bottlenecks to bioweapon production that can't be done by a bio undergrad with limitless scientific knowledge; on my current understanding, the bottlenecks are not manual dexterity bottlenecks like playing a violin which require years of practice, they are knowledge bottlenecks
Bio supply chain controls that make it harder to get ingredients aren't working and aren't on track to work
So it seems like we're very close to democratizing (even bespoke) bioweapons. When I talk to bio experts about this they often reassure me that few people want to conduct a biological attack, but I haven't seen much analysis on this and it seems hard to be highly confident.

While we gear up for a bioweapon democracy it seems that there are very few people working on worst-case bio, and most of the people working on it are working on access controls and evaluations. But I don't expect access controls to succeed, and I expect evaluations to mostly be useful for scaring politicians, due in part to the open source issue meaning we just can't give frontier models robust safeguards. The most likely thing to actually work is biodefense.

I suspect that too many people working on GCR have moved into working on AI alignment and reliability issues and too few are working on bio. I suspect there are bad incentives, given that AI is the new technology frontier and working with AI is good career capital, and given that AI work is higher status.

When I talk to people at the frontier of biosecurity, I learn that there's a clear plan and funding available, but the work is bottlenecked by entrepreneurial people who can pick up a big project and execute on it autonomously — these people don't even need a bio background. On my current guess, the next 3-5 such people who are ambivalent about what to do should go into bio rather than AI, in part because AI seems to be more bottlenecked by less generalist skills, like machine learning, communications, and diplomacy.

calebpApr 148

I think the main reasons that EAs are working on AI stuff over bio stuff is that there aren't many good routes into worst case bio work afaict largely due to infohazard concerns from field building, and the x-risk case for biorisk not being very compelling (maybe due to infohazard concerns around threat models).

tylermjohnApr 145

I think these are fair points, I agree the info hazard stuff has smothered a lot of talent development and field building, and I agree the case for x-risk from misaligned advanced AI is more compelling. At the same time, I don't talk to a lot of EAs and people in the broader ecosystem these days who are laser focused on extinction over GCR, that seems like a small subset of the community. So I expect various social effects, making a bunch more money, and AI being really cool and interesting and fast-moving are probably a bigger deal than x-risk compellingness simpliciter. Or at least they have had a bigger effect on my choices!

But insufficiently successful talent development / salience / comms is probably the biggest thing, I agree.

Marcus Abramovitch 🔸Apr 136

can you spell out the clear plan? feel free to DM me also

tylermjohnApr 136

Yup! The highest level plan is in Kevin Esvelt's "Delay, Detect, Defend": use access controls and regulation to delay worst-case pandemics, build a nucleic acid observatory and other tools to detect amino acid sequences for superpandemics, and defend by hardening the world against biological attacks.

The basic defense, as per DDD, is:

Develop and distribute adequate PPE to all essential workers
Make sure the supply chain is robust to ensure that essential workers can distribute food and essential supplies in the event of a worst-case pandemic
Environmental defenses like far-UVC that massively reduce the spread and replication rate of pandemic pathogens

IMO "delay" has so far basically failed but "detect" has been fairly successful (though incompletely). Most of the important work now needs to rapidly be done on the "defend" side of things.

There's a lot more details on this and the biosecurity community has really good ideas now about how to develop and distribute effective PPE and rapidly scale environmental defenses. There's also now interest in developing small molecule countermeasures that can stop pandemics early but are general enough to stop a lot of different kinds of biological attacks. A lot of this is bottlenecked by things like developing industrial-scale capacity for defense production or solving logistics around supply chain robustness and PPE distribution. Happy to chat more details or put you in touch with people better suited than me if it's relevant to your planning.

tylermjohnApr 267

AI safetyShow more

Just Compute: an idea for a highly scalable AI nonprofit

Just Compute is a 501c3 organization whose mission is to buy cutting-edge chips and distribute them to academic researchers and nonprofits doing research for societal benefit. Researchers can apply to Just Compute to get access to the JC cluster, which supports research in AI safety, AI for good, AI for science, AI ethics, and the like, through a transparent and streamlined process. It's a lean nonprofit organization with a highly ambitious founder who seeks to raise billions of dollars for compute.

The case for Just Compute is fairly robust: it supports socially valuable AI research and creates opportunities for good researchers to work in AI for social benefit and without having to join a scaling lab. And because frontier capabilities are compute constrained, it also slows down the frontier by using up a portion of the total available compute. The sales case for it is very strong, as it attracts a wide variety of donors interested in supporting AI research in the academy and at nonprofits. Donors can even earmark their donations for specific areas of research, if they'd like, perhaps with a portion of the donations mandatorily allocated to whatever JC sees as the most important area of AI research.

If a pair of co-founders wanted to launch this project, I think it could be a very cool moonshot!

LarksApr 2723

Why does it make sense to bundle buying chips, operating a datacenter etc. with doing due diligence on grant applicants? Why should grant applicants prefer to receive compute credits from your captive neocloud than USD they can spend on any cloud they want - or on non-compute, if the need there is greater?

calebpApr 276

You might believe future GPU hours are currently underpriced (e.g. maybe we'll soon develop AI systems that can automate valuable scientific research). In such a scenario, GPU hours would become much more valuable, while standard compute credits (which iiuc are essentially just money designated for computing resources) would not increase in value. Buying the underlying asset directly might be a straightforward way to invest in GPU hours now before their value increases dramatically.

Maybe there are cleverer ways to bet on price of GPU hours dramatically increasing that are conceptually simpler than nvidia share prices increasing, idk.

tylermjohnApr 271

You're probably right that operating a data center doesn't make sense. The initial things that pushed me in that direction were concerns about robustness of the availability of compute and the aim to cut into the supply of frontier chips labs have available to them rather than funge out other cloud compute users, but it's likely way too much overhead.

I don't worry about academics preferring to spend on other things, it's specialization for efficient administration and a clear marketing narrative.

Zach Stein-PerlmanApr 264

One problem is that donors would rather support their favorite research than a mixture that includes non-favorite research.

tylermjohnApr 261

Most major donors don't have time or expertise to vet research opportunities, so they'd rather outsource to someone else who can source and vet them.

[comment deleted]Apr 271

Deleted by Throwaway81, 04/27/2025

RebeccaApr 280

I’m not sure if others share this intuition, but most of this gives off AI-generated vibes fyi

tylermjohnApr 305

Existential riskShow more

AI swarm writers:

Comms is a big bottleneck for AI safety talent, policy, and public awareness. Currently the best human writers are better than the best LLMs, but LLMs are better writers than 99% of humans and much easier to align to a message and style than human employees. In many venues (particularly social media) factors other than writing and analytical quality drive discourse. This makes a lot of comms a numbers game. And the way you win a numbers game is by scaling a swarm of AI writers.

I'd like to see some people with good comms taste and epistemics, thoughtful quality control, and the diligence to keep at it experiment with controlling swarms of AI writers producing and distributing lots of decent quality content on AI safety. Probably the easiest place to get started would be on social media where outputs are shorter and the numbers game is much starker. As the swarms got good, they could be used for other comms, like blogs and op eds. 4o is good at designing cartoons and memes, which could also be utilized.

To be clear, there is a failure mode here where elites associate AI safety with spammy bad reasoning and where mass content dilutes the public quality of the arguments for safety, which are at the limit are very strong. But at the moment there is virtually zero content on AI safety, making the bar for improving discourse quality relatively low.

I've found some AI workflows that work pretty well, like recording long voice notes, turning them into transcripts, and using the transcript as context for the LLM to write. I'd be happy to walk interested people through this or, if helpful, write something public.

MichaelDickensApr 302

Please correct me if I'm misunderstanding you but this idea seems to follow from a chain of logic that goes like this:

We need more widely-read, high-quality writing on AI risk.
Therefore, we need a large quantity of writing on AI risk.
We can use LLMs to help produce this large quantity.

I disagree with #2. It's sufficient to make a smaller amount of really good content and distribute it widely. I think right now the bottleneck isn't a lack of content for public consumption, it's a lack of high-quality content.

And I appreciate some of the efforts to fix this, for example Existential Risk Observatory has written some articles in national magazines, MIRI is developing some new public materials, and there's a documentary in the works. I think those are the sorts of things we need. I don't think AI is good enough to produce content at the level of quality that I expect/hope those groups will achieve.

(Take this comment as a weak endorsement of those three things but not a strong endorsement. I think they're doing the right kinds of things; I'm not strongly confident that the results will be high quality, but I hope they will be.)

Although, I do agree with you that LLMs can speed up writing, and you can make the writing high-quality as long as there's enough human oversight. (TBH I am not sure how to do this myself, I've tried but I always end up writing ~everything by hand. But many people have had success with LLM-assisted writing.)

calebpApr 302

There's an adjacent take I agree which is more like:
1. AI will likely create many high-stakes decisions and a confusing environment
2. The situation would be better if we could use AI to stay in-step with AI progress on our ability to figure stuff out

3. rather than waiting until the world is very confusing, maybe we should use AIs right now to do some kinds of intellectual writing, in ways we expect to improve as AIs improve (even if AI development isn't optimising for intellectual writing).

I think this could look a bit like company with mostly AI workers that produces writing on a bunch of topics, or as a first step, heavily LM written (but still high-quality) substack.

tylermjohnApr 301

If you want to reach a very wide audience the N times they need to read and think about and internalize the message you can either write N pieces that reach that whole audience or N×y pieces that reach a portion of that audience. Generally, if you have the ability to efficiently write N×y pieces, then the latter is going to be easier than the former. This is what I mean about comms being a numbers game, and I take this to be pretty foundational to a lot of comms work in marketing, political campaigning, and beyond.

Though I also agree with Caleb's adjacent take, largely because if you can build an AI company then you can create greater coverage for your idea, arguments, or data pursuant to the above.

Of course there's large and there's large. We may well disagree about how good LLMs are at writing. I think Claude is about 90th percentile as compared to tech journalists in terms of factfulness, clarity, and style.

tylermjohnApr 301

You could instead or in addition do a bunch of paid advertising to get writing in front of everyone. I think that's a good idea too, but there are also risks here like the problems that faces WWOTF's advertising when some people saw the same thing 10 times and were annoyed.

tylermjohnMar 204

Existential Choices Debate WeekShow more

Parker_WhitfillMar 212

Is the alignment motivation distinct from just using AI to solve general bargaining problems?

tylermjohnMar 211

I don't know! It's possible that you can just solve a bargain and then align AI to that, like you can align AI to citizens assemblies. I want to be pitched.