Jan_Kulveit

5015 karmaJoined Dec 2017

Bio

Studying behaviour and interactions of boundedly rational agents, AI alignment and complex systems.

Research fellow at Future of Humanity Institute, Oxford. Other projects: European Summer Program on Rationality. Human-aligned AI Summer School. Epistea Lab.

Posts
37

Sorted by New

Jan_Kulveit's Quick takes

Jan_Kulveit

· 2y ago · 1m read

ACS is hiring: why work here and why not

Jan_Kulveit

· 9mo ago · 2m read

Do Not Tile the Lightcone with Your Confused Ontology

Jan_Kulveit

· 1y ago

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Jan_Kulveit

· 1y ago

“Charity” as a conflationary alliance term

Jan_Kulveit

· 2y ago · 6m read

135

Distancing EA from rationality is foolish

Jan_Kulveit

· 2y ago · 2m read

Announcing Human-aligned AI Summer School

Jan_Kulveit

· 2y ago

Box inversion revisited

Jan_Kulveit

· 3y ago

We don't understand what happened with culture enough

Jan_Kulveit

· 3y ago · 7m read

152

Talking publicly about AI risk

Jan_Kulveit

· 3y ago

Sequences
1

Learning from crisis

Comments
228

Survey of AI safety leaders on x-risk, AGI timelines, and resource allocation (Feb 2026)

Jan_Kulveit4mo81

Just wanted to flag the group is heavily selected for belief alignment with something like "EA/Constellation/Trajan House" views, and "AI enabled human takeovers" was promoted as agenda to prioritize in multiple widely read memos by high statues people in the community (which the organisers prioritized in the reading list).

I dislike the "echo chambre" effect where the steps are:
- invite people partially based on alignment with the idea cluster
- tell them to read memos advocating something written by some of the most central people in the cluster
- poll attendees
- results are framed as "leaders and key thinkers in the x-risk and AI safety communities agree"

It is in some sense useful, but in my view the cluster of people invited represents maybe ~30% of thinking about x-risk and AI safety, and its mostly an amplification of existing voices.

"The slight lean against misaligned AI takeover resources is perhaps the most surprising result for this audience, and merits closer examination."

This is unsurprising given the marginal and somewhat confusing nature of the question. My wild guess is
- some attendees voted for everything; it is unclear what does it mean on the margin, probably to grow everything, and prioritize more neglected topics?
- some attendees understood the marginal question as "assuming fixed pie, how to change the allocation" - with this understanding you need to assign something negative weight for consistency

Unfalsifiable stories of doom

Jan_Kulveit5mo-6

Unfalsifiable stories of doom

Jan_Kulveit5mo4

I was considering hypothetical scenarios of the type "imagine this offer from MIRI arrived, would a lab accept" ; clearly MIRI is not making the offer because the labs don't have good alignment plans and they are obviously high integrity enough to not be corrupted by relatively tiny incentives like $3b

I would guess there are ways to operationalise the hypothethicals, and try to have, for example, Dan Hendrycks guess what would xAI do, him being an advisor.

With your bets about timelines - I did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the "confident about short timelines" do you expect I should take. I'm happy to bet on some operationalization of your overall thinking and posting about the topic of AGI being bad, e.g. something like "3 smartest available AIs in 2035 compare all what we wrote in 2026 on EAF, LW and Twitter about AI and judge who was more confused, overconfident and miscalibrated".

Unfalsifiable stories of doom

Jan_Kulveit5mo4

The operationalisation you propose does not make any sense, Yudkowsky and Soares do not claim ChatGPT 5.2 will kill everyone or anything like that.

What about this:

MIRI approaches [a lab] with this offer: we have made some breakthrough in ability to verify if the way you are training AIs leads to misalignment in the way we are worried about. Unfortunately the way to verify requires a lot of computations (ie something like ARC), so it is expensive. We expect your whole training setup will pass this, but we will need $3B from you to run this; if our test will work, we will declare that your lab solved the technical part of AI alignment we were most worried about & some arguments which we expect to convince many people who listen to our views.

Or this: MIRI discusses stuff with xAI or Meta and convinces themselves their - secret - plan is by far the best chance humanity has, and everyone ML/AI smart and conscious should stop whatever they are doing and join them.

(Obviously these are also unrealistic / assume something like some lab coming with some plan which could even hypotehically work)

Unfalsifiable stories of doom

Jan_Kulveit5mo-7

ACS is hiring: why work here and why not

Jan_Kulveit9mo3

Travel: mostly planned (conferences, some research retreats).

We expect closely coordinated team work on the LLM psychology direction, with a bit looser connections to the gradual disempowerment / macrostrategy work. Broadly ACS is small enough that anyone is welcome to participate in anything they are interested in, and generally everyone has idea what others work on.

calebp's Quick takes

Jan_Kulveit9mo15

My impression is EAGx Prague 22 managed to balance 1:1s with other content simply by not offering SwapCard 1:1s slots part of the time, having a lot of spaces for small group conversations, and suggesting to attendees they should aim for something like balanced diet. (Turning off SwapCard slots does not prevent people from scheduling 1:1, just adds a little friction; empirically it seems enough to prevent the mode where people just fill their time by 1:1s).

As far as I understand this will most likely not happen, because weight given to / goodharting on metrics like people reporting 1:1s is the most valuable use of time, metrics tracking "connections formed" and weird psychological effect of 1:1 fests. (People feel stimulated, connected, energized,... Part of the effect is superficial). Also the counterfactual value lost from lack of conversational energy at scales ~3 to 12ppl is not visible and likely not tracked in feedback (I think this has predictable effects on what types of collaborations do start and which do not, and the effect is on the margin bad.) The whole is downstream of problems like Don't Over-Optimize Things / We can do better than argmax.

Btw I think you are too apologetic / self-deprecating ("inexperienced event organisers complaining about features of the conference"). I have decent experience running events and all what you wrote is spot on.

Jan_Kulveit's Quick takes

Jan_Kulveit2y4

Thanks for explanation. My guess is this decision should not be delegated to LLMs but mostly to authors (possibly with some emphasis on correct classification in the UI).

I think the "the post concerns an ongoing conversation, scandal or discourse that would not be relevant to someone who doesn't care about the EA community" should not be interpreted extensively, otherwise it can easily mean "any controversy or criticism". I will repost it without the links to current discussions - these are non-central, similar points are raised repeatedly over the years and it is easy to find dozens of texts making them.

Jan_Kulveit's Quick takes

Jan_Kulveit2y29

I wrote a post on “Charity” as a conflationary alliance term. You can read it on LessWrong, but I'm also happy to discuss it here.

If wondering why not post it here: Originally posted it here with a LW cross-post. It was immediately slapped with the "Community" tag, despite not being about community, but about different ways people try to do good, talk about charity & ensuing confusions. It is about the space of ideas, not about the actual people or orgs.

With posts like OP announcements about details of EA group funding or EAG admissions bar not being marked as community, I find it increasingly hard to believe the "Community" tag is driven by the stated principe marking "Posts about the EA community and projects that focus on the EA community" and not by other motives, like e.g. forum mods expressing the view "we want people to think less about this / this may be controversial / we prefer someone new to not read this".

My impression this moves substantial debates about ideas to the side, which is a state I don't want to cooperate on by just leaving it as it is -> moved the post on LessWrong and replaced by this comment.

Ben_West's Quick takes

Jan_Kulveit2y59

Seems plausible the impact of that single individual act is so negative that aggregate impact of EA is negative.

I think people should reflect seriously upon this possibility and not fall prey to wishful thinking (let's hope speeding up the AI race and making it superpower powered is the best intervention! it's better if everyone warning about this was wrong and Leopold is right!).

The broader story here is that EA prioritization methodology is really good for finding highly leveraged spots in the world, but there isn't a good methodology for figuring out what to do in such places, and there also isn't a robust pipeline for promoting virtues and virtuous actors to such places.

Jan_Kulveit

Bio

Posts 37

Sequences 1

Comments228

Posts
37

Sequences
1

Comments
228