Studying behaviour and interactions of boundedly rational agents, AI alignment and complex systems.
Research fellow at Future of Humanity Institute, Oxford. Other projects: European Summer Program on Rationality. Human-aligned AI Summer School. Epistea Lab.
I was considering hypothetical scenarios of the type "imagine this offer from MIRI arrived, would a lab accept" ; clearly MIRI is not making the offer because the labs don't have good alignment plans and they are obviously high integrity enough to not be corrupted by relatively tiny incentives like $3b
I would guess there are ways to operationalise the hypothethicals, and try to have, for example, Dan Hendrycks guess what would xAI do, him being an advisor.
With your bets about timelines - I did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the "confident about short timelines" do you expect I should take. I'm happy to bet on some operationalization of your overall thinking and posting about the topic of AGI being bad, e.g. something like "3 smartest available AIs in 2035 compare all what we wrote in 2026 on EAF, LW and Twitter about AI and judge who was more confused, overconfident and miscalibrated".
The operationalisation you propose does not make any sense, Yudkowsky and Soares do not claim ChatGPT 5.2 will kill everyone or anything like that.
What about this:
MIRI approaches [a lab] with this offer: we have made some breakthrough in ability to verify if the way you are training AIs leads to misalignment in the way we are worried about. Unfortunately the way to verify requires a lot of computations (ie something like ARC), so it is expensive. We expect your whole training setup will pass this, but we will need $3B from you to run this; if our test will work, we will declare that your lab solved the technical part of AI alignment we were most worried about & some arguments which we expect to convince many people who listen to our views.
Or this: MIRI discusses stuff with xAI or Meta and convinces themselves their - secret - plan is by far the best chance humanity has, and everyone ML/AI smart and conscious should stop whatever they are doing and join them.
(Obviously these are also unrealistic / assume something like some lab coming with some plan which could even hypotehically work)
Travel: mostly planned (conferences, some research retreats).
We expect closely coordinated team work on the LLM psychology direction, with a bit looser connections to the gradual disempowerment / macrostrategy work. Broadly ACS is small enough that anyone is welcome to participate in anything they are interested in, and generally everyone has idea what others work on.
My impression is EAGx Prague 22 managed to balance 1:1s with other content simply by not offering SwapCard 1:1s slots part of the time, having a lot of spaces for small group conversations, and suggesting to attendees they should aim for something like balanced diet. (Turning off SwapCard slots does not prevent people from scheduling 1:1, just adds a little friction; empirically it seems enough to prevent the mode where people just fill their time by 1:1s).
As far as I understand this will most likely not happen, because weight given to / goodharting on metrics like people reporting 1:1s is the most valuable use of time, metrics tracking "connections formed" and weird psychological effect of 1:1 fests. (People feel stimulated, connected, energized,... Part of the effect is superficial). Also the counterfactual value lost from lack of conversational energy at scales ~3 to 12ppl is not visible and likely not tracked in feedback (I think this has predictable effects on what types of collaborations do start and which do not, and the effect is on the margin bad.) The whole is downstream of problems like Don't Over-Optimize Things / We can do better than argmax.
Btw I think you are too apologetic / self-deprecating ("inexperienced event organisers complaining about features of the conference"). I have decent experience running events and all what you wrote is spot on.
Thanks for explanation. My guess is this decision should not be delegated to LLMs but mostly to authors (possibly with some emphasis on correct classification in the UI).
I think the "the post concerns an ongoing conversation, scandal or discourse that would not be relevant to someone who doesn't care about the EA community" should not be interpreted extensively, otherwise it can easily mean "any controversy or criticism". I will repost it without the links to current discussions - these are non-central, similar points are raised repeatedly over the years and it is easy to find dozens of texts making them.
I wrote a post on “Charity” as a conflationary alliance term. You can read it on LessWrong, but I'm also happy to discuss it here.
If wondering why not post it here: Originally posted it here with a LW cross-post. It was immediately slapped with the "Community" tag, despite not being about community, but about different ways people try to do good, talk about charity & ensuing confusions. It is about the space of ideas, not about the actual people or orgs.
With posts like OP announcements about details of EA group funding or EAG admissions bar not being marked as community, I find it increasingly hard to believe the "Community" tag is driven by the stated principe marking "Posts about the EA community and projects that focus on the EA community" and not by other motives, like e.g. forum mods expressing the view "we want people to think less about this / this may be controversial / we prefer someone new to not read this".
My impression this moves substantial debates about ideas to the side, which is a state I don't want to cooperate on by just leaving it as it is -> moved the post on LessWrong and replaced by this comment.
Seems plausible the impact of that single individual act is so negative that aggregate impact of EA is negative.
I think people should reflect seriously upon this possibility and not fall prey to wishful thinking (let's hope speeding up the AI race and making it superpower powered is the best intervention! it's better if everyone warning about this was wrong and Leopold is right!).
The broader story here is that EA prioritization methodology is really good for finding highly leveraged spots in the world, but there isn't a good methodology for figuring out what to do in such places, and there also isn't a robust pipeline for promoting virtues and virtuous actors to such places.
I don't think so. I think in practice
I. - Some people don't like the big R community very much.
AND
2a. - Some people don't think improving the EA community small-r rationality/epistemics should be one of top ~3-5 EA priorities.
OR
2b. - Some people do agree this is important, but don't clearly see the extent to which the EA community imported healthy epistemic vigilance and norms from Rationalist or Rationality-adjacent circles
=>
- As a consequence, they are at risk of distancing from small r rationality as a collateral damage / by neglect
Also I think many people in the EA community don't think it's important to try hard at being small-r rational at the level of aliefs. No matter what is the actual situation revealed by actual decisions, I would expect the EA community to at least pay lip service to epistemics and reason, so I don't think stated preferences are strong evidence.
"Being against small-r rationality is like being against kindness or virtue; no one thinks of themselves as taking that stand."
Yes I do agree almost no one thinks about themselves that way. I think it is maybe somewhat similar to "Being against effective charity" - I would be surprised if people though about themselves that way.
I think it could be a helpful response for people who are able to respond to signals of the type "someone who has demonstrably good forecasting skills, is an expert in the field, and works on this long time claims X" by at least re-evaluating if their models make sense and are not missing some important considerations.
If someone is at least able to that, they can for example ask a friendly AI or some other friendly AI and they will tell you, based on conservative estimates and reference classes, that the original claim is likely wrong. They will still miss important considerations -- in a way in which typical forecaster would also do - so the results are underestimates.
I think at the level of [some combination of lack of ability to think and motivated reasoning] when people are uninterested in e.g. sanity checking their thinking with AIs, it is not worth the time correcting them. People are wrong on the internet all the time.
(I think the debate was moderately useful - I made an update from this debate & voting patterns, broadly in the direction EA Forum descending to a level of random place on the internet where confused people talk about AI and it is broadly not worth to read or engage. I'm no longer that much active on EAF, but I've made some update)