Hi, I'm Max :)
Fwiw, I also think the name is a bit complicated, and less memorable than Open Philanthropy. Here the reasoning from the Vox interview:
> Why “coefficient”? As CEO, Alexander Berger, puts it in my conversation with him, “coefficient is a multiplier”: the “co-” nods to collaboration with other givers; the “efficient” is a reminder of the north star of effectiveness.
Huge fan of your work, one of the few newsletters I read every week.
Random question, I wonder whether prediction markets are a potentially promising income stream for the team? E.g. Polymarket seems to have a bunch of overlap with the topics you're covering.
Also, thanks for making your news-parsing code open source, was often curious how it looks like under the hood.
Hi Connacher! Thanks for the responses, makes sense.
On your question, one example I often miss from expert surveys is something like this open-ended question: "Do you have any other considerations that would help with understanding this topic?"
I generally agree that quantitative questions are intimately connected with identifying cruxes. Being quantitative about concrete events is a neat way of forcing the experts get more concrete and incentivize them to not get lost in a vague story, etc. But I suspect that often the individual insights from the experts might seem like cruxes to them, as they're not used to think like that. So I think giving experts some prompts to just pour out their thoughts is often neglected. Furthermore, sometimes quantitative questions don't fully capture all important angles of an issue and so it's useful to give responders many chances to add additional comments.
Thanks for the work, this is great!
I especially appreciate the rationale summaries, and generally I'd encourage you to lean more into identifying underlying cruxes as opposed to quantitative estimates. (E.g. I'm skeptical on experts being sufficiently well calibrated to give particularly informative timeline forecasts).
I'm looking forward to the risk-related surveys. Would be interesting to hear their thoughts on the likelihood of concrete risks. One idea that comes to mind would be conditional forecasts on specific interventions to reduce risks.
Also, I wonder whether the presentation on the website could also feature some more "snack-sized" insights, next to the more long-form report-focussed presentation. E.g. the Chicago Booth expert surveys on economic experts focus on ~2-3 questions per article, with a handful of rationales of experts quoted in full. It keeps me coming back because it's informative and takes up less than 5 minutes of my time.
https://kentclarkcenter.org/us-economic-experts-panel/
PS: Just in case something went wrong:
Thanks for the interesting interview!
Fwiw, this section made me feel like it should be thought through more deeply:
Luisa Rodriguez: [...] How confident are you that leaders in the countries that are set up to race and are already racing a little bit are going to see this as close to existential?
Daniel Kokotajlo: I think it will be existential if one side is racing and the other side isn’t. And even if they don’t see that yet, by the time they have superintelligences, then they will see it — because the superintelligences, being superintelligent, will be able to correctly identify this strategic consideration, and probably communicate that to the humans around them.
Luisa Rodriguez: Right. Once you have AGI, the AGI is like, “This is existential. We should do this big wartime effort to create a robot economy that’s going to give us this big advantage.”
Daniel Kokotajlo: That’s right. [...]
Some random thoughts:
(Just quick random thoughts.)
The more that Trump is perceived as a liability for the party, the more likely they would go along with an impeachment after a scandal.
Thanks for writing this up, I think it's a really useful benchmark for tracking AI capabilities.
One minor feedback point, I feel like instead of reporting on statistical significance in the summary, I'd report on effect sizes, or maybe even better just put the discrimination plots in the summary as they give a very concrete and striking sense of the difference in performance. Statistical significance is affected by how many datapoints you have, which makes lack of a difference especially hard to interpret in terms of how real-world significant the difference is.
It's great that you already have a rationale prompt for each question. I would probably recommend having one prompt like this at the end, with "(Optional)" in front so experts can share all further thoughts they think might be useful.