AI Programme Officer at Longview Philanthropy and AI DPhil student at Oxford
Thanks for this post—I agree on many of the key points. I was Longview's grant investigator on CAIP and, as I wrote in our official reply to CAIP (posted here), I wish there had been enough c4 funding available to sustain CAIP. Unfortunately, funding for 501c4 work remains scarce.
If anyone reading this is interested in contributing >$100K to 501c4 policy advocacy or any other kind of work on AI safety, please feel free to reach out to me at aidan@longview.org. We've comprehensively reviewed the 501c4 policy advocacy ecosystem and many other opportunities, and we’d be happy to offer detailed info and donation recommendations to potential large donors.
Agreed with the other answers on the reasons why there's no GiveWell for AI safety. But in case it's helpful, I should say that Longview Philanthropy offers advice to donors looking to give >$100K per year to AI safety. Our methodology is a bit different from GiveWell’s, but we do use cost-effectiveness estimates. We investigate funding opportunities across the AI landscape from technical research to field-building to policy in the US, EU, and around the world, trying to find the most impactful opportunities for the marginal donor. We also do active grantmaking, such as our calls for proposals on hardware-enabled mechanisms and digital sentience. More details here. Feel free to reach out to aidan@longview.org or simran@longview.org if you'd like to learn more.
Now, Anthropic, OpenAI, Google DeepMind, and xAI say their most powerful models might have dangerous biology capabilities and thus could substantially boost extremists—but not states—in creating bioweapons.
I think the "not states" part of this is incorrect in the case of OpenAI, whose Deep Research system card said: "Our evaluations found that deep research can help experts with the operational planning of reproducing a known biological threat, which meets our medium risk threshold."
One other potential suggestion: Organizers should consider focusing on their own career development rather than field-building if their timelines are shortening and they think they can have a direct impact sooner than they can have an impact through field-building. Personally I regret much of the time I spent starting an AI safety club in college because it traded off against building skills and experience in direct work. I think my impact through direct work has been significantly greater than my impact through field-building, and I should've spent more time on direct work in college.
What about corporations or nation states during times of conflict - do you think it's accurate to model them as roughly as ruthless in pursuit of their own goals as future AI agents?
They don't have the same psychological makeup as individual people, they have a strong tradition and culture of maximizing self-interest, and they face strong incentives and selection pressures to maximize fitness (i.e. for companies to profit, for nation states to ensure their own survival) lest they be outcompeted by more ruthless competitors. On average, while I'd expect that these entities tend to show some care for goals besides self-interest maximization, I think the most reliable predictor of their behavior is the maximization of their self-interest.
If they're roughly as ruthless as future AI agents, and we've developed institutions that somewhat robustly align their ambitions with pro-social action, then we should have some optimism that we can find similarly productive systems for working with misaligned AIs.
Human history provides many examples of agents with different values choosing to cooperate thanks to systems and institutions:
If two agents' utility functions are perfect inverses, then I agree that cooperation is impossible. But when agents share a preference for some outcomes over others, even if they disagree about the preference ordering of most outcomes, then cooperation is possible. In such general sum games, well-designed institutions can systematically promote cooperative behavior over conflict.
Nice! This is a different question, but I'd be curious if you have any thoughts on how to evaluate risks from BDTs. There's a new NIST RFI on bio/chem models asking about this, and while I've seen some answers to the question, most of them say they have a ton of uncertainty and no great solutions. Maybe reliable evaluations aren't possible today, but what would we need to build them?
The topline comparison between LLMs and superforecasters seems a bit unfair. You compare a single LLM's forecast against the median from a crowd of superforecasters. But we know the median from a crowd is typically more accurate than any particular member of the crowd. Therefore I think it'd be more fair to compare a single LLM to a single superforecaster, or a crowd of LLMs against a crowd of superforecasters. Do we know whether the best LLM is better than the best individual forecaster in your sample, or how the median LLM compares to the median forecaster?
(Nitpick aside, this is very interesting research, thanks for doing it.)