This week, we are releasing new research on advanced artificial intelligence (AI), the opportunities and risks it presents, and the role donations can play in positively steering it's development.
As with our previous research investigating areas such as nuclear risks and catastrophic biological risks, our report on advanced AI provides a comprehensive overview of the landscape, outlining for the first time how effective donations can cost-effectively reduce risks.
You can find the technical report as a PDF here, or read a condensed version here.
In brief, the key points from our report are:
- General, highly capable AI systems are likely to be developed in the next couple of decades, with the possibility of emerging in the next few years.
- Such AI systems will radically upend the existing order - presenting a wide range of risks, scaling up to and including catastrophic threats.
- AI companies - funded by big tech - are racing to build these systems without appropriate caution or restraint given the stakes at play.
- Governments are under-resourced, ill-equipped and vulnerable to regulatory capture from big tech companies, leaving a worrying gap in our defenses against dangerous AI systems.
- Philanthropists can and must step in where governments and the private sector are missing the mark.
- We recommend special attention to funding opportunities to (1) boost global resilience, (2) improve government capacity, (3) coordinate major global players, and (4) advance technical safety research.
Funding Recommendations
Alongside this report, we are sharing some of our latest recommended high-impact funding opportunities: The Centre for Long-Term Resilience, the Institute for Law and AI, the Effective Institutions Project and FAR AI are four promising organizations we have recently evaluated and recommend for more funding, covering our four respective focus areas. We are in the process of evaluating more organizations, and hope to release further recommendations.
Furthermore, the Founders Pledge’s Global Catastrophic Risks Fund supports critical work on these issues. If you would like to make progress on a range of catastrophic risks - including from advanced AI - then please consider donating to the Fund!
About Founders Pledge
Founders Pledge is a global non-profit empowering entrepreneurs to do the most good possible with their charitable giving. We equip members with everything needed to maximize their impact, from evidence-led research and advice on the world’s most pressing problems, to comprehensive infrastructure for global grant-making, alongside opportunities to learn and connect. To date, they have pledged over $10 billion to charity and donated more than $950 million. We’re grateful to be funded by our members and other generous donors. founderspledge.com
From the full report,
I think the key weakness in this part of the argument is that it overlooks lawful, non-predatory strategies for satisfying goals. As a result, you give the impression that any AI that has non-human goals will, by default, take anti-social actions that harm others in pursuit of their goals. I believe this idea is false.
The concept of instrumental convergence, even if true[1], does not generally imply that almost all power-seeking agents will achieve their goals through nefarious means. Ordinary trade, compromise, and acting through the legal system (rather than outside of it) are usually rational means of achieving your goals.
Certainly among humans, a desire for resources (e.g. food, housing, material goods) does not automatically imply that humans will universally converge on unlawful or predatory behavior to achieve their goals. That's because there are typically more benign ways of accomplishing these goals than theft or social manipulation. In other words, we can generally get what we want in a way that is not negative-sum and does not hurt other people as a side effect.
To the extent you think power-seeking behavior among humans is usually positive-sum, but will become negative-sum when in manifests in AIs, this premise needs to be justified. One cannot explain the positive sum-nature of the existing human world by positing that humans are aligned with each other and have pro-social values, as this appears to be a poor explanation for why humans obey the law.
Indeed, the legal system itself can be seen as a way for power-seeking misaligned agents to compromise on a framework that allows agents within it to achieve their goals efficiently, without hurting others. In a state of full mutual inter-alignment with other agents, criminal law would largely be unnecessary. Yet it is necessary, because humans in fact do not share all their goals with each other.
It is likely, of course, that AIs will exceed human intelligence. But this fact alone does not imply that AIs will take unlawful actions to pursue their goals, since the legal system could become better at coping with more intelligent agents at the same time AIs are incorporated into it.
We could imagine an analogous case in which genetically engineered humans are introduced into the legal system. As these modified humans get smarter over time, and begin taking on roles within the legal system itself, our institutions would adapt, and likely become more capable of policing increasingly sophisticated behavior. In this scenario, as in the case of AI, "smarter" does not imply a proclivity towards predatory and unlawful behavior in pursuit of one's goals.
I personally doubt that the instrumental convergence thesis is true as it pertains to "sufficiently intelligent" AIs which were not purposely trained to have open-ended goals. I do not expect, for example, that GPT-5 or GPT-6 will spontaneously develop a desire to acquire resources or preserve their own existence, unless they are subject to specific fine-tuning that would reinforce those impulses.