This week, we are releasing new research on advanced artificial intelligence (AI), the opportunities and risks it presents, and the role donations can play in positively steering it's development.
As with our previous research investigating areas such as nuclear risks and catastrophic biological risks, our report on advanced AI provides a comprehensive overview of the landscape, outlining for the first time how effective donations can cost-effectively reduce risks.
You can find the technical report as a PDF here, or read a condensed version here.
In brief, the key points from our report are:
- General, highly capable AI systems are likely to be developed in the next couple of decades, with the possibility of emerging in the next few years.
- Such AI systems will radically upend the existing order - presenting a wide range of risks, scaling up to and including catastrophic threats.
- AI companies - funded by big tech - are racing to build these systems without appropriate caution or restraint given the stakes at play.
- Governments are under-resourced, ill-equipped and vulnerable to regulatory capture from big tech companies, leaving a worrying gap in our defenses against dangerous AI systems.
- Philanthropists can and must step in where governments and the private sector are missing the mark.
- We recommend special attention to funding opportunities to (1) boost global resilience, (2) improve government capacity, (3) coordinate major global players, and (4) advance technical safety research.
Funding Recommendations
Alongside this report, we are sharing some of our latest recommended high-impact funding opportunities: The Centre for Long-Term Resilience, the Institute for Law and AI, the Effective Institutions Project and FAR AI are four promising organizations we have recently evaluated and recommend for more funding, covering our four respective focus areas. We are in the process of evaluating more organizations, and hope to release further recommendations.
Furthermore, the Founders Pledge’s Global Catastrophic Risks Fund supports critical work on these issues. If you would like to make progress on a range of catastrophic risks - including from advanced AI - then please consider donating to the Fund!
About Founders Pledge
Founders Pledge is a global non-profit empowering entrepreneurs to do the most good possible with their charitable giving. We equip members with everything needed to maximize their impact, from evidence-led research and advice on the world’s most pressing problems, to comprehensive infrastructure for global grant-making, alongside opportunities to learn and connect. To date, they have pledged over $10 billion to charity and donated more than $950 million. We’re grateful to be funded by our members and other generous donors. founderspledge.com
From the full report,
I dispute that we need to get alignment right on the first try, and otherwise we're doomed. However, this question depends critically on what is meant by "first try". Let's consider two possible interpretations of the idea that we only get "one try" to develop AI:
Interpretation 1: "At some point we will build a general AI system for the first time. If this system is misaligned, then all humans will die. Otherwise, we will not all die."
Interpretation 2: "The decision to build AI is, in a sense, irreversible. Once we have deployed AI systems widely, it is unlikely that we could roll them back, just like how we can't roll back the internet, or electricity."
I expect the first interpretation of this thesis will turn out incorrect because the "first" general AI systems will likely be rather weak and unable to unilaterally disempower all of humanity. This seems evident to me because of the fact that current AI systems are already fairly general (and increasingly so), and yet are weak, and are as-yet far from being able to disempower humanity.
These current systems also seem to be increasing in their capabilities somewhat incrementally, albeit at a rapid pace[1]. I think it is highly likely that we will have many attempts at aligning general AI systems before they become more powerful than the rest of humanity combined, either individually or collectively. This implies that we do not get only "one try" to align AI—in fact, we will likely have many tries, and these attempts will help us accumulate evidence about the difficulty of alignment on the even more powerful systems that we build next.
To the extent that you are simply defining the "first try" as the last system developed before humans become disempowered, then this claim seems confused. Building such a system is better viewed as a "last try" than a "first try" at AI, since it would not necessarily be the first general AI system that we develop. It also seems likely that the construction of such a system would be aided substantially by AI-guided R&D, making it unclear to what extent it was really "humanity's try" at AI.
Interpretation 2 appears similarly confused. It may be true that the decision to deploy AI on a wide scale is irreversible, if indeed these systems have a lot of value and are generally intelligent, which would make it hard to "put the genie back in the bottle". However, AI does not seem unusual in this respect among technologies, as it is similarly nearly impossible to reverse the course of technological progress in almost all other domains.
More generally, it is simply a fundamental feature of all decision-making that actions are irreversible, in the sense that it is impossible to go back in time and make different decisions than the ones we had in fact made. As a general property of the world, rather than a narrow feature of AI development in particular, this fact in isolation does little to motivate any specific AI policy.
I do not think the existence of emergent capabilities implies that general AI systems are getting more capable in a discontinuous fashion, as emergent capabilities are generally quite narrow abilities, rather than constituting an average competence level of AI systems. On broad measures of intelligence, such as the MMLU, AI systems appear to be developing more incrementally. And moreover, many apparently emergent capabilities are merely artifacts of the way we measure them, and therefore do not reflect underlying discontinuities in latent abilities.