Excluding the fact that EAs tend to be more tech-savvy and their advantage lies in technical work such as alignment, the community as a whole is not prioritizing advocacy and governance enough.
Effective Altruists over-prioritize working on AI alignment over AI regulation advocacy. I disagree with prioritizing alignment because much of alignment research is simultaneously capabilities research (Connor Leahy even begged people to stop publishing interpretability research). Consequently, alignment research is accelerating the timelines toward AGI. Another problem with alignment research is that cutting-edge models are only available at frontier AI labs, meaning there is comparatively less that someone on the outside can help with. Finally, even if an independent alignment researcher finds a safeguard to a particular AGI risk, the target audience AI lab might not implement it since it would cost time and effort. This is due to the "race to the bottom," a governance problem.
Even excluding X-risk, I can imagine a plethora of reasons why a US corporation or the USA itself is by far one of the worst paths to AGI. Corporations are profit-seeking and are less concerned with the human-centric integrations of technology necessitated by AGI. Having one country with the ultimate job-replacer also seems like a bad idea. All economies all over the world are subject to whatever the next GPT model can do, potentially replacing half their workforce. Instead, I am led to believe that the far superior best-case scenario is an international body that globally makes decisions or at least has control over AGI development in each country. Therefore, I believe EA should prioritize lengthening the time horizon by advocating for a pause, a slowdown, or any sort of international treaty. This would help to prevent the extremely dangerous race dynamics that we are currently in.
How you can help:
I recommend PauseAI. They are great community of people (including many EAs) trying to advocate for an international moratorium on frontier general capability AI models. There is so much you can do to help, including putting up posters, writing letters, writing about the issue, etc. They are very friendly and will answer any questions about how you can fit in and maximize your power as a democratic citizen.
Even if you disagree with pausing as the solution to the governance problem, I believe that the direction of PauseAI is correct. On a governance political compass, I feel like pausing is 10 miles away from the current political talk but most EAs generally lie 9.5 miles in the same direction.
What is the evidence for this claim? It doesn't appear to be true in any observable or behavioral sense that I'm currently aware of. We now have systems (LLMs) that can reason generally about the world, make rudimentary plans, pursue narrow goals, and speak English at the level of a college graduate. And yet virtually none of the traditional issues of misalignment appear to be arising in these systems yet—at least in the sense that one might have expected if they took traditional alignment arguments very seriously and literally.
For example, for many years people argued about what they perceived as the default "instrumentally convergent" incentives for "sufficiently intelligent" agents, such as self-preservation. The idea of a spontaneous survival instinct in goal-following agents was indeed a major building block of several arguments for why alignment would be hard. For instance, one can examine "You can't fetch the coffee if you're dead" from Stuart Russell.
Current LLMs lack survival impulses. They do not "care" in a behavioral whether they are alive or dead, as far as we can tell. They also do not appear to be following slightly mis-specified utility functions that dangerously deviate from ours, in a way that causes them to lie and plot a takeover. Instead, broadly speaking, instruction-tuned LLMs are corrigible, and aligned with us, as they generally follow our intentions when asked (rather than executing our commands literally).
In other words, we have systems that are:
And yet these systems are:
So what exactly is the reason to think that alignment is harder than people thought? Is it merely more theoretical arguments about the difficulty of alignment? Do these arguments have any observable consequences that we could actually verify in 1-5 years, or are they unfalsifable?
To be clear: I do not think there is a ~100% chance that alignment will be solved and that we don't need to worry at all about alignment. I think the field is important and should still get funding. In this comment I am purely pushing back against the claim that alignment is harder than we thought. I do not think that claim is true, as a general fact about the world and the EA community. In the most straightforward interpretation of the evidence, AI alignment is a great deal easier than people thought it would be, in say 2015.