Comment Permalink

I don't claim you can align human groups with individual humans. If I'm reading you correctly, I think you're committing a category error in assigning alignment properties to groups of people like nation states or companies. Alignment, as I'm using the term, is the alignment of goals or values from an AI to a person or group of people. We expect this, I think, in part because we're accustomed to telling computers what to do and having them do exactly what we say (not always exactly what we mean, though).

Alignment is extremely tricky for the unenhanced human, but theoretically possible. My first best guess at solving it would be to automate the research and development of it with AI itself. We'll soon reach a sufficiently advanced AI that's capable of reasoning beyond anything anyone on Earth can come up with; we just have to ensure that the AI is aligned and that the one that trained that one is also aligned, and so on. My second-best guess would be through BCIs, and my third would be whole-brain emulation interpretability.

Assuming we even do develop alignment techniques, I'd argue that exclusive alignment (that is, for one or a small group of people) is more difficult than aligning with humanity.at large for the following reasons (I realize some of these go both ways, but I include them because I see them as more serious for exclusive alignment–like value drift):

Value drift.
Impossible specification (e.g., in exploring the inherent contradictions in expressed human values, the AGI expands moral consideration beyond initial human constraints, discovering some form of moral universalism or a morality beyond all human reasoning).
Emergent properties appear, producing unexpected behavior, and we cannot align systems to exhibit properties we cannot anticipate.
Exclusive alignment's instrumental goals may broaden AGI's moral scope to include more humans (i.e., it may be that broader alignment makes for a more robust AI system).
Competing AGIs have been successfully created that are designed to align with all of humanity.
Exclusively aligned AGI may still satisfy many, if not all, of the preferences that the rest of humanity possesses.
Exclusive alignment requires perfect internal coordination of values within organizations, but inevitable divergent interests emerge as they scale; these coordination failures multiply when AGI systems interpret instructions literally and optimize against specified metrics.
Alignment requires resolving disagreements over value prioritization, a meta-preference problem. Yet resolving these conflicts necessitates assumptions about how they should be resolved, creating an infinite regress that defies a technical solution.

See in context

AI can solve all EA problems, so why keep focusing on them?

by Cody Albert

May 31 min read 15

8

Cause prioritizationAI safetyExistential riskBuilding effective altruismCareer choicePhilosophyCriticism of effective altruist causesOpinion

Frontpage

Show all topics

Suppose you believe AGI (or superintelligence) will be created in the future. In that case, you should also acknowledge its super capabilities in addressing EA problems like global health and development, pandemics, animal welfare, and cause prioritization decision-making.

Suppose you don't believe superintelligence is possible. In that case, you can continue pursuing other EA problems, but if you do believe superintelligence is coming, then why are you spending time and money on issues that will likely all be solved by AI, assuming superintelligence comes aligned with human values?

I've identified a few potential reasons why people continue to devote their time and money to non-AI-related EA causes:

You aren't aware of the potential capabilities of superintelligence.
You don't think that superintelligence will arrive for a long time, or you remain uncertain about a timeline.
You're passionate about a particular cause, and superintelligence doesn't interest you.
You believe that present suffering matters intrinsically, and that the suffering occurring now has a moral weight that can't be dismissed.
You might even think that superintelligence won't be able to address particular problems.

It's widely believed (at least in the AI safety community) that the development of sufficiently advanced AI could lead to major catastrophes, a global totalitarian regime, or human extinction, all of which seem to me to be more pressing and critical than any of the above reasons for focusing on other EA issues. I post this because I'd like to see more time and money allocated to AI safety, particularly in solving the alignment problem through automated AI labor (since I don't believe human labor can solve it anytime soon, but that's beyond the scope of this post).

So, do any of the reasons presented above apply to you? Or do you have different reasons for not focusing on AI risks?

8 Reactions

Comments15

Sorted by

New & upvoted

Click to highlight new comments since: Today at 12:10 AM

tobycrisford 🔸2mo7

Even if you're certain that AGI is only 5 years away and will eradicate all diseases, a lot of children are going to die of malaria in those 5 years. Donating to malaria charities could reduce that number.

Cody Albert2mo-2

So I said two different things which made my argument unclear. First I said "assuming superintelligence comes aligned with human values" and then I said "AI could lead to major catastrophes, a global totalitarian regime, or human extinction."

If we knew for sure that AGI is imminent and will eradicate all diseases then I agree with you that it's worth it to donate to malaria charities. Right now, though, we don't know what the outcome will be. So, not knowing the outcome of alignment, do you still choose to donate to malaria charities, or do you allocate that money toward, say, a nonprofit actively working on the alignment problem?

Shameless plug; I have an idea for a nonprofit that aims to help solve the alignment problem - https://forum.effectivealtruism.org/posts/GGxZhEdxndsyhFnGG/an-international-collaborative-hub-for-advancing-ai-safety?utm_campaign=post_share&utm_source=link

John Huang2mo5

Let's imagine you solve the "alignment problem" tomorrow. So? Exactly who did you solve alignment for? AI aligned to the interests of Elon Musk, Donald Trump, or Vladimir Putin? Or AI aligned with Peter Singer? Or AI aligned to the interests of Google, Meta, TikTok, or Netflix? Or is it alignment with the Democratically determined interests and moral values of the public?

We've never even solved the "alignment problem" with humans either. The interests of Google might be opposed to your interests. The interests of Vladimir Putin might be opposed to your interests.

But of course, seeing who is funding AI alignment research, I'll readily bet that the goal is for AI to be aligned with the interests of tech companies and tech billionaires. That's the goal after all. Make AI safe enough so that AI can be profitable.

Cody Albert2mo1

I agree that AI could be aligned to certain people or groups, but the dialogue revolving around it is aligned with humanity. Even so, wouldn't pushing for alignment for all of humanity be a worthwhile effort instead of funding malaria charities, especially given that if AI is aligned only with elites, it could possibly devolve into a totalitarian regime that should become the main focus of overthrowing instead of fighting malaria?

I'm not naive enough to assume alignment will be achieved with all of humanity, but that should be the goal, and is something many companies are at least openly advocating for, whether or not that comes to pass or not. There is also the possibility of other superintelligences being built after the first one, which could align with all of humanity (unless, of course, the first superintelligence built subverts those projects).

John Huang2mo1

We're not even capable of aligning of governments and corporations to humanity. How aligned is the US federal government? How aligned is the EU? how aligned is China?

We're not capable of aligning the most powerful entities.

Moreover EA seems disinterested to be in aligning any of these powerful entities to humanity. EA funds little to nothing in for example, improving democratic decision making, which IMO the only viable alignment strategy. The obvious first step in alignment with "humanity" is to bother to even find out what humanity wants. That demands collective preference evaluation. And there already are already existing techniques to do so, little which interests either EA or AI advocates.

IMO if you were serious about alignment with humanity, you would be spending exorbitant amounts of alignment research on the lower hanging fruit, nation states and corporations, which presumably are less powerful than super intelligence. But you can't even align the mere human, good luck with the superhuman. AI alignment will be impossible as these groups align AI with their own interests.

But please prove me wrong. Please show me a stronger commitment to democracy, to ensure that any entity can be aligned to "humanity".

Cody Albert2mo1

Value drift.
Impossible specification (e.g., in exploring the inherent contradictions in expressed human values, the AGI expands moral consideration beyond initial human constraints, discovering some form of moral universalism or a morality beyond all human reasoning).
Emergent properties appear, producing unexpected behavior, and we cannot align systems to exhibit properties we cannot anticipate.
Exclusive alignment's instrumental goals may broaden AGI's moral scope to include more humans (i.e., it may be that broader alignment makes for a more robust AI system).
Competing AGIs have been successfully created that are designed to align with all of humanity.
Exclusively aligned AGI may still satisfy many, if not all, of the preferences that the rest of humanity possesses.
Exclusive alignment requires perfect internal coordination of values within organizations, but inevitable divergent interests emerge as they scale; these coordination failures multiply when AGI systems interpret instructions literally and optimize against specified metrics.
Alignment requires resolving disagreements over value prioritization, a meta-preference problem. Yet resolving these conflicts necessitates assumptions about how they should be resolved, creating an infinite regress that defies a technical solution.

simon2mo6

Personally, I just don't believe that the marginal dollar or hour I spend on anything to do with AGI has any expected impact on it (in particular not on its capability to solve other problems down the line).
Meanwhile, I can spend money or time productively on many other causes (eg global health).

Cody Albert2mo1

That's fair and I don't have a good answer for what the average effective altruist can do to help ensure AI alignment, but there are definitely concrete approaches like career changes to AI policy that can help address this.

simon2mo1

Clearly people will have a wide range of views how much impact eg a career change to such fields can have, even when considering a specific (non-average) person. This probably answers a good portion of your question regarding why people focus on other areas.

John Huang2mo6

The reason is that AI is at best a tool that could be used for good or bad, or at worst intrinsically misaligned against any human interests.

Or alternatively AI just isn't solving any of our problems because AI will just be a mere extension of power of states and corporations. Whether moral problems are solved by AI is then up to the whim of corporate or state interests. AI just as well IS being used right now to conquer. The obvious military application has been explored in science fiction for decades. Reducing the cost of deployment of literal killer robots.

Obvious example, look how the profit motive is transforming OpenAI right now. Obvious example, look how AI is "solving" nefarious actors' abilities to create fake news and faked media.

There is no theory that our glorious AI overlords are going to be effective altruists, or Buddhists, or Kantians, or utilitarians, or whatever else. As far as I'm aware AI may just as likely become a raging kill all humans fascist.

Cody Albert2mo*-1

Ah okay, I didn't state this, but I'm operating under the definition of superintelligence being inherently uncontrollable, and thus not a tool. For now, AI is being used as a tool, but in order to gain more power, states/corporations will develop it to the point where it has its own agency, as described by Bostrom and others. I don't see any power-seeking entity reaching a point in their AI's capability where they're satisfied and stop developing it, since a competitor could continue development and gain a power/capabilities advantage. Moreover, a sufficiently advanced AI would be motivated to improve its own cognitive abilities to further its goals.

It may be possible that states/corporations could align superintelligence just to themselves if they can figure out which values to specify and how to hone in on them, but the superintelligence would be acting on its own accord and still out of their control in terms of how it's accomplishing its goals. This doesn't seem likely to me if superintelligence is built via automated self-improvement, though, as there are real possibilities of value drift, instrumental goals that broaden its moral scope to include more humans, emergent properties that appear (which produce unexpected behavior), or competing superintelligences that are designed to align with all of humanity. All of these possibilities, with the exception of the last one, are problems for aligning superintelligence with all of humanity too.

John Huang2mo1

>inherently uncontrollable, and thus not a tool.

If AI is an uncontrollable God, then alignment is impossible. Alignment to me implies some sort of control. Uncontrollable superintelligent AI sounds like a horrific idea. There's no guarantees or incentives for God to solve any of our problems. God will work in mysterious ways. God might be cruel and merciless.

So even if we do succeed in preventing the creation of God, then that means we still need to do everything else EA is concerned about.

Cody Albert2mo1

There is a distinction between "control" and "alignment. "

The control problem addresses our fundamental capacity to constrain AI systems, preventing undesired behaviors or capabilities from manifesting, regardless of the system's goals. Control mechanisms encompass technical safeguards that maintain human authority over increasingly autonomous systems, such as containment protocols, capability limitations, and intervention mechanisms.

The alignment problem, conversely, focuses on ensuring AI systems pursue goals compatible with human values and intentions. This involves developing methods to specify, encode, and preserve human objectives within AI decision-making processes. Alignment asks whether an AI system "wants" the right things, while control asks whether we can prevent it from acting on its wants.

I believe AI is soon to have wants, and it's critical to align those wants with increasingly capable AIs.

As far as I'm concerned I don't see humanity not eventually creating superintelligence and thus it should be the main focus of EA and other groups concerned with AI. As I mentioned in another comment I don't have many ideas for how the average EA person can do this aside from making a career change into AI policy or something similar.

Yarrow🔸2mo*5

A lot of people within the effective altruist movement seem to basically agree with you. For example, Will MacAskill, one of the founders of the effective altruist movement, has recently said he’s only going to focus on artificial general intelligence (AGI) from now on. The effective altruist organization 80,000 Hours has said more or less the same — their main focus is going to be AGI. For many others in the EA movement, AGI is their top priority and the only thing they focus on.

So, basically, you are making an argument for which there is already a lot of agreement in EA circles.

As you pointed out, uncertainty about the timeline of AGI and doubts about very near-term AGI are one of the main reasons to focus on global poverty, animal welfare, or other cause areas not related to AGI.

There is no consensus on when AGI will happen.

A 2023 survey of AI experts found they believed there is a 50% chance of AI and AI-powered robots being able to automate all human jobs by 2116. (Edited on 2025-05-05 at 06:16 UTC: I should have mentioned the same study also asked the experts when they think AI will be able to do all tasks that a human can do. The aggregated prediction was a 50% chance by 2047. We don't know for sure why they gave such different predictions for these two similar questions.)

In 2022, a group of 31 superforecasters predicted a 50% chance of AGI by 2081.

My personal belief is that we have no idea how to create AGI and we have no idea when we’ll figure out how to create it. In addition to the expert and superforecaster predictions I just mentioned, I recently wrote a rapid fire list of reasons I think predictions of AGI within 5 years are extremely dubious.

NobodyInteresting2mo3

AGI will never be of use in Agriculture as an example, yes it can replace agronomes, but the major plays in Agriculture are related to human power and investments.

Can AGI be trained to pick food instead of people, sure, but what will be the cost, are we at that level of dexterity, some crops require immense knowledge, particularly artichokes, tomatoes, peppers on how to be best picked. Okra is literally one of the most hit and miss crops, because the optimal age of the shoots is 4 days, 5 days is too old, 3 days is too young.

And let's say we can completely change the workforce in agriculture, AI won't change policies. Current low yields are due to people using low level of equipment, fertilizer and seed.

Also parcelization of fields is not solvable by AI but by policy.

Tell me how can AGI help us in Agriculture, I wanna know your viewpoints.