Hide table of contents
This is a Draft Amnesty Week draft. It may not be polished, up to my usual standards, fully thought through, or fully fact-checked. 

TLDR: I made a back-of-the-envelope model for the value of steering the future of AI (link here).

I started with four questions:

a) How morally aligned can we expect the goals of an ASI to be?

b) How morally aligned can we expect future human goals to be?

c) How much can we expect ASI to increase or decrease human agency?

d) How would a [stronger AIS movement] affect these expectations?

Here, by agency, I mean the proportion of decisions made based on someone’s expressed preferences. In my model, I compare a world in which a superintelligence (ASI) suddenly arises (World A) with a world in which there is a boom of AI safety research (World B) or one possible goal of AI governance is achieved (World C). More in the doc.

Although Bostrom (12), Ord (Precipice, Chapter 1) or MacAskill (WWOTF) tackle all of these questions, I’m not aware of a post which would put them into a single “interactive” Excel, so that’s what I tried to doSeeing how they weigh in my mind makes me think consciousness and the reliability of progress are somewhat under-discussed parts of the equation.

I was influenced by Joscha Bach’s arguments from this debate based on

  1. a strong credence humanity will go extinct without AGI (based on the Limits to Growth reports)
  2. a theory of valence that infers normativity from evolved preferences (presumably making a future based on a random AI value function likely good)

The growth extinction angle is based on resource depletion, which I explored in this post, unable to find a credible basis for Bach’s argument. However, I think it’s reasonable to question the value of x-risk reduction if one’s uncertain that civilization without AGI could yield much positive value. Similarly, I think Bach’s specific theory of valence is likely wrong but grant that the hypothesized conclusion should be taken seriously based on a wider range of views on consciousness and AI.

As a result, my guess is that whether or not AI safety will succeed at steering the values of ASI, the future will be better than today. However, these considerations haven’t changed my general outlook: It’s much more likely that the future will be good if humanity makes a conscious effort to shape the trajectory and values of ASI and this conclusion seems robust even to quite exotic considerations.

Nevertheless, my reflection highlighted a few ideas:

1. Alignment isn’t just about the control problem

Yes, AIS increases human agency (question C) but also the probability that the amount of agency given to humans or ASI will depend on their moral alignment (interactions C-A and C-B), as well as directly improving the probability any AI that will be developed will be morally aligned (question A). To a limited extent, AIS may also improve human (moral) decision-making (question B) via the routes discussed within AI ethics (such as preventing the rise of extremism via AI manipulation).

2. Increasing human agency does not guarantee positive outcomes.

It seems a truly long-lasting value lock-in is only possible with a heavy help of AI. Therefore, the risk that we would solve the alignment problem but nevertheless irrationally prevent ourselves from building a friendly AI, seems very low - relative to the billions of years we’ve got to realize our potential, cultural evolution is quick. More on this in point 5). 

This consideration also potentially suggests that one possible risk of increasing humanity’s attempts to carefully shape AI values could be increasing the chances of a value lock-in. However, I think that if we solve the control problem (i.e. humans will stay in the decision loop), an AI capable of a value lock-in would understand how our meta-values interact with our true values. In other words, coherent extrapolated volition is a more rational way of interpreting goals than taking them literally, so I have a decent faith an aligned ASI would recognize that. And it doesn’t seem like there are important differences in CEV, that is meta-values (more in point 6).

I think there's a big chance I'm wrong here. If ASI arises by scaling a LLM, it could be analogous to a human who is very smart in terms of System 1 (can instantly produce complex plans to achieve goals) but not so rational, i.e. bright in terms of System 2 (doesn't care to analyze, how philosophically coherent these goals are). However, these scenarios seem like precisely the kind of problem reduced by increasing the attention oriented towards AI safety.

3. Consciousness, progress and uncertainty seem like key factors.

Understanding consciousness seems important to evaluate, what value we would lose if an AI proceeded to convert the universe's resources according to whatever value function which would happen to win the AI race. I explored this interaction more in a previous post.

Understanding progress seems important to evaluate whether humanity would be better equipped to create an ASI in 100 or 1000 years. For this purpose, I think "better equipped" can be nicely operationalized in a very value-uncertain way as "making decisions based on more reflection & evidence and higher-order considerations". Part of this question is whether morally misaligned actors, such as authoritarian regimes or terrorists may utilize this time to catch up and perhaps use an AI to halt humanity's potential (5).

The specific flavor of uncertainty we choose seems crucial. If it pushes us towards common-sense morality or if it pushes us to defer to later generations, AIS seems like a clear top-priority. If it pushes us towards views that assign moral patienthood to AI, it may decrease some forms of AIS (an infinite pause) while increasing others (e.g. implementing reliable AI philosophy / meta-cognition, see Chi's recent post) (6).

4. Increasing ASI agency does not guarantee negative outcomes.

Orthogonality thesis, as proposed by Bostrom is hard to disagree with - it does seem possible to imagine an AI holding any combination of goals and intelligence. However, the thesis alone doesn’t rule out a possible correlation - i.e. the possibility that given somewhat flexible goals, it’s more likely that an AI will be morally aligned, as opposed to misaligned.

Given the grand uncertainty and importance of these questions, hoping that such a correlation exists would be a terrible plan. Nevertheless, there’s a few interesting reasons one might think it does:

  • Humans act as an existence proof that alignment with morality "by default" is possible (or even likely on priors) - Batson suggests people treat others' wellbeing as an intrinsic value (i.e. true altruism exists), which is why I suspect the CEV of most of humanity would converge on a world model close to the moral ideal. However:
    • This approach could be biased by anthropic effects - if we hadn't developed morality, we wouldn't be talking about it.
    • Some suggest RLHF could be analogous to this process, most disagree
  • It could be that positive value means the fulfillment of preferences. In this way, virtually any ASI capable of having coherent preferences may be maximizing moral value by realizing them.
  • If an AI starts to reflect on what it should aim to achieve, it may have to solve what “it” is, i.e. the philosophy of self. It may conclude (personal) identity is an unsustainable concept and accept open individualism or a kind of veil of ignorance - if you don't know in which intelligent entity you will be the next moment, you should optimize for everyone's well-being.
  • Consciousness may be changing the architecture of intelligent networks. Or vice versa, intelligent networks may naturally benefit from creating positive qualia.

5. Progress with humans in charge seems reliable

As long as humans have agency - a collective leverage against actors who would like to take power into their own hands - any value dissatisfaction creates a tension and therefore, over the long run, systems that are positive for human well-being seem more stable. A much more speculative question is, to what extent this dynamic also selects for worldviews that are more congruent with people's belief that they are rational and moral. Both of these questions seem uncertain, however I think there are good reasons to believe that the evolution of democracy and expansion of the moral circle were not flukes, but a result of a selection for belief systems that are indeed congruent with evidence, higher-order reflection and human well-being.

  • Mainly, democracies seem more stable than autocracies. The typical story of both right and left authoritarian regimes of the 20th century seems to be a spontaneous collapse - or (in the case of China or Vietnam) adaptation to become more tolerable. The spirit of democracy seems so omnipresent that existing authoritarian regimes generally pay lip service to democracy and seem pressured to accustom to the opinions of the populations. In China, around 90 % of people support democracy, in Arabic countries, this figure reaches around 72 %.
  • One could fear that the disproportion between the birth rates among religious fundamentalists and the cosmopolitan population could make us expect the future to have less rational values. To evaluate this hypothesis, one could inspect the demographic projections of religiosity, as a crude heuristic. Indeed, a look at the global projection for 2050 shows a 3% decline in irreligion. However, I suspect that as the demographic revolution unfolds and people become richer, religious practice will become more reminiscent of the rich parts of the Arabic world. Eventually, I think we should expect these regions to follow the current demographic trends in the US, where secularism is on the rise. Here, my point isn't to argue these specific trends are necessarily optimistic but rather, that in societies with material abundance and information access, horizontal memetic cultural evolution (ideas spreading) seems to be more effective than vertical one (ideas getting “inherited”), e.g. selecting out cultural systems that are strictly antagonistic to science.
  • One could fear that populism will get more intense with AI, leading to worse governance. I think this is a problem we should be taking seriously. Nevertheless, I again think the evidence leans towards optimism. Firstly, AI may also improve the defense of social media against fake news. Secondly, new populism does not seem dependent on false information per se - rather, misleading interpretations of reality. Being 1 SD more exposed to fake news only increases populist voting by 0,19 SD. Similarly, conspiracy beliefs don't seem to have changed much over the last years. And importantly, our credence towards video evidence seems to be matching the decreasing costs of creating deep fakes. Currently, deep fakes that are impossible to recognize are already very easy to make - nevertheless none of the attempts to change wars and elections in this way seemed to have made a significant difference so far.

Let’s say humans won’t become an interplanetary species. In such a case, I’d expect our species to continue thriving on this planet for the remaining lifetime of Earth, i.e. something like 500 million years. Let’s say current AI safety efforts do overshoot and in result, our civilization implements a tough international law that prevents civilization from making use of the positive side of AI and spreading between the stars. This could constitute a suboptimal lock-in. However, it seems unlikely to me that without AI, humans would be able to lock-in a bad idea for long enough to matter. In the 17th century slavery and witch trials were commonly accepted. If it took us a hundred times longer to reach some moral threshold, we would have just used up 0.006% of the remaining lifetime of our planet. In the optimistic scenario we utilize the full lifetime of the universe, this time could be trillion times longer.

6. “Indiscriminate moral uncertainty” supports AIS

Naively, absolute moral uncertainty would imply practical moral nihilism. Every moral claim would have a 50 % probability of being true - therefore, there’s no reason to judge actions on moral basis. However, such a position requires ~100 % credence that for each claim, this probability is indeed 0.5 and no further inspection can move it by any margin, which is paradoxically an expression of ridiculous certainty. True moral uncertainty probably leads to attempts to increase humanity’s philosophical reflection. This seems philosophically very straight-forward:

  • Yes, humans disagree about values to an extent that assigns negative value to most charitable attempts from some perspective. However, compared to our value disagreements, our meta-value disagreements seem incredibly small - nearly all believe to choose beliefs according to what is true and what brings fulfillment.
  • It seems hard to argue that more reflection gets us farther from the truth. And it seems hard to argue that knowing the truth brings us less of what we meta-value. Therefore, steering progress towards reflection seems like a robust way to increase the fulfillment of humanity’s meta-values.

AIS could be a necessary precursor to make sure we have time for such reflection. This is a less “obviously true” statement but the uncertainty is epistemic, not moral. And provided ASI doesn’t happen in our lifetimes, such effort would merely be a waste, not actively harmful, which seems positive from the position of a "sincere" moral uncertainty.

Lastly, more uncertainty about cause X increases the necessity to develop an (aligned) ASI. For instance, one could argue that perhaps the universe is full of deadly rays that wipe life out the moment they meet it but we can’t observe any signs of it, because once we could observe them, we’d already be dead. However, I think the Grabby Aliens model provides an interesting argument against this reasoning - just based on conventional assumptions about the great filters, our civilization is suspiciously early in the universe (see this fun animated explainer). Therefore, any additional historical strong selection effect seems unlikely on priors.

6

0
1

Reactions

0
1

More posts like this

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
 ·  · 25m read
 · 
Epistemic status: This post — the result of a loosely timeboxed ~2-day sprint[1] — is more like “research notes with rough takes” than “report with solid answers.” You should interpret the things we say as best guesses, and not give them much more weight than that. Summary There’s been some discussion of what “transformative AI may arrive soon” might mean for animal advocates. After a very shallow review, we’ve tentatively concluded that radical changes to the animal welfare (AW) field are not yet warranted. In particular: * Some ideas in this space seem fairly promising, but in the “maybe a researcher should look into this” stage, rather than “shovel-ready” * We’re skeptical of the case for most speculative “TAI<>AW” projects * We think the most common version of this argument underrates how radically weird post-“transformative”-AI worlds would be, and how much this harms our ability to predict the longer-run effects of interventions available to us today. Without specific reasons to believe that an intervention is especially robust,[2] we think it’s best to discount its expected value to ~zero. Here’s a brief overview of our (tentative!) actionable takes on this question[3]: ✅ Some things we recommend❌ Some things we don’t recommend * Dedicating some amount of (ongoing) attention to the possibility of “AW lock ins”[4]  * Pursuing other exploratory research on what transformative AI might mean for animals & how to help (we’re unconvinced by most existing proposals, but many of these ideas have received <1 month of research effort from everyone in the space combined — it would be unsurprising if even just a few months of effort turned up better ideas) * Investing in highly “flexible” capacity for advancing animal interests in AI-transformed worlds * Trying to use AI for near-term animal welfare work, and fundraising from donors who have invested in AI * Heavily discounting “normal” interventions that take 10+ years to help animals * “Rowing” on na
 ·  · 3m read
 · 
About the program Hi! We’re Chana and Aric, from the new 80,000 Hours video program. For over a decade, 80,000 Hours has been talking about the world’s most pressing problems in newsletters, articles and many extremely lengthy podcasts. But today’s world calls for video, so we’ve started a video program[1], and we’re so excited to tell you about it! 80,000 Hours is launching AI in Context, a new YouTube channel hosted by Aric Floyd. Together with associated Instagram and TikTok accounts, the channel will aim to inform, entertain, and energize with a mix of long and shortform videos about the risks of transformative AI, and what people can do about them. [Chana has also been experimenting with making shortform videos, which you can check out here; we’re still deciding on what form her content creation will take] We hope to bring our own personalities and perspectives on these issues, alongside humor, earnestness, and nuance. We want to help people make sense of the world we're in and think about what role they might play in the upcoming years of potentially rapid change. Our first long-form video For our first long-form video, we decided to explore AI Futures Project’s AI 2027 scenario (which has been widely discussed on the Forum). It combines quantitative forecasting and storytelling to depict a possible future that might include human extinction, or in a better outcome, “merely” an unprecedented concentration of power. Why? We wanted to start our new channel with a compelling story that viewers can sink their teeth into, and that a wide audience would have reason to watch, even if they don’t yet know who we are or trust our viewpoints yet. (We think a video about “Why AI might pose an existential risk”, for example, might depend more on pre-existing trust to succeed.) We also saw this as an opportunity to tell the world about the ideas and people that have for years been anticipating the progress and dangers of AI (that’s many of you!), and invite the br
 ·  · 12m read
 · 
I donated my left kidney to a stranger on April 9, 2024, inspired by my dear friend @Quinn Dougherty (who was inspired by @Scott Alexander, who was inspired by @Dylan Matthews). By the time I woke up after surgery, it was on its way to San Francisco. When my recipient woke up later that same day, they felt better than when they went under. I'm going to talk about one complication and one consequence of my donation, but I want to be clear from the get: I would do it again in a heartbeat. Correction: Quinn actually donated in April 2023, before Scott’s donation. He wasn’t aware that Scott was planning to donate at the time. The original seed came from Dylan's Vox article, then conversations in the EA Corner Discord, and it's Josh Morrison who gets credit for ultimately helping him decide to donate. Thanks Quinn! I met Quinn at an EA picnic in Brooklyn and he was wearing a shirt that I remembered as saying "I donated my kidney to a stranger and I didn't even get this t-shirt." It actually said "and all I got was this t-shirt," which isn't as funny. I went home and immediately submitted a form on the National Kidney Registry website. The worst that could happen is I'd get some blood tests and find out I have elevated risk of kidney disease, for free.[1] I got through the blood tests and started actually thinking about whether to do this. I read a lot of arguments, against as well as for. The biggest risk factor for me seemed like the heightened risk of pre-eclampsia[2], but since I live in a developed country, this is not a huge deal. I am planning to have children. We'll just keep an eye on my blood pressure and medicate if necessary. The arguments against kidney donation seemed to center around this idea of preserving the sanctity or integrity of the human body: If you're going to pierce the sacred periderm of the skin, you should only do it to fix something in you. (That's a pretty good heuristic most of the time, but we make exceptions to give blood and get pier