This is a special post for quick takes by Lukas_Gloor. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
This is still in brainstorming stage; I think there's probably a convincing line of argument for "AI alignment difficulty is high at least on priors" that includes the following points:
Many humans don't seem particularly aligned to "human values" (not just thinking of dark triad traits, but also things like self-deception, cowardice, etc.)
There's a loose analogy where AI is "more technological progress," and "technological progress" so far hasn't always been aligned to human flourishing (it has solved or improved a lot of long-term problems of civilization, like infant mortality, but has also created some new ones, like political polarization, obesity, unhappiness from constant bombardement with images of people who are richer and more successful than you, etc.). So, based on this analogy, why think things will somehow fall into place with AI training so that the new forces that be will for once become aligned?
AI will accelerate everything, and if you accelerate something that isn't set up in a secure way, it goes off the rails ("small issues will be magnified").
I think that a corollary of the first point is that we can learn a lot about alignment by looking at humans who seem unusually aligned to human values (although I think more generally to the interests of all conscious beings), e.g. highly attained meditators with high integrity, altruistic motivations, rationality skills, and a healthy balance of sytematizer and empathizer mindsets. From phenomenological reports, their subagentic structures seem quite unlike anything most of us experience day to day. That, plus a few core philosophical assumptions, can get you a really long way in deducing e.g. Anthropic's constitutional AI principles from first principles.
I’m probably going to win the first round of the Li Wenliang forecasting tournament on Metaculus, or maybe get second. (My screen name shows up in second on the leaderboard, but it’s a glitch that’s not resolved yet because one of the resolutions depends on a strongly delayed source.) (Update: I won it!)
With around 52 questions, this was the largest forecasting tournament on the virus. It ran from late February until early June.
I learned a lot during the tournament. Next to claiming credit, I want to share some observations and takeaways from this forecasting experience, inspired by Linch Zhang’s forecasting AMA:
I did well at forecasting, but it came at the expense of other things I wanted to do. In February, March and April, Covid had completely absorbed me. I spent several hours per day reading news and had anxiety about regularly updating my forecasts. This was exhausting; I was relieved when the tournament came to an end.
I had previously dabbled in AI forecasting. Unfortunately, I can’t tell if I excelled at it because the Metaculus domain for it went dormant. In any case, I noticed that I felt more motivated to delve into Covid questions because they seemed more connected. It felt like I was not only learning random information to help me with a single question, but I was acquiring a kind of expertise. (Armchair epidemiology? :P ) I think this impression was due to a mixture of perhaps suboptimal question design for the AI Metaculus domain and the increased difficulty of picking up useful ML intuitions on the go.
One thing I think I’m good at is identifying reasons why past trends might change. I’m always curious to understand the underlying reasons behind some trends. I come up with lots of hypotheses because I like the feeling of generating a new insight. I often realized that my hunches were wrong, but in the course of investigating them, I improved my understanding.
I have an aversion to making complex models. I always feel like model uncertainty is too large anyway. When forecasting Covid cases, I mostly looked for countries where similar situations have already played out. Then, I’d think about factors that might be different with the new situation, and make intuition-based adjustments in the direction predicted by the differences.
I think my main weakness is laziness. Occasionally, when there’s an easy way to do it, I’d spot-check hypotheses by making predictions about past events that I hadn’t yet read about. However, I don’t do this nearly enough. Also, I rely too much on factoids I picked up from somewhere without verifying how accurate they are. For instance, I had it stuck in my head that someone said that the case doubling rate was 4 days. So, I operated with this assumption for many days of forecasting, before realizing that it’s actually looking like 2.5 days in densely populated areas and that I should anyway have spent more time looking firsthand into this crucial variable. Lastly, I noticed a bunch of times that other forecasters were talking about issues I don't have a good grasp on (e.g., test-positivity rates), and I felt that I'd probably improve my forecasting if I looked into it, but I preferred to stick with approaches I was more familiar with.
IT skills really would have helped me generate forecasts faster. I had to do crazy things with pen and paper because I lacked them. (But none of what I did involved more than elementary-school math.)
I learned that confidently disagreeing with the community forecast is different from “not confidently agreeing.” I lost a bunch of points twice due to underconfidence. In cases where I had no idea about some issue and saw the community predict <10%, I didn’t want to go <20% because that felt inappropriate given my lack of knowledge about the plausible-sounding scenario. I couldn't confidently agree with the community, but since I also didn't confidently disagree with them, I should have just deferred to their forecast. Contrarianism is a valuable skill, but one also has to learn to trust others in situations where one sees no reason not to.
I realized early that when I changed my mind on some consideration that initially had me predict different from the community median, I should make sure to update thoroughly. If I no longer believe my initial reason for predicting significantly above the median, maybe I should go all the way to slightly below the median next. (The first intuition is to just move closer to it but still stay above.)
From playing a lot of poker, I have the habit of imagining that I make some bet (e.g., a bluff or thin value bet) and it will turn out that I’m wrong in this instance. Would I still feel good about the decision in hindsight? This heuristic felt very useful to me in forecasting. It made me reverse initially overconfident forecasts when I realized that my internal assumptions didn’t feel like something I could later on defend as “It was a reasonable view at the time.”
I made a couple of bad forecasts after I stopped following developments every day. I realized I needed to re-calibrate how much to trust my intuitions once I no longer had a good sense of everything that was happening.
Some things I was particularly wrong about:
This was well before I started predicting on Metaculus, but up until about February 5th, I was way too pessimistic about the death rate for young healthy people. I think I lacked the medical knowledge to have the right prior about how strongly age-skewed most illnesses are, and therefore updated too strongly upon learning about the deaths of two young healthy Chinese doctors.
Like others, I overestimated the importance of hospital overstrain. I assumed that this would make the infection fatality rate about 1.5x–2.5x worse in countries that don’t control their outbreaks. This didn’t happen.
I was somewhat worried about food shortages initially, and was surprised by the resilience of the food distribution chains.
I expected more hospitalizations in Sweden in April.
I didn’t expect the US to put >60 countries on the level-3 health warning travel list. I was confident that they would not do this, because “If a country is gonna be safer than the US itself, why not let your citizens travel there??”
I was nonetheless too optimistic about the US getting things under control eventually, even though I saw comments from US-based forecasters who were more pessimistic.
My long-term forecasts for case numbers tended to be somewhat low. (Perhaps this was in part related to laziness; the Metaculus interface made it hard to create long tails for the distribution.)
Some things I was particularly right about:
I was generally early to recognize the risks from novel coronavirus / Covid.
For European countries and the US initially, I expected lockdown measures to work roughly as well as they did. I confidently predicted lower than the community for the effects of the first peak.
I somewhat confidently ruled out IFR estimates <0.5% in early March already, and I think this was for good reasons, even though I continued to accumulate better evidence for my IFR predictions later and was wrong about the effects of hospital overstrain.
I very confidently doubled down against <0.5% IFR estimates in late March, despite the weird momentum that developed around taking them seriously, and the confusion about the percentage of asymptomatic cases.
I have had very few substantial updates since mid March. I predicted the general shape of the pandemic quite well, e.g. here or here.
I confidently predicted that the UK and the Netherlands (later) would change course about their initial “no lockdown” policy.
I noticed early that Indonesia had a large undetected outbreak. A couple of days after I predicted this, the deaths there jumped from 1 to 5 and its ratio of confirmed cases to deaths became the worst (or second worst?) in the world at the time.
(I have stopped following the developments closely by now.)
+1 to the congratulations from JP! I may have mentioned this before, but I considered your forecasts and comments for covidy questions to be the highest-quality on Metaculus, especially back when we were both very active.
You may not have considered it worth your time in the end, but I still think it's good for EAs to do things that on the face of it seem fairly hard, and develop better self models and models of the world as a result.
This was a great writeup, thanks for taking the time to make it. Congrats on the contest, too!
I'm sorry to hear your experience was stressful. Do you intend to go back to Metaculus in a more relaxed way? I know some users restrict themselves to a subset of topics, for example.
I'm not following the developments anymore. I could imagine that the IFR is now lower than it used to be in April because treatment protocols have improved.
[AI pause can be the best course of action even if the baseline risk of AI ruin were "only" 5%]
Some critics of pausing frontier AI research argue that AI pause would be an extreme measure (curtailing tech progress, which has historically led to so much good) that is only justifiable if we have very high confidence that the current path leads to ruin.
On the one hand, I feel like the baseline risk is indeed very high (I put it at 70% all-things-considered).
At the same time, I'm frustrated that there's so much discussion of "is the baseline risk really that high?" compared to "is not pausing really the optimal way for us to go into this major civilizational transition?"
I feel like arguing about ruin levels can be distraction from what should be a similarly important crux. Something has gone wrong if people think pausing can only make sense if the risks of AI ruin are >50%.
The key question is: Will pausing reduce the risks?
Even if the baseline risk were "only" 5%, assuming we have a robust argument that pausing for (say) five years will reduce it to (say) 4%, that would clearly be good! (It would be very unfortunate for the people who will die preventable deaths in the next five years, but pausing would still be better on net, on these assumptions.)
So, what assumptions would have to be true that continuing ahead is better than pausing?
(Also, if someone is worried that there are negative side effects from pausing, such as that it'll be politically/societally hard to re-start things later on after alignment research made breakthroughs or if some Western-values unaligned country is getting closer to building TAI themselves, that's a discussion worth having! However, then we have to look at "the best implementation of pausing we can realistically get if we advocate for the smartest thing," and not "a version of pausing that makes no attempts whatsoever to reduce bad side effects.")
I think the factor missing here is the matter of when pushing for a pause is appropriate.
Like, imagine a (imo likely) scenario where a massive campaign gets off, with a lot of publicity behind it, to try and prevent GPT-5 from being released on existential risk grounds. It fails, and GPT-5 is released anyway , and literally nothing majorly bad happens. And then the same thing happens for gpt-6 and gpt-7.
In this scenario, the idea of pausing AI could easily become a laughing stock. Then when an actually dangerous AI comes out, the idea of pausing is still discredited, and you're missing a tool when you really actually need it.
Even if I believed the risk of overall doom was 5% (way too high imo), I wouldn't support the pause movement now, I'd want to wait on advocating a pause until there was a significant chance of imminent danger.
The point I wanted to make in the short form was directed at a particular brand of skeptic.
When I said,
Something has gone wrong if people think pausing can only make sense if the risks of AI ruin are >50%.
I didn't mean to imply that anyone who opposes pausing would consider >50% ruin levels their crux.
Likewise, I didn't mean to imply that "let's grant 5% risk levels" is something that every skeptic would go along with (but good that your comment is making this explicit!).
For what it's worth, if I had to give a range for how much I think people I, at the moment, epistemically respect to the highest extent possible, can disagree on this question today (June 2024), I would probably not include credences <<5% in that range (I'd maybe put it a more like 15-90%?). (This is of course subject to change if I encounter surprisingly good arguments for something outside the range.) But that's a separate(!) discussion, separate from the conditional statement that I wanted to argue for in my short form. (Obviously, other people will draw the line elsewhere.)
On the 80k article, I think it aged less well than what one maybe could've written at the time, but it was written at a time when AI risk concerns still seemed fringe. So, just because it in my view didn't age amazingly doesn't mean that it was unreasonable at the time. At the time, I'd have probably called it "lower than what I would give, but seems within the range of what I consider reasonable."
There’s an uncontroversial interpretation and a controversial one.
Vague and uncontroversial claim: When we say that pleasure is good, we mean that all else equal, pleasure is always unobjectionable, and often it is desired.
Specific and controversial claim: When we say that pleasure is good, what we mean is that, all else equal, pleasure is an end we should be striving for. This captures points like:
that pleasure is in itself desirable,
that no mental states without pleasure are in itself desirable,
that more pleasure is always better than less pleasure.
People who say “pleasure is good” claim that we can establish this by introspection about the nature of pleasure. I don’t see how one could establish the specific and controversial claim from mere introspection. After all, even if I personally valued pleasure in the strong sense (I don’t), I couldn’t, with my own introspection, establish that everyone does the same. People’s psychologies differ, and how pleasure is experienced in the moment doesn’t fully determine how one will relate to it. Whether one wants to dedicate one’s life (or, for altruists, at least the self-oriented portions of one's life) to pursuing pleasure depends on more than just what pleasure feels like.
Therefore, I think pleasure is only good in the weak sense. It’s not good in the strong sense.
Another argument that points to "pleasure is good" is that people and many animals are drawn to things that gives them pleasure, and that generally people communicate about their own pleasurable states as good. Given a random person off the street, I'm willing to bet that after introspection they will suggest that they value pleasure in the strong sense. So while this may not be universally accepted, I still think it could hold weight.
Also, a symmetric statement can be said regarding suffering, which I don't think you'd accept. People who say "suffering is bad" claim that we can establish this by introspection about the nature of suffering.
From reading Tranquilism, I think that you'd respond to these as saying that people confuse "pleasure is good" with an internal preference or craving for pleasure, while suffering is actually intrinsically bad. But taking an epistemically modest approach would require quite a bit of evidence for that, especially as part of the argument is that introspection may be flawed.
I'm curious as to how strongly you hold this position. (Personally, I'm totally confused here but lean toward the strong sense of pleasure is good but think that overall pleasure holds little moral weight)
Another argument that points to "pleasure is good" is that people and many animals are drawn to things that gives them pleasure
It's worth pointing out that this association isn't perfect. See [1] and [2] for some discussion. Tranquilism allows that if someone is in some moment neither drawn to (craving) (more) pleasurable experiences nor experiencing pleasure (or as much as they could be), this isn't worse than if they were experiencing (more) pleasure. If more pleasure is always better, then contentment is never good enough, but to be content is to be satisfied, to feel that it is good enough or not feel that it isn't good enough. Of course, this is in the moment, and not necessarily a reflective judgement.
I also approach pleasure vs suffering in a kind of conditional way, like an asymmetric person-affecting view, or "preference-affecting view":
I would say that something only matters if it matters (or will matter) to someone, and an absence of pleasure doesn't necessarily matter to someone who isn't experiencing pleasure, and certainly doesn't matter to someone who does not and will not exist, and so we have no inherent reason to promote pleasure. On the other hand, there's no suffering unless someone is experiencing it, and according to some definitions of suffering, it necessarily matters to the sufferer. (A bit more on this argument here, but applied to good and bad lives.)
I agree that pleasure is not intrinsically good (i.e. I also deny the strong claim). I think it's likely that experiencing the full spectrum of human emotions (happiness, sadness, anger, etc.) and facing challenges are good for personal growth and therefore improve well-being in the long run. However, I think that suffering is inherently bad, though I'm not sure what distinguishes suffering from displeasure.
[I’m an anti-realist because I think morality is underdetermined]
I often find myself explaining why anti-realism is different from nihilism / “anything goes.” I wrote lengthy posts in my sequence on moral anti-realism (2 and 3) about partly this point. However, maybe the framing “anti-realism” is needlessly confusing because some people do associate it with nihilism / “anything goes.” Perhaps the best short explanation of my perspective goes as follows:
I’m happy to concede that some moral facts exist (in a comparatively weak sense), but I think morality is underdetermined.
This means that beyond the widespread agreement on some self-evident principles, expert opinions won’t converge even if we had access to a superintelligent oracle. Multiple options will be defensible, and people will gravitate to different attractors in value space.
I think if you concede that some moral facts exist, it might be more accurate to call yourself a moral realist. The indeterminacy of morality could be a fundamental feature, allowing for many more acts to be ethically permissible (or no worse than other acts) than with a linear (complete) ranking. I think consequentialists are unusually prone to try to rank outcomes linearly.
I read this recently, which describes how moral indeterminacy can be accommodated within moral realism, although it was kind of long for what it had to say. I think expert agreement (or ideal observers/judges) could converge on moral indeterminacy: they could agree that we can't know how to rank certain options and further that there's no fact of the matter.
Thanks for bringing up this option! I don't agree with this framing for two reasons:
As I point out in my sequence's first post, some ways in which "moral facts exist" are underwhelming.
I don't think moral indeterminacy necessarily means that there's convergence of expert judgments. At least, the way in which I think morality is underdetermined explicitly predicts expert divergence. Morality is "real" in the sense that experts will converge up to a certain point, and beyond that, some experts will have underdetermined moral values while others will have made choices within what's allowed by indeterminacy. Out of the ones that made choices, not all choices will be the same.
I think what I describe in the second bullet point will seem counterintuitive to many people because they think that if morality is underdetermined, your views on morality should be underdetermined, too. But that doesn't follow! I understand why people have the intuition that this should follow, but it really doesn't work that way when you look at it closely. I've been working on spelling out why.
[When thinking about what I value, should I take peer disagreement into account?]
Consider the question “What’s the best career for me?”
When we think about choosing careers, we don’t update to the career choice of the smartest person we know or the person who has thought the most about their career. Instead, we seek out people who have approached career choice with a similar overarching goal/framework (in my case, 80,000 Hours is a good fit), and we look toward the choices of people with similar personalities (in my case, I notice a stronger personality overlap with researchers than managers, operations staff, or those doing earning to give).
When it comes to thinking about one’s values, many people take peer disagreement very seriously.
I think that can be wise, but it shouldn’t be done unthinkingly. I believe that the quest to figure out one’s values shares strong similarities with the quest of figuring out one’s ideal career. Before deferring to others with one's deliberations, I recommend making sure that others are asking the same questions (not everything that comes with the label “morality” is the same) and that they are psychologically similar in the ways that seem fundamental to what you care about as a person.
If I think my goals are merely uncertain, but in reality they are underdetermined and the contributions I make to shaping the future will be driven, to a large degree, by social influences, ordering effects, lock-in effects, and so on, is that a problem?
I can’t speak for others, but I’d find it weird. I want to know what I’m getting up for in the morning.
On the other hand, because it makes it easier for the community to coordinate and pull things in the same directions, there's a sense in which underdetermined values are beneficial.
[New candidate framing for existential risk reduction]
The default [edit: implicit]framing for reducing existential risk is something like this. "Currently, humans have control over what we want, but there's a risk that we would lose this control. For instance, transformative AI that's misaligned with what we'd want could prevent us from actualizing good futures."
I don't find this framing particularly compelling. I don't feel like people are particularly "in control of things." There are areas/domains where our control is growing, but there are also areas/domains where it is waning (e.g., cost disease; dysfunctional institutions). (Or, instead of "control waning," we can also think of misaligned forces taking away some of our control – for instance with filter bubbles and other polarizing forces reducing the sense that all people have a shared reality.)
The framing I find most compelling is the following:
"Humans aren't particularly in control of things, but there are areas where technological progress has given us surprisingly advanced capabilities, and every now and then, some groups of people manage to use those capabilities really well. If we want to reduce existential risks, we'd require almost god-like degrees of control over the future and the wisdom/foresight to use it to our advantage. AI risk, in particular, seems especially important from this perspective – for two reasons. (1) AI will likely be radically transformative. Since it's generally much easier to design good systems from scratch rather than make tweaks to existing systems, transformative AI (precisely because of its potential to be transformative) is our best chance to get in control of things. (2) If we fail to align AI, we won't be left in a position where we could attain control over things later."
Fwiw (1) is more naturally phrased as an opportunity associated with AI than a risk ("AI opportunity" vs "AI risk"). And if so you may want to use another term than "existential risk reduction" for the concept that you've identified.
"The default framing for reducing existential risk is something like this. "Currently, humans have control over what we want, but there's a risk that we would lose this control"
Can you perhaps point to some examples?
To me it seems that the default framing is often focused on extinction risks, and then non-extinction existential risks are mentioned as a sort of secondary case. Under this framing you're not really mentioning the issue of control, but are rather mostly focusing on the distinction between survival and extinction.
Maybe you had specific writings (focusing on AI risk?) in mind though?
Good points. I should have written that the point about control is implicit. The default framing focuses on risks, as you say, not on making something happen that gives us more control than we currently have. I think there's a natural reading of the existential risk framings that implicitly says something like "current levels of control might be adequate if it weren't for destructive risks" or perhaps "there's a trend where control increases by default and things might go well unless some risk comes about." To be clear, that's by no means a necessary implication of any text on existential risks. It's just something that is under-discussed, and the lack of discussion suggests that some people might think that way.
In discussions on the difficulty of aligning transformative AI, I've seen reference class arguments like "When engineers build and deploy things, it rarely turns out to be destructive."
I've always felt like this is pointing at the wrong reference class.
My above comment on framings explains why. I think the reference class for AI alignment difficulty levels should be more like: "When have the people who deployed transformative technology correctly foreseen long-term bad societal consequences and have taken the right costly steps to mitigate them?"
(Examples could be: Keeping a new technology secret; or facebook in an alternate history setting up a governance structure where "our algorithm affects society poorly" would receive a lot of sincere attention even at management levels, securely going forward throughout the company's existence.)
Admittedly, I'm kind of lumping together the alignment and coordination problems. Someone could have the view that "AI alignment," with a narrow definition of what counts as "aligned," is comparatively easy, but coordination could still be hard.
[Moral uncertainty and moral realism are in tension]
Is it ever epistemically warranted to have high confidence in moral realism, and also be morally uncertain not only between minor details of a specific normative-ethical theory but between theories?
I think there's a tension there. One possible reply might be the following. Maybe we are confident in the existence of some moral facts, but multiple normative-ethical theories can accommodate them. Accordingly, we can be moral realists (because some moral facts exist) and be morally uncertain (because there are many theories to choose from that accommodate the little bits we think we know about moral reality).
However, what do we make of the possibility that moral realism could be true only in a very weak sense? For instance, maybe some moral facts exist, but most of morality is underdetermined. Similarly, maybe the true morality is some all-encompassing and complete theory, but humans might be forever epistemically closed off to it. If so, then, in practice, we could never go beyond the few moral facts we already think we know for sure.
Assuming a conception of moral realism that is action-relevant for effective altruism (e.g., because it predicts reasonable degrees of convergence among future philosophers, or makes other strong claims that EAs would be interested in), is it ever epistemically warranted to have high confidence in that, and be open-endedly morally uncertain?
Another way to ask this question: If we don't already know/see that a complete and all-encompassing theory explains many of the features related to folk discourse on morality, why would we assume that such a complete and all-encompassing theory exists in a for-us-accessible fashion? Even if there are, in some sense, "right answers" to moral questions, we need more evidence to conclude that morality is not vastly underdetermined.
This is still in brainstorming stage; I think there's probably a convincing line of argument for "AI alignment difficulty is high at least on priors" that includes the following points:
I think that a corollary of the first point is that we can learn a lot about alignment by looking at humans who seem unusually aligned to human values (although I think more generally to the interests of all conscious beings), e.g. highly attained meditators with high integrity, altruistic motivations, rationality skills, and a healthy balance of sytematizer and empathizer mindsets. From phenomenological reports, their subagentic structures seem quite unlike anything most of us experience day to day. That, plus a few core philosophical assumptions, can get you a really long way in deducing e.g. Anthropic's constitutional AI principles from first principles.
I find these analogies more reassuring than worrying TBH
[Takeaways from Covid forecasting on Metaculus]
I’m probably going to win the first round of the Li Wenliang forecasting tournament on Metaculus, or maybe get second. (My screen name shows up in second on the leaderboard, but it’s a glitch that’s not resolved yet because one of the resolutions depends on a strongly delayed source.) (Update: I won it!)
With around 52 questions, this was the largest forecasting tournament on the virus. It ran from late February until early June.
I learned a lot during the tournament. Next to claiming credit, I want to share some observations and takeaways from this forecasting experience, inspired by Linch Zhang’s forecasting AMA:
Some things I was particularly wrong about:
Some things I was particularly right about:
(I have stopped following the developments closely by now.)
+1 to the congratulations from JP! I may have mentioned this before, but I considered your forecasts and comments for covidy questions to be the highest-quality on Metaculus, especially back when we were both very active.
You may not have considered it worth your time in the end, but I still think it's good for EAs to do things that on the face of it seem fairly hard, and develop better self models and models of the world as a result.
I know it might not be what you're looking for, but congratulations!
This was a great writeup, thanks for taking the time to make it. Congrats on the contest, too! I'm sorry to hear your experience was stressful. Do you intend to go back to Metaculus in a more relaxed way? I know some users restrict themselves to a subset of topics, for example.
Can you provide some links on the latest IFR estimates? A quick Google search leads me to the same 0.5% ballpark.
I'm not following the developments anymore. I could imagine that the IFR is now lower than it used to be in April because treatment protocols have improved.
[AI pause can be the best course of action even if the baseline risk of AI ruin were "only" 5%]
Some critics of pausing frontier AI research argue that AI pause would be an extreme measure (curtailing tech progress, which has historically led to so much good) that is only justifiable if we have very high confidence that the current path leads to ruin.
On the one hand, I feel like the baseline risk is indeed very high (I put it at 70% all-things-considered).
At the same time, I'm frustrated that there's so much discussion of "is the baseline risk really that high?" compared to "is not pausing really the optimal way for us to go into this major civilizational transition?"
I feel like arguing about ruin levels can be distraction from what should be a similarly important crux. Something has gone wrong if people think pausing can only make sense if the risks of AI ruin are >50%.
The key question is: Will pausing reduce the risks?
Even if the baseline risk were "only" 5%, assuming we have a robust argument that pausing for (say) five years will reduce it to (say) 4%, that would clearly be good! (It would be very unfortunate for the people who will die preventable deaths in the next five years, but pausing would still be better on net, on these assumptions.)
So, what assumptions would have to be true that continuing ahead is better than pausing?
(Also, if someone is worried that there are negative side effects from pausing, such as that it'll be politically/societally hard to re-start things later on after alignment research made breakthroughs or if some Western-values unaligned country is getting closer to building TAI themselves, that's a discussion worth having! However, then we have to look at "the best implementation of pausing we can realistically get if we advocate for the smartest thing," and not "a version of pausing that makes no attempts whatsoever to reduce bad side effects.")
I think the factor missing here is the matter of when pushing for a pause is appropriate.
Like, imagine a (imo likely) scenario where a massive campaign gets off, with a lot of publicity behind it, to try and prevent GPT-5 from being released on existential risk grounds. It fails, and GPT-5 is released anyway , and literally nothing majorly bad happens. And then the same thing happens for gpt-6 and gpt-7.
In this scenario, the idea of pausing AI could easily become a laughing stock. Then when an actually dangerous AI comes out, the idea of pausing is still discredited, and you're missing a tool when you really actually need it.
Even if I believed the risk of overall doom was 5% (way too high imo), I wouldn't support the pause movement now, I'd want to wait on advocating a pause until there was a significant chance of imminent danger.
Yeah, I agree. I wrote about timing considerations here; I agree this is an important part of the discussion.
"5%" is underestimating skepticism. Even those that publicized artificial intelligence risk didn't claim much higher chances:
https://80000hours.org/articles/existential-risks/
The point I wanted to make in the short form was directed at a particular brand of skeptic.
When I said,
I didn't mean to imply that anyone who opposes pausing would consider >50% ruin levels their crux.
Likewise, I didn't mean to imply that "let's grant 5% risk levels" is something that every skeptic would go along with (but good that your comment is making this explicit!).
For what it's worth, if I had to give a range for how much I think people I, at the moment, epistemically respect to the highest extent possible, can disagree on this question today (June 2024), I would probably not include credences <<5% in that range (I'd maybe put it a more like 15-90%?). (This is of course subject to change if I encounter surprisingly good arguments for something outside the range.) But that's a separate(!) discussion, separate from the conditional statement that I wanted to argue for in my short form. (Obviously, other people will draw the line elsewhere.)
On the 80k article, I think it aged less well than what one maybe could've written at the time, but it was written at a time when AI risk concerns still seemed fringe. So, just because it in my view didn't age amazingly doesn't mean that it was unreasonable at the time. At the time, I'd have probably called it "lower than what I would give, but seems within the range of what I consider reasonable."
[Is pleasure ‘good’?]
What do we mean by the claim “Pleasure is good”?
There’s an uncontroversial interpretation and a controversial one.
Vague and uncontroversial claim: When we say that pleasure is good, we mean that all else equal, pleasure is always unobjectionable, and often it is desired.
Specific and controversial claim: When we say that pleasure is good, what we mean is that, all else equal, pleasure is an end we should be striving for. This captures points like:
People who say “pleasure is good” claim that we can establish this by introspection about the nature of pleasure. I don’t see how one could establish the specific and controversial claim from mere introspection. After all, even if I personally valued pleasure in the strong sense (I don’t), I couldn’t, with my own introspection, establish that everyone does the same. People’s psychologies differ, and how pleasure is experienced in the moment doesn’t fully determine how one will relate to it. Whether one wants to dedicate one’s life (or, for altruists, at least the self-oriented portions of one's life) to pursuing pleasure depends on more than just what pleasure feels like.
Therefore, I think pleasure is only good in the weak sense. It’s not good in the strong sense.
Another argument that points to "pleasure is good" is that people and many animals are drawn to things that gives them pleasure, and that generally people communicate about their own pleasurable states as good. Given a random person off the street, I'm willing to bet that after introspection they will suggest that they value pleasure in the strong sense. So while this may not be universally accepted, I still think it could hold weight.
Also, a symmetric statement can be said regarding suffering, which I don't think you'd accept. People who say "suffering is bad" claim that we can establish this by introspection about the nature of suffering.
From reading Tranquilism, I think that you'd respond to these as saying that people confuse "pleasure is good" with an internal preference or craving for pleasure, while suffering is actually intrinsically bad. But taking an epistemically modest approach would require quite a bit of evidence for that, especially as part of the argument is that introspection may be flawed.
I'm curious as to how strongly you hold this position. (Personally, I'm totally confused here but lean toward the strong sense of pleasure is good but think that overall pleasure holds little moral weight)
It's worth pointing out that this association isn't perfect. See [1] and [2] for some discussion. Tranquilism allows that if someone is in some moment neither drawn to (craving) (more) pleasurable experiences nor experiencing pleasure (or as much as they could be), this isn't worse than if they were experiencing (more) pleasure. If more pleasure is always better, then contentment is never good enough, but to be content is to be satisfied, to feel that it is good enough or not feel that it isn't good enough. Of course, this is in the moment, and not necessarily a reflective judgement.
I also approach pleasure vs suffering in a kind of conditional way, like an asymmetric person-affecting view, or "preference-affecting view":
I would say that something only matters if it matters (or will matter) to someone, and an absence of pleasure doesn't necessarily matter to someone who isn't experiencing pleasure, and certainly doesn't matter to someone who does not and will not exist, and so we have no inherent reason to promote pleasure. On the other hand, there's no suffering unless someone is experiencing it, and according to some definitions of suffering, it necessarily matters to the sufferer. (A bit more on this argument here, but applied to good and bad lives.)
I agree that pleasure is not intrinsically good (i.e. I also deny the strong claim). I think it's likely that experiencing the full spectrum of human emotions (happiness, sadness, anger, etc.) and facing challenges are good for personal growth and therefore improve well-being in the long run. However, I think that suffering is inherently bad, though I'm not sure what distinguishes suffering from displeasure.
[I’m an anti-realist because I think morality is underdetermined]
I often find myself explaining why anti-realism is different from nihilism / “anything goes.” I wrote lengthy posts in my sequence on moral anti-realism (2 and 3) about partly this point. However, maybe the framing “anti-realism” is needlessly confusing because some people do associate it with nihilism / “anything goes.” Perhaps the best short explanation of my perspective goes as follows:
I’m happy to concede that some moral facts exist (in a comparatively weak sense), but I think morality is underdetermined.
This means that beyond the widespread agreement on some self-evident principles, expert opinions won’t converge even if we had access to a superintelligent oracle. Multiple options will be defensible, and people will gravitate to different attractors in value space.
I think if you concede that some moral facts exist, it might be more accurate to call yourself a moral realist. The indeterminacy of morality could be a fundamental feature, allowing for many more acts to be ethically permissible (or no worse than other acts) than with a linear (complete) ranking. I think consequentialists are unusually prone to try to rank outcomes linearly.
I read this recently, which describes how moral indeterminacy can be accommodated within moral realism, although it was kind of long for what it had to say. I think expert agreement (or ideal observers/judges) could converge on moral indeterminacy: they could agree that we can't know how to rank certain options and further that there's no fact of the matter.
Thanks for bringing up this option! I don't agree with this framing for two reasons:
I think what I describe in the second bullet point will seem counterintuitive to many people because they think that if morality is underdetermined, your views on morality should be underdetermined, too. But that doesn't follow! I understand why people have the intuition that this should follow, but it really doesn't work that way when you look at it closely. I've been working on spelling out why.
[When thinking about what I value, should I take peer disagreement into account?]
Consider the question “What’s the best career for me?”
When we think about choosing careers, we don’t update to the career choice of the smartest person we know or the person who has thought the most about their career. Instead, we seek out people who have approached career choice with a similar overarching goal/framework (in my case, 80,000 Hours is a good fit), and we look toward the choices of people with similar personalities (in my case, I notice a stronger personality overlap with researchers than managers, operations staff, or those doing earning to give).
When it comes to thinking about one’s values, many people take peer disagreement very seriously.
I think that can be wise, but it shouldn’t be done unthinkingly. I believe that the quest to figure out one’s values shares strong similarities with the quest of figuring out one’s ideal career. Before deferring to others with one's deliberations, I recommend making sure that others are asking the same questions (not everything that comes with the label “morality” is the same) and that they are psychologically similar in the ways that seem fundamental to what you care about as a person.
[Are underdetermined moral values problematic?]
If I think my goals are merely uncertain, but in reality they are underdetermined and the contributions I make to shaping the future will be driven, to a large degree, by social influences, ordering effects, lock-in effects, and so on, is that a problem?
I can’t speak for others, but I’d find it weird. I want to know what I’m getting up for in the morning.
On the other hand, because it makes it easier for the community to coordinate and pull things in the same directions, there's a sense in which underdetermined values are beneficial.
[New candidate framing for existential risk reduction]
The default [edit: implicit]framing for reducing existential risk is something like this. "Currently, humans have control over what we want, but there's a risk that we would lose this control. For instance, transformative AI that's misaligned with what we'd want could prevent us from actualizing good futures."
I don't find this framing particularly compelling. I don't feel like people are particularly "in control of things." There are areas/domains where our control is growing, but there are also areas/domains where it is waning (e.g., cost disease; dysfunctional institutions). (Or, instead of "control waning," we can also think of misaligned forces taking away some of our control – for instance with filter bubbles and other polarizing forces reducing the sense that all people have a shared reality.)
The framing I find most compelling is the following:
"Humans aren't particularly in control of things, but there are areas where technological progress has given us surprisingly advanced capabilities, and every now and then, some groups of people manage to use those capabilities really well. If we want to reduce existential risks, we'd require almost god-like degrees of control over the future and the wisdom/foresight to use it to our advantage. AI risk, in particular, seems especially important from this perspective – for two reasons. (1) AI will likely be radically transformative. Since it's generally much easier to design good systems from scratch rather than make tweaks to existing systems, transformative AI (precisely because of its potential to be transformative) is our best chance to get in control of things. (2) If we fail to align AI, we won't be left in a position where we could attain control over things later."
Fwiw (1) is more naturally phrased as an opportunity associated with AI than a risk ("AI opportunity" vs "AI risk"). And if so you may want to use another term than "existential risk reduction" for the concept that you've identified.
A bit related to an opportunity+risk framing of AI: Artficial Intelligence as a Positive and Negative Factor in Global Risk.
"The default framing for reducing existential risk is something like this. "Currently, humans have control over what we want, but there's a risk that we would lose this control"
Can you perhaps point to some examples?
To me it seems that the default framing is often focused on extinction risks, and then non-extinction existential risks are mentioned as a sort of secondary case. Under this framing you're not really mentioning the issue of control, but are rather mostly focusing on the distinction between survival and extinction.
Maybe you had specific writings (focusing on AI risk?) in mind though?
Good points. I should have written that the point about control is implicit. The default framing focuses on risks, as you say, not on making something happen that gives us more control than we currently have. I think there's a natural reading of the existential risk framings that implicitly says something like "current levels of control might be adequate if it weren't for destructive risks" or perhaps "there's a trend where control increases by default and things might go well unless some risk comes about." To be clear, that's by no means a necessary implication of any text on existential risks. It's just something that is under-discussed, and the lack of discussion suggests that some people might think that way.
The second part of my comment here is relevant for this thread's theme – it explains my position a bit better.
In discussions on the difficulty of aligning transformative AI, I've seen reference class arguments like "When engineers build and deploy things, it rarely turns out to be destructive."
I've always felt like this is pointing at the wrong reference class.
My above comment on framings explains why. I think the reference class for AI alignment difficulty levels should be more like: "When have the people who deployed transformative technology correctly foreseen long-term bad societal consequences and have taken the right costly steps to mitigate them?"
(Examples could be: Keeping a new technology secret; or facebook in an alternate history setting up a governance structure where "our algorithm affects society poorly" would receive a lot of sincere attention even at management levels, securely going forward throughout the company's existence.)
Admittedly, I'm kind of lumping together the alignment and coordination problems. Someone could have the view that "AI alignment," with a narrow definition of what counts as "aligned," is comparatively easy, but coordination could still be hard.
[Moral uncertainty and moral realism are in tension]
Is it ever epistemically warranted to have high confidence in moral realism, and also be morally uncertain not only between minor details of a specific normative-ethical theory but between theories?
I think there's a tension there. One possible reply might be the following. Maybe we are confident in the existence of some moral facts, but multiple normative-ethical theories can accommodate them. Accordingly, we can be moral realists (because some moral facts exist) and be morally uncertain (because there are many theories to choose from that accommodate the little bits we think we know about moral reality).
However, what do we make of the possibility that moral realism could be true only in a very weak sense? For instance, maybe some moral facts exist, but most of morality is underdetermined. Similarly, maybe the true morality is some all-encompassing and complete theory, but humans might be forever epistemically closed off to it. If so, then, in practice, we could never go beyond the few moral facts we already think we know for sure.
Assuming a conception of moral realism that is action-relevant for effective altruism (e.g., because it predicts reasonable degrees of convergence among future philosophers, or makes other strong claims that EAs would be interested in), is it ever epistemically warranted to have high confidence in that, and be open-endedly morally uncertain?
Another way to ask this question: If we don't already know/see that a complete and all-encompassing theory explains many of the features related to folk discourse on morality, why would we assume that such a complete and all-encompassing theory exists in a for-us-accessible fashion? Even if there are, in some sense, "right answers" to moral questions, we need more evidence to conclude that morality is not vastly underdetermined.
For more detailed arguments on this point, see section 3 in this post.