A reflection on the posts I have written in the last few months, elaborating on my views
In a series of recent posts, I have sought to challenge the conventional view among longtermists that prioritizes the empowerment or preservation of the human species as the chief goal of AI policy. It is my opinion that this view is likely rooted in a bias that automatically favors human beings over artificial entities—thereby sidelining the idea that future AIs might create equal or greater moral value than humans—and treating this alternative perspective with unwarranted skepticism.
I recognize that my position is controversial and likely to remain unpopular among effective altruists for a long time. Nevertheless, I believe it is worth articulating my view at length, as I see it as a straightforward application of standard, common-sense utilitarian principles that merely lead to an unpopular conclusion. I intend to continue elaborating on my arguments in the coming months.
My view follows from a few basic premises. First, that future AI systems are quite likely to be moral patients; second, that we shouldn’t discriminate against them based on arbitrary distinctions, such as their being instanti... (read more)
Thanks for writing on this important topic!
I think it's interesting to assess how popular or unpopular these views are within the EA community. This year and last year, we asked people in the EA Survey about the extent to which they agreed or disagreed that:
Most expected value in the future comes from digital minds' experiences, or the experiences of other nonbiological entities.
This year about 47% (strongly or somewhat) disagreed, while 22.2% agreed (roughly a 2:1 ratio).
However, among people who rated AI risks a top priority, respondents leaned towards agreement, with 29.6% disagreeing and 36.6% agreeing (a 0.8:1 ratio).[1]
Similarly, among the most highly engaged EAs, attitudes were roughly evenly split between 33.6% disagreement and 32.7% agreement (1.02:1), with much lower agreement among everyone else.
This suggests to me that the collective opinion of EAs, among those who strongly prioritise AI risks and the most highly engaged is not so hostile to digital minds. Of course, for practical purposes, what matters most might be the attitudes of a small number of decisionmakers, but I think the attitudes of the engaged EAs matters for epistemic reasons.
Interestingly, a
I haven't read your other recent comments on this, but here's a question on the topic of pausing AI progress. (The point I'm making is similar to what Brad West already commented.)
Let's say we grant your assumptions (that AIs will have values that matter the same as or more than human values and that an AI-filled future would be just as or more morally important than one with humans in control). Wouldn't it still make sense to pause AI progress at this important junction to make sure we study what we're doing so we can set up future AIs to do as well as (reasonably) possible?
You say that we shouldn't be confident that AI values will be worse than human values. We can put a pin in that. But values are just one feature here. We should also think about agent psychologies and character traits and infrastructure beneficial for forming peaceful coalitions. On those dimensions, some traits or setups seem (somewhat robustly?) worse than others?
We're growing an alien species that might take over from humans. Even if you think that's possibly okay or good, wouldn't you agree that we can envision factors about how AIs are built/trained and about what sort of world they are placed in that affe... (read more)
I think it's interesting and admiral that you're dedicated on a position that's so unusual in this space.
I assume I'm in the majority here that my intuitions are quite different from yours, however.
One quick point when we're here:
> this view is likely rooted in a bias that automatically favors human beings over artificial entities—thereby sidelining the idea that future AIs might create equal or greater moral value than humans—and treating this alternative perspective with unwarranted skepticism.
I think that a common, but perhaps not well vocalized, utilitarian take is that humans don't have much of a special significance in terms of creating well-being. The main option would be a much more abstract idea, some kind of generalization of hedonium or consequentialism-ium or similar. For now, let's define hedonium as "the ideal way of converting matter and energy into well-being, after a great deal of deliberation."
As such, it's very tempting to try to separate concerns and have AI tools focus on being great tools, and separately optimize hedonium to be efficient at being well-being. While I'm not sure if AIs would have zero qualia, I'd feel a lot more confident that the... (read more)
I realize my position can be confusing, so let me clarify it as plainly as I can: I do not regard the extinction of humanity as anything close to “fine.” In fact, I think it would be a devastating tragedy if every human being died. I have repeatedly emphasized that a major upside of advanced AI lies in its potential to accelerate medical breakthroughs—breakthroughs that might save countless human lives, including potentially my own. Clearly, I value human lives, as otherwise I would not have made this particular point so frequently.
What seems to cause confusion is that I also argue the following more subtle point: while human extinction would be unbelievably bad, it would likely not be astronomically bad in the strict sense used by the "astronomical waste" argument. The standard “astronomical waste” argument says that if humanity disappears, then all possibility for a valuable, advanced civilization vanishes forever. But in a scenario where humans die out because of AI, civilization would continue—just not with humans. That means a valuable intergalactic civilization could still arise, populated by AI rather than by humans. From a purely utilitarian perspective that counts the exis... (read more)
At the same time though I don't think you mean to endorse 1).
I have read or skimmed some of his posts and my sense is that he does endorse 1). But at the same time he says
critics seem to frequently conflate my arguments with other, simpler positions that can be more easily dismissed.
so maybe this is one of these cases and I should be more careful.
In this "quick take", I want to summarize some my idiosyncratic views on AI risk.
My goal here is to list just a few ideas that cause me to approach the subject differently from how I perceive most other EAs view the topic. These ideas largely push me in the direction of making me more optimistic about AI, and less likely to support heavy regulations on AI.
(Note that I won't spend a lot of time justifying each of these views here. I'm mostly stating these points without lengthy justifications, in case anyone is curious. These ideas can perhaps inform why I spend significant amounts of my time pushing back against AI risk arguments. Not all of these ideas are rare, and some of them may indeed be popular among EAs.)
I want to say thank you for holding the pole of these perspectives and keeping them in the dialogue. I think that they are important and it's underappreciated in EA circles how plausible they are.
(I definitely don't agree with everything you have here, but typically my view is somewhere between what you've expressed and what is commonly expressed in x-risk focused spaces. Often also I'm drawn to say "yeah, but ..." -- e.g. I agree that a treacherous turn is not so likely at global scale, but I don't think it's completely out of the question, and given that I think it's worth serious attention safeguarding against.)
In fact, it is difficult for me to name even a single technology that I think is currently underregulated by society.
The obvious example would be synthetic biology, gain-of-function research, and similar.
I also think AI itself is currently massively underregulated even entirely ignoring alignment difficulties. I think the probability of the creation of AI capable of accelerating AI R&D by 10x this year is around 3%. It would be extremely bad for US national interests if such an AI was stolen by foreign actors. This suffices for regulation ensuring very high levels of security IMO. And this is setting aside ongoing IP theft and similar issues.
In particular, I am persuaded by the argument that, because evaluation is usually easier than generation, it should be feasible to accurately evaluate whether a slightly-smarter-than-human AI is taking unethical actions, allowing us to shape its rewards during training accordingly. After we've aligned a model that's merely slightly smarter than humans, we can use it to help us align even smarter AIs, and so on, plausibly implying that alignment will scale to indefinitely higher levels of intelligence, without necessarily breaking down at any physically realistic point.
This reasoning seems to imply that you could use GPT-2 to oversee GPT-4 by bootstrapping from a chain of models of scales between GPT-2 and GPT-4. However, this isn't true, the weak-to-strong generalization paper finds that this doesn't work and indeed bootstrapping like this doesn't help at all for ChatGPT reward modeling (it helps on chess puzzles and for nothing else they investigate I believe).
I think this sort of bootstrapping argument might work if we could ensure that each model in the chain was sufficiently aligned and capable of reasoning such that it would carefully reason about what humans would want if the... (read more)
I'm curious why there hasn't been more work exploring a pro-AI or pro-AI-acceleration position from an effective altruist perspective. Some points:
I think a more important reason is the additional value of the information and the option value. It's very likely that the change resulting from AI development will be irreversible. Since we're still able to learn about AI as we study it, taking additional time to think and plan before training the most powerful AI systems seems to reduce the likelihood of being locked into suboptimal outcomes. Increasing the likelihood of achieving "utopia" rather than landing into "mediocrity" by 2 percent seems far more important than speeding up utopia by 10 years.
AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can't simply apply a naive argument that AI threatens total extinction of value
Paul Christiano wrote a piece a few years ago about ensuring that misaligned ASI is a “good successor” (in the moral value sense),[1] as a plan B to alignment (Medium version; LW version). I agree it’s odd that there hasn’t been more discussion since.[2]
Here's a non-exhaustive list of guesses for why I think EAs haven't historically been sympathetic [...]: A belief that AIs won't be conscious, and therefore won't have much moral value compared to humans.
I’ve wondered about this myself. My take is that this area was overlooked a year ago, but there’s now some good work being done. See Jeff Sebo’s Nov ‘23 80k podcast episode, as well as Rob Long’s episode, and the paper that the two of them co-authored at the end of last year: “Moral consideration for AI systems by 2030”. Overall, I’m optimistic about this area becoming a new forefront of EA.
accelerationism would have, at best, temporary effects
I’m confused by this point, and for me this is the overriding crux between m... (read more)
Under purely longtermist views, accelerating AI by 1 year increases available cosmic resources by 1 part in 10 billion. This is tiny. So the first order effects of acceleration are tiny from a longtermist perspective.
Thus, a purely longtermist perspective doesn't care about the direct effects of delay/acceleration and the question would come down to indirect effects.
I can see indirect effects going either way, but delay seems better on current margins (this might depend on how much optimism you have on current AI safety progress, governance/policy progress, and whether you think humanity retaining control relative to AIs is good or bad). All of these topics have been explored and discussed to some extent.
When focusing on the welfare/preferences of currently existing people, I think it's unclear if accelerating AI looks good or bad, it depends on optimism about AI safety, how you trade-off old people versus young people, and death via violence versus death from old age. (Misaligned AI takeover killing lots of people is by no means assured, but seems reasonably likely by default.)
I expect there hasn't been much investigation of accelerating AI to advance the preferences of currently ... (read more)
I generally agree that we should be more concerned about this. In particular, I find people who will happily approve Shut Up and Multiply sentiment but reject this consideration suspect in their reasoning.
A more extreme version of this is that, given the massively greater efficiency with which a digital consciousness could convert matter and energy to utilons (IIRC naively about 3 orders of magnitude according to Bostrom, before any increase from greater coordination), on strict expected value reasoning you have to be extremely confident that this won't happen - or at least have a much stronger rebuttal than 'AI won't necessarily be conscious'.
Separately, I think there might be a case for accelerationism even if you think it increases the risk of AI takeover and that AI takeover is bad, on the grounds that in many scenarios advancing faster might still increase the probability of human descendants getting through the time of perils before some other threat destroys us (every year we remain in our current state is another year in which we run the risk of, for example, a global nuclear war or civilisation-ending pandemic).
My stance is that we (more-or-less) know humans are conscious and have moral values that, while they have failed to prevent large amounts of harm, seem to have the potential to be good.
I claim there's a weird asymmetry here where you're happy to put trust into humans because they have the "potential" to do good, but you're not willing to say the same for AIs, even though they seem to have the same type of "potential".
Whatever your expectations about AIs, we already know that humans are not blank slates that may or may not be altruistic in the future: we actually have a ton of evidence about the quality and character of human nature, and it doesn't make humans look great. Humans are not mainly described as altruistic creatures. I mentioned factory farming in my original comment, but one can examine the way people spend their money (i.e. not mainly on charitable causes), or the history of genocides, war, slavery, and oppression for additional evidence.
Probably a core point of disagreement here is whether, presented with a "random" intelligent actor, we should expect it to promote welfare or prevent suffering "by default".
I don't expect humans to "promote welfare or prevent suffering"... (read more)
It seems like you're just substantially more pessimistic than I am about humans. I think factory farming will be ended, and though it seems like humans have caused more suffering than happiness so far, I think their default trajectory will be to eventually stop doing that, and to ultimately do enough good to outweigh their ignoble past. I don't think this is certain by any means, but I think it's a reasonable extrapolation. (I maybe don't expect you to find it a reasonable extrapolation.)
Meanwhile I expect the typical unaligned AI may seize power for some purpose that seems to us entirely trivial, and may be uninterested in doing any kind of moral philosophy, and/or may not place any terminal (rather than instrumental) value in paying attention to other sentient experiences in any capacity. I do think humans, even with their kind of terrible track record, are more promising than that baseline, though I can see why other people might think differently.
I think the fact that people are partial to humanity explains a large fraction of the disagreement people have with me.
Maybe, it's hard for me to know. But I predict most the pushback you're getting from relatively thoughtful longtermists isn't due to this.
I've noticed that EAs are happy to concede that AIs could be moral patients, but are generally reluctant to admit AIs as moral agents, in the way they'd be happy to accept humans as independent moral agents (e.g. newborns) into our society.
I agree with this.
I'd call this "being partial to humanity", or at least, "being partial to the values of the human species".
I think "being partial to humanity" is a bad description of what's going on because (e.g.) these same people would be considerably more on board with aliens. I think the main thing going on is that people have some (probably mistaken) levels of pessimism about how AIs would act as moral agents which they don't have about (e.g.) aliens.
... (read more)To test this hypothesis, I recently asked three questions on Twitter about whether people would be willing to accept immigration through a portal to another universe from three sources:
- "a society of humans who are very similar to us"
- "a
It is becoming increasingly clear to many people that the term "AGI" is vague and should often be replaced with more precise terminology. My hope is that people will soon recognize that other commonly used terms, such as "superintelligence," "aligned AI," "power-seeking AI," and "schemer," suffer from similar issues of ambiguity and imprecision, and should also be approached with greater care or replaced with clearer alternatives.
To start with, the term "superintelligence" is vague because it encompasses an extremely broad range of capabilities above human intelligence. The differences within this range can be immense. For instance, a hypothetical system at the level of "GPT-8" would represent a very different level of capability compared to something like a "Jupiter brain", i.e., an AI with the computing power of an entire gas giant. When people discuss "what a superintelligence can do" the lack of clarity around which level of capability they are referring to creates significant confusion. The term lumps together entities with drastically different abilities, leading to oversimplified or misleading conclusions.
Similarly, "aligned AI" is an ambiguous term because it means differen... (read more)
I don't think this is true, or at least I think you are misrepresenting the tradeoffs and diversity here. There is some publication bias here because people are more precise in papers, but honestly, scientists are also not more precise than many top LW posts in the discussion section of their papers, especially when covering wider-ranging topics.
Predictive coding papers use language incredibly imprecisely, analytic philosophy often uses words in really confusing and inconsistent ways, economists (especially macroeconomists) throw out various terms in quite imprecise ways.
But also, as soon as you leave the context of official publications, but are instead looking at lectures, or books, or private letters, you will see people use language much less precisely, and those contexts are where a lot of the relevant intellectual work happens. Especially when scientists start talking about the kind of stuff that LW likes to talk about, like intelligence and philosophy of science, there is much less rigor (and also, I recommend people read a human's guide to words as a general set of arguments for why "precise definitions" are really not viable as a constraint on language)
I might elaborate on this at some point, but I thought I'd write down some general reasons why I'm more optimistic than many EAs on the risk of human extinction from AI. I'm not defending these reasons here; I'm mostly just stating them.
ETA: feel free to ignore the below, given your caveat, though you may find it helpful if you choose to write an expanded form of any of the arguments later to have some early objections.
Correct me if I'm wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense (since if it is, effectively all of them break down as reasons for optimism)? To wit:
Correct me if I'm wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense
No, I certainly expect AIs will eventually be superhuman in virtually all relevant respects.
Resource allocation is relatively equal (and relatively free of violence) among humans because even humans that don't very much value the well-being of others don't have the power to actually expropriate everyone else's resources by force.
Can you clarify what you are saying here? If I understand you correctly, you're saying that humans have relatively little wealth inequality because there's relatively little inequality in power between humans. What does that imply about AI?
I think there will probably be big inequalities in power among AIs, but I am skeptical of the view that there will be only one (or even a few) AIs that dominate over everything else.
I do not think GPT-4 is meaningful evidence about the difficulty of value alignment.
I'm curious: does that mean you also think that alignment research performed on GPT-4 is essentially worthless? If not, why?
... (read more)I think it's extremely unlikely that GPT-4 has preferences over world states in a way that most humans wou
(Clarification about my views in the context of the AI pause debate)
I'm finding it hard to communicate my views on AI risk. I feel like some people are responding to the general vibe they think I'm giving off rather than the actual content. Other times, it seems like people will focus on a narrow snippet of my comments/post and respond to it without recognizing the context. For example, one person interpreted me as saying that I'm against literally any AI safety regulation. I'm not.
For a full disclosure, my views on AI risk can be loosely summarized as follows:
It seems to me that a big crux about the value of AI alignment work is what target you think AIs will ultimately be aligned to in the future in the optimistic scenario where we solve all the "core" AI risk problems to the extent they can be feasibly solved, e.g. technical AI safety problems, coordination problems, the problem of having "good" AI developers in charge etc.
There are a few targets that I've seen people predict AIs will be aligned to if we solve these problems: (1) "human values", (2) benevolent moral values, (3) the values of AI developers, (4) the CEV of humanity, (5) the government's values. My guess is that a significant source of disagreement that I have with EAs about AI risk is that I think none of these answers are actually very plausible. I've written a few posts explaining my views on this question already (1, 2), but I think I probably didn't make some of my points clear enough in these posts. So let me try again.
In my view, in the most likely case, it seems that if the "core" AI risk problems are solved, AIs will be aligned to the primarily selfish individual revealed preferences of existing humans at the time of alignment. This essentially refers to the the... (read more)
In some circles that I frequent, I've gotten the impression that a decent fraction of existing rhetoric around AI has gotten pretty emotionally charged. And I'm worried about the presence of what I perceive as demagoguery regarding the merits of AI capabilities and AI safety. Out of a desire to avoid calling out specific people or statements, I'll just discuss a hypothetical example for now.
Suppose an EA says, "I'm against OpenAI's strategy for straightforward reasons: OpenAI is selfishly gambling everyone's life in a dark gamble to make themselves immortal." Would this be a true, non-misleading statement? Would this statement likely convey the speaker's genuine beliefs about why they think OpenAI's strategy is bad for the world?
To begin to answer these questions, we can consider the following observations:
In my latest post I talked about whether unaligned AIs would produce more or less utilitarian value than aligned AIs. To be honest, I'm still quite confused about why many people seem to disagree with the view I expressed, and I'm interested in engaging more to get a better understanding of their perspective.
At the least, I thought I'd write a bit more about my thoughts here, and clarify my own views on the matter, in case anyone is interested in trying to understand my perspective.
The core thesis that was trying to defend is the following view:
My view: It is likely that by default, unaligned AIs—AIs that humans are likely to actually build if we do not completely solve key technical alignment problems—will produce comparable utilitarian value compared to humans, both directly (by being conscious themselves) and indirectly (via their impacts on the world). This is because unaligned AIs will likely both be conscious in a morally relevant sense, and they will likely share human moral concepts, since they will be trained on human data.
Some people seem to merely disagree with my view that unaligned AIs are likely to be conscious in a morally relevant sense. And a few others have a sema... (read more)
(A clearer and more fleshed-out version of this argument is now a top-level post. Read that instead.)
I strongly dislike most AI risk analogies that I see EAs use. While I think analogies can be helpful for explaining a concept to people for the first time, I think they are frequently misused, and often harmful. The fundamental problem is that analogies are consistently mistaken for, and often deliberately intended as arguments for particular AI risk positions. And the majority of the time when analogies are used this way, I think they are misleading and imprecise, routinely conveying the false impression of a specific, credible model of AI, when in fact no such credible model exists.
Here are two particularly egregious examples of analogies I see a lot that I think are misleading in this way:
I think these analogies are typically poor because, when evaluated carefully, they establish almost nothing of importance beyond the logical possibility of severe AI misalignment. Worse, they give the impression of a model for how we should think about AI behavior, even when the speak... (read more)
I'm considering posting an essay about how I view approaches to mitigate AI risk in the coming weeks. I thought I'd post an outline of that post here first as a way of judging what's currently unclear about my argument, and how it interacts with people's cruxes.
Current outline:
In the coming decades I expect the world will transition from using AIs as tools to relying on AIs to manage and govern the world broadly. This will likely coincide with the deployment of billions of autonomous AI agents, rapid technological progress, widespread automation of labor, and automated decision-making at virtually every level of our society.
Broadly speaking, there are (at least) two main approaches you can take now to try to improve our chances of AI going well:
[This shortform comment has now been superseded by a slightly longer post.]
Many effective altruists have shown interest in expanding moral consideration to AIs, which I appreciate. However, in my experience, these EAs have primarily focused on AI welfare—mostly by ensuring that AIs are treated well and protected from harm—rather than advocating for AI rights, which has the potential to grant AIs legal autonomy and freedoms. While these two approaches overlap significantly, there is a tendency for these approaches to come apart in the following way:
I want to challenge an argument that I think is drives a lot of AI risk intuitions. I think the argument goes something like this:
My problem with this argument is that "human values" can refer to (at least) three different things, and under every plausible interpretation, the argument appears internally inconsistent.
Broadly speaking, I think "human values" usually refers to one of three concepts:
Under the first interpretation, I think premise (2) of the original argum... (read more)
Here's a fictional dialogue with a generic EA that I think can perhaps helps explain some of my thoughts about AI risks compared to most EAs:
EA: "Future AIs could be unaligned with human values. If this happened, it would likely be catastrophic. If AIs are unaligned, they'll hide their intentions until they're in a position to strike in a violent coup, and then the world will end (for us at least)."
Me: "I agree that sounds like it would be very bad. But maybe let's examine why this scenario seems plausible to you. What do you mean when you say AIs might be unaligned with human values?"
EA: "Their utility functions would not overlap with our utility functions."
Me: "By that definition, humans are already unaligned with each other. Any given person has almost a completely non-overlapping utility function with a random stranger. People—through their actions rather than words—routinely value their own life and welfare thousands of times higher than that of strangers. Yet even with this misalignment, the world does not end in any strong sense. Nor does this fact automatically imply the world will end for a given group within humanity."
EA: "Sure, but that's because humans mostly all have s... (read more)
I have so many axes of disagreement that is hard to figure out which one is most relevant. I guess let's go one by one.
Me: "What do you mean when you say AIs might be unaligned with human values?"
I would say that pretty much every agent other than me (and probably me in different times and moods) are "misaligned" with me, in the sense that I would not like a world where they get to dictate everything that happens without consulting me in any way.
This is a quibble because in fact I think if many people were put in such a position they would try asking others what they want and try to make it happen.
Consider a random retirement home. Compared to the rest of the world, it has basically no power. If the rest of humanity decided to destroy or loot the retirement home, there would be virtually no serious opposition.
This hypothetical assumes too much, because people outside care about the lovely people in the retirement home, and they represent their interests. The question is, will some future AIs with relevance and power care for humans, as humans become obsolete?
I think this is relevant, because in the current world there is a lot of variety. There are people who care about ret... (read more)
EA: "Their utility functions would not overlap with our utility functions."
Me: "By that definition, humans are already unaligned with each other. Any given person has almost a completely non-overlapping utility function with a random stranger. People—through their actions rather than words—routinely value their own life and welfare thousands of times higher than that of strangers. Yet even with this misalignment, the world does not end in any strong sense."
EA: "Sure, but that's because humans are all roughly the same intelligence and/or capability. Future AIs will be way smarter and more capable than humans."
Just for the record, this is when I got off the train for this dialogue. I don't think humans are misaligned with each other in the relevant ways, and if I could press a button to have the universe be optimized by a random human's coherent extrapolated volition, then that seems great and thousands of times better than what I expect to happen with AI-descendants. I believe this for a mixture of game-theoretic reasons and genuinely thinking that other human's values do really actually capture most of what I care about.
I find it slightly strange that EAs aren't emphasizing semiconductor investments more given our views about AI.
(Maybe this is because of a norm against giving investment advice? This would make sense to me, except that there's also a cultural norm about criticizing charities that people donate to, and EAs seemed to blow right through that one.)
I commented on this topic last year. Later, I was informed that some people have been thinking about this and acting on it to some extent, but overall my impression is that there's still a lot of potential value left on the table. I'm really not sure though.
Since I might be wrong and I don't really know what the situation is with EAs and semiconductor investments, I thought I'd just spell out the basic argument, and see what people say:
I mostly agree with this (and did also buy some semiconductor stock last winter).
Besides plausibly accelerating AI a bit (which I think is a tiny effect at most unless one plans to invest millions), a possible drawback is motivated reasoning (e.g., one may feel less inclined to think critically of the semi industry, and/or less inclined to favor approaches to AI governance that reduce these companies' revenue). This may only matter for people who work in AI governance, and especially compute governance.
I'm considering writing a post that critically evaluates the concept of a decisive strategic advantage, i.e. the idea that in the future an AI (or set of AIs) will take over the world in a catastrophic way. I think this concept is central to many arguments about AI risk. I'm eliciting feedback on an outline of this post here in order to determine what's currently unclear or weak about my argument.
The central thesis would be that it is unlikely that an AI, or a unified set of AIs, will violently take over the world in the future, especially at a time when h... (read more)
Some people seem to think the risk from AI comes from AIs gaining dangerous capabilities, like situational awareness. I don't really agree. I view the main risk as simply arising from the fact that AIs will be increasingly integrated into our world, diminishing human control.
Under my view, the most important thing is whether AIs will be capable of automating economically valuable tasks, since this will prompt people to adopt AIs widely to automate labor. If AIs have situational awareness, but aren't economically important, that's not as concerning.
The risk... (read more)
I hold a few core ethical ideas that are extremely unpopular: the idea that we should treat the natural suffering of animals as a grave moral catastrophe, the idea that old age and involuntary death is the number one enemy of humanity, the idea that we should treat so-called farm animals with an very high level of compassion.
Given the unpopularity of these ideas, you might be tempted to think that the reason they are unpopular is that they are exceptionally counterinuitive ones. But is that the case? Do you really need a modern education and philosphical t... (read more)
I have now posted as a comment on Lesswrong my summary of some recent economic forecasts and whether they are underestimating the impact of the coronavirus. You can help me by critiquing my analysis.
A trip to Mars that brought back human passengers also has the chance of bringing back microbial Martian passengers. This could be an existential risk if microbes from Mars harm our biosphere in a severe and irreparable manner.
From Carl Sagan in 1973, "Precisely because Mars is an environment of great potential biological interest, it is possible that on Mars there are pathogens, organisms which, if transported to the terrestrial environment, might do enormous biological damage - a Martian plague, the twist in the plot of H. G. Wells' War of the ... (read more)
In response to human labor being automated, a lot of people support a UBI funded by a tax on capital. I don't think this policy is necessarily unreasonable, but if later the UBI gets extended to AIs, this would be pretty bad for humans, whose only real assets will be capital.
As a result, the unintended consequence of such a policy may be to set a precedent for a massive wealth transfer from humans to AIs. This could be good if you are utilitarian and think the marginal utility of wealth is higher for AIs than humans. But selfishly, it's a big cost.
I have read or skimmed some of his posts and my sense is that he does endorse 1). But at the same time he says
so maybe this is one of these cases and I should be more careful.