Yarrow Bouchard 🔸

1445 karmaJoined Canadastrangecosmos.substack.com

Bio

Pronouns: she/her or they/them. 

Parody of Stewart Brand’s whole Earth button.

I got interested in effective altruism back before it was called effective altruism, back before Giving What We Can had a website. Later on, I got involved in my university EA group and helped run it for a few years. Now I’m trying to figure out where effective altruism can fit into my life these days and what it means to me.

I write on Substack, and used to write on Medium.

Posts
31

Sorted by New

Sequences
2

Criticism of specific accounts of imminent AGI
Skepticism about near-term AGI

Comments
712

Topic contributions
13

Much could be said in response to this comment. Probably the most direct and succinct response is my post “Unsolved research problems on the road to AGI”.

Largely for the reasons explained in that post, I think AGI is much less than 0.01% likely in the next decade.

How much of a post are you comfortable for AI to write?

I will never let AI write a single sentence! I resent reading AI-generated writing passed off as written by a human, and I would never inflict this upon my readers.

I have found that the most common explanation for why people using AI for writing is a lack of self-confidence. I keep encouraging people to write in their own words and use their own voice because all the flaws of unpolished human writing are vastly preferable to chatbot writing.

Thank you for your supportive comment. I think David Mathers is an exceptionally and commendably valuable contributor to the EA Forum in terms of engaging deeply with the substance of arguments around AI safety and AGI forecasting. David engages in discussions with a high level of reasoning transparency, which I deeply appreciate. It isn’t always clear to me why people who fall on the opposite side of debates around AI safety and AGI forecasting believe what they do, and talking to David has helped me understand this better. I would love to have more discussions about these topics with David, or with interlocutors like him. I feel as though there is still much work to be done in bringing the cruxes of these debates into sharp relief.

The EA Forum has a little-used “Dialogues” feature that I think has some potential. Anyone who would be interested in having a Dialogue on AGI forecasting and/or AGI safety should send me a private message.

On to the rest of your comment:

I think the current investments in AGI safety will end up being wasted. I think it’s a bit like paying philosophers in the 1920s to think about how to mitigate social media addiction, years before the first proper computer was built, and even before the concept of a Turing machine was formalized. There is simply too much unknown about how AGI might eventually be built.

Conversely, investments in narrow, prosaic “AI safety” like making LLM chatbots less likely to give people dangerous medical advice are modestly useful today but will have no applicability to AGI much later on. Other than having the name “AI” in common and running on computers using probably some sort of connectionist architecture, I don’t think today’s AI systems will have any meaningful resemblance to AGI, if it is eventually created.

I can’t remember where — I thought it was maybe in a comment on this post, but apparently not — but I seem to recall someone on the EA Forum saying that MIRI or Yudkowsky deserved credit for correctly predicting the sort of “alignment” failures that modern AI systems like LLMs would exhibit. (If anyone remembers the specific comment I’m thinking of, please let me know.) I want to set the record straight and explain why this is not true.

Reinforcement learning was originally developed in the late 1970s and in the 1980s. Years before the founding of MIRI (originally called the Singularity Institute, and created with a different focus) and before Yudkowsky first wrote about “friendly AI”, RL researchers noticed the phenomenon of “reward hacking” or “specification gaming” (although these exact terms were not always used to describe it). One example is found in the 1998 paper “Learning to Drive a Bicycle using Reinforcement Learning and Shaping”. The authors created a bicycle riding simulation and tasked an RL agent with riding the bicycle toward a target. The RL agent found it could maximize reward by riding in circles around the target (page 6):

We agree with Mataric [Mataric, 1994] that these heterogeneous reinforcement functions have to be designed with great care. In our first experiments we rewarded the agent for driving towards the goal but did not punish it for driving away from it. Consequently the agent drove in circles with a radius of 20–50 meters around the starting point. Such behavior was actually rewarded by the reinforcement function…

One could cite many more examples like this.[1]

It is possible to mistake an awareness of well-known concepts in a field (such as RL or AI more generally) with prescience or insight. Readers of MIRI’s or Yudkowsky’s writing should be wary of this.

  1. ^

    A similar example to the one just given but using real robots made of Lego is found in the 2004 paper “Lego Mindstorms Robots as a Platform for Teaching Reinforcement Learning”. One robot learned to continually drive backwards and forwards along the same stretch of track to maximize reward (page 5):

    After some experimentation with the reinforcement signal, the reinforcement learning system was much more successful on the line-following task than on the walking task. 

    In the initial trials, the reinforcement signal used rewarded the robot with positive reinforcement for any action which led to the robot remaining on the track (measured by applying a threshold to the summed value of the light sensors). As the actions available did not provide an option for staying still, this was expected to lead to the robot moving forward along the path and eventually traversing the circuit. However the learning algorithm discovered that alternating turning left and right allowed the robot to reverse slowly in a straight line, and hence maximal reinforcement could be achieved by travelling along a straight section of line at the beginning of the track, and then reversing back along that same section of track.

There's an expert consensus that tobacco is harmful, and there is a well-documented history of tobacco companies engaging in shady tactics. There is also a well-documented history of government propaganda being misleading and deceptive, and if you asked anyone with relevant expertise — historians, political scientists, media experts, whoever — they would certainly tell you that government propaganda is not reliable.

But just lumping in "AI accelerationist companies" with that is not justified. "AI accelerationist" just means anyone who works on making AI systems more capable who doesn't agree with the AI alignment/AI safety community's peculiar worldview. In practice, that means you're saying most people with expertise AI are compromised and not worth listening to, but you are willing to listen to this weird random group of people, some of whom like Yudkowsky who have no technical expertise in contemporary AI paradigms (i.e. deep learning and deep reinforcement learning). This seems like a recipe for disaster, like deciding that capitalist economists are all corrupt and that only Marxist philosophers are worth trusting.

A problem with motivated reasoning arguments, when stretched to this extent, is that anyone can accuse anyone over the thinnest pretext. And rather than engaging with people's views and arguments in any serious, substantive way, it just turns into a lot of finger pointing.

Yudkowsky's gotten paid millions of dollars to prophesize AI doom. Many people have argued that AI safety/AI alignment narratives benefit the AI companies and their investors. The argument goes like this: Exaggerating the risks of AI exaggerates AI's capabilities. Exaggerating AI's capabilities makes the prospective financial value of AI much higher than it really is. Therefore, talking about AI risk or even AI doom is good business.

I would add that exaggerating risk may be a particularly effective way to exaggerate AI's capabilities. People tend to be skeptical of anything that sounds like pie-in-the-sky hope or optimism. On the other hand, talking about risk sounds serious and intelligent. Notice what goes unsaid: many near-term AGI believers think there's a high chance of some unbelievably amazing utopia just on the horizon. How many times have you heard someone imagine that utopia? One? Zero? And how many times have you heard various AI doom or disempowerment stories? Why would no one ever bring up this amazing utopia they think might happen very soon?

Even if you're very pessimistic and think there's a 90% chance of AI doom, a 10% chance of utopia is still pretty damn interesting. And many people are much more optimistic, thinking there's around a 1-30% chance of doom, which implies a 70%+ chance of utopia. So, what gives? Where's the utopia talk? Even when people talk about the utopian elements of AGI futures, they emphasize the worrying parts: what if intelligent machines produce effectively unlimited wealth, how will we organize the economy? What policies will we need to implement? How will people cope? We need to start worrying about this now! When I think about what would happen if I won the lottery, my mind does not go to worrying about the downsides.

I think the overwhelming majority of people who express views on this topic are true believers. I think they are sincere. I would only be willing to accuse someone of possibly doing something underhanded if, independently, they had a track record of deceptive behaviour. (Sam Altman has such a track record, and generally I don't believe anything he says anymore. I have no way of knowing what's sincere, what's a lie, and what's something he's convinced himself of because it suits him to believe it.) I think the specific accusation that AI safety/AI alignment is a deliberate, conscious lie cooked up to juice AI investment is silly. It's probably true, though, that people at AI companies have some counterintuitive incentive or bias toward talking up AI doom fears.

However, my general point is that just as it's silly to accuse AI safety/alignment people of being shills for AI companies, it also seems silly to me to say that AI companies (or "AI accelerationist" companies, which is effectively all major AI companies and almost all startups) are the equivalent of tobacco companies, and you shouldn't pay attention to what people at AI companies say about AI. Motivated reasoning accusations made on thin grounds can put you into a deluded bubble (e.g. becoming a Marxist) and I don't think AI is some clear-cut, exceptional case like tobacco or state propaganda where obviously you should ignore the message.

There's a fine line between steelmanning people's views and creating new views that are facially similar to those views but are crucially different from the views those people actually hold. I think what you're describing is not steelmanning, but developing your own views different from Yudkowsky and Soares' — views that they would almost certainly disagree with in strong terms.

I think it would be constructive for you to publish the views you developed after reading Yudkowsky and Soares' book. People might find that useful to read. That could give people something interesting to engage with. But if you write that Yudkowsky and Soares' claim about alien preferences is wrong, many people will disagree with you (including Yudkowsky and Soares, if they read it). So, it's important to get very clear on what different people in a discussion are saying and what they're not saying. Just to keep everything straight, at least.

I agree the alien preferences thing is not necessarily a crux of AI doom arguments more generally, but it is certainly a crux of Yudkowsky and Soares' overall AI doom argument specifically. Yes, you can change their overall argument into some other argument that doesn't depend on the alien preferences thing anymore, but then that's no longer their argument, that's a different argument.

I agree that Yudkowsky and Soares (and their book) are not fully representative of the AI safety community's views, and probably no single text or person (or pair of people) are. I agree that it isn't really reasonable to say that if you can refute Yudkowsky and Soares (or their book), you refute the AI safety community's views overall. So, I agree with that critique.

In contrast, if Mechanize succeeds, Matthew Barnett will probably be a billionaire.

If Mechanize succeeds in its long-term goal of "the automation of all valuable work in the economy", then everyone on Earth will be a billionaire.

So, if the best version of Yudkowsky and Soares' argument is not the one made in their book, what is the best version? Can you explain how that version of the argument, which they made previously elsewhere, is different than the version in the book?

I can't tell if you're saying:

a) that the alien preferences thing is not a crux of Yudkowsky and Soares' overall argument for AI doom (it seems like it is) or if

b) the version of the specific argument about alien preferences they gave in the book isn't as good as previous versions they've given (which is why I asked what version is better) or if

c) you're saying that Yudkowsky and Soares' book overall isn't as good as their previous writings on AI alignment.

I don't know that academic reviewers of Yudkowsky and Soares' argument would take a different approach. The book is supposed to be the most up-to-date version of the argument, and one the authors took a lot of care in formulating. It doesn't feel intuitive to go back and look at their earlier writings and compare different version of the argument, which aren't obviously different at first glance. (Will MacAskill and Clara Collier both complained the book wasn't sufficiently different from previous formulations of the argument, i.e. wasn't updated enough in light of advancements in deep learning and deep reinforcement learning over the last decade.) I think an academic reviewer might just trust that Yudkowsky and Soares' book is going to be the best thing to read and respond to if they want to engage with their argument.

You might, as an academic, engage in a really close reading of many versions of a similar argument made by Aristotle in different texts, if you're a scholar of Aristotle, but this level of deep textual analysis doesn't typical apply to contemporary works by lesser-known writers outside academia.

The academic philosopher David Thorstad is writing a blog series in response to the book. I haven't read it yet, so I don't know if he pulls his alternative Yudkowsky and Soares writings other than the book itself. However, I think it would be perfectly fine for him to just focus on the book, and not seek out other texts from the same authors that make the same argument in maybe a better form.

If what you're saying is that there are multiple independent (and mutually incompatible) arguments for the AI safety community's core claims, including ones that Yudkowsky and Soares don't make, then I agree with that. I agree you can criticize that sentence in the Mechanize co-founders' essay if you believe Yudkowsky's views and arguments don't actually unifies (or adequately represents) the views and arguments of the AI safety community overall. Maybe you could point out what those other arguments are and who has formulated them best. Maybe the Mechanize co-founders would write a follow-up piece engaging with those non-Yudkowsky arguments as well, to give a more complete engage with the AI safety community's worldview.

I think the claim that Yudkowsky's views on AI risk are meaningfully influenced by money is very weak.

To be clear, I agree. I also agree with your general point that other factors are often more important than money. Some of these factors include the allure of millennialism, or the allure of any sort of totalizing worldview or "ideology".

I was trying to make a general point against accusations of motivated reasoning related to money, at least in this context. If two sets of people are each getting paid to work on opposite sides of an issue, why only accuse one side of motivated reasoning?

This is indicated by the hundreds of comments, tweets, in-person arguments, DMs, and posts from at least 2023 onward in which I expressed skepticism about AI risk arguments and AI pause proposals.

Thanks for describing this history. Evidence of a similar kind lends strong credence to Yudkowsky forming his views independent from the influence of money as well.

My general view is that reasoning is complex, motivation is complex, people's real psychology is complex, and that the forum-like behaviour of accusing someone of engaging in X bias is probably a misguided pop science simplification of the relevant scientific knowledge. For instance, when people engage in distorted thinking, the actual underlying reasoning often seems to be a surprisingly complicated multi-step sequence.

The essay above that you co-wrote is incredibly strong. I was the one who originally sent it to Vasco and, since he is a prolific cross-poster and I don't like to cross-post under my name, encouraged him to cross-post it. I'm glad more people in the EA community have now read it. I think everyone in the EA community should read it. It's regrettable that there's only been one object-level comment on the substance of the essay so far, and so many comments about this (to me) relatively uninteresting and unimportant side point about money biasing people's beliefs. I hope more people will comment on the substance of the essay at some point.

Load more