On Deference and Yudkowsky's AI Risk Estimates

bmg

To be clear, Yudkowsky isn’t asking other people to defer to him. He’s spent a huge amount of time outlining his views (allowing people to evaluate them on their merits) and has often expressed concerns about excessive epistemic deference. ↩︎
A better, but still far-from-optimal approach to deference might be to give a lot of weight to the "average" view within the pool of smart people who have spent a reasonable amount of time thinking about AI risk. This still isn't great, though, since different people do deserve different amounts of weight, and since there's at least some reason to think that selection effects might bias this pool toward overestimating the level of risk. ↩︎
It might be worth emphasizing that I’m not making any claim about the relative quality of my own track record. ↩︎
To say something concrete about my current views on misalignment risk: I'm currently inclined to assign a low-to-mid-single-digits probability to existential risk from misaligned AI this century, with a lot of volatility in my views. This is of course, in some sense, still extremely high! ↩︎
I think that expressing extremely high credences in existential risk (without sufficiently strong and clear justification) can also lead some people to simply dismiss the concerns. It is often easier to be taken seriously, when talking about strange and extreme things, if you express significant uncertainty. Importantly, I don't think this means that people should ever misrepresent their levels of concern about existential risks; dishonesty seems like a really bad and corrosive policy. Still, this is one extra reason to think that it can be important to avoid overestimating risks. ↩︎
Yudkowsky is obviously a pretty polarizing figure. I'd also say that some people are probably too dismissive of him, for example because they assign too much significance to his lack of traditional credentials. But it also seems clear that many people are inclined to give Yudkowsky's views a great deal of weight. I've even encountered the idea that Yudkowsky is virtually the only person capable of thinking about alignment risk clearly. ↩︎
I think that cherry-picking examples from someone's forecasting track record is normally bad to do, even if you flag that you're engaged in cherry-picking. However, I do think (or at least hope) that it's fair in cases where someone already has a very high level of respect and frequently draws attention to their own successful predictions. ↩︎
I don't mean to suggest that the specific twenty orders-of-magnitude of growth figure was the result of deep reflection or was Yudkowsky's median estimate. Here is the specific quote, in response to Hanson raising the twenty orders-of-magnitude-in-a-week number: "Twenty orders of magnitude in a week doesn’t sound right, unless you’re talking about the tail end after the AI gets nanotechnology. Figure more like some number of years to push the AI up to a critical point, two to six orders of magnitude improvement from there to nanotech, then some more orders of magnitude after that." I think that my general point, that this is a very extreme prediction, stays the same even if we lower the number to ten orders-of-magnitude and assume that there will be a bit of a lag between the 'critical point' and the development of the relevant nanotechnology. ↩︎
As an example of a failed prediction or piece of analysis on the other side of the FOOM debate, Hanson praised the CYC project - which lies far afield of the current deep learning paradigm and now looks like a clear dead end. ↩︎
Yudkowsky also provides a number of arguments in favor of the view that the human mind can be massively improved upon. I think these arguments are mostly right. However, I think, they don't have any very strong implications for the question of whether AI progress will be compute-intensive, sudden, or localized. ↩︎
To probe just the relevance of this one piece of evidence, specifically, let’s suppose that it’s appropriate to use the length of a person’s genome in bits of information as an upper bound on the minimum amount of code required to produce a system that shares their cognitive abilities (excluding code associated with digital environments). This would imply that it is in principle possible to train an ML model that can do anything a given person can do, using something on the order of 10 million lines of code. But even if we accept this hypothesis - which seems quite plausible to me - it doesn’t seem to me like this implies much about the relative contributions of architecture and compute to AI progress or the extent to which progress in architecture design is driven by “deep insights.” For example, why couldn’t it be true that it is possible to develop a human-equivalent system using fewer than 10 million lines of code and also true that computing power (rather than insight) is the main bottleneck to developing such a system? ↩︎
Two caveats regarding my discussion of the FOOM debate:

First, I should emphasize that, although I think Yudkowsky’s arguments were weak when it came to the central hypothesis being debated, his views were in some other regards more reasonable than his debate partner’s. See here for comments by Paul Christiano on how well various views Yudkowsky expressed in the FOOM debate have held up.

Second, it's been a few years since I've read the FOOM debate - and there's a lot in there (the book version of it is 741 pages long) - so I wouldn't be surprised if my high-level characterization of Yudkowsky's arguments is importantly misleading. My characterization here is based on some rough notes I took the last time I read it. ↩︎
For example, it may be possible to construct very strong arguments for AI risk that don't rely on the fast take-off assumption. However, in practice, I think it's fair to say that the classic arguments did rely on this assumption. If the assumption wasn't actually very justified, then, I think, it seems to follow that having a very high credence in AI risk also wasn't justified at the time ↩︎
Here’s another example of an argument that’s risen to prominence in the past few years, and plays an important role in some presentations of AI risk, that I now suspect simply might not work. This argument shows up, for example, in Yudkowsky’s recent post “AGI Ruin: A List of Lethalities,” at the top of the section outlining “central difficulties.” ↩︎

139

EDIT: I've now written up my own account of how we should do epistemic deference in general, which fleshes out more clearly a bunch of the intuitions I outline in this comment thread.

I think that a bunch of people are overindexing on Yudkowsky's views; I've nevertheless downvoted this post because it seems like it's making claims that are significantly too strong, based on a methodology that I strongly disendorse. I'd much prefer a version of this post which, rather than essentially saying "pay less attention to Yudkowsky", is more nuanced about how to update based on his previous contributions; I've tried to do that in this comment, for example. (More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements. Note that the list of agreements there, which I expect that many other alignment researchers also buy into, serves as a significant testament to Yudkowsky's track record.)

The part of this post which seems most wild to me is the leap from "mixed track record" to

In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.

For any reasonable interpretation of this sentence, it's transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn't write a similar "mixed track record" post about, it's almost entirely because they don't have a track record of making any big claims, in large part because they weren't able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.

Based on his track record, I would endorse people deferring more towards the general direction of Yudkowsky's views than towards the views of almost anyone else. I also think that there's a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large. The EA community has ended up strongly moving in Yudkowsky's direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.

bmg

The part of this post which seems most wild to me is the leap from "mixed track record" to

In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.

For any reasonable interpretation of this sentence, it's transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn't write a similar "mixed track record" post about, it's almost entirely because they don't have a track record of making any big claims, in large part because they weren't able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.

I disagree that the sentence is false for the interpretation I have in mind.

I think it's really important to seperate out the question "Is Yudkowsky an unusually innovative thinker?" and the question "Is Yudkowsky someone whose credences you should give an unusual amount of weight to?"

I read your comment as arguing for the former, which I don't disagree with. But that doesn't mean that people should currently weigh his risk estimates more highly than they weigh the estimates of other researchers currently in the space (like you).

I also think that there's a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

But we do also need to try to have well-calibrated credences, of course. For the reason given in the post, it's important to know whether the risk of everyone dying soon is 5% or 99%. It's not enough just to determine whether we should take AI risk seriously.

We're also now past the point, as a community, where "Should AI risk be taken seriously?" is that much of a live question. The main epistemic question that matters is what probability we assign to it - and I think this post is relevant to that.

(More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements.)

I definitely recommend people read the post Paul just wrote! I think it's overall more useful than this one.

But I don't think there's an either-or here. People - particularly non-experts in a domain - do and should form their views through a mixture of engaging with arguments and deferring to others. So both arguments and track records should be discussed.

The EA community has ended up strongly moving in Yudkowsky's direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.

I discuss this in response to another comment, here, but I'm not convinced of that point.

richard_ngo

I phrased my reply strongly (e.g. telling people to read the other post instead of this one) because deference epistemology is intrinsically closely linked to status interactions, and you need to be pretty careful in order to make this kind of post not end up being, in effect, a one-dimensional "downweight this person". I don't think this post was anywhere near careful enough to avoid that effect. That seems particularly bad because I think most EAs should significantly upweight Yudkowsky's views if they're doing any kind of reasonable, careful deference, because most EAs significantly underweight how heavy-tailed the production of innovative ideas actually is (e.g. because of hindsight bias, it's hard to realise how much worse than Eliezer we would have been at inventing the arguments for AI risk, and how many dumb things we would have said in his position).

By contrast, I think your post is implicitly using a model where we have a few existing, well-identified questions, and the most important thing is to just get to the best credences on those questions, and we should do so partly by just updating in the direction of experts. But I think this model of deference is rarely relevant; see my reply to Rohin for more details. Basically, as soon as we move beyond toy models of deference, the "innovative thinking" part becomes crucially important, and the "well-calibrated" part becomes much less so.

One last intuition: different people have different relationships between their personal credences and their all-things-considered credences. Inferring track records in the way you've done here will, in addition to favoring people who are quieter and say fewer useful things, also favor people who speak primarily based on their all-things-considered credences rather than their personal credences. But that leads to a vicious cycle where people are deferring to people who are deferring to people who... And then the people who actually do innovative thinking in public end up getting downweighted to oblivion via cherrypicked examples.

Modesty epistemology delenda est.

Rohin Shah

when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

This seems like an overly research-centric position.

When your job is to come up with novel relevant stuff in a domain, then I agree that it's mostly about "which ideas and arguments to take seriously" rather than specific credences.

When your job is to make decisions right now, the specific credences matter. Some examples:

Any cause prioritization decision, e.g. should funders reallocate nearly all biosecurity money to AI?
What should AI-focused community builders provide as starting resources?
Should there be an organization dedicated to solving Eliezer's health problems? What should its budget be?
Should people try to solve technical AI alignment or try to, idk, create a culture of secrecy within AGI labs?

richard_ngo

I think that there are very few decisions which are both a) that low-dimensional and b) actually sensitive to the relevant range of credences that we're talking about.

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident. This is a huge hit in terms of Bayes points; if that's how you determine deference, and you believe he's 2x off, then plausibly that implies you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved - this should very rarely move you from a yes to no, or vice versa. (edit: I should restrict the scope here to grantmaking in complex, high-uncertainty domains like AI alignment).

Then you might say: well, okay, we're not just making binary decisions, we're making complex decisions where we're choosing between lots of different options. But the more complex the decisions you're making, the less you should care about whether somebody's credences on a few key claims are accurate, and the more you should care about whether they're identifying the right types of considerations, even if you want to apply a big discount factor to the specific credences involved.

As a simple example, as soon as you're estimating more than one variable, you typically start caring a lot about whether the errors on your estimates are correlated or uncorrelated. But there are so many different possibilities for ways and reasons that they might be correlated that you can't just update towards experts' credences, you have to actually update towards experts' reasons for those credences, which then puts you in the regime of caring more about whether you've identified the right types of considerations.

CarlShulman

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident. This is a huge hit in terms of Bayes points; if that's how you determine deference, and you believe he's 2x off, then plausibly you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved - this should very rarely move you from a yes to no, or vice versa.

Such differences are crucial for many of the most important grant areas IME, because they are areas where you are trading off multiple high-stakes concerns. E.g. in nuclear policy all the strategies on offer have arguments that they might lead to nuclear war or worse war. On AI alignment there are multiple such tradeoffs and people embracing strategies to push the same variable in opposite directions with high stakes on both sides.

richard_ngo

I haven't thought much about nuclear policy, so I can't respond there. But at least in alignment, I expect that pushing on variables where there's less than a 2x difference between the expected positive and negative effects of changing that variable is not a good use of time for altruistically-motivated people.

(By contrast, upweighting or downweighting Eliezer's opinions by a factor of 2 could lead to significant shifts in expected value, especially for people who are highly deferential. The specific thing I think doesn't make much difference is deferring to a version of Eliezer who's 90% confident about something, versus deferring to the same extent to a version of Eliezer who's 45% confident in the same thing.)

My more general point, which doesn't hinge on the specific 2x claim, is that naive conversions between metrics of calibration and deferential weightings are a bad idea, and that a good way to avoid naive conversions is to care a lot more about innovative thinking than calibration when deferring.

Rohin Shah

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident.

I think differences between Eliezer + my views often make way more than a 2x difference to the bottom line. I'm not sure why you're only considering probabilities on specific claims; when I think of "deferring" I also imagine deferring on estimates of usefulness of various actions, which can much more easily have OOMs of difference.

(Fwiw I also think Eliezer is way more than 2x too high for probabilities on many claims, though I don't think that matters much for my point.)

Taking my examples:

should funders reallocate nearly all biosecurity money to AI?

Since Eliezer thinks something like 99.99% chance of doom from AI, that reduces cost effectiveness of all x-risk-targeted biosecurity work by a factor of 10,000x (since only in 1 in 10,000 worlds does the reduced bio x-risk matter at all), whereas if you have < 50% of doom from AI (as I do) then that's a discount factor of < 2x on x-risk-targeted biosecurity work. So that's almost 4 OOMs of difference.

What should AI-focused community builders provide as starting resources?

Eliezer seems very confident that a lot of existing alignment work is useless. So if you imagine taking a representative set of such papers as starting resources, I'd imagine that Eliezer would be at < 1% on "this will help the person become an effective alignment researcher" whereas I'd be at > 50% (for actual probabilities I'd want a better operationalization), leading to a >50x difference in cost effectiveness.

(And if you compare against the set of readings Eliezer would choose, I'd imagine the difference becomes even greater -- I could imagine we'd each think the other's choice would be net negative.)

Should there be an organization dedicated to solving Eliezer's health problems? What should its budget be?

I don't have a citation but I'm guessing that Eliezer thinks that with more energy and project management skills he could make a significant dent in x-risk (perhaps 10 percentage points), while thinking the rest of the alignment field if fully funded can't make a dent of more than 0.01 percentage points, suggesting that "improve Eliezer's health + project management skills" is 3 OOM more important than "all other alignment work" (saying nothing about tractability, which I don't know enough to evaluate). Whereas I'd have it at, idk, 1-2 OOM less important, for a difference of 4-5 OOMs.

Should people try to solve technical AI alignment or try to, idk, create a culture of secrecy within AGI labs?

This one is harder to make up numbers for but intuitively it seems like there should again be many OOMs of difference, primarily because we differ by many OOMs on "regular EAs trying to solve technical AI alignment" but roughly agree on the value of "culture of secrecy".

I realize I haven't engaged with the abstract points you made. I think I mostly just don't understand them and currently they feel like they have to be wrong given the obvious OOMs of difference in all of the examples I gave. If you still disagree it would be great if you could explain how your abstract points play out in some of my concrete examples.

richard_ngo

We both agree that you shouldn't defer to Eliezer's literal credences, because we both think he's systematically overconfident. The debate is between two responses to that:

a) Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).

b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn't overconfident.

I say you should do the latter, because you should be deferring to coherent worldviews (which are rare) rather than deferring on a question-by-question basis. This becomes more and more true the more complex the decisions you have to make. Even for your (pretty simple) examples, the type of deference you seem to be advocating doesn't make much sense.

For instance:

should funders reallocate nearly all biosecurity money to AI?

It doesn't make sense to defer to Eliezer's estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.

Should there be an organization dedicated to solving Eliezer's health problems? What should its budget be?
I'm guessing that Eliezer thinks that with more energy and project management skills he could make a significant dent in x-risk (perhaps 10 percentage points), while thinking the rest of the alignment field if fully funded can't make a dent of more than 0.01 percentage points, suggesting that "improve Eliezer's health + project management skills" is 3 OOM more important than "all other alignment work" (saying nothing about tractability, which I don't know enough to evaluate). Whereas I'd have it at, idk, 1-2 OOM less important, for a difference of 4-5 OOMs.

Again, the problem is that you're deferring on a question-by-question basis, without considering the correlations between different questions - in this case, the likelihood that Eliezer is right, and the value of his work. (Also, the numbers seem pretty wild; maybe a bit uncharitable to ascribe to Eliezer the view that his research would be 3 OOM more valuable than the rest of the field combined? His tone is strong but I don't think he's ever made a claim that big.)

Here's an alternative calculation which takes into account that correlation. I claim that the value of this organization is mainly determined by the likelihood that Eliezer is correct about a few key claims which underlie his research agenda. Suppose he thinks that's 90% likely and I think that's 10% likely. Then if our choices are "defer entirely to Eliezer" or "defer entirely to Richard", there's a 9x difference in funding efficacy. In practice, though, the actual disagreement here is between "defer to Eliezer no more than a median AI safety researcher" and something like "assume Eliezer is, say, 2x overconfident and then give calibrated-Eliezer, say, 30%ish of your deference weight". If we assume for the sake of simplicity that every other AI safety researcher has my worldview, then the practical difference here is something like a 2x difference in this org's efficacy (0.1 vs 0.3*0.9*0.5+0.7*0.1). Which is pretty low!

Won't go through the other examples but hopefully that conveys the idea. The basic problem here, I think, is that the implicit "deference model" that you and Ben are using doesn't actually work (even for very simple examples like the ones you gave).

Rohin Shah

It doesn't make sense to defer to Eliezer's estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.

There's lots of things you can do under Eliezer's worldview that add dignity points, like paying relevant people millions of dollars to spend a week really engaging with the arguments, or trying to get whole-brain emulation before AGI. My understanding is that he doesn't expect those sorts of things to happen.

I claim that the value of this organization is mainly determined by the likelihood that Eliezer is correct about a few key claims which underlie his research agenda. Suppose he thinks that's 90% likely and I think that's 10% likely.

This seems like a crazy way to do cost-effectiveness analyses.

Like, if I were comparing deworming to GiveDirectly, would I be saying "well, the value of deworming is mainly determined by the likelihood that the pro-deworming people are right, which I estimate is 70% but you estimate is 50%, so there's only a 1.4x difference"? Something has clearly gone wrong here.

It also feels like this reasoning implies that no EA action can be > 10x more valuable than any other action that an EA critic thinks is good? Since you assign a 90% chance that the EA is right, and the critic thinks there's a 10% chance of that, so there's only a 9x gap? And then once you do all of your adjustments it's only 2x? Why do we even bother with cause prioritization under this worldview?

I don't have a fleshed out theory of how and when to defer, but I feel pretty confident that even our intuitive pretheoretic deference should not be this sort of thing, and should be the sort of thing that can have orders of magnitude of difference between actions.

(One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to "all other alignment work".)

The debate is between two responses to that:
a) Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).
b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn't overconfident.

I don't see why you are not including "c) give significant deference weight to his actual worldview", which is what I'd be inclined to do if I didn't have significant AI expertise myself and so was trying to defer.

(Aside: note that Ben said "they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk", which is slightly different from your rephrasing, but that's a nitpick)

Also, the numbers seem pretty wild; maybe a bit uncharitable to ascribe to Eliezer the view that his research would be 3 OOM more valuable than the rest of the field combined?

¯\_(ツ)_/¯ Both the 10% and 0.01% (= 100% - 99.99%) numbers are ones I've heard reported (though both second-hand, not directly from Eliezer), and it also seems consistent with other things he writes. It seems entirely plausible that people misspoke or misremembered or lied, or that Eliezer was reporting probabilities "excluding miracles" or something else that makes these not the right numbers to use.

I'm not trying to be "charitable" to Eliezer, I'm trying to predict his views accurately (while noting that often people predict views inaccurately by failing to be sufficiently charitable). Usually when I see people say things like "obviously Eliezer meant this more normal, less crazy thing" they seem to be wrong.

Rob thinking that it's not actually 99.99% is in fact an update for me.

richard_ngo

(One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to "all other alignment work".)

IMO the crux is that I disagree with both of these. Instead I think you should use each worldview to calculate a policy, and then generate some kind of compromise between those policies. My arguments above were aiming to establish that this strategy is not very sensitive to exactly how much you defer to Eliezer, because there just aren't very many good worldviews going around - hence why I assign maybe 15 or 20% (inside view) credence to his worldview (updated from 10% above after reflection). (I think my all-things-considered view is similar, actually, because deference to him cancels out against deference to all the people who think he's totally wrong.)

Again, the difference is in large part determined by whether you think you're in a low-dimensional space (here are our two actions, which one should we take?) versus a high-dimensional space (millions of actions available to us, how do we narrow it down?) In a high-dimensional space the tradeoffs between the best ways to generate utility according to Eliezer's worldview and the best ways to generate utility according to other worldviews become much smaller.

This seems like a crazy way to do cost-effectiveness analyses.
Like, if I were comparing deworming to GiveDirectly, would I be saying "well, the value of deworming is mainly determined by the likelihood that the pro-deworming people are right, which I estimate is 70% but you estimate is 50%, so there's only a 1.4x difference"? Something has clearly gone wrong here.

Within a worldview, you can assign EVs which are orders of magnitude different. But once you do worldview diversification, if a given worldview gets even 1% of my resources, then in some sense I'm acting like that worldview's favored interventions are in a comparable EV ballpark to all the other worldviews' favored interventions. That's a feature not a bug.

It also feels like this reasoning implies that no EA action can be > 10x more valuable than any other action that an EA critic thinks is good? Since you assign a 90% chance that the EA is right, and the critic thinks there's a 10% chance of that, so there's only a 9x gap? And then once you do all of your adjustments it's only 2x? Why do we even bother with cause prioritization under this worldview?
I don't have a fleshed out theory of how and when to defer, but I feel pretty confident that even our intuitive pretheoretic deference should not be this sort of thing, and should be the sort of thing that can have orders of magnitude of difference between actions.

An arbitrary critic typically gets well less than 0.1% of my deference weight on EA topics (otherwise it'd run out fast!) But also see above: because in high-dimensional spaces there are few tradeoffs between different worldviews' favored interventions, changing the weights on different worldviews doesn't typically lead to many OOM changes in how you're acting like you're assigning EVs.

Also, I tend to think of cause prio as trying to integrate multiple worldviews into a single coherent worldview. But with deference you intrinsically can't do that, because the whole point of deference is you don't fully understand their views.

There's lots of things you can do under Eliezer's worldview that add dignity points, like paying relevant people millions of dollars to spend a week really engaging with the arguments, or trying to get whole-brain emulation before AGI. My understanding is that he doesn't expect those sorts of things to happen.

What do you mean "he doesn't expect this sort of thing to happen"? I think I would just straightforwardly endorse doing a bunch of costly things like these that Eliezer's worldview thinks are our best shot, as long as they don't cause much harm according to other worldviews.

I don't see why you are not including "c) give significant deference weight to his actual worldview", which is what I'd be inclined to do if I didn't have significant AI expertise myself and so was trying to defer.

Because neither Ben nor myself was advocating for this.

Rohin Shah

Okay, my new understanding of your view is that you're suggesting that (if one is going to defer) one should:

Identify a panel of people to defer to
Assign them weights based on how good they seem (e.g. track record, quality and novelty of ideas, etc)
Allocate resources to [policies advocated by person X] in proportion to [weight assigned to person X].

I agree that (a) this is a reasonable deference model and (b) under this deference model most of my calculations and questions in this thread don't particularly make sense to think about.

However, I still disagree with the original claim I was disagreeing with:

when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

Even in this new deference model, it seems like the specific weights chosen in step 2 are a pretty big deal (which seem like the obvious analogues of "credences", and the sort of thing that Ben's post would influence). If you switch from a weight of 0.5 to a weight of 0.3, that's a reallocation of 20% of your resources, which is pretty large!

richard_ngo

Yepp, thanks for the clear rephrasing. My original arguments for this view were pretty messy because I didn't have it fully fleshed out in my mind before writing this comment thread, I just had a few underlying intuitions about ways I thought Ben was going wrong.

Upon further reflection I think I'd make two changes to your rephrasing.

First change: in your rephrasing, we assign people weights based on the quality of their beliefs, but then follow their recommended policies. But any given way of measuring the quality of beliefs (in terms of novelty, track record, etc) is only an imperfect proxy for quality of policies. For example, Kurzweil might very presciently predict that compute is the key driver of AI progress, but suppose (for the sake of argument) that the way he does so is by having a worldview in which everything is deterministic, individuals are powerless to affect the future, etc. Then you actually don't want to give many resources to Kurzweil's policies, because Kurzweil might have no idea which policies make any difference.

So I think I want to adjust the rephrasing to say: in principle we should assign people weights based on how well their past recommended policies for someone like you would have worked out, which you can estimate using things like their track record of predictions, novelty of ideas, etc. But notably, the quality of past recommended policies is often not very sensitive to credences! For example, if you think that there's a 50% chance of solving nanotech in a decade, or a 90% chance of solving nanotech in a decade, then you'll probably still recommend working on nanotech (or nanotech safety) either way.

Having said all that, since we only get one rollout, evaluating policies is very high variance. And so looking at other information like reasoning, predictions, credences, etc, helps you distinguish between "good" and "lucky". But fundamentally we should think of these as approximations to policy evaluation, at least if you're assuming that we mostly can't fully evaluate whether their reasons for holding their views are sound.

Second change: what about the case where we don't get to allocate resources, but we have to actually make a set of individual decisions? I think the theoretically correct move here is something like: let policies spend their weight on the domains which they think are most important, and then follow the policy which has spent most weight on that domain.

Some complications:

I say "domains" not "decisions" because you don't want to make a series of related decisions which are each decided by a different policy, that seems incoherent (especially if policies are reasoning adversarially about how to undermine each other's actions).
More generally, this procedure could in theory be sensitive to bargaining and negotiating dynamics between different policies, and also the structure of the voting system (e.g. which decisions are voted on first, etc). I think we can just resolve to ignore those and do fine, but in principle I expect it gets pretty funky.

Lastly, two meta-level notes:

I feel like I've probably just reformulated some kind of reinforcement learning. Specifically the case where you have a fixed class of policies and no knowledge of how they relate to each other, so you can only learn how much to upweight each policy. And then the best policy is not actually in your hypothesis space, but you can learn a simple meta-policy of when to use each existing policy.
It's very ironic that in order to figure out how much to defer to Yudkowsky we need to invent a theory of idealised cooperative decision-making. Since he's probably the person whose thoughts on that I trust most, I guess we should meta-defer to him about what that will look like...

Rohin Shah

First change:

In your Kurzweil example I think the issue is not that you assigned weights based on hypothetical-Kurzweil's beliefs, but that hypothetical-Kurzweil is completely indifferent over policies. I think the natural fix is "moral parliament" style decision making where the weights can still come from beliefs but they now apply more to preferences-over-policies. In your example hypothetical-Kurzweil has a lot of weight but never has any preferences-over-policies so doesn't end up influencing your decisions at all.

That being said, I agree that if you can evaluate quality of past recommended policies well, without a ton of noise, that would be a better signal than accuracy of beliefs. This just seems extremely hard to do, especially given the selection bias in who comes to your attention in the first place, and idk how I'd do it for Eliezer in any sane way. (Whereas you get to see people state many more beliefs and so there are a lot more data points that you can evaluate if you look at beliefs.)

But notably, the quality of past recommended policies is often not very sensitive to credences!

I think you're thinking way too much about credences-in-particular. The relevant notion is not "credences", it's that-which-determines-how-much-influence-the-person-has-over-your-actions. In this model of deference the relevant notion is the weights assigned in step 2 (however you calculate them), and the message of Ben's post would be "I think people assign too high a weight to Eliezer", rather than anything about credences. I don't think either Ben or I care particularly much about credences-based-on-deference except inasmuch as they affect your actions.

I do agree that Ben's post looks at credences that Eliezer has given and considers those to be relevant evidence for computing what weight to assign Eliezer. You could take a strong stand against using people's credences or beliefs to compute weights, but that is at least a pretty controversial take (that I personally don't agree with), and it seems different from what you've been arguing so far (except possibly in the parent comment).

Second change:

This change seems fine. Personally I'm pretty happy with a rough heuristic of "here's how I should be splitting my resources across worldviews" and then going off of intuitive "how much does this worldview care about this decision" + intuitive trading between worldviews rather than something more fleshed out and formal but that seems mostly a matter of taste.

richard_ngo

In your Kurzweil example I think the issue is not that you assigned weights based on hypothetical-Kurzweil's beliefs, but that hypothetical-Kurzweil is completely indifferent over policies.

Your procedure is non-robust in the sense that, if Kurzweil transitions from total indifference to thinking that one policy is better by epsilon, he'll throw his full weight behind that policy. Hmm, but then in a parliamentary approach I guess that if there are a few different things he cares epsilon about, then other policies could negotiate to give him influence only over the things they don't care about themselves. Weighting by hypothetical-past-impact still seems a bit more elegant, but maybe it washes out.

(If we want to be really on-policy then I guess the thing which we should be evaluating is whether the person's worldview would have had good consequences when added to our previous mix of worldviews. And one algorithm for this is assigning policies weights by starting off from a state where they don't know anything about the world, then letting them bet on all your knowledge about the past (where the amount they win on bets is determined not just by how correct they are, but also how much they disagree with other policies). But this seems way too complicated to be helpful in practice.)

I agree that if you can evaluate quality of past recommended policies well, without a ton of noise, that would be a better signal than accuracy of beliefs. This just seems extremely hard to do, especially given the selection bias in who comes to your attention in the first place, and idk how I'd do it for Eliezer in any sane way.

I think I'm happy with people spending a bunch of time evaluating accuracy of beliefs, as long as they keep in mind that this is a proxy for quality of recommended policies. Which I claim is an accurate description of what I was doing, and what Ben wasn't: e.g. when I say that credences matter less than coherence of worldviews, that's because the latter is crucial for designing good policies, whereas the former might not be; and when I say that all-things-considered estimates of things like "total risk level" aren't very important, that's because in principle we should be aggregating policies not risk estimates between worldviews.

I also agree that selection bias could be a big problem; again, I think that the best strategy here is something like "do the standard things while remembering what's a proxy for what".

Rohin Shah

Meta: This comment (and some previous ones) get a bunch into "what should deference look like", which is interesting, but I'll note that most of this seems unrelated to my original claim, which was just "deference* seems important for people making decisions now, even if it isn't very important in practice for researchers", in contradiction to a sentence on your top-level comment. Do you now agree with that claim?

*Here I mean deference in the sense of how-much-influence-various-experts-have-over-your-actions. I initially called this "credences" because I thought you were imagining a model of deference in which literal credences determined how much influence experts had over your actions.

Your procedure is non-robust in the sense that, if Kurzweil transitions from total indifference to thinking that one policy is better by epsilon, he'll throw his full weight behind that policy.

Agreed, but I'm not too worried about that. It seems like you'll necessarily have some edge cases like this; I'd want to see an argument that the edge cases would be common before I switch to something else.

The chain of approximations could look something like:

The correct thing to do is to consider all actions / policies and execute the one with the highest expected impact.
First approximation: Since there are so many actions / policies, it would take too long to do this well, and so we instead take a shortcut and consider only those actions / policies that more experienced people have thought of, and execute the ones with the highest expected impact. (I'm assuming for now that you're not in the business of coming up with new ideas of things to do.)
Second approximation: Actually it's still pretty hard to evaluate the expected impact of the restricted set of actions / policies, so we'll instead do the ones that the experts say is highest impact. Since the experts disagree, we'll divide our resources amongst them, in accordance with our predictions of which experts have highest expected impact across their portfolios of actions. (This is assuming a large enough pile of resources that it makes sense to diversify due to diminishing marginal returns for any one expert.)
Third approximation: Actually expected impact of an expert's portfolio of actions is still pretty hard to assess, we can save ourselves decision time by choosing weights for the portfolios according to some proxy that's easier to assess.

It seems like right now we're disagreeing about proxies we could use in the third approximation. It seems to me like proxies should be evaluated based on how close they reach the desired metric (expected future impact) in realistic use cases, which would involve both (1) how closely they align with "expected future impact" in general and (2) how easy they are to evaluate. It seems to me like you're thinking mostly of (1) and not (2) and this seems weird to me; if you were going to ignore (2) you should just choose "expected future impact". Anyway, individual proxies and my thoughts on them:

Beliefs / credences: 5/10 on easy to evaluate (e.g. Ben could write this post). 3/10 on correlation with expected future impact. Doesn't take into account how much impact experts think their policies could have (e.g. the Kurzweil example above).
Coherence: 3/10 on easy to evaluate (seems hard to do this without being an expert in the field). 2/10 on correlation with expected future impact (it's not that hard to have wrong coherent worldviews, see e.g. many pop sci books).
Hypothetical impact of past policies: 1/10 on easy to evaluate (though it depends on the domain). 7/10 on correlation with expected future impact (it's not 9/10 or 10/10 because selection bias seems very hard to account for).

As is almost always the case with proxies, I would usually use an intuitive combination of all the available proxies, because that seems way more robust than relying on any single one. I am not advocating for only relying on beliefs.

Which I claim is an accurate description of what I was doing, and what Ben wasn't

I get the sense that you think I'm trying to defend "this is a good post and has no problems whatsoever"? (If so, that's not what I said.)

Summarizing my main claims about this deference model that you might disagree with:

In practice, an expert's beliefs / credences will be relevant information into deciding what weight to assign them,
Ben's post provides relevant information about Eliezer's beliefs (note this is not taking a stand on other aspects of the post, e.g. the claim about how much people should defer to Eliezer)
The weights assigned to experts are important / valuable to people who need to make decisions now (but they are usually not very important / valuable to researchers).

richard_ngo

Meta: I'm currently writing up a post with a fully-fleshed-out account of deference. If you'd like to drop this thread and engage with that when it comes out (or drop this thread without engaging with that), feel free; I expect it to be easier to debate when I've described the position I'm defending in more detail.

I'll note that most of this seems unrelated to my original claim, which was just "deference* seems important for people making decisions now, even if it isn't very important in practice for researchers", in contradiction to a sentence on your top-level comment. Do you now agree with that claim?

I always agreed with this claim; my point was that the type of deference which is important for people making decisions now should not be very sensitive to the "specific credences" of the people you're deferring to. You were arguing above that the difference between your and Eliezer's views makes much more than a 2x difference; do you now agree that, on my account of deference, a big change in the deference-weight you assign to Eliezer plausibly leads to a much smaller change in your policy from the perspective of other worldviews, because the Eliezer-worldview trades off influence over most parts of the policy for influence over the parts that the Eliezer-worldview thinks are crucial and other policies don't?

individual proxies and my thoughts on them

This is helpful, thanks. I of course agree that we should consider both correlations with impact and ease of evaluation; I'm talking so much about the former because not noticing this seems like the default mistake that people make when thinking about epistemic modesty. Relatedly, I think my biggest points of disagreement with your list are:

1. I think calibrated credences are badly-correlated with expected future impact, because:
a) Overconfidence is just so common, and top experts are often really miscalibrated even when they have really good models of their field
b) The people who are best at having impact have goals other than sounding calibrated - e.g. convincing people to work with them, fighting social pressure towards conformity, etc. By contrast, the people who are best at being calibrated are likely the ones who are always stating their all-things-considered views, and who therefore may have very poor object-level models. This is particularly worrying when we're trying to infer credences from tone - e.g. it's hard to distinguish the hypotheses "Eliezer's inside views are less calibrated than other peoples" and "Eliezer always speaks based on his inside-view credences, whereas other people usually speak based on their all-things-considered credences".
c) I think that "directionally correct beliefs" are much better-correlated, and not that much harder to evaluate, and so credences are especially unhelpful by comparison to those (like, 2/10 before conditioning on directional correctness, and 1/10 after, whereas directional correctness is like 3/10).

2. I think coherence is very well-correlated with expected future impact (like, 5/10), because impact is heavy-tailed and the biggest sources of impact often require strong, coherent views. I don't think it's that hard to evaluate in hindsight, because the more coherent a view is, the more easily it's falsified by history.

3. I think "hypothetical impact of past policies" is not that hard to evaluate. E.g. in Eliezer's case the main impact is "people do a bunch of technical alignment work much earlier", which I think we both agree is robustly good.

Rohin Shah

You were arguing above that the difference between your and Eliezer's views makes much more than a 2x difference;

I was arguing that EV estimates have more than a 2x difference; I think this is pretty irrelevant to the deference model you're suggesting (which I didn't know you were suggesting at the time).

do you now agree that, on my account of deference, a big change in the deference-weight you assign to Eliezer plausibly leads to a much smaller change in your policy from the perspective of other worldviews, because the Eliezer-worldview trades off influence over most parts of the policy for influence over the parts that the Eliezer-worldview thinks are crucial and other policies don't?

No, I don't agree with that. It seems like all the worldviews are going to want resources (money / time) and access to that is ~zero-sum. (All the worldviews want "get more resources" so I'm assuming you're already doing that as much as possible.) The bargaining helps you avoid wasting resources on counterproductive fighting between worldviews, it doesn't change the amount of resources each worldview gets to spend.

Going from allocating 10% of your resources to 20% of your resources to a worldview seems like a big change. It's a big difference if you start with twice as much money / time as you otherwise would have, unless there just happens to be a sharp drop in marginal utility of resources between those two points for some reason.

Maybe you think that there are lots of things one could do that have way more effect than "redirecting 10% of one's resources" and so it's not a big deal? If so can you give examples?

I think calibrated credences are badly-correlated with expected future impact

I agree overconfidence is common and you shouldn't literally calculate a Brier score to figure out who to defer to.

I agree that directionally-correct beliefs are better correlated than calibrated credences.

When I say "evaluate beliefs" I mean "look at stated beliefs and see how reasonable they look overall, taking into account what other people thought when the beliefs were stated" and not "calculate a Brier score"; I think this post is obviously closer to the former than the latter.

I agree that people's other goals make it harder to evaluate what their "true beliefs" are, and that's one of the reasons I say it's only 3/10 correlation.

I think coherence is very well-correlated with expected future impact (like, 5/10), because impact is heavy-tailed and the biggest sources of impact often require strong, coherent views. I don't think it's that hard to evaluate in hindsight, because the more coherent a view is, the more easily it's falsified by history.

Re: correlation, I was implicitly also asking the question "how much does this vary across experts". Across the general population, maybe coherence is 7/10 correlated with expected future impact; across the experts that one would consider deferring to I think it is more like 2/10, because most experts seem pretty coherent (within the domains they're thinking about and trying to influence) and so the differences in impact depend on other factors.

Re: evaluation, it seems way more common to me that there are multiple strong, coherent, conflicting views that all seem compelling (see epistemic learned helplessness), which do not seem to have been easily falsified by history (in sufficiently obvious manner that everyone agrees which one is false).

This too is in large part because we're looking at experts in particular. I think we're good at selecting for "enough coherence" before we consider someone an expert (if anything I think we do it too much in the "public intellectual" space), and so evaluating coherence well enough to find differences between experts ends up being pretty hard.

I think "hypothetical impact of past policies" is not that hard to evaluate. E.g. in Eliezer's case the main impact is "people do a bunch of technical alignment work much earlier", which I think we both agree is robustly good.

I feel like looking at any EA org's report on estimation of their own impact makes it seem like "impact of past policies" is really difficult to evaluate?

Eliezer seems like a particularly easy case, where I agree his impact is probably net positive from getting people to do alignment work earlier, but even so I think there's a bunch of questions that I'm uncertain about:

How bad is it that some people completely dismiss AI risk because they encountered Eliezer and found it off putting? (I've explicitly heard something along the lines of "that crazy stuff from Yudkowsky" from multiple ML researchers.)
How many people would be working on alignment without Eliezer's work? (Not obviously hugely fewer, Superintelligence plausibly still gets published, Stuart Russell plausibly still goes around giving talks about value alignment and its importance.)
To what extent did Eliezer's forceful rhetoric (as opposed to analytic argument) lead people to focus on the wrong problems?

richard_ngo

I've now written up a more complete theory of deference here. I don't expect that it directly resolves these disagreements, but hopefully it's clearer than this thread.

Going from allocating 10% of your resources to 20% of your resources to a worldview seems like a big change.

Note that this wouldn't actually make a big change for AI alignment, since we don't know how to use more funding. It'd make a big change if we were talking about allocating people, but my general heuristic is that I'm most excited about people acting on strong worldviews of their own, and so I think the role of deference there should be much more limited than when it comes to money. (This all falls out of the theory I linked above.)

Across the general population, maybe coherence is 7/10 correlated with expected future impact; across the experts that one would consider deferring to I think it is more like 2/10, because most experts seem pretty coherent (within the domains they're thinking about and trying to influence) and so the differences in impact depend on other factors.

Experts are coherent within the bounds of conventional study. When we try to apply that expertise to related topics that are less conventional (e.g. ML researchers on AGI; or even economists on what the most valuable interventions are) coherence drops very sharply. (I'm reminded of an interview where Tyler Cowen says that the most valuable cause area is banning alcohol, based on some personal intuitions.)

I feel like looking at any EA org's report on estimation of their own impact makes it seem like "impact of past policies" is really difficult to evaluate?

The question is how it compares to estimating past correctness, where we face pretty similar problems. But mostly I think we don't disagree too much on this question - I think epistemic evaluations are gonna be bigger either way, and I'm mostly just advocating for the "think-of-them-as-a-proxy" thing, which you might be doing but very few others are.

Rohin Shah

Note that this wouldn't actually make a big change for AI alignment, since we don't know how to use more funding.

Funding isn't the only resource:

You'd change how you introduce people to alignment (since I'd guess that has a pretty strong causal impact on what worldviews they end up acting on). E.g. if you previously flipped a 10%-weighted coin to decide whether to send them down the Eliezer track or the other track, now you'd flip a 20%-weighted coin, and this straightforwardly leads to different numbers of people working on particular research agendas that the worldviews disagree about. Or if you imagine the community as a whole acting as an agent, you send 20% of the people to MIRI fellowships and the remainder to other fellowships (whereas previously it would be 10%).
(More broadly I think there's a ton of stuff you do differently in community building, e.g. do you target people who know ML or people who are good at math?)
You'd change what you used political power for. I don't particularly understand what policies Eliezer would advocate for but they seem different, e.g. I think I'm more keen on making sure particular alignment schemes for building AI systems get used and less keen on stopping everyone from doing stuff besides one secrecy-oriented lab that can become a leader.

Experts are coherent within the bounds of conventional study.

Yeah, that's what I mean.

Rohin Shah

Responding to other more minor points:

What do you mean "he doesn't expect this sort of thing to happen"?

I mean that he predicts that these costly actions will not be taken despite seeming good to him.

Because neither Ben nor myself was advocating for this.

I think it's also important to consider Ben's audience. If I were Ben I'd be imagining my main audience to be people who give significant deference weight to Eliezer's actual worldview. If you're going to write a top-level comment arguing against Ben's post it seems pretty important to engage with the kind of deference he's imagining (or argue that no one actually does that kind of deference, or that it's not worth writing to that audience, etc).

(Of course, I could be wrong about who Ben imagines his audience to be.)

Verden

-1

Rob thinking that it's not actually 99.99% is in fact an update for me.

This survey suggests that he was at 96-98% a year ago.

RobBensinger

Why do you think it suggests that? There are two MIRI responses in that range, but responses are anonymous, and most MIRI staff didn't answer the survey.

Verden

I should have clarified that I think (or at least I thought so, prior to your question; kind of confused now) Yudkowsky's answer is probably one of those two MIRI responses. Sorry about that.

I recall you or somebody else at MIRI once wrote something along the lines that most of MIRI researchers don't actually believe that p(doom) is extremely high, like >90% doom. Then, in the linked post, there is a comment from someone who marked themselves both as a technical safety and strategy researcher and who gave 0.98, 0.96 on your questions. The style/content of the comment struck me as something Yudkowsky would have written.

RobBensinger

Cool! I figured your reasoning was probably something along those lines, but I wanted to clarify that the survey is anonymous and hear your reasoning. I personally don't know who wrote the response you're talking about, and I'm very uncertain how many researchers at MIRI have 90+% p(doom), since only five MIRI researchers answered the survey (and marked that they're from MIRI).

richard_ngo

Musing out loud: I don't know of any complete model of deference which doesn't run into weird issues, like the conclusion that you should never trust yourself. But suppose you have some kind of epistemic parliament where you give your own views some number of votes, and assign the rest of the votes to other people in proportion to how defer-worthy they seem. Then you need to make a bunch of decisions, and your epistemic parliament keeps voting on what will best achieve your (fixed) goals.

If you do naive question-by-question majority voting on each question simultaneously then you can end up with an arbitrarily incoherent policy - i.e. a set of decisions that's inconsistent with each other. And if you make the decisions in some order, with the constraint that they each have to be consistent with all prior decisions, then the ordering of the decisions can become arbitrarily important.

Instead, you want your parliament to negotiate some more coherent joint policy to follow. And I expect that in this joint policy, each worldview gets its way on the questions that are most important to it, and cedes responsibility on the questions that are least important. So Eliezer's worldview doesn't end up reallocating all the biosecurity money, but it does get a share of curriculum time (at least for the most promising potential researchers). But in general how to conduct those negotiations is an unsolved problem (and pretty plausibly unsolveable).

RobBensinger

Since Eliezer thinks something like 99.99% chance of doom from AI

I could be wrong, but I'd guess Eliezer's all-things-considered p(doom) is less extreme than that.

richard_ngo

Yeah, I'm gonna ballpark guess he's around 95%? I think the problem is that he cites numbers like 99.99% when talking about the chance of doom "without miracles", which in his parlance means assuming that his claims are never overly pessimistic. Which seems like wildly bad epistemic practice. So then it goes down if you account for that, and then maybe it goes down even further if he adjusts for the possibility that other people are more correct than him overall (although I'm not sure that's a mental move he does at all, or would ever report on if he did).

Rohin Shah

Even at 95% you get OOMs of difference by my calculations, though significantly fewer OOMs, so this doesn't seem like the main crux.

kokotajlod

Beat me to it & said it better than I could.

My now-obsolete draft comment was going to say:

It seems to me that between about 2004 and 2014, Yudkowsky was the best person in the world to listen to on the subject of AGI and AI risks. That is, deferring to Yudkowsky would have been a better choice than deferring to literally anyone else in the world. Moreover, after about 2014 Yudkowsky would probably have been in the top 10; if you are going to choose 10 people to split your deference between (which I do not recommend, I recommend thinking for oneself), Yudkowsky should be one of those people and had you dropped Yudkowsky from the list in 2014 you would have missed out on some important stuff. Would you agree with this?

On the positive side, I'd be interested to see a top ten list from you of people you think should be deferred to as much or more than Yudkowsky on matters of AGI and AI risks.*

*What do I mean by this? Idk, here's a partial operationalization: Timelines, takeoff speeds, technical AI alignment, and p(doom).

[ETA: lest people write me off as a Yudkowsky fanboy, I wish to emphasize that I too think people are overindexing on Yudkowsky's views, I too think there are a bunch of people who defer to him too much, I too think he is often overconfident, wrong about various things, etc.]

[ETA: OK, I guess I think Bostrom probably was actually slightly better than Yudkowsky even on 20-year timespan.]

[ETA: I wish to reemphasize, but more strongly, that Yudkowsky seems pretty overconfident not just now but historically. Anyone deferring to him should keep this in mind; maybe directly update towards his credences but don't adopt his credences. E.g. think "we're probably doomed" but not "99% chance of doom" Also, Yudkowsky doesn't seem to be listening to others and understanding their positions well. So his criticisms of other views should be listened to but not deferred to, IMO.]

Habryka [Deactivated]

Didn't you post that comment right here?

kokotajlod

Oops! Dunno what happened, I thought it was not yet posted. (I thought I had posted it at first, but then I looked for it and didn't see it & instead saw the unposted draft, but while I was looking for it I saw Richard's post... I guess it must have been some sort of issue with having multiple tabs open. I'll delete the other version.)

Dawn Drescher

I've nevertheless downvoted this post because it seems like it's making claims that are significantly too strong, based on a methodology that I strongly disendorse.

I agree, and I’m a bit confused that the top-level post does not violate forum rules in its current form. There is a version of the post – rephrased and reframed – that I think would be perfectly fine even though I would still disagree with it.

And I say that as someone who loved Paul’s response to Eliezer’s list!

Separately, my takeaway from Ben’s 80k interview has been that I think that Eliezer’s take on AI risk is much more truth-tracking than Ben’s. To improve my understanding, I would turn to Paul and ARC’s writings rather than Eliezer and MIRI’s, but Eliezer’s takes are still up there among the most plausible ones in my mind.

I suspect that the motivation for this post comes from a place that I would find epistemically untenable and that bears little semblance to the sophisticated disagreement between Eliezer and Paul. But I’m worried that a reader may come away with the impression that Ben and Paul fall into one camp and Eliezer into another on AI risk when really Paul agrees with Eliezer on many points when it comes to the importance and urgency of AI safety (see the list of agreements at the top of Paul’s post).

Stefan_Schubert

I agree, and I’m a bit confused that the top-level post does not violate forum rules in its current form.

That seems like a considerable overstatement to me. I think it would be bad if the forum rules said an article like this couldn't be posted.

Dawn Drescher

Maybe, but I find it important to maintain the sort of culture where one can be confidently wrong about something without fear that it’ll cause people to interpret all future arguments only in light of that mistake instead of taking them at face value and evaluating them for their own merit.

The sort of entrepreneurialness that I still feel is somewhat lacking in EA requires committing a lot of time to a speculative idea on the off-chance that it is correct. If it is not, the entrepreneur has wasted a lot of time and usually money. If additionally it has the social cost that they can't try again because people will dismiss them because of that past failure, it makes it just so much less likely still that anyone will try in the first place.

Of course that’s not the status quo. I just really don’t want EA to move in that direction.

Stefan_Schubert

If anything, I think that prohibiting posts like this from being published would have a more detrimental effect on community culture.

Of course, people are welcome to criticise Ben's post - which some in fact do. That's a very different category from prohibition.

Dawn Drescher

Yeah, that sounds perfectly plausible to me.

“A bit confused” wasn’t meant to be any sort of rhetorical pretend understatement or something. I really just felt a slight surprise that caused me to check whether the forum rules contain something about ad hom, and found that they don’t. It may well be the right call on balance. I trust the forum team on that.

bmg

129

I really appreciate the time people have taken to engage with this post (and actually hope the attention cost hasn’t been too significant). I decided to write some post-discussion reflections on what I think this post got right and wrong.

The reflections became unreasonably long - and almost certainly should be edited down - but I’m posting them here in a hopefully skim-friendly format. They cover what I see as some mistakes with the post, first, and then cover some views I stand by.

Things I would do differently in a second version of the post:

1. I would either drop the overall claim about how much people should defer to Yudkowsky — or defend it more explicitly

At the start of the post, I highlight the two obvious reasons to give Yudkowsky's risk estimates a lot of weight: (a) he's probably thought more about the topic than anyone else and (b) he developed many of the initial AI risk arguments. I acknowledge that many people, justifiably, treat these as important factors when (explicitly or implicitly) deciding how much to defer to Yudkowsky.

Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about technological timelines and risks - and at least hasn’t obviously internalised lessons from these apparent mistakes.

The post expresses my view that these two considerations at least counterbalance each other - so that, overall, Yudkowsky's risk estimates shouldn't be given more weight than (e.g.) those of other established alignment researchers or the typical person on the OpenPhil worldview investigation team.

But I don't do a lot in the post to actually explore how we should weigh these factors up. In that sense: I think it’d be fair to regard the post’s central thesis as importantly under-supported by the arguments contained in the post.

I should have either done more to explicitly defend my view or simply framed the post as "some evidence about the reliability of Yudkowsky's risk estimates."

2. I would be clearer about how and why I generated these examples

In hindsight, this is a significant oversight on my part. The process by which I generated these examples is definitely relevant for judging how representative they are - and, therefore, how much to update on them. But I don’t say anything about this in the post. My motives (or at least conscious motives) are also part of the story that I only discuss in pretty high-level terms, but seem like they might be relevant for forming judgments.

For context, then, here was the process:

A few years ago, I tried to get a clearer sense of the intellectual history of the AI risk and existential risk communities. For that reason, I read a bunch of old white papers, blog posts, and mailing list discussions.

These gave me the impression that Yudkowsky’s track record (and - to some extent - the track record of the surrounding community) was worse than I’d realised. From reading old material, I basically formed something like this impression: “At each stage of Yudkowsky’s professional life, his work seems to have been guided by some dramatic and confident belief about technological trajectories and risks. The older beliefs have turned out to be wrong. And the ones that haven’t yet resolved at least seem to have been pretty overconfident in hindsight.”

I kept encountering the idea that Yudkowsky has an exceptionally good track record or that he has an unparalleled ability to think well about AI (he’s also expressed view himself) - and I kept thinking, basically, that this seemed wrong. I wrote up some initial notes on this discrepancy at some point, but didn’t do anything with them.

I eventually decided to write something public after the “Death with Dignity” post, since the view it expresses (that we’re all virtually certain to die soon) both seems wrong to me and very damaging if it’s actually widely adopted in the community. I also felt like the “Death with Dignity” post was getting more play than it should, simply because people have a strong tendency to give Yudkowsky’s views weight. I can’t imagine a similar post written by someone else having nearly as large of an impact. Notably, since that post didn’t really have substantial arguments in it (although the later one did), I think the fact it had an impact is seemingly a testament to the power of deference; I think it’d be hard to look at the reaction to that post and argue that it’s only Yudkowsky’s arguments (rather than his public beliefs in-and-of-themselves) that have a major impact on the community.

People are obviously pretty aware of Yudkowsky’s positive contributions, but my impression is that (especially) new community members tended not to be aware of negative aspects of his track record. So I wanted to write a post drawing attention to the negative aspects.

I was initially going to have the piece explicitly express the impression I’d formed, which was something like: “At each stage of Yudkowsky’s professional life, his work has been guided by some dramatic and seemingly overconfident belief about technological trajectories and risks.” The examples in the post were meant to map onto the main ‘animating predictions’ about technology he had at each stage of his career. I picked out the examples that immediately came to mind.

Then I realised I wasn’t at all sure I could defend the claim that these were his main ‘animating predictions’ - the category was obviously extremely vague, and the main examples that came to mind were extremely plausibly a biased sample. I thought there was a good chance that if I reflected more, then I’d also want to include various examples that were more positive.

I didn’t want to spend the time doing a thorough accounting exercise, though, so I decided to drop any claim that the examples were representative and just describe them as “cherry-picked” — and add in lots of caveats emphasising that they’re cherry-picked.

(At least, these were my conscious thought processes and motivations as I remember them. I’m sure other factors played a role!)

3. I’d tweak my discussion of take-off speeds

I’d make it clearer that my main claim is: it would have been unreasonable to assign a very high credence to fast take-offs back in (e.g.) the early- or mid-2000s, since the arguments for fast take-offs had significant gaps. For example, there were a lots of possible countervailing arguments for slow take-offs that pro-fast-take-off authors simply hadn’t address yet — as evidenced, partly, by the later publication of slow-take-off arguments leading a number of people to become significantly more sympathetic to slow take-offs. (I’m not claiming that there’s currently a consensus against fast-take-off views.)

4. I’d add further caveats to the “coherence arguments” case - or simply leave it out

Rohin’s and Oli’s comments under the post have made me aware that there’s a more positive way to interpret Yudkowsky’s use of coherence arguments. I’m not sure if that interpretation is correct, or if it would actually totally undermine the example, but this is at minimum something I hadn’t reflected on. I think it’s totally possible that further reflection would lead me to simply remove the example.

Positions I stand by:

On the flipside, here’s a set of points I still stand by:

1. If a lot of people in the community believe AI is probably going to kill everyone soon, then (if they’re wrong) this can have really important negative effects

In terms of prioritisation: My prediction is that if you were to ask different funders, career advisors, and people making career decisions (e.g. deciding whether to go into AI policy or bio policy) how much they value having a good estimate of AI risk, they’ll very often answer that they value it a great deal. I do think that over-estimating the level of risk could lead to concretely worse decisions.

In terms of community health: I think that believing you’re probably going to die soon is probably bad for a large portion of people. Reputationally: Being perceived as believing that everyone is probably going to die soon (particularly if this actually an excessive level of worry) also seems damaging.

I think we should also take seriously the tail-risk that at least one person with doomy views (even if they’re not directly connected to the existential risk community) will take dramatic and badly harmful actions on the basis of their views.

2. Directly and indirectly, deference to Yudkowsky has a significant influence on a lot of people’s views

As above: One piece of evidence for this is Yudkowsky’s “Death with Dignity” post triggered a big reaction, even though it didn’t contain any significant new arguments. I think his beliefs (above and beyond his arguments) clearly do have an impact.

Another reason to believe deference is a factor: I think it’s both natural and rational for people, particularly people new to an area, to defer to people with more expertise in that area.^[1] Yudkowsky is one of the most obvious people to defer to, as one of the two people most responsible for developing and popularising AI risk arguments and as someone who has (likely) spent more time thinking about the subject than anyone else.

Beyond that: A lot of people also clearly in general have huge amount of respect for Yudkowsky, sometimes more than they have for any other public intellectual. I think it’s natural (and sensible) for people’s views to be influenced by the views of the people they respect. In general, I think, unless you have tremendous self-control, this will tend to happen sub-consciously even if you don’t consciously choose to defer to the people you respect.

Also, people sometimes just do talk about Yudkowsky’s track record or reputation as a contributing factor to their views.

3. The track records of influential intellectuals (including Yudkowsky) should be publicly discussed.

A person’s track-record provides evidence about how reliable their predictions are. If people are considering how much to defer to some intellectual, then they should want to know what their track record (at least within the relevant domain) looks like.

The main questions that matter are: What has the intellectual gotten wrong and right? Beyond whether they were wrong or right, about a given case, does it also seem like their predictions were justified? If they’ve made certain kinds of mistakes in the past, do we now have reason to think they won’t repeat those kinds of mistakes?

4. Yudkowsky’s track record suggests a substantial bias toward dramatic and overconfident predictions.

One counter - which I definitely think it’s worth reflecting on - is that it might be possible to generate a similarly bias-suggesting list of examples like this for any other public intellectual or member of the existential risk community.

I’ll focus on one specific comment, suggesting that Yudkowsky’s incorrect predictions about nanotechnology are in the same reference class as ‘writing a typically dumb high school essay.’ The counter goes something like this: Yes, it was possible to find this example from Yudkowsky’s past - but that’s not importantly different than being able to turn up anyone else’s dumb high school essay about (e.g.) nuclear power.

Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an organization and devote years of their professional life to address the risk, and talk about how they’re the only person alive who can stop it.

That just seems very different from writing a dumb high school essay. Much more than a standard dumb high school essay, I think this aspect of Yudkowsky’s track record really does suggest a bias toward dramatic and overconfident predictions. This prediction is also really strikingly analogous to the prediction Yudkowsky is making right now - its relevance is clearly higher than the relevance of (e.g.) a random poorly thought-out view in a high school essay.

(Yudkowsky's early writing and work is also impressive, in certain ways, insofar as it suggests a much higher level of originality of thought and agency than the typical young person has. But the fact that this example is impressive doesn’t undercut, I think, the claim that it’s also highly suggestive of a bias toward highly confident and dramatic predictions.)

5. Being one of the first people to identify, develop, or take seriously some idea doesn’t necessarily mean that you predictions about the idea will be unusually reliable

By analogy:

I don’t think we can assume that the first person to take the covid lab leak theory seriously (when others were dismissive) is currently the most reliable predictor of whether the theory is true.
I don’t think we can assume that the first person to develop the many worlds theory of quantum mechanics (when others were dismissive) would currently be the best person to predict whether the theory is true, if they were still alive.

There are, certainly, reasons to give pioneers in a domain special weight when weighing expert opinion in that domain.^[2] But these reasons aren’t absolute.

There are even easons that point in the opposite direction: we might worry that the pioneer has an attachment to their theory, so will be biased toward believing it is true and as important as possible. We might also worry that the pioneering-ness of their beliefs is evidence that these beliefs front-ran the evidence and arguments (since one way to be early is to simply be excessively confident). We also have less evidence of their open-mindedness than we do for the people who later on moved toward the pioneer’s views — since moving toward the pioneer’s views, when you were initially dismissive, is at least a bit of evidence for open-mindedness and humility.^[3]

Overall, I do think we should tend defer more to pioneers (all else being equal). But this tendency can definitely be overruled by other evidence and considerations.

6. The causal effects that people have had on the world don’t (in themselves) have implications for how much we should defer to them

At least in expectation, so far, Eliezer Yudkowsky has probably had a very positive impact on the world. There is a plausible case to be made that misaligned AI poses a substantial existential risk - and Yudkowsky’s work has probably, on net, massively increased the number of people thinking about it and taking it seriously. He’s also written essays that have exposed huge numbers of people to other important ideas and helped them to think more clearly. It makes sense for people to applaud all of this.

Still, I don’t think his positive causal effect on the world gives people much additional reason to be deferential to him.

Here’s a dumb thought experiment: Suppose that Yudkowsky wrote all of the same things, but never published them. But suppose, also, that a freak magnetic storm ended up implanting all of the same ideas in his would-be-readers’ brains. Would this absence of a casual effect count against deferring to Yudkowsky? I don’t think so. The only thing that ultimately matters, I think, is his track record of beliefs - and the evidence we currently have about how accurate or justified those beliefs were.

I’m not sure anyone disagrees with the above point, but I did notice there seemed to be a decent amount of discussion in the comments about Yudkowsky’s impact - and I’m not sure I think this issue will ultimately be relevant.^[4]

For example: I had ten hours to form a view about the viability of some application of nanotechnology, I definitely wouldn’t want to ignore the beliefs of people who have already thought about the question. Trying to learn the relevant chemistry and engineering background wouldn’t be a good use of my time. ↩︎
One really basic reason is simply that they’ve simply had more time to think about certain subjects than anyone else. ↩︎
Here’s a concrete case: Holden Karnofsky eventually moved toward taking AI risks seriously, after publicly being fairly dismissive of it, and then wrote up a document analysing why he was initially dismissive and drawing lessons from the experience. It seems like we could count that as positive evidence about his future judgment. ↩︎
Even though I’ve just said I’m not sure this question is relevant, I do also want to say a little bit about Yudkowsky’s impact. I personally think's probably had a very significant impact. Nonetheless, I also think the impact can be overstated. For example, I think, it’s been suggested that the effective altruism community might not be very familiar with concepts like Bayesian or the importance of overcoming bias if it weren’t for Yudkowsky’s writing. I don’t really find that particular suggestion plausible.

Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities. For example, my college had classes in probability theory, Bayesian epistemology, and the philosophy of quantum mechanics, and I’d read at least parts of books like Thinking Fast and Slow, the Signal and the Noise, the Logic of Science, and various books associated with the “skeptic community.” (Admittedly, I think it would have been harder to learn some of these things if I’d gone to college a bit earlier or had a different major. I also probably "got lucky" in various ways with the classes I took and books I picked up.) See also Carl Shulman making a similar point and John Halstead also briefly commenting the way in which he personally encountered some the relevant ideas. ↩︎

RobBensinger

I noted some places I agree with your comment here, Ben. (Along with my overall take on the OP.)

Some additional thoughts:

Notably, since that post didn’t really have substantial arguments in it (although the later one did), I think the fact it had an impact is seemingly a testament to the power of deference

The “death with dignity” post came in the wake of Eliezer writing hundreds of thousands of words about why he thinks alignment is hard in the Late 2021 MIRI Conversations (in addition to the many specific views and arguments about alignment difficulty he’s written up in the preceding 15+ years). So it seems wrong to say that everyone was taking it seriously based on deference alone.

The post also has a lot of content beyond “p(doom) is high”. Indeed, I think the post’s focus (and value-add) is mostly in its discussion of rationalization, premature/excessive conditionalizing, and ethical injunctions, not in the bare assertion that p(doom) is high. Eliezer was already saying pretty similar stuff about p(doom) back in September.

I’d make it clearer that my main claim is: it would have been unreasonable to assign a very high credence to fast take-offs back in (e.g.) the early- or mid-2000s, since the arguments for fast take-offs had significant gaps. For example, there were a lots of possible countervailing arguments for slow take-offs that pro-fast-take-off authors simply hadn’t address yet — as evidenced, partly, by the later publication of slow-take-off arguments leading a number of people to become significantly more sympathetic to slow take-offs.

I disagree; I think that, e.g., noting how powerful and widely applicable general intelligence has historically been, and noting a bunch of standard examples of how human cognition is a total shitshow, is sufficient to have a very high probability on hard takeoff.

I think the people who updated a bunch toward hard takeoff based on the recent debate were making a mistake, and should have already had a similarly high p(hard takeoff) going back to the Foom debate, if not earlier.

Insofar as others disagree, I obviously think it’s a good thing for people to publish arguments like “but ML might be very competitive”, and for people to publicly respond to them. But I don’t think “but ML might be very competitive” and related arguments ought to look compelling at a glance (given the original simple arguments for hard takeoff), so I don’t think someone should need to consider the newer discussion in order to arrive at a confident hard-takeoff view.

(Also, insofar as Paul recently argued for X and Eliezer responded with a valid counter-argument for Y, it doesn’t follow that Eliezer had never considered anything like X or Y in initially reaching his confidence. Eliezer’s stated view is that the new Paul arguments seem obviously invalid and didn’t update him at all when he read them. Your criticism would make more sense here if Eliezer had said “Ah, that’s an important objection I hadn’t considered; but now that I’m thinking about it, I can generate totally new arguments that deal with the objections, and these new counter-arguments seem correct to me.”)

The main questions that matter are: What has the intellectual gotten wrong and right? Beyond whether they were wrong or right, about a given case, does it also seem like their predictions were justified?

At least as important, IMO, is the visible quality of their reasoning and arguments, and their retrodictions.

AGI, moral philosophy, etc. are not topics where we can observe extremely similar causal processes today and test all the key claims and all the key reasoning heuristics with simple experiments. Tossing out ‘argument evaluation’ and ‘how well does this fit what I already know?’ altogether would mean tossing out the majority of our evidence about how much weight to put on people’s views.

Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an organization and devote years of their professional life to address the risk, and talk about how they’re the only person alive who can stop it.

I take the opposite view on this comparison. I agree that this is really unusual, but I think the comparison is unfavorable to the high school students, rather than unfavorable to Eliezer. Having unusual views and then not acting on them in any way is way worse than actually acting on your predictions.

I agree that Eliezer acting on his beliefs to this degree suggests he was confident; but in a side-by-side comparison of a high schooler who’s expressed equal confidence in some other unusual view, but takes no unusual actions as a result, the high schooler is the one I update negatively about.

(This also connects up to my view that EAs generally are way too timid/passive in their EA activity, don’t start enough new things, and (when they do start new things) start too many things based on ‘what EA leadership tells them’ rather than based on their own models of the world. The problem crippling EA right now is not that we're generating and running with too many wildly different, weird, controversial moonshot ideas. The problem is that we're mostly just passively sitting around, over-investing in relatively low-impact meta-level interventions, and/or hoping that the most mainstream already-established ideas will somehow suffice.)

Oliver Sourbut

I just wanted to state agreement that it seems a large number of people largely misread Death with Dignity, at least according to what seems to me the most plausible intended message: mainly about the ethical injunctions (which are very important as a finitely-rational and prone-to-rationalisation being), as Yudkowsky has written of in the past.

The additional detail of 'and by the way this is a bad situation and we are doing badly' is basically modal Yudkowsky schtick and I'm somewhat surprised it updated anyone's beliefs (about Yudkowsky's beliefs, and therefore their all-things-considered-including-deference beliefs).

I think if he had been a little more audience-aware he might have written it differently. Then again maybe not, if the net effect is more attention and investment in AI safety - and more recent posts and comments suggest he's more willing than before to use certain persuasive techniques to spur action (which seems potentially misguided to me, though understandable).

Michael St Jules 🔸

The “death with dignity” post came in the wake of Eliezer writing hundreds of thousands of words about why he thinks alignment is hard in the Late 2021 MIRI Conversations (in addition to the many specific views and arguments about alignment difficulty he’s written up in the preceding 15+ years). So it seems wrong to say that everyone was taking it seriously based on deference alone.

I think "deference alone" is a stronger claim than the one we should worry about. People might read the arguments on either side (or disproportionately Eliezer's arguments), but then defer largely to Eliezer's weighing of arguments because of his status/position, confidence, references to having complicated internal models (that he often doesn't explain or link explanations to), or emotive writing style.

What share of people with views similar to Eliezer's do you expect to have read these conversations? They're very long, not well organized, and have no summaries/takeaways. The format seems pretty bad if you value your time.

I think the AGI Ruin: A List of Lethalities post was formatted pretty accessibly, but that came after death with dignity.

Also, insofar as Paul recently argued for X and Eliezer responded with a valid counter-argument for Y, it doesn’t follow that Eliezer had never considered anything like X or Y in initially reaching his confidence. Eliezer’s stated view is that the new Paul arguments seem obviously invalid and didn’t update him at all when he read them.

If the new Paul arguments seem obviously invalid, then Eliezer should be able to explain why in such a way that convinces Paul. Has this generally been the case?

Habryka [Deactivated]

I appreciate this update!

Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about technological timelines and risks - and at least hasn’t obviously internalised lessons from these apparent mistakes.

I am confused about you bringing in the claim of "at each stage of his career", given that the only two examples you cited that seemed to provide much evidence here were from the same (and very early) stage of his career. Of course, you might have other points of evidence that point in this direction, but I did want to provide some additional pushback on the "at each stage of his career" point, which I think you didn't really provide evidence for.

I do think finding evidence for each stage of his career would of course be time-consuming, and I understand that you didn't really want to go through all of that, but it seemed good to point out explicitly.

Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an organization and devote years of their professional life to address the risk, and talk about how they’re the only person alive who can stop it.

FWIW, indeed in my teens I basically did dedicate a good chunk of my time and effort towards privacy efforts out of a concern for US and UK-based surveillance-state concerns. I was in high-school, so making it my full-time efforts was a bit hard, though I did help found a hackerspace in my hometown that had a lot of privacy concerns baked into the culture, and I did write a good number of essays on this. I think the key difference between me and Eliezer here is more the fact that Eliezer was home-schooled and had experience doing things on his own, and not some kind of other fact about his relationship to the ideas being very different.

It's plausible you should update similarly on me, which I think isn't totally insane (I do think I might have, as Luke put it, the "taking ideas seriously gene", which I would also associate with taking other ideas to their extremes, like religious beliefs).

Owen Cotton-Barratt

I really appreciated this update. Mostly it checks out to me, but I wanted to push back on this:

Here’s a dumb thought experiment: Suppose that Yudkowsky wrote all of the same things, but never published them. But suppose, also, that a freak magnetic storm ended up implanting all of the same ideas in his would-be-readers’ brains. Would this absence of a casual effect count against deferring to Yudkowsky? I don’t think so. The only thing that ultimately matters, I think, is his track record of beliefs - and the evidence we currently have about how accurate or justified those beliefs were.

It seems to me that a good part of the beliefs I care about assessing are the beliefs about what is important. When someone has a track record of doing things with big positive impact, that's some real evidence that they have truth-tracking beliefs about what's important. In the hypothetical where Yudkowsky never published his work, I don't get the update that he thought these were important things to publish, so he doesn't get credit for being right about that.

Yonatan Cale

There's also (imperfect) information in "lots of smart people thought about EY's opinions and agree with him" that you don't get from the freak magnetic storm scenario.

richard_ngo

Thanks for writing this update. I think my number one takeaway here is something like: when writing a piece with the aim of changing community dynamics, it's important to be very clear about motivations and context. E.g. I think a version of the piece which said "I think people are overreacting to Death with Dignity, here are my specific models of where Yudkowsky tends to be overconfident, here are the reasons why I think people aren't taking those into account as much as they should" would have been much more useful and much less controversial than the current piece, which (as I interpret it) essentially pushes a general "take Yudkowsky less seriously" meme (and is thereby intrinsically political/statusy).

Yonatan Cale

I'm a bit confused about a specific small part:

tendency toward expressing dramatic views

I imagine that for many people, including me (including you?), once we work on [what we believe to be] preventing the world from ending, we would only move to another job if it was also preventing the world from ending, probably in an even more important way.

In other words, I think "working at a 2nd x-risk job and believing it is very important" is mainly predicted by "working at a 1st x-risk job and believing it is very important", much more than by personality traits.

This is almost testable, given we have lots of people working on x-risk today and believing it is very important. But maybe you can easily put your finger on what I'm missing?

[anonymous]

For what it's worth, I found this post and the ensuing comments very illuminating. As a person relatively new to both EA and the arguments about AI risk, I was a little bit confused as to why there was not much push back on the very high confidence beliefs about AI doom within the next 10 years. My assumption had been that there was a lot of deference to EY because of reverence and fealty stemming from his role in getting the AI alignment field started not to mention the other ways he has shaped people's thinking. I also assumed that his track record on predictions was just ambiguous enough for people not to question his accuracy. Given that I don't give much credence to the idea that prophets/oracles exist, I thought it unlikely that the high confidence on his predictions were warranted on the count that there doesn't seem to be much evidence supporting the accuracy of long range forecasts. I did not think that there were such glaring mispredictions made by EY in the past so thank you for highlighting them.

Verden

I feel like people are missing one fairly important consideration when discussing how much to defer to Yudkowsky, etc. Namely, I've heard multiple times that Nate Soares, the executive director of MIRI, has models of AI risk that are very similar to Yudkowsky's, and their p(doom) are also roughly the same. My limited impression is that Soares is no less smart or otherwise capable than Yudkowsky. So, when having this kind of discussion, focusing on Yudkowsky's track record or whatever, I think it's good to remember that there's another very smart person, who entered AI safety much later than Yudkowsky, and who holds very similar inside views on AI risk.

technicalities

This isn't much independent evidence I think: seems unlikely that you could become director of MIRI unless you agreed. (I know that there's a lot of internal disagreement at other levels.)

Verden

My point has little to do with him being the director of MIRI per se.

I suppose I could be wrong about this, but my impression is that Nate Soares is among the top 10 most talented/insightful people with elaborate inside view and years of research experience in AI alignment. He also seems to agree with Yudkowsky on a whole lot of issues and predicts about the same p(doom) for about the same reasons. And I feel that many people don't give enough thought to the fact that while e.g. Paul Christiano has interacted a lot with Yudkowsky and disagreed with him on many key issues (while agreeing on many others), there's also Nate Soares, who broadly agrees with Yudkowsky's models that predict very high p(doom).

Another, more minor point: if someone is bringing up Yudkowsky's track record in the context of his extreme views on AI risk, it seems helpful to talk about Soares' track record as well.

Guy Raveh

I think this maybe argues against a point not made in the OP. Garfinkel isn't saying "disregard Yudkowsky's views" - rather he's saying "don't give them extra weight just because Yudkowsky's the one saying them".

For example, from his reply to Richard Ngo:

I think it's really important to seperate out the question "Is Yudkowsky an unusually innovative thinker?" and the question "Is Yudkowsky someone whose credences you should give an unusual amount of weight to?"

I read your comment as arguing for the former, which I don't disagree with. But that doesn't mean that people should currently weigh his risk estimates more highly than they weigh the estimates of other researchers currently in the space

So at least from Garfinkel's perspective, Yudkowsky and Soares do count as data points, they're just equal in weight to other relevant data points.

(I'm not expressing any of my own, mostly unformed, views here)

RobBensinger

So at least from Garfinkel's perspective, Yudkowsky and Soares do count as data points, they're just equal in weight to other relevant data points.

Ben has said this about Eliezer, but not about Nate, AFAIK.

David Mathers🔸

'Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities.'

I think some of this is just a result of being a community founded partly by analytic philosophers. (though as a philosopher I would say that!).

I think it's normal to encounter some of these ideas in undergrad philosophy programs. At my undergrad back in 2005-09 there was a whole upper-level undergraduate course in decision theory. I don't think that's true everywhere all the time, but I'd be surprised if it was wildly unusual. I can't remember if we covered population ethics in any class, but I do remember discovering Parfit on the Repugnant Conclusion in 2nd-year of undergrad because one of my ethics lecturers said Reasons and Persons was a super-important book. In terms of the Oxford phil scene where the term "effective altruism" was born, the main titled professorship in ethics at that time was held by John Broome, a utilitarianism-sympathetic former economist, who had written famous stuff on expected utility theory. I can't remember if he was the PhD supervisor of anyone important to the founding of EA, but I'd be astounded if some of the phil. people involved in that had not been reading his stuff and talking to him about it. Most of the phil. physics people at Oxford were gung-ho for many worlds, it's not a fringe view in philosophy of physics as far as I know. (Though I think Oxford was kind of a centre for it and there was more dissent elsewhere.) As far as I can tell, Bayesian epistemology in at least some senses of that term is a fairly well-known approach in philosophy of science. Philosophers specializing in epistemology might more often ignore it, but they know it's there. And not all of them ignore it! I'm not an epistemologist, by my doctoral supervisor was, and it's not unusual for his work to refer to Bayesian ideas in modelling stuff about how to evaluate evidence. (I.e. in uhm, defending the fine-tuning argument for the existence of God, which might not be the best use, but still!: https://www.yoaavisaacs.com/uploads/6/9/2/0/69204575/ms_for_fine-tuning_fine-tuning.pdf). (John was my supervisor, not Yoav.)

A high interest in bias stuff might genuinely be more an Eliezer/LessWrong legacy though.

Pablo

the main titled professorship in ethics at that time was held by John Broome, a utilitarianism-sympathetic former economist, who had written famous stuff on expected utility theory. I can't remember if he was the PhD supervisor of anyone important to the founding of EA, but I'd be astounded if some of the phil. people involved in that had not been reading his stuff and talking to him about it.

Indeed, Broome co-supervised the doctoral theses of both Toby Ord and Will MacAskill. And Broome was, in fact, the person who advised Will to get in touch with Toby, before the two had met.

Linch

Speaking for myself, I was interested in a lot of the same things in the LW cluster (Bayes, approaches to uncertainty, human biases, utilitarianism, philosophy, avoiding the news) before I came across LessWrong or EA. The feeling is much more like "I found people who can describe these ideas well" than "oh these are interesting and novel ideas to me." (I had the same realization when I learned about utilitarianism...much more of a feeling that "this is the articulation of clearly correct ideas, believing otherwise seems dumb").

That said, some of the ideas on LW that seemed more original to me (AI risk, logical decision theory stuff, heroic responsibility in an inadequate world), do seem both substantively true and extremely important, and it took me a lot of time to be convinced of this.

(There are also other ideas that I'm less sure about, like cryonics and MW).

Guy Raveh

Veering entirely off-topic here, but how does the many worlds hypothesis tie in with all the rest of the rationality/EA stuff?

Yonatan Cale

[replying only to you with no context]

EY pointed out the many worlds hypothesis as a thing that even modern science, specifically physics (which is considered a very well functioning science, it's not like social psychology), is missing.

And he used this as an example to get people to stop trusting authority, including modern science, which many people around him seem to trust.

I think this is a reasonable reference.

Guy Raveh

Can't say any of that makes sense to me. I have the feeling there's some context I'm totally missing (or he's just wrong about it). I may ask you about this in person at some point :)

[anonymous]

Edit: I think this came off more negatively than I intended it to, particularly about Yudkowsky's understanding of physics. The main point I was trying to make is that Yudkowsky was overconfident, not that his underlying position was wrong. See the replies for more clarification.

I think there's another relevant (and negative) data point when discussing Yudkowsky's track record: his argument and belief that the Many-Worlds Interpretation of quantum mechanics is the only viable interpretation of quantum mechanics, and anyone who doesn't agree is essentially a moron. Here's one 2008 link from the Sequences where he expresses this position^[1]; there are probably many other places where he's said similar things. (To be clear, I don’t know if he still holds this belief, and if he doesn’t anymore, when and why he updated away from it.)

Many Worlds is definitely a viable and even leading interpretation, and may well be correct. But Yudkowsky's confidence in Many Worlds, as well as his conviction that people who disagree with him are making elementary mistakes, is more than a little disproportionate, and may come partly from a lack of knowledge and expertise.

The above is a paraphrase of Scott Aaronson, a credible authority on quantum mechanics who is sympathetic to both Yudkowsky and Many Worlds (bold added):

I think Yudkowsky's central argument---basically, that anyone who rejects [Many Worlds] needs to have their head examined---is to put it mildly, a bit overstated. :) I'll resist the temptation to elaborate, since this is really a discussion for another thread.
In several posts, Yudkowsky gives indications that he doesn't really understand the concept of mixed states. (For example, he writes about the No-Communication Theorem as something complicated and mysterious, which it's not from a density-matrix perspective.) As I see it, this might be part of the reason why Yudkowsky sees anything besides Many-Worlds as insanity, and can't understand what (besides sheep-like conformity) would drive any knowledgeable physicist to any other point of view. If I didn't know that in real life, people pretty much never encounter pure states, but only more general objects that (to paraphrase Jaynes) scramble together "subjective" probabilities and "objective" amplitudes into a single omelette, the view that quantum states are "states of knowledge" that "live in the mind, not in the world" would probably also strike me as meaningless nonsense.

While this isn't directly related to AI risk, I think it's relevant to Yudkowsky's track record as a public intellectual.

^{^}
He expresses this in the last six paragraphs of the post. I'm excerpting some of it (bold added, italics were present in the original):

Many-worlds is an obvious fact, if you have all your marbles lined up correctly (understand very basic quantum physics, know the formal probability theory of Occam’s Razor, understand Special Relativity, etc.) It is in fact considerably more obvious to me than the proposition that spinning black holes should obey conservation of angular momentum.
...
So let me state then, very clearly, on behalf of any and all physicists out there who dare not say it themselves: Many-worlds wins outright given our current state of evidence. There is no more reason to postulate a single Earth, than there is to postulate that two colliding top quarks would decay in a way that violates Conservation of Energy. It takes more than an unknown fundamental law; it takes magic.
The debate should already be over. It should have been over fifty years ago. The state of evidence is too lopsided to justify further argument. There is no balance in this issue. There is no rational controversy to teach. The laws of probability theory are laws, not suggestions; there is no flexibility in the best guess given this evidence. Our children will look back at the fact that we were still arguing about this in the early twenty-first century, and correctly deduce that we were nuts.
We have embarrassed our Earth long enough by failing to see the obvious. So for the honor of my Earth, I write as if the existence of many-worlds were an established fact, because it is. The only question now is how long it will take for the people of this world to update.

Steven Byrnes

OTOH, I am (or I guess was?) a professional physicist, and when I read Rationality A-Z, I found that Yudkowsky was always reaching exactly the same conclusions as me whenever he talked about physics, including areas where (IMO) the physics literature itself is a mess—not only interpretations of QM, but also how to think about entropy & the 2nd law of thermodynamics, and, umm, I thought there was a third thing too but I forget.

That increased my respect for him quite a bit.

And who the heck am I? Granted, I can’t out-credential Scott Aaronson in QM. But FWIW, hmm let’s see, I had the highest physics GPA in my Harvard undergrad class and got the highest preliminary-exam score in my UC Berkeley physics grad school class, and I’ve played a major role in designing I think 5 different atomic interferometers (including an atomic clock) for various different applications, and in particular I was always in charge of all the QM calculations related to estimating their performance, and also I once did a semester-long (unpublished) research project on quantum computing with superconducting qubits, and also I have made lots of neat wikipedia QM diagrams and explanations including a pedagogical introduction to density matrices and mixed states.

I don’t recall feeling strongly that literally every word Yudkowsky wrote about physics was correct, more like “he basically figured out the right idea, despite not being a physicist, even in areas where physicists who are devoting their career to that particular topic are all over the place”. In particular, I don’t remember exactly what Yudkowsky wrote about the no-communication theorem. But I for one absolutely understand mixed states, and that doesn’t prevent me from being a pro-MWI extremist like Yudkowsky.

[anonymous]

I agree that: Yudkowsky has an impressive understanding of physics for a layman, in some situations his understanding is on par with or exceeds some experts, and he has written explanations of technical topics that even some experts like and find impressive. This includes not just you, but also e.g. Scott Aaronson, who praised his series on QM in the same answer I excerpted above, calling it entertaining, enjoyable, and getting the technical stuff mostly right. He also praised it for its conceptual goals. I don't believe this is faint praise, especially given stereotypes of amateurs writing about physics. This is a positive part of Yudkowsky's track record. I think my comment sounds more negative about Yudkowsky's QM sequence than it deserves, so thanks for pushing back on that.

I'm not sure what you mean when you call yourself a pro-MWI extremist but in any case AFAIK there are physicists, including one or more prominent ones, who think MWI is really the only explanation that makes sense, although there are obviously degrees in how fervently one can hold this position and Yudkowsky seems at the extreme end of the scale in some of his writings. And he is far from the only one who thinks Copenhagen is ridiculous. These two parts of Yudkowsky's position on MWI are not without parallel within professional physicists, and the point about Copenhagen being ridiculous is probably a point in his favor from most views (e.g. Nobel laureate Murray Gell-Mann said that Neils Bohr brainwashed people into Copenhagen), let alone this community. Perhaps I should have clarified this in my comment, although I did say that MWI is a leading interpretation and may well be correct.

The negative aspects I said in my comment were:

Yudkowsky's confidence in MWI is disproportionate
Yudkowsky's conviction that people who disagree with him are making elementary mistakes is disproportionate
These may come partly from a lack of knowledge or expertise

Maybe (3) is a little unfair, or sounds harsher than I meant it. It's a bit unclear to me how seriously to take Aaronson's quote. It seems like plenty of physicists have looked through the sequences to find glaring flaws, and basically found none (physics stackexchange). This is a nontrivial achievement in context. At the same time I expect most of the scrutiny has been to a relatively shallow level, partly because Yudkowsky is a polarizing writer. Aaronson is probably one of fairly few people who have deep technical expertise and have read the sequences with both enjoyment and a critical eye. Aaronson suggested a specific, technical flaw that may be partly responsible for Yudkowsky holding an extreme position with overconfidence and misunderstanding what people who disagree with him think. Probably this is a flaw Yudkowsky would not have made if he had worked with a professional physicist or something. But maybe Aaronson was just casually speculating and maybe this doesn't matter too much. I don't know. Possibly you are right to push back on the mixed states explanation.

I think (1) and (2) are well worth considering though. The argument here is not that his position is necessarily wrong or impossible, but that it is overconfident. I am not courageous enough to argue for this position to a physicist who holds some kind of extreme pro-MWI view, but I think this is a reasonable view and there's a good chance (1) and (2) are correct. It also fits in Ben's point 4 in the comment above: "Yudkowsky’s track record suggests a substantial bias toward dramatic and overconfident predictions."

Steven Byrnes

Hmm, I’m a bit confused where you’re coming from.

Suppose that the majority of eminent mathematicians believe 5+5=10, but a significant minority believes 5+5=11. Also, out of the people in the 5+5=10 camp, some say “5+5=10 and anyone who says otherwise is just totally wrong”, whereas other people said “I happen to believe that the balance of evidence is that 5+5=10, but my esteemed colleagues are reasonable people and have come to a different conclusion, so we 5+5=10 advocates should approach the issue with appropriate humility, not overconfidence.”

In this case, the fact of the matter is that 5+5=10. So in terms of who gets the most credit added to their track-record, the ranking is:

1st place: The ones who say “5+5=10 and anyone who says otherwise is just totally wrong”,
2nd place: The ones who say “I think 5+5=10, but one should be humble, not overconfident”,
3rd place: The ones who say “I think 5+5=11, but one should be humble, not overconfident”,
Last place: The ones who say “5+5=11 and anyone who says otherwise is just totally wrong.

Agree so far?

(See also: Bayes’s theorem, Brier score, etc.)

Back to the issue here. Yudkowsky is claiming “MWI, and anyone who says otherwise is a just totally wrong”. (And I agree—that’s what I meant when I called myself a pro-MWI extremist.)

IF the fact of the matter is that careful thought shows MWI to be unambiguously correct, then Yudkowsky (and I) get more credit for being more confident. Basically, he’s going all in and betting his reputation on MWI being right, and (in this scenario) he won the bet.

Conversely, IF the fact of the matter is that careful thought shows MWI to be not unambiguously correct, then Eliezer loses the maximum number of points. He staked his reputation on MWI being right, and (in this scenario) he lost the bet.

So that’s my model, and in my model “overconfidence” per se is not really a thing in this context. Instead we first have to take a stand on the object-level controversy. I happen to agree with Eliezer that careful thought shows MWI to be unambiguously correct, and given that, the more extreme his confidence in this (IMO correct) claim, the more credit he deserves.

I’m trying to make sense of why you’re bringing up “overconfidence” here. The only thing I can think of is that you think that maybe there is simply not enough information to figure out whether MWI is right or wrong (not even for even an ideal reasoner with a brain the size of Jupiter and a billion years to ponder the topic), and therefore saying “MWI is unambiguously correct” is “overconfident”? If that’s what you’re thinking, then my reply is: if “not enough information” were the actual fact of the matter about MWI, then we should criticize Yudkowsky first and foremost for being wrong, not for being overconfident.

As for your point (2), I forget what mistakes Yudkowsky claimed that anti-MWI-advocates are making, and in particular whether he thought those mistakes were “elementary”. I am open-minded to the possibility that Yudkowsky was straw-manning the MWI critics, and that they are wrong for more interesting and subtle reasons than he gives them credit for, and in particular that he wouldn’t pass an anti-MWI ITT. (For my part, I’ve tried harder, see e.g. here.) But that’s a different topic. FWIW I don’t think of Yudkowsky as having a strong ability to explain people’s wrong opinions in a sympathetic and ITT-passing way, or if he does have that ability, then I find that he chooses not to exercise it too much in his writings. :-P

RobBensinger

I happen to agree with Eliezer that careful thought shows MWI to be unambiguously correct, and given that, the more extreme his confidence in this (IMO correct) claim, the more credit he deserves.

'The more probability someone assigns to a claim, the more credit they get when the claim turns out to be true' is true as a matter of Bayesian math. And I agree with you that MWI is true, and that we have enough evidence to say it's true with very high confidence, if by 'MWI' we just mean a conjunction like "Objective collapse is false." and "Quantum non-realism is false / the entire complex amplitude is in some important sense real".

(I think Eliezer had a conjunction like this in mind when he talked about 'MWI' in the Sequences; he wasn't claiming that decoherence explains the Born rule, and he certainly wasn't claiming that we need to reify 'worlds' as a fundamental thing. I think a better term for MWI might be the 'Much World Interpretation', since the basic point is about how much stuff there is, not about a division of that stuff into discrete 'worlds'.)

That said, I have no objection in principle to someone saying 'Eliezer was right about MWI (and gets more points insofar as he was correct), but I also dock him more points than he gained because I think he was massively overconfident'.

E.g., imagine someone who assigns probability 1 (or probability .999999999) to a coin flip coming up heads. If the coin then comes up heads, then I'm going to either assume they were trolling me, or I'm going to infer that they're very bad at reasoning. Even if they somehow rigged the coin, .999999999 is just too extreme a probability to be justified here.

By the same logic, if Eliezer had said that MWI is true with probability 1, or if he'd put too many '9s' at the end of his .99... probability assignment, then I'd probably dock him more points than he gained for being object-level-correct. (Or I'd at least assume he has a terrible understanding of how Bayesian probability works. Someone could indeed be very miscalibrated and bad at talking in probabilistic terms, and yet be very knowledgeable and correct on object-level questions like MWI.)

I'm not sure exactly how many 9s is too many in the case of MWI, but it's obviously possible to have too many 9s here. E.g., a hundred 9s would be too many! So I think this objection can make sense; I just don't think Eliezer is in fact overconfident about MWI.

Steven Byrnes

Fair enough, thanks.

[anonymous]

I’m trying to make sense of why you’re bringing up “overconfidence” here. The only thing I can think of is that you think that maybe there is simply not enough information to figure out whether MWI is right or wrong (not even for even an ideal reasoner with a brain the size of Jupiter and a billion years to ponder the topic), and therefore saying “MWI is unambiguously correct” is “overconfident”?

Here's my point: There is a rational limit to the amount of confidence one can have in MWI (or any belief). I don't know where exactly this limit is for MWI-extremism but Yudkowsky clearly exceeded it sometimes. To use made up numbers, suppose:

MWI is objectively correct
Eliezer says P(MWI is correct) = 0.9999999
But rationally one can only reach P(MWI) = 0.999
- Because there are remaining uncertainties that cannot be eliminated through superior thinking and careful consideration, such lack of experimental evidence, the possibility of QM getting overturned, the possibility of a new and better interpretation in the future, and unknown unknowns.
- These factors add up to at least P(Not MWI) = 0.001.

Then even though Eliezer is correct about MWI being correct, he is still significantly overconfident in his belief about it.

Consider Paul's example of Eliezer saying MWI is comparable to heliocentrism:

If we are deeply wrong about physics, then I [Paul Christiano] think this could go either way. And it still seems quite plausible that we are deeply wrong about physics in one way or another (even if not in any particular way). So I think it's wrong to compare many-worlds to heliocentrism (as Eliezer has done). Heliocentrism is extraordinarily likely even if we are completely wrong about physics---direct observation of the solar system really is a much stronger form of evidence than a priori reasoning about the existence of other worlds.

I agree with Paul here. Heliocentrism is vastly more likely than any particular interpretation of quantum mechanics, and Eliezer was wrong to have made this comparison.

This may sound like I'm nitpicking, but I think it fits into a pattern of Eliezer making dramatic and overconfident pronouncements, and it's relevant information for people to consider e.g. when evaluating Eliezer's belief that p(doom) = ~1 and the AI safety situation is so hopeless that the only thing left is to die with slightly more dignity.

Of course, it's far from the only relevant data point.

Regarding (2), I think we're on the same page haha.

RobBensinger

Could someone point to the actual quotes where Eliezer compares heliocentrism to MWI? I don't generally assume that when people are 'comparing' two very-high-probability things, they're saying they have the same probability. Among other things, I'd want confirmation that 'Eliezer and Paul assign roughly the same probability to MWI, but they have different probability thresholds for comparing things to heliocentrism' is false.

E.g., if I compare Flat Earther beliefs, beliefs in psychic powers, belief 'AGI was secretly invented in the year 2000', geocentrism, homeopathy, and theism to each other, it doesn't follow that I'd assign the same probabilities to all of those six claims, or even probabilities that are within six orders of magnitude of each other.

In some contexts it might indeed Griceanly imply that all six of those things pass my threshold for 'unlikely enough that I'm happy to call them all laughably silly views', but different people have their threshold for that kind of thing in different places.

Steven Byrnes

Gotcha, thanks. I guess we have an object-level disagreement: I think that careful thought reveals MWI to be unambiguously correct, with enough 9’s as to justify Eliezer’s tone. And you don’t. ¯\_(ツ)_/¯

(Of course, this is bound to be a judgment call; e.g. Eliezer didn’t state how many 9’s of confidence he has. It’s not like there’s a universal convention for how many 9’s are enough 9’s to state something as a fact without hedging, or how many 9’s are enough 9’s to mock the people who disagree with you.)

[anonymous]

(Of course, this is bound to be a judgment call; e.g. Eliezer didn’t state how many 9’s of confidence he has. It’s not like there’s a universal convention for how many 9’s are enough 9’s to state something as a fact without hedging, or how many 9’s are enough 9’s to mock the people who disagree with you.)

Yes, agreed.

Let me lay out my thinking in more detail. I mean this to explain my views in more detail, not as an attempt to persuade.

Paul's account of Aaronson's view says that Eliezer shouldn't be as confident in MWI as he is, which in words sounds exactly like my point, and similar to Aaronson's stack exchange answer. But it still leaves open the question of how overconfident he was, and what, if anything, should be taken away from this. It's possible that there's a version of my point which is true but is also uninteresting or trivial (who cares if Yudkowsky was 10% too confident about MWI 15 years ago?).

And it's worth reiterating that a lot of people give Eliezer credit for his writing on QM, including for being forceful in his views. I have no desire to argue against this. I had hoped to sidestep discussing this entirely since I consider it to be a separate point, but perhaps this was unfair and led to miscommunication. If someone wants to write a detailed comment/post explaining why Yudkowsky deserves a lot of credit for his QM writing, including credit for how forceful he was at times, I would be happy to read it and would likely upvote/strong upvote it depending on quality.

However, here my intention was to focus on the overconfidence aspect.

I'll explain what I see as the epistemic mistakes Eliezer likely made to end up in an overconfident state. Why do I think Eliezer was overconfident on MWI?

(Some of the following may be wrong.)

He didn't understand non-MWI-extremist views, which should have rationally limited his confidence
- I don't have sources for this, but I think something like this is true.
- This was an avoidable mistake
- Worth noting that Eliezer has updated towards the competence of elites in science since some of his early writing according to Rob's comment elsewhere this thread
It's possible that his technical understanding was uneven. This should also have limited his confidence.
- Aaronson praised him for "actually get most of the technical stuff right", which of course implies that not everything technical was correct.
- He also suggested a specific, technical flaw in Yudkowsky's understanding.
- One big problem with having extreme conclusions based on uneven technical understanding is that you don't know what you don't know. And in fact Aaronson suggests a mistake Yudkowsky seems unaware of as a reason why Yudkowsky's central argument is overstated/why Yudkowsky is overconfident about MWI.
- However, it's unclear how true/important a point this really is
At least 4 points limit confidence in P(MWI) to some degree:
- Lack of experimental evidence
- The possibility of QM getting overturned
- The possibility of a new and better interpretation in the future
- Unknown unknowns
- I believe most or all of these are valid, commonly brought up points that together limit how confident anyone can be in P(MWI). Reasonable people may disagree with their weighting of course.
- I am skeptical that Eliezer correctly accounted for these factors

Note that these are all points about the epistemic position Eliezer was in, not about the correctness of MWI. The first two are particular to him, and the last one applies to everyone.

Now, Rob points out that maybe the heliocentrism example is lacking context in some way (I find it a very compelling example of a super overconfident mistake if it's not). Personally I think there are at least a couple^[1] ^[2] of places in the sequences where Yudkowsky clearly says something that I think indicates ridiculous overconfidence tied to epistemic mistakes, but to be honest I'm not excited to argue about whether some of his language 15 years ago was or wasn't overzealous.

The reason I brought this up despite it being a pretty minor point is because I think it's part of a general pattern of Eliezer being overconfident in his views and overstating them. I am curious how much people actually disagree with this.

Of course, whether Eliezer has a tendency to be overconfident and overstate his views is only one small data point among very many others in evaluating p(doom), the value of listening to Eliezer's views, etc.

^{^}
"Many-worlds is an obvious fact, if you have all your marbles lined up correctly (understand very basic quantum physics, know the formal probability theory of Occam’s Razor, understand Special Relativity, etc.)"
^{^}
"The only question now is how long it will take for the people of this world to update." Both quotes from https://www.lesswrong.com/s/Kqs6GR7F5xziuSyGZ/p/S8ysHqeRGuySPttrS

Steven Byrnes

For what it's worth, consider the claim “The Judeo-Christian God, the one who listens to prayers and so on, doesn't exist.” I have such high confidence in this claim that I would absolutely state it as a fact without hedging, and psychoanalyze people for how they came to disagree with me. Yet there's a massive theology literature arguing to the contrary of that claim, including by some very smart and thoughtful people, and I've read essentially none of this theology literature, and if you asked me to do an anti-atheism ITT I would flunk it catastrophically.

I'm not sure what lesson you'll take from that; for all I know you yourself are very religious, and this anecdote will convince you that I have terrible judgment. But if you happen to be on the same page as me, then maybe this would be an illustration of the fact that (I claim) one can rationally and correctly arrive at extremely-confident beliefs without it needing to pass through a deep understanding and engagement with the perspectives of the people who disagree with you.

I agree that this isn’t too important a conversation, it’s just kinda interesting. :)

Paul_Christiano

I'm not sure either of the quotes you cited by Eliezer require or suggest ridiculous overconfidence.

If I've seen some photos of a tiger in town, and I know a bunch of people in town who got eaten by an animal, and we've all seen some apparent tiger-prints near where people got eaten, I may well say "it's obvious there is a tiger in town eating people." If people used to think it was a bear, but that belief was formed based on priors when we didn't yet have any hard evidence about the tiger, I may be frustrated with people who haven't yet updated. I may say "The only question is how quickly people's views shift from bear to tiger. Those who haven't already shifted seem like they are systematically slow on the draw and we should learn from their mistakes." I don't think any of those statements imply I think there's a 99.9% chance that it's a tiger. It's more a statement rejecting the reasons why people think there is a bear, and disagreeing with those reasons, and expecting their views to predictably change over time. But I could say all that while still acknowledging some chance that the tiger is a hoax, that there is a new species of animal that's kind of like a tiger, that the animal we saw in photos is different from the one that's eating people, or whatever else. The exact smallness of the probability of "actually it wasn't the tiger after all" is not central to my claim that it's obvious or that people will come around.

I don't think it's central to this point, but I think 99% is a defensible estimate for many-worlds. I would probably go somewhat lower but certainly wouldn't run victory laps about that or treat it as damning of someone's character. The above is mostly a bad analogy explaining why I think it's pretty reasonable to say things like Eliezer did even if your all-things-considered confidence was 99% or even lower.

To get a sense for what Eliezer finds frustrating and intends to critique, you can read If many-worlds had come first (which I find quite obnoxious). I think to the extent that he's wrong it's generally by mischaracterizing the alternative position and being obnoxious about it (e.g. misunderstanding the extent to which collapse is proposed as ontologically fundamental rather than an expression of agnosticism or a framework for talking about experiments, and by slightly misunderstanding what "ontologically fundamental collapse" would actually mean). I don't think it has much to do with overconfidence directly, or speaks to the quality of Eliezer's reasoning about the physical world, though I think it is a bad recurring theme in Eliezer's reasoning about and relationships with other humans. And in fairness I do think there are a lot of people who probably deserve Eliezer's frustration on this point (e.g. who talk about how collapse is an important and poorly-understood phenomenon rather than most likely just being the most boring thing) though I mostly haven't talked with them and I think they are systematically more mediocre physicists.

TAG

"Maybe (3) is a little unfair, or sounds harsher than I meant it. It's a bit unclear to me how seriously to take Aaronson's quote. It seems like plenty of physicists have looked through the sequences to find glaring flaws, and basically found none (physics stackexchange). T"

Here's a couple: he conflates Copenhagen and Objective collapse throughout.

He fails to distinguish Everettian and Decoherence based MWI.

Paul_Christiano

This doesn't feel like a track record claim to me. Nothing has changed since Eliezer wrote that; it reads as reasonably now as it did then; and we have nothing objective against which to evaluate it.

I broadly agree with Eliezer that (i) collapse seems unlikely, (ii) if the world is governed by QM as we understand it, the whole state is probably as "real" as we are, (iii) there seems to be nothing to favor the alternative interpretations other than those that make fewer claims and are therefore more robust to unknown-unknowns. So if anything I'd be inclined to give him a bit of credit on this one, given that it seems to have held up fine for readers who know much more about quantum mechanics than he did when writing the sequence.

The main way the sequence felt misleading was by moderately overstating how contrarian this take was. For example, near the end of my PhD I was talking with Scott Aaronson and my advisor Umesh Vazirani, who I considered not-very-sympathetic to many worlds. When asked why, my recollection of his objection was "What are these 'worlds' that people are talking about? There's just the state." That is, the whole issue turned on a (reasonable) semantic objection.

However, I do think Eliezer is right that in some parts of physics collapse is still taken very seriously and there are more-than-semantic disagreements. For example, I was pretty surprised by David Griffiths' discussion of collapse in the afterword of his textbook (pdf) during undergrad. I think that Eliezer is probably right that some of these are coming from a pretty confused place. I think the actual situation with respect to consensus is a bit muddled, and e.g. I would be fairly surprised if Eliezer was able to make a better prediction about the result of any possible experiment than the physics community based on his confidence in many-worlds. But I also think that a naive-Paul perspective of "no way anyone is as confused as Eliezer is saying" would have been equally-unreasonable.

I agree that Eliezer is overconfident about the existence of the part of the wavefunction we never see. If we are deeply wrong about physics, then I think this could go either way. And it still seems quite plausible that we are deeply wrong about physics in one way or another (even if not in any particular way). So I think it's wrong to compare many-worlds to heliocentrism (as Eliezer has done). Heliocentrism is extraordinarily likely even if we are completely wrong about physics---direct observation of the solar system really is a much stronger form of evidence than a priori reasoning about the existence of other worlds. Similarly, I think it's wrong to compare many-worlds to a particular arbitrary violation of conservation of energy when top quarks collide, rather than something more like "there is a subtle way in which our thinking about conservation of energy is mistaken and the concept either doesn't apply or is only approximately true." (It sounds reasonable to compare it to the claim that spinning black holes obey conservation of angular momentum, at least if you don't yet made any astronomical observations that back up that claim.)

My understanding is this is the basic substance of Eliezer's disagreement with Scott Aaronson. My vague understanding of Scott's view (from one conversation with Scott and Eliezer about this ~10 years ago) is roughly "Many worlds is a strong prediction of our existing theories which is intuitively wild and mostly-experimentally-unconfirmed. Probably true, and would be ~the most interesting physics result ever if false, but still seems good to test and you shouldn't be as confident as you are about heliocentrism."

[anonymous]

When I said it was relevant to his track record as a public intellectual, I was referring to his tendency to make dramatic and overconfident pronouncements (which Ben mentioned in the parent comment). I wasn't intending to imply that the debate around QM had been settled or that new information had come out. I do think that even at the time Eliezer's positions on both MWI and why people disagreed with him on it were overconfident though.

I think you're right that my comment gave too little credit to Eliezer, and possibly misleadingly implied that Eliezer is the only one who holds some kind of extreme MWI or anti-collapse view or that such views are not or cannot be reasonable (especially anti-collapse). I said that MWI is a leading candidate but that's still probably underselling how many super pro-MWI positions there are. I expanded on this in another comment.

Your story of Eliezer comparing MWI to heliocentrism is a central example of what I'm talking about. It is not that his underlying position is wrong or even unlikely, but that he is significantly overconfident.

I think this is relevant information for people trying to understand Eliezer's recent writings.

To be clear, I don't think it's a particularly important example, and there is a lot of other more important information than whether Eliezer overestimated the case for MWI to some degree while also displaying impressive understanding of physics and possibly/probably being right about MWI.

Habryka [Deactivated]

It seems that half of these examples are from 15+ years ago, from a period for which Eliezer has explicitly disavowed his opinions (and the ones that are not strike me as most likely correct, like treating coherence arguments as forceful and that AI progress is likely to be discontinuous and localized and to require relatively little compute).

Let's go example-by-example:

1. Predicting near-term extinction from nanotech

This critique strikes me as about as sensible as digging up someone's old high-school essays and critiquing their stance on communism or the criminal justice system. I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old. I am confident I can find crazier and worse opinions for every single leadership figure in Effective Altruism, if I am willing to go back to what they thought while they were in high-school. To give some character, here are some things I believed in my early high-school years:

The economy was going to collapse because the U.S. was establishing a global surveillance state
Nuclear power plants are extremely dangerous and any one of them is quite likely to explode in a given year
We could have easily automated the creation of all art, except for the existence of a vaguely defined social movement that tries to preserve the humanity of art-creation

These are dumb opinions. I am not ashamed of having had them. I was young and trying to orient in the world. I am confident other commenters can add their own opinions they had when they were in high-school. The only thing that makes it possible for someone to critique Eliezer on these opinions is that he was virtuous and wrote them down, sometimes in surprisingly well-argued ways.

If someone were to dig up an old high-school essay of mine, in-particular one that has at the top written "THIS IS NOT ENDORSED BY ME, THIS IS A DUMB OPINION", and used it to argue that I am wrong about important cause prioritization questions, I would feel deeply frustrated and confused.

For context, on Eliezer's personal website it says:

My parents were early adopters, and I’ve been online since a rather young age. You should regard anything from 2001 or earlier as having been written by a different person who also happens to be named “Eliezer Yudkowsky”. I do not share his opinions.

2. Predicting that his team had a substantial chance of building AGI before 2010

Given that this is only 2 years later, all my same comments apply. But let's also talk a bit about the object-level here.

This is the quote on which this critique is based:

Our best guess for the timescale is that our final-stage AI will reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010. As always with basic research, this is only a guess, and heavily contingent on funding levels.

This... is not a very confident prediction. This paragraph literally says "only a guess". I agree, if Eliezer said this today, I would definitely dock him some points, but this is again a freshman-aged Eliezer, and it was more than 20 years ago.

But also, I don't know, predicting AGI by 2020 from the year 2000 doesn't sound that crazy. If we didn't have a whole AI winter, if Moore's law had accelerated a bit instead of slowed down, if more talent had flowed into AI and chip-development, 2020 doesn't seem implausible to me. I think it's still on the aggressive side, given what we know now, but technological forecasting is hard, and the above sounds more like a 70% confidence interval instead of a 90% confidence interval.

3. Having high confidence that AI progress would be extremely discontinuous and localized and not require much compute

This opinion strikes me as approximately correct. I still expect highly discontinuous progress, and many other people have argued for this as well. Your analysis that the world looks more like Hanson's world described in the AI foom debate also strikes me as wrong (and e.g. Paul Christiano has also said that Hanson's predictions looked particularly bad in the FOOM debate. EDIT: I think this was worded too strong, and while Paul had some disagreements with Robin, on the particular dimension of discontinuity and competitiveness, Paul thinks Robin came away looking better than Eliezer). Indeed, I would dock Hanson many more points in that discussion (though, overall, I give both of them a ton of points, since they both recognized the importance of AI-like technologies early, and performed vastly above baseline for technological forecasting, which again, is extremely hard).

This seems unlikely to be the right place for a full argument on discontinuous progress. However, continuous takeoff is very far from consensus in the AI Alignment field, and this post seems to try to paint it as such, which seems pretty bad to me (especially if it's used in a list with two clearly wrong things, without disclaiming it as such).

4. Treating early AI risk arguments as close to decisive

You say:

My point, here, is not necessarily that Yudkowsky was wrong, but rather that he held a much higher credence in existential risk from AI than his arguments justified at the time. The arguments had pretty crucial gaps that still needed to be resolved^[14], but, I believe, his public writing tended to suggest that these arguments were tight and sufficient to justify very high credences in doom.

I think the arguments are pretty tight and sufficient to establish the basic risk argument. I found your critique relatively uncompelling. In particular, I think you are misrepresenting that a premise of the original arguments was a fast takeoff. I can't currently remember any writing that said it was a necessary component of the AI risk arguments that takeoff happens fast, or at least whether the distinction between "AI vastly exceeds human intelligence in 1 week vs 4 years" is that crucial to the overall argument, which is as far as I can tell the range that most current opinions in the AI Alignment field falls into (and importantly, I know of almost no one who believes that it could take 20+ years for AI to go from mildly subhuman to vastly superhuman, which does feel like it could maybe change the playing field, but also seems to be a very rarely held opinion).

Indeed, I think Eliezer was probably underconfident in doom from AI, since I currently assign >50% probability to AI Doom, as do many other people in the AI Alignment field.

See also Nate's recent comment on some similar critiques to this: https://www.lesswrong.com/posts/8NKu9WES7KeKRWEKK/why-all-the-fuss-about-recursive-self-improvement

5. Treating "coherence arguments" as forceful

Coherence arguments do indeed strike me as one of the central valid arguments in favor of AI Risk. I think there was a common misunderstanding that did confuse some people, but that misunderstanding was not argued for by Eliezer or other people at MIRI, as far as I can tell (and I've looked into this for 5+ hours as part of discussions with Rohin and Richard).

The central core of coherence arguments, which are based in arguments of competetiveness and economic efficiency strike me as very strong, robustly argued for, and one of the main reasons for why AI Risk will be dangerous. The Neumann-Morgensterm theorem does play a role here, though it's definitely not sufficient to establish a strong case, and Rohin and Richard have successfully argued against that, though I don't think Eliezer has historically argued that the Neumann-Morgenstern theorem is sufficient to establish an AI-alignment relevant argument on its own (though Dutch-book style arguments are very suggestive for the real structure of the argument).

Edit: Rohin says something similar in a separate comment reply.

6. Not acknowledging his mixed track record

Given my disagreements with the above, I think doing so would be a mistake. But even without that, let's look at the merits of this critique.

For the two "clear cut" examples, Eliezer has posted dozens of times on the internet that he has disendorsed his views from before 2002. This is present on his personal website, the relevant articles are no longer prominently linked anywhere, and Eliezer has openly and straightforwardly acknowledged that his predictions and beliefs from the relevant period were wrong.

For the disputed examples, Eliezer still believes all of these arguments (as do I), so it would be disingenuous for Eliezer to "acknowledge his mixed track record" in this domain. You can either argue that he is wrong, or you can argue that he hasn't acknowledged that he has changed his mind and was previously wrong, but you can't both argue that Eliezer is currently wrong in his beliefs, and accuse him of not telling others that he is wrong. I want people to say things they believe. And for the only two cases where you have established that Eliezer has changed his mind, he has extensively acknowledged his track record.

Some comments on the overall post:

I really dislike this post. I think it provides very little argument, and engages in extremely extensive cherry-picking in a way that does not produce a symmetric credit-allocation (i.e. most people who are likely to update downwards on Yudkowsky on the basis of this post, seem to me to be generically too trusting, and I am confident I can write a more compelling post about any other central figure in Effective Altruism that would likely cause you to update downwards even more).

I think a good and useful framing on this post could have been "here are 3 points where I disagree with Eliezer on AI Risk" (I don't think it would have been useful under almost any circumstance to bring up the arguments from the year 2000). And then to primarily spend your time arguing about the concrete object-level. Not to start a post that is trying to say that Eliezer is "overconfident in his beliefs about AI" and "miscalibrated", and then to justify that by cherry-picking two examples from when Eliezer was barely no longer a teenager, and three arguments on which there is broad disagreement within the AI Alignment field.

I also dislike calling this post "On Deference and Yudkowsky's AI Risk Estimates", as if this post was trying to be an unbiased analysis of how much to defer to Eliezer, while you just list negative examples. I think this post is better named "against Yudkowsky on AI Risk estimates". Or "against Yudkowsky's track record in AI Risk Estimates". Which would have made it clear that you are selectively giving evidence for one side, and more clearly signposted that if someone was trying to evaluate Eliezer's track record, this post will only be a highly incomplete starting point.

I have many more thoughts, but I think I've written enough for now. I think I am somewhat unlikely to engage with replies in much depth, because writing this comment has already taken up a lot of my time, and I expect given the framing of the post, discussion on the post to be unnecessarily conflicty and hard to navigate.

Pablo

It seems that half of these examples are from 15+ years ago, from a period for which Eliezer has explicitly disavowed his opinions

Just to note that the boldfaced part has no relevance in this context. The post is not attributing these views to present-day Yudkowsky. Rather, it is arguing that Yudkowsky's track record is less flattering than some people appear to believe. You can disavow an opinion that you once held, but this disavowal doesn't erase a bad prediction from your track record.

Habryka [Deactivated]

Hmm, I think that part definitely has relevance. Clearly we would trust Eliezer less if his response to that past writing was "I just got unlucky in my prediction, I still endorse the epistemological principles that gave rise to this prediction, and would make the same prediction, given the same evidence, today".

If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.

bmg

If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.

I suppose one of my main questions is whether he has visibly learned from the mistakes, in this case.

For example, I wasn't able to find a post or comment to the effect of "When I was younger, I spent of years of my life motivated by the belief that near-term extinction from nanotech was looming. I turned out to be wrong. Here's what I learned from that experience and how I've applied it to my forecasts of near-term existential risk from AI." Or a post or comment acknowledging his previous over-optimistic AI timelines and what he learned from them, when formulating his current seemingly short AI timelines.

(I genuinely could be missing these, since he has so much public writing.)

Habryka [Deactivated]

Eliezer writes a bit about his early AI timeline and nanotechnology opinions here, though it sure is a somewhat obscure reference that takes a bunch of context to parse:

Luke Muehlhauser reading a previous draft of this (only sounding much more serious than this, because Luke Muehlhauser): You know, there was this certain teenaged futurist who made some of his own predictions about AI timelines -

Eliezer: I'd really rather not argue from that as a case in point. I dislike people who screw up something themselves, and then argue like nobody else could possibly be more competent than they were. I dislike even more people who change their mind about something when they turn 22, and then, for the rest of their lives, go around acting like they are now Very Mature Serious Adults who believe the thing that a Very Mature Serious Adult believes, so if you disagree with them about that thing they started believing at age 22, you must just need to wait to grow out of your extended childhood.
Luke Muehlhauser (still being paraphrased): It seems like it ought to be acknowledged somehow.
Eliezer: That's fair, yeah, I can see how someone might think it was relevant. I just dislike how it potentially creates the appearance of trying to slyly sneak in an Argument From Reckless Youth that I regard as not only invalid but also incredibly distasteful. You don't get to screw up yourself and then use that as an argument about how nobody else can do better.
Humbali: Uh, what's the actual drama being subtweeted here?
Eliezer: A certain teenaged futurist, who, for example, said in 1999, "The most realistic estimate for a seed AI transcendence is 2020; nanowar, before 2015."
Humbali: This young man must surely be possessed of some very deep character defect, which I worry will prove to be of the sort that people almost never truly outgrow except in the rarest cases. Why, he's not even putting a probability distribution over his mad soothsaying - how blatantly absurd can a person get?
Eliezer: Dear child ignorant of history, your complaint is far too anachronistic. This is 1999 we're talking about here; almost nobody is putting probability distributions on things, that element of your later subculture has not yet been introduced. Eliezer-2002 hasn't been sent a copy of "Judgment Under Uncertainty" by Emil Gilliam. Eliezer-2006 hasn't put his draft online for "Cognitive biases potentially affecting judgment of global risks". The Sequences won't start until another year after that. How would the forerunners of effective altruism in 1999 know about putting probability distributions on forecasts? I haven't told them to do that yet! We can give historical personages credit when they seem to somehow end up doing better than their surroundings would suggest; it is unreasonable to hold them to modern standards, or expect them to have finished refining those modern standards by the age of nineteen.
Though there's also a more subtle lesson you could learn, about how this young man turned out to still have a promising future ahead of him; which he retained at least in part by having a deliberate contempt for pretended dignity, allowing him to be plainly and simply wrong in a way that he noticed, without his having twisted himself up to avoid a prospect of embarrassment. Instead of, for example, his evading such plain falsification by having dignifiedly wide Very Serious probability distributions centered on the same medians produced by the same basically bad thought processes.
But that was too much of a digression, when I tried to write it up; maybe later I'll post something separately.

While also including some other points, I do read it as a pretty straightforward "Yes, I was really wrong. I didn't know about cognitive biases, and I did not know about the virtue of putting probability distributions on things, and I had not thought enough about the art of thinking well. I would not make the same mistakes today.".

Guy Raveh

How would the forerunners of effective altruism in 1999 know about putting probability distributions on forecasts? I haven't told them to do that yet!

Did Yudkowsky actually write these sentences?

If Yudkowsky thinks, as this suggests, that people in EA think or do things because he tells them to - this alone means it's valuable to question whether people give him the right credibility.

Habryka [Deactivated]

I am not sure about the question. Yeah, this is a quote from the linked post, so he wrote those sections.

Also, yeah, seems like Eliezer has had a very large effect on whether this community uses things like probability distributions, models things in a bayesian way, makes lots of bets, and pays attention to things like forecasting track records. I don't think he gets to take full credit for those norms, but my guess is he is the single individual who most gets to take credit for those norms.

[anonymous]

I don't see how he has encouraged people to pay attention to forecasting track records. People who have encouraged that norm make public bets or go on public forecasting platforms and make predictions about questions that can resolve in the short term. Bryan Caplan does this; I think greg Lewis and David Manheim are superforecasters.

I thought the upshot of this piece and the Jotto post was that Yudkowsky is in fact very dismissive of people who make public forecasts. "I consider naming particular years to be a cognitively harmful sort of activity; I have refrained from trying to translate my brain's native intuitions about this into probabilities, for fear that my verbalized probabilities will be stupider than my intuitions if I try to put weight on them." This seems like the opposite of encouraging people to pay attention to forecasting but is rather dismissing the whole enterprise of forecasting.

Guy Raveh

I am not sure about the question.

I wanted to make sure I'm not missing something, since this shines a negative light about him IMO.

There's a difference between saying, for example, "You can't expect me to have done X then - nobody was doing it, and I haven't even written about it yet, nor was I aware of anyone else doing so" - and saying "... nobody was doing it because I haven't told them to."

This isn't about credit. It's about self-perception and social dynamics.

Habryka [Deactivated]

-9

I mean... it is true that Eliezer really did shape the culture in the direction of forecasting and predictions and that kind of stuff. My best guess is that without Eliezer, we wouldn't have a culture of doing those things (and like, the AI Alignment community as is probably wouldn't exist). You might disagree with me and him on this, in which case sure, update in that direction, but I don't think it's a crazy opinion to hold.

RyanCarey

110

My best guess is that without Eliezer, we wouldn't have a culture of [forecasting and predictions]

The timeline doesn't make sense for this version of events at all. Eliezer was uninformed on this topic in 1999, at a time when Robin Hanson had already written about gambling on scientific theories (1990), prediction markets (1996), and other betting-related topics, as you can see from the bibliography of his Futarchy paper (2000). Before Eliezer wrote his sequences (2006-2009), the Long Now Foundation already had Long Bets (2003), and Tetlock had already written Expert Political Judgment (2005).

If Eliezer had not written his sequences, forecasting content would have filtered through to the EA community from contacts of Hanson. For instance, through blogging by other GMU economists like Caplan (2009). And of course, through Jason Matheny, who worked at FHI, where Hanson was an affiliate. He ran the ACE project (2010), which led to the science behind Superforecasting, a book that the EA community would certainly have discovered.

Habryka [Deactivated]

Hmm, I think these are good points. My best guess is that I don't think we would have a strong connection to Hanson without Eliezer, though I agree that that kind of credit is harder to allocate (and it gets fuzzy what we even mean by "this community" as we extend into counterfactuals like this).

I do think the timeline here provides decent evidence in favor of less credit allocation (and I think against the stronger claim "we wouldn't have a culture of [forecasting and predictions] without Eliezer"). My guess is in terms of causing that culture to take hold, Eliezer is probably still the single most-responsible individual, though I do now expect (after having looked into a bunch of comment threads from 1996 to 1999 and seeing many familiar faces show up) that a lot of the culture would show up without Eliezer.

[anonymous]

speaking for myself, eliezer has played no role in encouraging me to give quantitative probability distributions. For me, that was almost entirely due to people like Tetlock and Bryan Caplan, both of whom I would have encountered regardless of Eliezer. I strongly suspect this is true of lots of people who are in EA but don't identify with the rationalist community

More generally, I do think that Eliezer and other rationalists overestimate how much influence they have had on wider views in the community. eg I have not read the sequences and I just don't think it plays a big role in the internal story of a lot of EAs.

[anonymous]

For me, even people like Nate Silver or David McKay, who aren't part of the community, have played a bigger role on encouraging quantification and probabilistic judgment.

Rebecca

This is my impression and experience as well

Howie_Lempel

"My best guess is that I don't think we would have a strong connection to Hanson without Eliezer"

Fwiw, I found Eliezer through Robin Hanson.

Habryka [Deactivated]

Yeah, I think this isn't super rare, but overall still much less common than the reverse.

Guy Raveh

I'll currently take your word for that because I haven't been here nearly as long. I'll mention that some of these contributions I don't necessarily consider positive.

But the point is, is Yudkowsky a (major) contributor to a shared project, or is he a ruler directing others, like his quote suggests? How does he view himself? How do the different communities involved view him?

P.S. I disagree with whoever (strong-)downvoted your comment.

Yonatan Cale

Yudkowsky often ~~complains~~ ~~rants~~ hopes people will form their own opinions instead of just listening to him, I can find references if you want.
I also think he lately finds it ~~depressing~~ worrying that he's got to be the responsible adult. Easy references: Search for "Eliezer" in List Of Lethalities.

Guy Raveh

I also think he lately finds it ~~depressing~~ worrying that he's got to be the responsible adult. Easy references: Search for "Eliezer" in List Of Lethalities

I think this strengthens my point, especially given how it is written in the post you linked. Telling people you're the responsible adult, or the only one who notices things, still means telling them you're smarter than them and they should just defer to you.

I'm trying to account for my biases in these comments, but I encourage others to go to that post, search for "Eliezer" as you suggested, and form their own views.

RobBensinger

Telling people you're the responsible adult, or the only one who notices things, still means telling them you're smarter than them and they should just defer to you.

Those are four very different claims. In general, I think it's bad to collapse all (real or claimed) differences in ability into a single status hierarchy, for the reasons stated in Inadequate Equilibria.

Eliezer is claiming that other people are not taking the problem sufficiently seriously, claiming ownership of it, trying to form their own detailed models of the full problem, and applying enough rigor and clarity to make real progress on the problem.

He is specifically not saying "just defer to me", and in fact is saying that he and everyone else is going to die if people rely on deference here. A core claim in AGI Ruin is that we need more people with "not the ability to read this document and nod along with it, but the ability to spontaneously write it from scratch without anybody else prompting you".

Deferring to Eliezer means that Eliezer is the bottleneck on humanity solving the alignment problem; which means we die. The thing Eliezer claims we need is a larger set of people who arrive at true, deep, novel insights about the problem on their own —without Eliezer even mentioning the insights, much less spending a ton of time trying to persuade anyone of them—and writing them up.

It's true that Eliezer endorses his current stated beliefs; this goes without saying, or he obviously wouldn't have written them down. It doesn't mean that he thinks humanity has any path to survival via deferring to him, or that he thinks he has figured out enough of the core problems (or ever could conceivably could do so, on his own) to give humanity a significant chance of surviving. Quoting AGI Ruin:

It's guaranteed that some of my analysis is mistaken, though not necessarily in a hopeful direction. The ability to do new basic work noticing and fixing those flaws is the same ability as the ability to write this document before I published it[.]

The end of the "death with dignity" post is also alluding to Eliezer's view that it's pretty useless to figure out what's true merely via deferring to Eliezer.

Guy Raveh

Thanks, those are some good counterpoints.

D0TheMath

Eliezer is cleanly just a major contributor. If he went off the rails tomorrow, some people would follow him (and the community would be better with those few gone), but the vast majority would say “wtf is that Eliezer fellow doing”. I also don’t think he sees himself as the leader of the community either.

Probably Eliezer likes Eliezer more than EA/Rationality likes Eliezer, because Eliezer really likes Eliezer. If I were as smart & good at starting social movements as Eliezer, I’d probably also have an inflated ego, so I don’t take it as too unreasonable of a character flaw.

HaydnBelfield

More than Philip Tetlock (author of Superforecasting)?

Does that particular quote from Yudkowsky not strike you as slightly arrogant?

Habryka [Deactivated]

-5

Yes, definitely much more than Philip Tetlock, given that our community had strong norms of forecasting and making bets before Tetlock had done most of his work on the topic (Expert Political Forecasting was out, but as far as I can tell was not a major influence on people in the community, though I am not totally confident of that).

Does that particular quote from Yudkowsky not strike you as slightly arrogant?

I am generally strongly against a culture of fake modesty. If I want people to make good decisions, they need to be able to believe things about them that might sound arrogant to others. Yes, it sounds arrogant to an external audience, but it also seems true, and it seems like whether it is true should be the dominant fact on whether it is good to say.

David Johnston

FWIW I think "it was 20 years ago" is a good reason not to take these failed predictions too seriously, and "he has disavowed these predictions after seeing they were false" is a bad reason to take them unseriously.

TAG

If EY gets to disavow his mistakes, so does everyone else.

bmg

On 1 (the nanotech case):

I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old.

I think your comment might give the misimpression that I don't discuss this fact in the post or explain why I include the case. What I write is:

I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.

Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his current belief about AI (a high outlier credence in near-term doom from an emerging technology), since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community, since it's not clear when he dropped the belief, and since twenty isn't (in my view) actually all that young. I do know a lot of people in their early twenties; I think their current work and styles of thought are likely to be predictive of their work and styles of thought in the future, even though I do of course expect the quality to go up over time....

An addition reason why I think it's worth distinguishing between his views on nanotech and (e.g.) your views on nuclear power: I think there's a difference between an off-hand view picked up from other people vs. a fairly idiosyncratic view that you consciously adopted after a lot of reflection and that you decide to devote your professional life to and found an organization to address.

It's definitely up to the reader to decide how relevant the nanotech case is. Since it's not widely known, it seems at least pretty plausibly relevant, and the post twice flags his age at the time, I do still endorse including it.

At face value, as well: we're trying to assess how much weight to give to someone's extreme, outlier-ish prediction that an emerging technology is almost certain to kill everyone very soon. It just does seem very relevant, to me, that they previously had a different extreme outlier-ish prediction that another emerging technology was very likely kill everyone within a decade.

I don't find it plausible that we should assign basically no significance to this.

On 6 (the question of whether Yudkowsky has acknowledged negative aspects of his track record):

For the two "clear cut" examples, Eliezer has posted dozens of times on the internet that he has disendorsed his views from before 2002. This is present on his personal website, the relevant articles are no longer prominently linked anywhere, and Eliezer has openly and straightforwardly acknowledged that his predictions and beliefs from the relevant period were wrong.

Similarly, I think your comment may give the impression that I don't discuss this point in the post. What I write is this:

He has written about mistakes from early on in his intellectual life (particularly pre-2003) and has, on this basis, even made a blanket-statement disavowing his pre-2003 work. However, based on my memory and a quick re-read/re-skim, this writing is an exploration of why it took him a long time to become extremely concerned about existential risks from misaligned AI. For instance, the main issue it discusses with his plans to build AGI are that these plans didn't take into account the difficulty and importance of ensuring alignment. This writing isn't, I think, an exploration or acknowledgement of the kinds of mistakes I've listed in this post.

On the general point that this post uses old examples:

Give the sorts of predictions involved (forecasts about pathways to transformative technologies), old examples are generally going to be more unambiguous than new examples. Similarly for risk arguments: it's hard to have a sense of how new arguments are going to hold up. It's only for older arguments that we can start to approach the ability to say that technological progress, progress in arguments, and evolving community opinion say something clear-ish about how strong the arguments were.

On signposting:

I also dislike calling this post "On Deference and Yudkowsky's AI Risk Estimates", as if this post was trying to be an unbiased analysis of how much to defer to Eliezer, while you just list negative examples. I think this post is better named "against Yudkowsky on AI Risk estimates". Or "against Yudkowsky's track record in AI Risk Estimates". Which would have made it clear that you are selectively giving evidence for one side, and more clearly signposted that if someone was trying to evaluate Eliezer's track record, this post will only be a highly incomplete starting point.

I think it's possible another title would have been better (I chose a purposely bland one partly for the purpose of trying to reduce heat - and that might have been a mistake). But I do think I signpost what the post is doing fairly clearly.

The introduction says it's focusing on "negative aspects" of Yudkowsky's track record, the section heading for the section introducing the examples describes them as "cherry-picked," and the start of the section introducing the examples has an italicized paragraph re-emphasizing that the examples are selective and commenting on the significance of this selectiveness.

On the role of the fast take-off assumption in classic arguments:

I think the arguments are pretty tight and sufficient to establish the basic risk argument. I found your critique relatively uncompelling. In particular, I think you are misrepresenting that a premise of the original arguments was a fast takeoff.

I disagree with this. I do think it's fair to say that fast take-off was typically a premise of the classic arguments.

Two examples I have off-hand (since they're in the slides from my talk) are from Yudkowsky's exchange with Caplan and from Superintelligence. Superintelligence isn't by Yudkowsky, of course, but hopefully is still meaningful to include (insofar as Superintelligence heavily drew on Yudkowsky's work and was often accepted as a kind of distillation of the best arguments as they existed at the time).

From Yudkowsky's debate with Caplan (2016):

“I’d ask which of the following statements Bryan Caplan [a critic of AI risk arguments] denies:

Orthogonality thesis: Intelligence can be directed toward any compact goal….

Instrumental convergence: An AI doesn’t need to specifically hate you to hurt you; a paperclip maximizer doesn’t hate you but you’re made out of atoms that it can use to make paperclips, so leaving you alive represents an opportunity cost and a number of foregone paperclips….

Rapid capability gain and large capability differences: Under scenarios seeming more plausible than not, there’s the possibility of AIs gaining in capability very rapidly, achieving large absolute differences of capability, or some mixture of the two….

1-3 in combination imply that Unfriendly AI is a critical problem-to-be-solved, because AGI is not automatically nice, by default does things we regard as harmful, and will have avenues leading up to great intelligence and power.”

(Caveat that the fast-take-off premise is stated a bit ambiguity here, so it's not clear what level of rapidness is being assumed.)

From Superintelligence:

Taken together, these three points [decisive strategic advantage, orthogonality, and instrumental convergence] thus indicate that the first superintelligence may shape the future of Earth-originating life, could easily have non-anthropomorphic final goals, and would likely have instrumental reasons to pursue open-ended resource acquisition. If we now reflect that human beings consist of useful resources (such as conveniently located atoms) and that we depend for our survival and flourishing on many more local resources, we can see that the outcome could easily be one in which humanity quickly becomes extinct.

The decisive strategic advantage point is justified through a discussion of the possibility of a fast take-off. The first chapter of the book also starts by introducing the possibility of an intelligence explosion. It then devotes two chapters to the possibility of a fast take-off and the idea this might imply a decisive strategic advantage, before it gets to discussing things like the orthogonality thesis.

I think it's also relevant that content from MIRI and people associated with MIRI, raising the possibility of extinction from AI, tended to very strongly emphasize (e.g. spend most of its time on) the possibility of a run-away intelligence explosion. The most developed classic pieces arguing for AI risk often have names like "Shaping the Intelligence Explosion," "Intelligence Explosion: Evidence and import," "Intelligence Explosion Microeconomics," and "Facing the Intelligence Explosion."

Overall, then, I do think it's fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn't incidental or a secondary consideration.

[[Note: I've edited my comment, here, to respond to additional points. Although there are still some I haven't responded to yet.]]

Habryka [Deactivated]

One quick response, since it was easy (might respond more later):

Overall, then, I do think it's fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn't incidental or a secondary consideration.

I do think takeoff speeds between 1 week and 10 years are a core premise of the classic arguments. I do think the situation looks very different if we spend 5+ years in the human domain, but I don't think there are many who believe that that is going to happen.

I don't think the distinction between 1 week and 1 year is that relevant to the core argument for AI Risk, since it seems in either case more than enough cause for likely doom, and that premise seems very likely to be true to me. I do think Eliezer believes things more on the order of 1 week than 1 year, but I don't think the basic argument structure is that different in either case (though I do agree that the 1 year opens us up to some more potential mitigating strategies).

TAG

"Orthogonality thesis: Intelligence can be directed toward any compact goal….

Instrumental convergence: An AI doesn’t need to specifically hate you to hurt you; a paperclip maximizer doesn’t hate you but you’re made out of atoms that it can use to make paperclips, so leaving you alive represents an opportunity cost and a number of foregone paperclips….

Rapid capability gain and large capability differences: Under scenarios seeming more plausible than not, there’s the possibility of AIs gaining in capability very rapidly, achieving large absolute differences of capability, or some mixture of the two….

1-3 in combination imply that Unfriendly AI is a critical problem-to-be-solved, because AGI is not automatically nice, by default does things we regard as harmful, and will have avenues leading up to great intelligence and power.”"

1-3 in combination don't imply anything with high probability.

Jan_Kulveit

(i.e. most people who are likely to update downwards on Yudkowsky on the basis of this post, seem to me to be generically too trusting, and I am confident I can write a more compelling post about any other central figure in Effective Altruism that would likely cause you to update downwards even more)

My impression is the post is somewhat unfortunate attempt to "patch" the situation in which many generically too trusting people updated a lot on AGI Ruin: A List of Lethalities and Death with Dignity and subsequent deference/update cascades.

In my view the deeper problem here is instead of disagreements about model internals, many of these people do some sort of "averaging conclusions" move, based on signals like seniority, karma, vibes, etc.

Many of these signals are currently wildly off from truth-tracking, so you get attempts to push the conclusion-updates directly.

Linch

This critique strikes me as about as sensible as digging up someone's old high-school essays and critiquing their stance on communism or the criminal justice system. I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old. I am confident I can find crazier and worse opinions for every single leadership figure in Effective Altruism, if I am willing to go back to what they thought while they were in high-school. To give some character, here are some things I believed in my early high-school years

This is really minor and nitpicky, and I agree with much of your overall points, but I don't think equivocating between "barely 20" and "early high-school" is fair. The former is a normal age to be a third-year university student in the US, and plenty of college-age EAs are taken quite seriously by the rest of us.

Habryka [Deactivated]

Oh, hmm, I think this is just me messing up the differences between the U.S. and german education systems (I was 18 and 19 in high-school, and enrolled in college when I was 20).

I think the first quote on nanotechnology was actually written in 1996 originally (though was maybe updated in 1999). Which would put Eliezer at ~17 years old when he wrote that.

The second quote was I think written in more like 2000, which would put him more in the early college years, and I agree that it seems good to clarify that.

Linch

Thank you, this clarification makes sense to me!

Paul_Christiano

e.g. Paul Christiano has also said that Hanson's predictions looked particularly bad in the FOOM debate

To clarify, what I said was:

I don't think Eliezer has an unambiguous upper hand in the FOOM debate at all

Then I listed a bunch of ways in which the world looks more like Robin's predictions, particularly regarding continuity and locality. I said Robin's predictions about AI timelines in particular looked bad. This isn't closely related to the topic of your section 3, where I mostly agree with the OP.

Habryka [Deactivated]

Hmm, I think this is fair, rereading that comment.

I feel a bit confused here, since at the scale that Robin is talking about, timelines and takeoff speeds seem very inherently intertwined (like, if Robin predicts really long timelines, this clearly implies a much slower takeoff speed, especially when combined with gradual continuous increases). I agree there is a separate competitiveness dimension that you and Robin are closer on, which is important for some of the takeoff dynamics, but on overall takeoff speed, I feel like you are closer to Eliezer than Robin (Eliezer predicting weeks to months to cross the general intelligence human->superhuman gap, you predicting single-digit years to cross that gap, and Hanson predicting decades to cross that gap). Though it's plausible that I am missing something here.

In any case, I agree that my summary of your position here is misleading, and will edit accordingly.

Paul_Christiano

I think my views about takeoff speeds are generally similar to Robin's though neither Robin nor Eliezer got at all concrete in that discussion so I can't really say. You can read this essay from 1998 with his "outside-view" guesses, which I suspect are roughly in line with what he's imagining in the FOOM debate.

I think that doc implies significant probability on a "slow" takeoff of 8, 4, 2... year doublings (more like the industrial revolution), but a broad distribution over dynamics which also puts significant probability on e.g. a relatively fast jump to a 1 month doubling time (more like the agricultural revolution). In either case, over the next few doublings he would by default expect still further acceleration. Overall I think this is basically a sensible model.

(I agree that shorter timelines generally suggest faster takeoff, but I think either Robin or Eliezer's views about timelines would be consistent with either Robin or Eliezer's views about takeoff speed.)

Guy Raveh

I am confident I can write a more compelling post about any other central figure in Effective Altruism that would likely cause you to update downwards even more

If done in a polite and respectful manner, I think this would be a genuinely good idea.

gwern

Not sure why this is on EAF rather than LW or maybe AF, but anyway. I find this interesting to look at because I have been following Eliezer's work since approximately 2003 on SL4, and so I remember this firsthand, as it were. I disagree with several of the evaluations here (but of course agree with several of the others - I found the premise of Flare to be ludicrous at the time, and thankfully, AFAICT, pretty much zero effort went into that vaporware*):

calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn't work, or that all future AI work would look like the Bayesian program and logical approach he favored; he's said (consistently since at least SL4 that I've observed) that they would be extremely dangerous when they worked, and extremely hard to make safe to the high probability that we need them to when deployed to the real world indefinitely and unboundedly and self-modifyingly, and that rigorous program-proof approaches which can make formal logical guarantees of 100% safety are what are necessary and must deal with the issues and concepts discussed in LOGI. I think this is true: they do look extremely dangerous by default, and we still do not have adequate solutions to problems like "how do we talk about human values in a way which doesn't hardwire them dangerously into a reward function which can't be changed?" This is something actively researched now in RL & AI safety, and which continues to lack any solution you could call even 'decent'. (If you have ever been surprised by any result from causal influence diagrams, then you have inadvertently demonstrated the value of this.) More broadly, we still do not have any good proof or approach that we can feasibly engineer any of that with prosaic alignment approaches, which tend towards the 'patch bugs as you find them' or 'make systems so complex you can't immediately think of how they fail' approach to security that we already knew back then was a miserable failure. Eliezer hasn't been shown to be wrong here.
I continue to be amazed anyone can look at the past decade of DL and think that Hanson is strongly vindicated by it, rather than Yudkowsky-esque views. (Take a look at his OB posts on AI the past few years. Hanson is not exactly running victory laps, either on DL, foom, or ems. It would be too harsh to compare him to Gary Marcus... but I've seen at least one person do so anyway.) I would also say that to the extent that Yudkowsky-style research has enjoyed any popularity of late, it's because people have been looking at the old debate and realizing that extremely simple generic architectures written down in a few dozen lines of code, with large capability differences between very similar lines of code, solving many problems in many fields and subsuming entire subfields as simply another minor variant, with large generalizing models (as opposed to the very strong small-models-unique-to-each-individual-problem-solved-case-by-case-by-subject-experts which Hanson & Drexler strongly advocated and which was the ML mainstream at the time) powered by OOMs more compute, steadily increasing in agency, is a short description of Yudkowsky's views on what the runup will look like and how DL now works.
"his arguments focused on a fairly specific catastrophe scenario that most researchers now assign less weight to than they did when they first entered the field."

Yet, the number who take it seriously since Eliezer started advocating it in the 1990s is now far greater than it was when he started and was approximately the only person anywhere. You aren't taking seriously that these surveyed researchers ("AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI") wouldn't exist without Eliezer as he created the AI safety field as we know it, with everyone else downstream (like Bostrom's influential Superintelligence - Eliezer with the serial numbers filed off and an Oxford logo added). This is missing the forest for a few trees; if you are going to argue that a bit of regression to the mean in extreme beliefs should be taken as some evidence against Eliezer, then you must also count the initial extremity of the beliefs leading to these NGOs doing AI safety & people at them doing AI safety at all as much evidence for Eliezer.† (What a perverse instance of Simpson's paradox.)

There's also the caveat mentioned there that the reduction may simply be because they have moved up other scenarios like the part 2 scenario where it's not a singleton hard takeoff but a multipolar scenario (a distinction of great comfort, I'm sure), which is a scenario which over the past few years is certainly looking more probable due to how DL scaling and arms races work. (In particular, we've seen some fast followups - because the algorithms are so simple that once you hear the idea described at all, you know most of it.) I didn't take the survey & don't work at the listed NGOs, but I would point out that if I had gone pro sometime in the past decade & taken it, under your interpretation of this statistic, you would conclude "Gwern now thinks Eliezer was wrong". Something to think about, especially if you want to consider observations like "this statistic claims most people are moving away from Eliezer's views, even though when I look at discussions of scaling, research trends, and what startups/NGOs are being founded, it sure looks like the opposite..."

* Flare has been, like Roko's Basilisk, one of those things where the afterlife of it has been vastly greater than the thing itself ever was, and where it gets employed in mutually contradictory ways by critics

† I find it difficult to convey what incredibly hot garbage AI researcher opinions in the '90s were about these topics. And I don't mean the casual projections that AGI would take until 2500 AD or whatever, I mean basics like the orthogonality thesis and instrumental drives. Like 'transhumanism', these are terms used in inverse proportion to how much people need them. Even on SL4, which was the fringiest of the fringe in AI alarmism, you had plenty of people reading and saying, "no, there's no problem here at all, any AI will just automatically be friendly and safe, human moral values aren't fragile or need to be learned, they're just, like, a law of physics and any evolving system will embody our values". If you ever wonder how old people in AI like Kurzweil or Schmidhuber can be so gungho about the prospect of AGI happening and replacing (ie. killing) humanity and why they have zero interest in AI safety/alignment, it's because they think that this is a good thing and our mind-children will just automatically be like us but better and this is evolution. ("Say, doth the dull soil / Quarrel with the proud forests it hath fed, / And feedeth still, more comely than itself?"...) If your response to reading this is, "gwern, do you have a cite for all of that? because no real person could possibly believe such a both deeply naive and also colossally evil strawman", well, perhaps that will convey some sense of the intellectual distance traveled.

RyanCarey

131

like Bostrom's influential Superintelligence - Eliezer with the serial numbers filed off and an Oxford logo added

It's not accurate that the key ideas of Superintelligence came to Bostrom from Eliezer, who originated them. Rather, at least some of the main ideas came to Eliezer from Nick. For instance, in one message from Nick to Eliezer on the Extropians mailing list, dated to Dec 6th 1998, inline quotations show Eliezer arguing that it would be good to allow a superintelligent AI system to choose own its morality. Nick responds that it's possible for an AI system to be highly intelligent without being motivated to act morally. In other words, Nick explains to Eliezer an early version of the orthogonality thesis.

Nick was not lagging behind Eliezer on evaluating the ideal timing of a singularity, either - the same thread reveals that they both had some grasp of the issue. Nick said that the fact that 150,000 people die per day must be contextualised against "the total number of sentiences that have died or may come to live", foreshadowing his piece on Astronomical Waste, that would be published five years later. Eliezer said that having waited billions of years, the probability of a success is more important than any delay of hundreds of years.

These are indeed two of the most-important macrostrategy insights relating to AI. A reasonable guess is that a lot of the big ideas in Superintelligence were discovered by Bostrom. Some surely came from Eliezer and his sequences, or from discussions between the two, and I suppose that some came from other utilitarians and extropians.

Ben Pace

I think chapter 4, The Kinetics of an Intelligence Explosion, has a lot of terms and arguments from EY's posts in the FOOM Debate. (I've been surprised by this in the past, thinking Bostrom invented the terms, then finding things like resource overhangs getting explicitly defined in the FOOM Debate.)

bmg

Thanks for the comment! A lot of this is useful.

calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn't work, or that all future AI work would look like the Bayesian program and logical approach he favored;

I mainly have the impression that LOGI and related articles were probably "wrong" because, so far as I've seen, nothing significant has been built on top of them in the intervening decade-and-half (even though LOGI's successor was seemingly predicted to make it possible for a small group to build AGI). It doesn't seem like there's any sign that these articles were the start of a promising path to AGI that was simply slower than the deep learning path.

I have had the impression, though, that Yudkowsky also thought that logical/Bayesian approaches were in general more powerful/likely-to-enable-near-term-AGI (not just less safe) than DL. It's totally possible this is a misimpression - and I'd be inclined to trust your impression over mine, since you've read more of his old writing than I have. (I'd also be interested if you happen to have any links handy.) But I'm not sure this significantly undermine the relevance of the LOGI case.

I continue to be amazed anyone can look at the past decade of DL and think that Hanson is strongly vindicated by it, rather than Yudkowsky-esque views.

I also think that, in various ways, Hanson also doesn't come off great. For example, he expresses a favorable attitude toward the CYC project, which now looks like a clear dead end. He is also overly bullish about the importance of having lots of different modules. So I mostly don't want to defend the view "Hanson had a great performance in the FOOM debate."

I do think, though, his abstract view that compute and content (i.e. data) are centrally important are closer to mark than Yudkowsky's expressed view. I think it does seem hard to defend Yudkowsky's view that it's possible for a programming team (with mid-2000s levels of compute) to acquire some "deep new insights," go down into their basement, and then create an AI system that springboards itself into taking over the world. At least - I think it's fair to say - the arguments weren't strong enough to justify a lot of confidence in that view.

Yet, the number who take it seriously since Eliezer started advocating it is now far greater than it was when he started and was approximately the only person anywhere. You aren't taking seriously that these surveyed researchers ("AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI") wouldn't exist without Eliezer as he created the AI safety field as we know it, with everyone else downstream (like Bostrom's influential Superintelligence - Eliezer with the serial numbers filed off and an Oxford logo added).

This is certainly a positive aspect of his track-record - that many people have now moved closer to his views. (It also suggests that his writing was, in expectation, a major positive contribution to the project of existential risk reduction - insofar as this writing has helped move people up and we assume this was the right direction to move.) But it doesn't imply that we should give him many more "Bayes points" to him than we give to the people who moved.

Suppose, for example, that someone says in 2020 that there was a 50% chance of full-scale nuclear war in the next five years. Then - due to Russia's invasion of Ukraine - most people move their credences upward (although they still remained closer to 0% than 50%). Does that imply the person giving the early warning was better-calibrated than the people who moved their estimates up? I don't think so. And I think - in this nuclear case - some analysis can be used to justify the view that the person giving the early warning was probably overconfident; they probably didn't have enough evidence or good enough arguments to actually justify a 50% credence.

It may still be the case that the person giving the early warning (in the hypothetical nuclear case) had some valuable and neglected insights, missed by others, that are well worth paying attention to and seriously reflecting on; but that's a different matter from believing they were overall well-calibrated or should be deferred to much more than the people who moved.

[[EDIT: Something else it might be worth emphasizing, here, is that I'm not arguing for the view "ignore Eliezer." It's closer to "don't give Eliezer's views outsized weight, compared to (e.g.) the views of the next dozen people you might be inclined to defer to, and factor in evidence that his risk estimates might have a sigificant upward bias to them."]]

DirectedEvolution

I'm going to break a sentence from your comment here into bits for inspection. Also, emphasis and elisions mine.

I would also say that to the extent that Yudkowsky-style research has enjoyed any popularity of late, it's because people have been looking at the old debate and realizing that
extremely simple generic architectures written down in a few dozen lines of code
with large capability differences between very similar lines of code
solving many problems in many fields and subsuming entire subfields as simply another minor variant
with large generalizing models...
powered by OOMs more compute
steadily increasing in agency
is
a short description of Yudkowsky's views on what the runup will look like
and how DL now works.

We don't have a formalism to describe what "agency" is. We do have several posts trying to define it on the Alignment Forum:

While it might not be the best choice, I'm going to use Gradations of Agency as a definition, because it's more systematic in its presentation.

"Level 3" is described as "Armed with this ability you can learn not just from your own experience, but from the experience of others—you can identify successful others and imitate them."

This doesn't seem like what any ML model does. So we can look at "Level 2," which gives the example " You start off reacting randomly to inputs, but you learn to run from red things and towards green things because when you ran towards red things you got negative reward and when you ran towards green things you got positive reward."

This seems like how all ML works.

So using the "Gradations of Agency" framework, we might view individual ML systems as improving in power and generality within a single level of agency. But they don't appear to be changing levels of agency. They aren't identifying other successful ML models and imitating them.

Gradations of Agency doesn't argue whether or not there is an asymptote of power and generality within each level. Is there a limit to the power and generality possible within level 2, where all ML seems to reside?

This seems to be the crux of the issue. If DL is approaching an asymptote of power and generality below that of AGI as model and data sizes increase, then this cuts directly against Yudkowsky's predictions. On the other hand, if we think that DL can scale to AGI through model and data size increases alone, then that would be right in line with his predictions.

A 10 trillion parameter model now exists, and it's been suggested that a 100 trillion parameter model, which might even be created this year, might be roughly comparable to the power of the human brain.

It's scary to see that we're racing full-on toward a very near-term ML project that might plausibly be AGI. However, if a 100-trillion parameter ML model is not AGI, then we'd have two strikes against Yudkowski. If neither a small coded model nor a 100-trillion parameter trained model using 2022-era ML results in AGI, then I think we have to take a hard look at his track record on predicting what technology is likely to result in AGI. We also have his "AGI well before 2050" statement from "Beware boasting" to work with, although that's not much help.

On the other hand, I think his assertiveness about the importance of AI safety and risk is appropriate even if he proves wrong about the technology by which AGI will be created.

I would critique the OP, however, for not being sufficiently precise in its critiques of Yudkowsky. As its "fairly clearcut examples," it uses 20+-year-old predictions that Yudkowsky has explicitly disavowed. Then, at the end, it complains that he hasn't "acknowledged his mixed track record." Yet in the post it links, Yudkowsky's quoted as saying:

To be a slightly better Bayesian is to spend your entire life watching others slowly update in excruciatingly predictable directions that you jumped ahead of 6 years earlier so that your remaining life could be a random epistemic walk like a sane person with self-respect.

6 years is not 20 years. It's perfectly consistent to say that a youthful, 20+-years-in-the-past version of you thought wrongly about a topic, but that you've since come to be so much better at making predictions within your field that you're 6 years ahead of Metaculus. We might wish he'd stated these predictions in public and specified what they were. But his failure to do so doesn't make him wrong, but rather lacking evidence of his superior forecasting ability. These are distinct failure modes.

Overall, I think it's wrong to conflate "Yudkowsky was wrong 20+ years ago in his youth" with "not everyone in AI safety agrees with Yudkowsky" with "Yudkowsky hasn't made many recent, falsifiable near-term public predictions about AI timelines." I think this is a fair critique of the OP, which claims to be interrogating Yudkowsky's "track record."

But I do agree that it's wise for a non-expert to defer to a portfolio of well-chosen experts, rather than the views of the originator of the field alone. While I don't love the argument the OP used to get there, I do agree with the conclusion, which strikes me as just plain common sense.

kokotajlod

Re gradations of agency: Level 3 and level 4 seem within reach IMO. IIRC there are already some examples of neural nets being trained to watch other actors in some simulated environment and then imitate them. Also, model-based planning (i.e. level 4) is very much a thing, albeit something that human programmers seem to have to hard-code. I predict that within 5 years there will be systems which are unambiguously in level 3 and level 4, even if they aren't perfect at it (hey, we humans aren't perfect at it either).

Charles He

Level 3" is described as "Armed with this ability you can learn not just from your own experience, but from the experience of others—you can identify successful others and imitate them." This doesn't seem like what any ML model does.

This sounds like straightforward transfer learning (TL) or fine tuning, common in 2017.

So you could just write 15 lines of python which shops between some set of pretrained weights and sees how they perform. Often TL is many times (1000x) faster than random weights and only needs a few examples.

As speculation: it seems like in one of the agent simulations you can just have agents grab other agents weights or layers and try them out in a strategic way (when they detect an impasse or new environment or something). There is an analogy to biology where species alternate between asexual vs sexual reproduction, and trading of genetic material occurs during periods of adversity. (This is trivial, I’m sure a second year student has written a lot more.)

This doesn’t seem to fit any sort of agent framework or improve agency though. It just makes you train faster.

Charles He

Eh, there seems like a connection to interpretability.

For example, if the ML architecture “were modular+categorized or legible to the agents”, they would more quickly and effectively swap weights or models.

So there might be some way where legibility can emerge by selection pressure in an environment where say, agents had limited capacity to store weights or data, and had to constantly and extensively share weights with each other. You could imagine teams of agents surviving and proliferating by a shared architecture that let them pass this data fluently in the form of weights.

To make sure the transmission mechanism itself isn’t crazy baroque you can, like, use some sort of regularization or something.

I’m 90% sure this is a shower thought but like it can’t be worse than “The Great Reflection”.

Locke

n00b q: What's AF?

Linch

Alignment Forum (for technical discussions about AI alignment)

Evan R. Murphy

It's short for the Alignment Forum: https://www.alignmentforum.org/

Charles He

-31

Eh.

The above seems voluminous and I believe this is the written output with the goal of defending a person.

I will reluctantly engage directly, instead of just launching into another class of arguments or something or go for a walk (I'm being blocked by moral maze sort of reasons and unseasonable weather).

You aren't taking seriously that these surveyed researchers ("AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI") wouldn't exist without Eliezer as he created the AI safety field as we know it

Yeah, no, it's the exact opposite.

So one dude, who only has a degree in social studies, but seems to write well, wrote this:

https://docs.google.com/document/d/1hKZNRSLm7zubKZmfA7vsXvkIofprQLGUoW43CYXPRrk/edit#

I'm copying a screenshot to show the highlighting isn't mine:

This isn't what is written or is said, but using other experience unrelated to EA or anyone in it, I'm really sure even a median thought leader would have better convinced the person written this.

So they lost 4 years of support (until Superintelligence was written)

gwern

The above seems voluminous and I believe this is the written output with the goal of defending a person.

Yes, much like the OP is voluminous and is the written output with the goal of criticizing a person. You're familiar with such writings, as you've written enough criticizing me. Your point?

Yeah, no, it's the exact opposite.

No, it's just as I said, and your Karnofsky retrospective strongly supports what I said. (I strongly encourage people to go and read it, not just to see what's before and after the part He screenshots, but because it is a good retrospective which is both informative about the history here and an interesting case study of how people change their minds and what Karnofsky has learned.)

Karnofsky started off disagreeing that there is any problem at all in 2007 when he was introduced to MIRI via EA, and merely thought there were some interesting points. Interesting, but certainly not worth sending any money to MIRI or looking for better alternative ways to invest in AI safety. These ideas kept developing, and Karnofsky kept having to engage, steadily moving from 'there is no problem' to intermediate points like 'but we can make tool AIs and not agent AIs' (a period in his evolution I remember well because I wrote criticisms of it), which he eventually abandons. You forgot to screenshot the part where Karnofsky writes that he assumed 'the experts' had lots of great arguments against AI risk and the Yudkowsky paradigm and that was why they just bother talking about it, and then moved to SF and discovered 'oh no', that not only did those not exist, the experts hadn't even begun to think about it. Karnofsky also agrees with many of the points I make about Bostrom's book & intellectual pedigree ("When I'd skimmed Superintelligence (prior to its release), I'd felt that its message was very similar to - though more clearly and carefully stated than - the arguments MIRI had been making without much success." just below where you cut off). And so here we are today, where Karnofsky has not just overseen donations of millions of dollars to MIRI and AI safety NGOs or the recruitment of MIRI staffers like ex-MIRI CEO Muehlhauser, but it remains a major area for OpenPhil (and philanthropies imitating it like FTX). It all leads back to Eliezer. As Karnofsky concludes:

One of the biggest changes is the one discussed above, regarding potential risks from advanced AI. I went from seeing this as a strange obsession of the community to a case of genuine early insight and impact. I felt the community had identified a potentially enormously important cause and played a major role in this cause's coming to be taken more seriously. This development became - in my view - a genuine and major candidate for a "hit", and an example of an idea initially seeming "wacky" and later coming to seem prescient.

Of course, it is far from a settled case: many questions remain about whether this cause is indeed important and whether today's preparations will look worthwhile in retrospect. But my estimate of the cause's likely importance - and, I believe, conventional wisdom among AI researchers in academia and industry - has changed noticeably.

That is, Karnofsky explicitly attributes the widespread changes I am describing to the causal impact of the AI risk community around MIRI & Yudkowsky. He doesn't say it happened regardless or despite them, or that it was already fairly common and unoriginal, or that it was reinvented elsewhere, or that Yudkowsky delayed it on net.

I'm really sure even a median thought leader would have better convinced the person written this.

Hard to be convincing when you don't exist.

bmg

No, it's just as I said, and your Karnofsky retrospective strongly supports what I said.

I also agree that Karnfosky's retrospective supports Gwern's analysis, rather than doing the opposite.

(I just disagree about how strongly it counts in favor of deference to Yudkowsky. For example, I don't think this case implies we should currently defer more to Yudkwosky's risk estimates than we do to Karnofsky's.)

Charles He

-1

Ugh. Y'all just made me get into "EA rhetoric" mode:

I also agree that Karnfosky's retrospective supports Gwern's analysis, rather than doing the opposite.

What?

No. Not only is this not true but this is indulging in a trivial rhetorical maneuver.

My comment said that the counterfactual would be better without the involvement of the person mentioned in the OP. I used the retrospective as evidence.

The retrospective includes at least two points for why the author changed their mind:

The book Superintelligence, which they explicitly said was the biggest event
The author moved to SF and learned about DL, and was informed by speaking to non-rationalist AI researchers, and then decided that LessWrong and MIRI were right.

In response to this, Gwern states the point #2, and asserts that this is causal evidence in favor of the person mentioned in the OP being useful.

Why? How?

Notice that #2 above doesn't at all rule out that the founders or culture was repellent. In fact it seems like a lavish, and unlikely level amount of involvement.

bmg

What?

I interpreted Gwern as mostly highlighting that people have updated toward's Yudkowsky's views - and using this as evidence in favor of the view we should defer a decent amount to Yudkowsky. I think that was a reasonable move.

There is also a causal question here ('Has Yudkowsky on-net increased levels of concern about AI risk relative to where they would otherwise be?'), but I didn't take the causal question to be central to the point Gwern was making. Although now I'm less sure.

I don't personally have strong views on the causal question - I haven't thought through the counterfactual.

Charles He

-10

(I strongly encourage people to go and read it, not just to see what's before and after the part He screenshots, but because it is a good retrospective which is both informative about the history here and an interesting case study of how people change their minds and what Karnofsky has learned.)

By the way, I didn't screenshot the pieces that fit my narrative—Gwern's assertion of bad faith is another device being used.

Yes, much like the OP is voluminous and is the written output with the goal of criticizing a person. You're familiar with such writings, as you've written enough criticizing me. Your point?

Gwern also digs up a previous argument. Not only is that issue entirely unrelated, its sort of exactly the opposite evidence he wants to show: Gwern appeared to borderline or threaten to dox someone who spoke out against him.

I commented. However I do not know anyone involved, such as who Gwern was, but only acting on the content and behaviour I saw, which was outright abusive.

There is no expected benefit to doing this. It's literally the most principled thing to act in this way and I would do it again.

The consequences of that incident, the fact that this person with this behavior and content had this much status, was a large update for me.

More subtly and perniciously, Gwern's adverse behavior in this comment chain and the incident mentioned above, is calibrated to the level of "EA rhetoric". Digs like his above can sail through, with the tailwind of support of a subset of this community, a subset that values authority over content and Truth, to a degree much more than it understands.

On the other hand, in contrast, an outsider, who already has to dance through all the rhetorical devices and elliptical references, has to make a high effort, unemotional comment to try to make a point. Even or especially if they manage to do this, they can expect to be hit with a wall of text with various hostilities.

Like, this is awful. This isn't just bad but it's borderline abusive.

It's wild that that this is the level of discourse here.

Because of the amount of reputation, money and ingroupness, this is probably one of the most extreme forms of tribalism that exists.

Do you know how much has been lost?

technicalities

Charles, consider going for that walk now if you're able to. (Maybe I'm missing it, but the rhetorical moves in this thread seem equally bad, and not very bad at that.)

Charles He

You are right, I don't think my comments are helping.

Charles He

-1

Like, how can so many standard, stale patterns of internet forum authority, devices and rhetoric be rewarded and replicate in a community explicitly addressing topics like tribalism and "evaporative cooling"?

Lizka

Moderator comment

The moderators feel that some comments in this thread break Forum norms and are discussing what to do about it.

Lizka

Moderator comment

Here are some things we think break Forum norms:

Rude/hostile language and condescension, especially from Charles He
Gwern brings in an external dispute — a thread in which Charles accuses them of doxing an anonymous critic on LessWrong. We think that bringing in external disputes interferes with good discourse; it moves the thread away from discussion of the topic in question, and more towards discussions of individual users’ characters
The conversation about the external dispute gets increasingly unproductive

The mentioned thread about doxing also breaks Forum norms in multiple ways. We’ve listed them on that thread.

The moderators are still considering a further response. We’ll also be discussing with both Gwern and Charles privately.

Lizka

Moderator comment

The moderation team is issuing Charles a 3-month ban.

RyanCarey

I honestly don't see such a problem with Gwern calling out out Charles' flimsy argument and hypocrisy using an example, be it a part of an external dispute.

On the other hand, I think Charles' uniformly low comment quality should have had him (temporarily) banned long ago (sorry Charles). The material is generally poorly organised, poorly researched, often intentionally provocative, sometimes interspersed with irrelevant images, and high in volume. One gets the impression of an author who holds their reader in contempt.

[anonymous]

I don't necessarily disagree with the assessment of a temporary ban for "unnecessary rudeness or offensiveness", or "other behaviour that interferes with good discourse", but I disagree that Charles' comment quality is "uniformly" low or that a ban might be merited primarily because of high comment volume and too low quality.There are some real insights and contributions sprinkled in in my opinion.

For me the unnecessary rudeness or offensiveness and other behavior interfering with discourse comes from things like comments that are technically replies to a particular person but seem like they're mostly intended to win the argument in front of unknown readers, and containing things like rudeness, paranoia, and condescension towards the person they're replying to. I think the doxing accusation, which if I remember correctly actually doxxed the victim much more than Gwern's comment, is part of a similar pattern of engaging poorly with a particular person, partly through an incorrect assessment that the benefits to bystanders will outweigh the costs. I think this sort of behavior stifles conversation and good will.

I'm not sure a ban is a great solution though. There might be other, less blunt ways of tackling this situation.

What I would really like to see is a (much) higher lower limit of comment quality from Charles i.e. moving the bar for tolerating rudeness and bad behavior in a comment much higher even though it could be potentially justified in terms of benefits to bystanders or readers.

Charles He

This is useful and thoughtful. I will read and will try to update on this (in general life, if not the forum?) Please continue as you wish!

I want to notify you and others, that I don't expect such discussion to materially affect any resulting moderator action, see this comment describing my views on my ban.

Below that comment, I wrote some general thoughts on EA. It would be great if people considered or debated the ideas there.

Charles He

I don’t disagree with your judgement of banning but I point out there’s no banning for quality—you must be very frustrated with the content.

To get a sense of this, for the specific issue in the dispute, where I suggested the person or institution in question caused a a 4 year delay in funding, are you saying it’s an objectively bad read, even limited to just the actual document cited? I don’t see how that is.

Or is this wrong, but requires additional context or knowledge.

RyanCarey

Re the banning idea, I think you could fall afoul of "unnecessary rudeness or offensiveness", or "other behaviour that interferes with good discourse" (too much volume, too low quality). But I'm not the moderator here.

My point is that when you say that Gwern produces verbose content about a person, it seems fine - indeed quite appropriate - for him to point out that you do too. So it seems a bit rich for that to be a point of concern for moderators.

I'm not taking any stance on the doxxing dispute itself, funding delays, and so on.

Charles He

I agree with your first paragraph for sure.

bmg

A general reflection: I wonder if one at least minor contributing factor to disagreement, around whether this post is worthwhile, is different understandings about who the relevant audience is.

I mostly have in mind people who have read and engaged a little bit with AI risk debates, but not yet in a very deep way, and would overall be disinclined to form strong independent views on the basis of (e.g.) simply reading Yudkowsky's and Christiano's most recent posts. I think the info I've included in this post could be pretty relevant to these people, since in practice they're often going to rely a lot -- consciously or unconsciously; directly or indirectly -- on cues about how much weight to give different prominent figures' views. I also think that the majority of members of the existential risk community are in this reference class.

I think the info in this post isn't nearly as relevant to people who've consumed and reflected on the relevant debates very deeply. The more you've engaged with and reflected on an issue, the less you should be inclined to defer -- and therefore the less relevant track records become.

(The limited target audience might be something I don't do a good enough job communicating in the post.)

kokotajlod

I think that insofar as people are deferring on matters of AGI risk etc., Yudkowsky is in the top 10 people in the world to defer to based on his track record, and arguably top 1. Nobody who has been talking about these topics for 20+ years has a similarly good track record. If you restrict attention to the last 10 years, then Bostrom does and Carl Shulman and maybe some other people too (Gwern?), and if you restrict attention to the last 5 years then arguably about a dozen people have a somewhat better track record than him.

(To my knowledge. I think I'm probably missing a handful of people who I don't know as much about because their writings aren't as prominent in the stuff I've read, sorry!)

He's like Szilard. Szilard wasn't right about everything (e.g. he predicted there would be a war and the Nazis would win) but he was right about a bunch of things including that there would be a bomb, that this put all of humanity in danger, etc. and importantly he was the first to do so by several years.

I think if I were to write a post cautioning people against deferring to Yudkowsky, I wouldn't talk about his excellent track record but rather about his arrogance, inability to clearly explain his views and argue for them (at least on some important topics, he's clear on others), seeming bias towards pessimism, ridiculously high (and therefore seemingly overconfident) credences in things like p(doom), etc. These are the reasons I would reach for (and do reach for) when arguing against deferring to Yudkowsky.

[ETA: I wish to reemphasize, but more strongly, that Yudkowsky seems pretty overconfident not just now but historically. Anyone deferring to him should keep this in mind; maybe directly update towards his credences but don't adopt his credences. E.g. think "we're probably doomed" but not "99% chance of doom" Also, Yudkowsky doesn't seem to be listening to others and understanding their positions well. So his criticisms of other views should be listened to but not deferred to, IMO.]

TAG

"Nobody who has been talking about these topics for 20+ years has a similarly good track record."

Really? We know EY made a bunch of mispredictions "A certain teenaged futurist, who, for example, said in 1999, "The most realistic estimate for a seed AI transcendence is 2020; nanowar, before 2015." What are his good predictions? I can't see a single example in this thread.

kokotajlod

Ironically, one of the two predictions you quote as example of bad prediction, is in fact an example of a good prediction: "The most realistic estimate for a seed AI transcendence is 2020."

Currently it seems that AGI/superintelligence/singularity/etc. will happen sometime in the 2020's. Yudkowsky's median estimate in 1999 was 2020 apparently, so he probably had something like 30% of his probability mass in the 2020s, and maybe 15% of it in the 2025-2030 period when IMO it's most likely to happen.

Now let's compare to what other people would have been saying at the time. They would almost all have been saying 0%, and then maybe the smarter and more rational ones would have been saying things like 1%, for the 2025-2030 period.

To put it in nonquantitative terms, almost everyone else in 1999 would have been saying "AGI? Singularity? That's not a thing, don't be ridiculous." The smarter and more rational ones would have been saying "OK it might happen eventually but it's nowhere in sight, it's silly to start thinking about it now." Yudkowsky said "It's about 21 years away, give or take; we should start thinking about it now." Now with the benefit of 24 years of hindsight, Yudkowsky was a lot closer to the truth than all those other people.

Also, you didn't reply to my claim. Who else has been talking about AGI etc. for 20+ years and has a similarly good track record? Which of them managed to only make correct predictions when they were teenagers? Certainly not Kurzweil.

splinter

The negative reactions to this post are disheartening. I have a degree of affectionate fondness for the parodic levels of overthinking that characterize the EA community, but here you really see the downsides of that overthinking concretely.

Of course it is meaningful that Eliezer Yudkowsky has made a bunch of terrible predictions in the past that closely echo predictions he continues to make in slightly different form today. Of course it is relevant that he has neither owned up to those earlier terrible predictions or explained how he has learned from those mistakes. Of course we should be more skeptical of similar claims he makes in the future. Of course we should pay more attention to broader consensus or aggregate predictions in the field than in outlier predictions.

This is sensible advice in any complex domain, and saying that we should "evaluate every argument in isolation on its merits" is a type of special pleading or sophistry. Sometimes (often!) the obvious conclusions are the correct ones: even extraordinarily clever people are often wrong; extreme claims that other knowledgeable experts disagree with are often wrong; and people who make extreme claims that prove to be wrong should be strongly discounted when they make further extreme claims.

None of this is to suggest in any what that Yudkowsky should be ignored, or even is necessarily wrong. But if you yourself are not an expert in AI (as most of us aren't), his past bad predictions are highly relevant indicators when assessing his current predictions.

RobBensinger

Of course it is meaningful that Eliezer Yudkowsky has made a bunch of terrible predictions in the past that closely echo predictions he continues to make in slightly different form today.

I assume you're mainly talking about young-Eliezer worrying about near-term risk from molecular nanotechnology, and current-Eliezer worrying about near-term risk from AGI?

I think age-17 Eliezer was correct to think widespread access to nanotech would be extremely dangerous. See my comment. If you or Ben disagree, why do you disagree?

Age-20 Eliezer was obviously wrong about the timing for nanotech, and this is obviously Bayesian evidence for 'Eliezer may have overly-aggressive tech timelines in general'.

I don't think this is generally true -- e.g., if you took a survey of EAs worried about AI risk in 2010 or in 2014, I suspect Eliezer would have longer AI timelines than others at the time. (E.g., he expected it to take longer to solve Go than Carl Shulman did.) When I joined MIRI, the standard way we summarized MIRI's view was roughly 'We think AI risk is high, but not because we think AGI is imminent; rather, our worry is that alignment is likely to take a long time, and that civilization may need to lay groundwork decades in advance in order to have a realistic chance of building aligned AGI.'

But nanotech is a totally fair data point regardless.

Of course it is relevant that he has neither owned up to those earlier terrible predictions or explained how he has learned from those mistakes.

Eliezer wrote a 20,000-word essay series on his update, and the mistakes he thought he was making. Essay titles include "My Childhood Death Spiral", "The Sheer Folly of Callow Youth", "Fighting a Rearguard Action Against the Truth", and "The Magnitude of His Own Folly".

He also talks a lot about how he's updated and revised his heuristics and world-models in other parts of the Sequences. (E.g., he writes that he underestimated elite competence when he was younger.)

What specific cognitive error do you want him to write about, that he hasn't already written on?

This is sensible advice in any complex domain, and saying that we should "evaluate every argument in isolation on its merits" is a type of special pleading or sophistry.

I don't think the argument I'm making (or most others are making) is 'don't update on people's past mistakes' or 'never do deference'. Rather, a lot of the people discussing this matter within EA (Wei Dai, Gwern Branwen, Richard Ngo, Rohin Shah, Carl Shulman, Nate Soares, Ajeya Cotra, etc.) are the world's leading experts in this area, and a lot of the world's frontier progess on this topic is happening on Internet fora like the EA Forum and LessWrong. It makes sense for domain specialists to put much more focus into evaluating arguments on the merits; object-level conversations like these are how the intellectual advances occur that can then be reflected in aggregators like Metaculus.

Metaculus and prediction markets will be less accurate if frontier researchers replace object-level discussion with debates about who to defer to, in the same way that stock markets would be less efficient if everyone overestimated the market's efficiency and put minimal effort into beating the market.

Insofar as we're trying to grow the field, it also makes sense to encourage more EAs to try to think about these topics and build their own inside-view models; and this has the added benefit of reducing the risk of deference cascades.

(I also think there are other reasons it would be healthy for EA to spend a lot more time on inside-view building on topics like AI, normative ethics, and global poverty, as I briefly said here. But it's possible to practice model-building and then decide at the end of the day, nonetheless, that you don't put much weight on the domain-specific inside views you've built.)

extreme claims

When people use words like "extreme" here, I often get the sense that they aren't crisply separating "extreme" in the sense of "weird-sounding" from "extreme" in the sense of "low prior probability". I think Eliezer's views are weird-sounding, not unlikely on priors.

E.g., why should we expect generally intelligent machines to be low-impact if built, or to never be built?

The idea that a post-AGI world looks mostly the same as a pre-AGI world might sound more normal and unsurprising to an early-21st-century well-off Anglophone intellectual, but I think this is just an error. It's a clear case of the availability heuristic misfiring, not a prior anyone should endorse upon reflection.

I view the Most Important Century series as an attempt to push back against many versions of this conflation.

Epistemically, I view Paul's model as much more "extreme" than Eliezer's because I think it's much more conjunctive. I obviously share the view that soft takeoff sounds more normal in some respects, but I don't think this should inform our prior much. I'd guess we should start with a prior that assigns lots of weight to soft takeoff as well as to hard takeoff, and then mostly arrive at a conclusion based on the specific arguments for each view.

Rohin Shah

See Rohin Shah’s (I think correct) objection to the use of “coherence arguments” to support AI risk concerns.

Fwiw I'd say this somewhat differently.

I object to a specific way in which one could use coherence arguments to support AI risk: namely, "AI is intelligent --> AI satisfies coherence arguments better than we do --> AI looks as though it is maximizing a utility function from our perspective --> Convergent instrumental subgoals --> Doom".

As far as I know, anyone who has spent ~an hour reading my post and thinking about it basically agrees with that particular narrow point.

This doesn't rule out other ways that one could use coherence arguments to support AI risk, such as "coherence arguments show that achieving stuff can typically be factored into beliefs about the world and goals that you want to achieve; since we'll be building AIs to achieve stuff, it seems likely they'll work by having separated beliefs and goals; if they have bad goals, then we die because of convergent instrumental subgoals". I'm more sympathetic to this argument (though not nearly as much as Eliezer appears to be).

I agree that the intro talk that you link to would likely cause people to think of the first pathway (which I object to) rather than the second pathway. Similar rhetoric caused me to believe the first pathway for a while.

But it also looks like the sort of talk you might give if you were thinking about the second pathway, and then compressed it losing a bunch of nuance, and didn't notice that people might then instead think of the first pathway.

(It's not clear whether any of this changes the upshot of your post. I am mostly trying to preserve nuance so I get fewer people saying "I thought you thought utility functions are fake" which is definitely not what I said or believed.)

David Mathers🔸

Several thoughts:

I'm not sure I can argue for this, but it feels weird and off-putting to me that all this energy is being spent discussing how good a track-record one guy has, especially one guy with a very charismatic and assertive writing-style, and a history of attempting to provide very general guidance for how to think across all topics (though I guess any philosophical theory of rationality does the last thing.) It just feels like a bad sign to me, though that could just be for dubious social reasons.
The question of how much to defer to E.Y. isn't answered just by things like "he has possibly the best track record in the world on this issue." If he's out of step with other experts, and by a long way, we need to have reason to think he outperforms the aggregate of experts before we weight him more than the aggregate and it's entirely normal, I'd have thought, for the aggregate to significantly outperform the single best individual. (I'm not making as strong a claim as that the best individual outperforming the aggregate is super-unusual and unlikely.) Of course if you think he's nearly as good as the aggregate, then you should still move a decent amount in his direction. But even that is quite a strong claim that goes beyond him being in the handful of individuals with the best track record.
It strikes me that some of the people criticizing this post on the grounds that actually E.Y. has a great track record keep citing "he's been right that there is significant X-risk from A.I., when almost everyone else missed that' for a couple of reasons.

Firstly, this isn't actually a prediction that has been resolved as correct in any kind of unambiguous way. Sure, a lot of very smart people in the EA community now agree. (And I agree the risk is worth assigning EA resources to as well, to be clear.) But we should be wary of substituting the judgment of the community that a prediction looks rational, for a track record of predictions that have actually resolved successfully in my view. (I think the later is better evidence than the former in most cases.)

Secondly, I feel like E.Y. being right about the importance of A.I.-risk is actually not very surprising, conditional on the key assumption here about E.Y. that Ben is relying on in telling people to be cautious about the probabilities and timelines that E.Y. gives for A.I. doom, but that even given this, IF Ben's assumption is correct it's still a good reason to doubt E.Y.'s p(doom). Suppose, as is being alleged here, someone has a general bias, for whatever reasons towards the view that doom from some technological source or other is likely and imminent. Does that make it especially surprising that that individual finds an important source of doom most people have missed? Not especially that I can see: sure they will be less rational on the topic perhaps, but a) a bias towards p(doom) wbeing high doesn't necessarily imply being poor ranking sources of doom-risk by relative importance, and b) there is probably a counter-effect where bias towards doom makes you more likely to find underrated doom-risks, because you spend more time looking. Of course, finding a doom-risk larger than most others that approx. everyone had missed would still be a very impressive achievement. But the question Ben's addressing isn't "is E.Y. a smart person with insights about A.I. risk?" but rather "how much should we update on E.Y.'s views about p(near-term A.I. doom)?" Suppose significant bias towards doom is genuinely evidenced by E.Y.'s earlier nanotech prediction (which to be fair is only 1 data point) and a good record at identifying neglected important doom sources is only weak evidence that E.Y. lacks the bias. Then we'd be right to only update a little towards doom, even if E.Y.'s record on A.I. risk was impressive in some ways.

Charles He

-29

Some things that aren't said in this post or any comments in here yet:

The issue isn't at all about 15-20 year old content, it's about very recent content and events (mostly publicly visible)
In addition to this recent, publicly visible content, there are several latent issues or effects that directly affect progress in the relevant cause area
- To calibrate, this could be slowing things down by 10 times or more, in what is supposed to be the most important cause area in EA and whose effects are supposed to happen very soon
Certain comments here do not at all contain all of the relevant content, because laying them out risks damaging an entire cause area.
- Certain commentors may feel personally restricted from doing for a variety of complex reasons ("moral mazes") and the content they are presenting is a "second best" option
- The above interacts poorly with the customs and practices around discourse and criticism
  - These in totality have become sort of an odious and out of space specter, invisible to people who a lot of spend time here

David Mathers🔸

For all I know, you maybe right or not (insofar as I follow what's being insinuated), but whilst I freely admit that l, like anyone who wants to work in EA, have self-interested incentives to not be too critical of Eliezer, there is no specific secret "latent issue" that I personally am aware of and consciously avoiding talking about. Honest.

Charles He

I am grateful for your considerate comment and your reply. I had no belief or thought about dishonesty.

Maybe I should have added^[1]:

"this is for onlookers"
"this is trying to rationalize/explain why this post exists, that has 234 karma and 156 votes, yet only talks about high school stuff."

I posted my comment because this situation is hurting onlookers and producing bycatch?

I don't really know what to do here (as a communications thing) and I have incentives not to be involved?

^{^}
But this is sort of getting into the elliptical rhetoric and self-referential stuff, that is sort of related to the problem in the first place.

Guy Raveh

I think the effect should depend on your existing view. If you've always engaged directly with Yudkowsky's arguments and chose the ones convinced you, there's nothing to learn. If you thought he was a unique genius and always assumed you weren't convinced of things because he understood things you didn't know about, and believed him anyway, maybe it's time to dial it back. If you'd always assumed he's wrong about literally everything, it should be telling for you that OP had to go 15 years back for good examples.

Writing this comment actually helped me understand how to respond to the OP myself.

David Mathers🔸

'If you'd always assumed he's wrong about literally everything, it should be telling for you that OP had to go 15 years back to get good examples.' How strong evidence this is also depends on whether he has made many resolvable predictions since 15-years ago, right? If he hasn't it's not very telling. To be clear, I genuinely don't know if he has or hasn't.

Guy Raveh

Sounds reasonable. Though predictions aren't the only thing one can be demonstratably wrong about.

Guy Raveh

Some off-topic comments, not specific to you or Yudkowsky:

the belief was so analogous to his current belief about AI... since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community

It seems to me (but I could be mistaken) like I see the phrase "has thought a lot about X" fairly often in EA contexts, where it is taken to imply being very well-informed about X. I don't think this is good reasoning. Thinking about something is probably required for understanding it well, but is certainly not enough.
When an idea or theory is very fringe, there's a strong selection effect for people in the relevant intellectual community. This means even their average views are sometimes not good evidence for something. For example, to answer a question about the probability of doom from AI in this century, are alignment researchers a good reference class? They all naturally believe AI is an existential risk to begin with. I'm not sure I have the solution, since "AI researchers in general" isn't a good reference class either - many might have not given any thought to whether AI is dangerous.

[anonymous]

Strong +1 on this. It in fact seems like the more someone thinks about something and takes a public position on it with strong confidence the more incentive they have to stick to the position they have. It's why making explicit forecasts and creating a forecasting track record is so important in countering this tendency. If arguments cannot be resolved by events happening in the real world then there is not much incentive for one to change their mind especially if it's about something speculative and abstract that one can generate arguments for ad infinitum by engaging in more speculation.

On your example. The question of AI existential risk this century seems downstream to the question of the probability of AGI this century and one can find some potential reference classes for that: AI safety research, general AI research, computer science research, scientific research, technological innovation etc. None of these are perfect reference classes but are at least something to work with. Contingent on AGI being possible this century one can form an opinion on how low/high the probability of doom be to warrant concern.

iporphyry

I like that you admit that your examples are cherry-picked. But I'm actually curious what a non-cherry-picked track record would show. Can people point to Yudkowsky's successes? What did he predict better than other people? What project did MIRI generate that either solved clearly interesting technical problems or got significant publicity in academic/AI circles outside of rationalism/EA? Maybe instead of a comment here this should be a short-form question on the forum.

Matthew_Barnett

I like that you admit that your examples are cherry-picked. But I'm actually curious what a non-cherry-picked track record would show. Can people point to Yudkowsky's successes?

While he's not single-handedly responsible, he lead the movement to take AI risk seriously at a time when approximately no one was talking about it, which has now attracted the interests of top academics. This isn't a complete track record, but it's still a very important data-point. It's a bit like if he were the first person to say that we should take nuclear war seriously, and then five years later people are starting to build nuclear bombs and academics realize that nuclear war is very plausible.

bmg

While he's not single-handedly responsible, he lead the movement to take AI risk seriously at a time when approximately no one was talking about it, which has now attracted the interests of top academics. This isn't a complete track record, but it's still a very important data-point.

I definitely do agree with that!

It's possible I should have emphasized the significance of it more in the post, rather than moving on after just a quick mention at the top.

If it's of interest: I say a little more about how I think about this, in response to Gwern's comment below. (To avoid thread-duplicating, people might want to respond there rather than here if they have follow-on thoughts on this point.) My further comment is:

This is certainly a positive aspect of his track-record - that many people have now moved closer to his views. (It also suggests that his writing was, in expectation, a major positive contribution to the project of existential risk reduction - insofar as this writing has helped move people up and we assume this was the right direction to move.) But it doesn't imply that we should give him many more "Bayes points" to him than we give to the people who moved.

Suppose, for example, that someone says in 2020 that there was a 50% chance of full-scale nuclear war in the next five years. Then - due to Russia's invasion of Ukraine - most people move their credences upward (although they still remained closer to 0% than 50%). Does that imply the person giving the early warning was better-calibrated than the people who moved their estimates up? I don't think so. And I think - in this nuclear case - some analysis can be used to justify the view that the person giving the early warning was probably overconfident; they probably didn't have enough evidence or good enough arguments to actually justify a 50% credence.

It may still be the case that the person giving the early warning (in the hypothetical nuclear case) had some valuable and neglected insights, missed by others, that are well worth paying attention to and seriously reflecting on; but that's a different matter from believing they were overall well-calibrated or should be deferred to much more than the people who moved.

[[EDIT: Something else it might be worth emphasizing, here, is that I'm not arguing for the view "ignore Eliezer." It's closer to "don't give Eliezer's views outsized weight, compared to (e.g.) the views of the next dozen people you might be inclined to defer to, and factor in evidence that his risk estimates might have a significant upward bias to them."]]

RobBensinger

I work at MIRI, but as usual, this comment is me speaking for myself, and I haven’t heard from Eliezer or anyone else on whether they'd agree with the following.

My general thoughts:

The primary things I like about this post are that (1) it focuses on specific points of disagreement, encouraging us to then hash out a bunch of object-level questions; and (2) it might help wake some people from their dream if they hero-worship Eliezer, or if they generally think that leaders in this space can do no wrong.
- By "hero-worshipping" I mean a cognitive algorithm, not a set of empirical conclusions. I'm generally opposed to faux egalitarianism and the Modest-Epistemology reasoning discussed in Inadequate Equilibria: if your generalized anti-hero-worship defenses force the conclusion that there just aren't big gaps in skills or knowledge (or that skills and knowledge always correspond to mainstream prestige and authority), then your defenses are ruling out reality a priori. In saying "people need to hero-worship Eliezer less", I'm opposing a certain kind of reasoning process and mindset, not a specific factual belief like "Eliezer is the clearest thinker about AI risk".
  
  In a sense, I want to promote the idea that the latter is a boring claim, to be evaluated like any other claim about the world; flinching away from it (e.g., because Eliezer is weird and says sci-fi-sounding stuff) and flinching toward it (e.g., because you have a bunch of your identity invested in the idea that the Sequences are awesome and rationalists are great) are both errors of process.
The main thing I dislike about this post is that it introduces a bunch of not-obviously-false Eliezer-claims — claims that EAs either widely disagree about, or haven’t discussed — and then dives straight into ‘therefore Eliezer has a bad track record'.

E.g., I disagree that molecular nanotech isn't a big deal (if that's a claim you're making?), that Robin better predicted deep learning than Eliezer did, and that your counter-arguments against Eliezer and Bostrom are generally strong. Certainly I don't think these points have been well-established enough that it makes sense to cite them in the mode 'look at these self-evident ways Yudkowsky got stuff wrong; let us proceed straight to psychoanalysis, without dwelling on the case for why I think he's wrong about this stuff'. At this stage of the debate on those topics, it would be more appropriate to talk in terms of cruxes like 'I think the history of tech shows it's ~always continuous in technological change and impact', so it's clear why you disagree with Eliezer in the first place.
I generally think that EA’s core bottlenecks right now are related to ‘willingness to be candid and weird enough to make intellectual progress (especially on AI alignment), and to quickly converge on our models of the world’.

My own models suggest to me that EA’s path to impact is almost entirely as a research community and a community that helps produce other research communities, rather than via ‘changing the culture of the world at large’ or going into politics or what-have-you. In that respect, rigor and skepticism is good, but singling out Eliezer because he’s unusually weird and candid is bad, because it discourages others from expressing weird/novel/minority views and from blurting out their true thought processes. (I recognize that this isn’t the only reason you’re singling Eliezer out, but it’s obviously a contributing factor.)
I am a big fan of Ben’s follow-up comment. Especially the part where he outlines the thought process that led to him generating the post’s contents. I think this is an absolutely wonderful thing to include in a variety of posts, or to add in the comment sections for a lot of posts.

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

Some specific thoughts on Ben's follow-up comment:

1. I agree with Ben on this: “If a lot of people in the community believe AI is probably going to kill everyone soon, then (if they’re wrong) this can have really important negative effects”.

I think they’re not wrong, and I think the benefits of discussing this openly strongly outweigh the costs. But the negative effects are no less real for that.

(Separately, I think the “death with dignity” post was a suboptimal way to introduce various people to the view that p(doom) is very high. I’m much more confident that we should discuss this at all, than that Eliezer or I or others have been discussing this optimally.)

2. “Directly and indirectly, deference to Yudkowsky has a significant influence on a lot of people’s views”

Agreed.

Roughly speaking, my own view is:

EAs currently do a very high amount of deferring to others (both within EA and outside of EA) on topics like AI, global development, moral philosophy, economics, cause prioritization, organizational norms, personal career development, etc.
On the whole, EAs currently do a low amount of model-building and developing their own inside views.
EAs should switch to doing a medium amount of deference on topics like the ones I listed, and a very high amount of personal model-building.
- Note that model-building can be useful even if you think all your conclusions will be strictly worse than the models of some other person you've identified. I'm pretty radical on this topic, and think that nearly all EAs should spend a nontrivial fraction of their time developing their own inside-view models of EA-relevant stuff, in spite of the obvious reasons (like gains from specialization) that this would normally not make sense.
  - Happy to say more about my views here, and I'll probably write a post explaining why I think this.
- I think the Alignment Research Field Guide, in spite of nominally being about “alignment”, is the best current intro resource for “how should I go about developing my own models on EA stuff?” A lot of the core advice is important and generalizes extremely well, IMO.
Insofar as EAs should do deference at all, Eliezer is in the top tier of people it makes sense to defer to.
But I’d guess the current amount of Eliezer-deference is way too high, because the current amount of deference overall is way too high. Eliezer should get a relatively high fraction of the deference pie IMO, but the overall pie should shrink a lot.

3. I also agree with Ben on “The track records of influential intellectuals (including Yudkowsky) should be publicly discussed.”

I don’t like the execution of the OP, but I strongly disagree with the people in the comments who have said “let us never publicly talk about individuals’ epistemic track records at all”—both because I think ‘how good is EY’s reasoning’ is a genuine crux for lots of people, and because I think this is a very common topic people think about, both in more pro-Eliezer and in more anti-Eliezer camps.

Discussing cruxes is obviously good, but even if this weren’t a crux for anyone, I’m strongly in favor of EAs doing a lot more “sharing their actual thoughts out loud”, including the more awkward and potentially inflammatory ones. (I’m happy to say more about why I think this.)

I do think it’s worth talking about what the best way is to discuss individuals' epistemic track records, without making EA feel hostile/unpleasant/scary. I think EAs are currently way too timid (on average) about sharing their thoughts, so I worry about any big norm shifts that might make that problem even worse.

But Eliezer’s views are influential enough (and cover a topic, AGI, that is complicated and difficult enough to reason about) that this just seems like an important topic to me (similar to ‘how much should we defer to Paul?’, etc.). I’d rather see crappy discussion of this in the community than zero discussion whatsoever.

Some specific thoughts on claims in the OP:

such that all we can hope to do is “die with dignity.”

This is in large part Eliezer's fault for picking such a bad post title, but I should still note that this is a very misleading summary. "Dying with dignity" often refers to giving up on taking any actions to keep yourself alive.

Eliezer's version of "dying with dignity" is exactly the opposite: he's advocating for doing whatever it takes to maximize the probability that humanity survives.

It's true that he thinks we'll probably fail (and I agree), and he thinks we should emotionally reconcile ourselves with that fact (because he thinks this emotional reconciliation will itself increase our probability of surviving!!), but he doesn't advocate giving up.

Quoting the post:

"Q1: Does 'dying with dignity' in this context mean accepting the certainty of your death, and not childishly regretting that or trying to fight a hopeless battle?

"Don't be ridiculous. How would that increase the log odds of Earth's survival?"

At least up until 1999, admittedly when he was still only about 20 years old, Yudkowsky argued that transformative nanotechnology would probably emerge suddenly and soon (“no later than 2010”) and result in human extinction by default.

I think the "no later than 2010" prediction is from when Eliezer was 20, but the bulk of the linked essay was written when he was 17. The quotation here is: "As of '95, Drexler was giving the ballpark figure of 2015. I suspect the timetable has been accelerated a bit since then. My own guess would be no later than 2010."

The argument for worrying about extinction via molecular nanotech to some non-small degree seems pretty straightforward and correct: molecular nanotech lets you build arbitrary structures, including dangerous ones, and some humans would want to destroy the world given the power to do so.

Eliezer was overconfident about nanotech timelines (though roughly to the same degree as Drexler, the world's main authority on nanotech).

Eliezer may have also been overconfident about nanotech's riskiness, but the specific thing he said when he was 17 is that he considered it important for humanity to achieve AGI "before nanotechnology, given the virtual certainty of deliberate misuse - misuse of a purely material (and thus, amoral) ultratechnology, one powerful enough to destroy the planet".

It's not clear to me whether this is saying that human-extinction-scale misuse from nanotech is 'virtually certain', versus the more moderate claim that some misuse is 'virtually certain' if nanotech sees wide usage (and any misuse is pretty terrifying in EV terms). The latter seems reasonable to me, given how powerful molecular nanotechnology would be.

Eliezer denies that he has a general tendency toward alarmism:

[Ngo][18:19]]
(As a side note, I think that if Eliezer had been around in the 1930s, and you described to him what actually happened with nukes over the next 80 years, he would have called that "insanely optimistic".)

[Yudkowsky][18:21]
Mmmmmmaybe. Do note that I tend to be more optimistic than the average human about, say, global warming, or everything in transhumanism outside of AGI.
Nukes have going for them that, in fact, nobody has an incentive to start a global thermonuclear war. Eliezer is not in fact pessimistic about everything and views his AGI pessimism as generalizing to very few other things, which are not, in fact, as bad as AGI.

[Ngo][18:27]
[...] So yeah, I picture 1930s-Eliezer pointing to technological trends and being like "by default, 30 years after the first nukes are built, you'll be able to build one in your back yard. And governments aren't competent enough to stop that happening."

And I don't think I could have come up with a compelling counterargument back then.

[Yudkowsky][18:29]
So, I mean, in fact, I don't prophesize doom from very many trends at all! It's literally just AGI that is anywhere near that unmanageable! Many people in EA are more worried about biotech than I am, for example.

It seems fair to note that nanotech is a second example of Eliezer raising alarm bells. But this remains a pretty small number of data points, and in neither of those cases does it actually look unreasonable to worry a fair bit—those are genuinely some of the main ways we could destroy ourselves.

I think 'Eliezer predicted nanotech way too early' is a better data point here, as evidence for 'maybe Eliezer tends to have overly aggressive tech forecasts'.

If Eliezer was deferring to Drexler to some extent, that makes the data a bit less relevant, but 'I was deferring to someone else who was also wrong' is not in fact a general-purpose excuse for getting the wrong answer.

In 2001, and possibly later, Yudkowsky apparently believed that his small team would be able to develop a “final stage AI” that would “reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010.”
In the first half of the 2000s, he produced a fair amount of technical and conceptual work related to this goal. It hasn't ultimately had much clear usefulness for AI development, and, partly on the basis, my impression is that it has not held up well - but that he was very confident in the value of this work at the time.

That view seems very dumb to me — specifically the belief that SingInst's very first unvetted idea would pan out and result in them building AGI, more so than the timelines per se.

I don't fault 21-year-old Eliezer for trying (except insofar as he was totally wrong about the probability of Unfriendly AI at the time!), because the best way to learn that a weird new path is unviable is often to just take a stab at it. But insofar as 2001-Eliezer thought his very first idea was very likely to work, this seems like a totally fair criticism of the quality of his reasoning at the time.

Looking at the source text, I notice that the actual text is much more hedged than Ben's summary (though it still sounds foreseeably overconfident to me, to the extent I can glean likely implicit probabilities from tone):

[...] The Singularity Institute is fully aware that creating true intelligence will not be easy. In addition to the enormous power deficit between modern computers and the human brain, there is an even more severe software deficit. The software of the human brain is the result of millions of years of evolution and contains perhaps tens of thousands of complex functional adaptations. The human brain itself is not a homogenous lump but a highly modular supersystem; the cerebral cortex is divided into two hemispheres, each containing 52 areas, each area subdivided into a half-dozen distinguishable maps. Cortical neurons group into minicolumns of perhaps a hundred neurons and macrocolumns of a few hundred minicolumns, with perhaps 1,000 macrocolumns to a cortical map. Of the 750 megabytes of human DNA, the vast majority is believed to be junk and 98% is identical to chimpanzee DNA, with perhaps 1% being concerned with intelligence - leaving 7.5 megabytes to specify, not the actual wiring of the brain, but the neuroanatomy of areas and maps and pathways, and the initial tiling patterns and learning algorithms for neurons and minicolumns and macrocolumns.
The Singularity Institute seriously intends to build a true general intelligence, possessed of all the key subsystems of human intelligence, plus design features unique to AI. We do not hold that all the complex features of the human mind are "emergent", or that intelligence is the result of some simple architectural principle, or that general intelligence will appear if we simply add enough data or computing power. We are willing to do the work required to duplicate the massive complexity of human intelligence; to explore the functionality and behavior of each system and subsystem until we have a complete blueprint for a mind. For more about our Artificial Intelligence plans, see the document General Intelligence and Seed AI.
Our specific cognitive architecture and development plan forms our basis for answering questions such as "Will transhumans be friendly to humanity?" and "When will the Singularity occur?" At the Singularity Institute, we believe that the answer to the first question is "Yes" with respect to our proposed AI design - if we didn't believe that, the Singularity Institute would not exist. Our best guess for the timescale is that our final-stage AI will reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010. As always with basic research, this is only a guess, and heavily contingent on funding levels. [...]

A later piece of work which I also haven’t properly read is “Levels of Organization in General Intelligence.”

Note that this paper was written much earlier than its publication date. Description from yudkowsky.net: "Book chapter I wrote in 2002 for an edited volume, Artificial General Intelligence, which is now supposed to come out in late 2006. I no longer consider LOGI’s theory useful for building de novo AI. However, it still stands as a decent hypothesis about the evolutionary psychology of human general intelligence."

Although Hanson very clearly wasn’t envisioning something like deep learning either, his side of the argument seems to fit better with what AI progress has looked like over the past decade.

I agree that Eliezer loses Bayes points (e.g., relative to Shane Legg and Dario Amodei) for not predicting the enormous success of deep learning. See also Nate's recent post about this.

I disagree that Robin Hanson scored Bayes points off of Eliezer, on net, from the deep learning revolution, or that Hanson's side of the Foom debate looks good (compared to Eliezer's) with the benefit of hindsight. I side with Gwern here; I think Robin's predictions and arguments on this topic have been terrible, as a rule.

I think that Yudkowsky's prediction - that a small amount of code, run using only a small amount of computing power, was likely to abruptly jump economic output upward by more than a dozen orders-of-magnitude - was extreme enough to require very strong justifications.

I think Eliezer assigned too high a probability to 'it's easy to find relatively clean, understandable approaches to AGI', and too low a probability to 'it's easy to find relatively messy, brute-forced approaches to AGI'. A consequence of the latter is that he (IMO) underestimated how compute-intensive AGI was likely to be, and overestimated how important recursive self-improvement was likely to be.

I otherwise broadly agree with his picture. E.g.:

I expect AGI to represent a large, sharp capabilities jump. (I think this is unlikely to require a bunch of recursive self-improvement.)
I think AGI is mainly bottlenecked on software, rather than hardware. (E.g., I think GPT-3 is impressive, but isn't a baby AGI; rather than AGI just being 'current systems but bigger', I expect at least one more key insight lies on the shortest likely path to AGI.)
And I expect AGI to be much more efficient than current systems at utilizing small amounts of data. Though (because it's likely to come from a relatively brute-forced, unalignable approach) I still expect it to be more compute-intensive than 2009-Eliezer was imagining.

However, later analysis has suggested that coherence arguments have either no or very limited implications for how we should expect future AI systems to behave.

This seems completely wrong to me. See Katja Grace's Coherence arguments imply a force for goal-directed behavior.

RobBensinger

I think that part of why Eliezer's early stuff sounds weird is:

He generally had a lower opinion of the competence of elites in business, science, etc. (Which he later updated about.)
He had a lower opinion of the field of AI in particular, as it existed in the 1990s and 2000s. Maybe more like nutrition science or continental philosophy than like chemistry, on the scale of 'field rigor and intellectual output'.

If you think of A(G)I as a weird, neglected, pre-paradigmatic field that gets very little attention outside of science fiction writing, then it's less surprising to think it's possible to make big, fast strides in the field. Outperforming a competitive market is very different from outperforming a small, niche market where very little high-quality effort is going into trying new things.

Similarly, if you have a lower opinion of elites, you should be more willing to endorse weird, fringe ideas, because you should be less confident that the mainstream is efficient relative to you. (And I think Eliezer still has a low opinion of elites on some very important dimensions, compared to a lot of EAs. But not to the same degrees as teenaged Eliezer.)

From Competent Elites:

[...]
I used to think—not from experience, but from the general memetic atmosphere I grew up in—that executives were just people who, by dint of superior charisma and butt-kissing, had managed to work their way to the top positions at the corporate hog trough.
No, that was just a more comfortable meme, at least when it comes to what people put down in writing and pass around. The story of the horrible boss gets passed around more than the story of the boss who is, not just competent, but more competent than you.
[...]
But the business world is not the only venue where I've encountered the upper echelons and discovered that, amazingly, they actually are better at what they do.
Case in point: Professor Rodney Brooks, CTO of iRobot and former director of the MIT AI Lab, who spoke at the 2007 Singularity Summit. I had previously known "Rodney Brooks" primarily as the promoter of yet another dreadful nouvelle paradigm in AI—the embodiment of AIs in robots, and the forsaking of deliberation for complicated reflexes that didn't involve modeling. Definitely not a friend to the Bayesian faction. Yet somehow Brooks had managed to become a major mainstream name, a household brand in AI...
And by golly, Brooks sounded intelligent and original. He gave off a visible aura of competence.

And from Above-Average AI Scientists:

At one of the first conferences organized around the tiny little subfield of Artificial General Intelligence, I met someone who was heading up a funded research project specifically declaring AGI as a goal, within a major corporation. I believe he had people under him on his project. He was probably paid at least three times as much as I was paid (at that time). His academic credentials were superior to mine (what a surprise) and he had many more years of experience. He had access to lots and lots of computing power.
And like nearly everyone in the field of AGI, he was rushing forward to write code immediately—not holding off and searching for a sufficiently precise theory to permit stable self-improvement.
In short, he was just the sort of fellow that... Well, many people, when they hear about Friendly AI, say: "Oh, it doesn't matter what you do, because [someone like this guy] will create AI first." He's the sort of person about whom journalists ask me, "You say that this isn't the time to be talking about regulation, but don't we need laws to stop people like this from creating AI?"
"I suppose," you say, your voice heavy with irony, "that you're about to tell us, that this person doesn't really have so much of an advantage over you as it might seem. Because your theory—whenever you actually come up with a theory—is going to be so much better than his. Or," your voice becoming even more ironic, "that he's too mired in boring mainstream methodology—"
No. I'm about to tell you that I happened to be seated at the same table as this guy at lunch, and I made some kind of comment about evolutionary psychology, and he turned out to be...
...a creationist.
This was the point at which I really got, on a gut level, that there was no test you needed to pass in order to start your own AGI project.
One of the failure modes I've come to better understand in myself since observing it in others, is what I call, "living in the should-universe". The universe where everything works the way it common-sensically ought to, as opposed to the actual is-universe we live in. There's more than one way to live in the should-universe, and outright delusional optimism is only the least subtle. Treating the should-universe as your point of departure—describing the real universe as the should-universe plus a diff—can also be dangerous.
Up until the moment when yonder AGI researcher explained to me that he didn't believe in evolution because that's not what the Bible said, I'd been living in the should-universe. In the sense that I was organizing my understanding of other AGI researchers as should-plus-diff. I saw them, not as themselves, not as their probable causal histories, but as their departures from what I thought they should be.
[...] When Scott Aaronson was 12 years old, he: "set myself the modest goal of writing a BASIC program that would pass the Turing Test by learning from experience and following Asimov's Three Laws of Robotics. I coded up a really nice tokenizer and user interface, and only got stuck on the subroutine that was supposed to understand the user's question and output an intelligent, Three-Laws-obeying response." It would be pointless to try and construct a diff between Aaronson₁₂ and what an AGI researcher should be. You've got to explain Aaronson₁₂ in forward-extrapolation mode: He thought it would be cool to make an AI and didn't quite understand why the problem was difficult.
It was yonder creationist who let me see AGI researchers for themselves, and not as departures from my ideal.
[...]
The really striking fact about the researchers who show up at AGI conferences, is that they're so... I don't know how else to put it...
...ordinary.
Not at the intellectual level of the big mainstream names in Artificial Intelligence. Not at the level of John McCarthy or Peter Norvig (whom I've both met).
More like... around, say, the level of above-average scientists, which I yesterday compared to the level of partners at a non-big-name venture capital firm. Some of whom might well be Christians, or even creationists if they don't work in evolutionary biology.
The attendees at AGI conferences aren't literally average mortals, or even average scientists. The average attendee at an AGI conference is visibly one level up from the average attendee at that random mainstream AI conference I talked about yesterday.
[...] But even if you just poke around on Norvig or McCarthy's website, and you've achieved sufficient level yourself to discriminate what you see, you'll get a sense of a formidable mind. Not in terms of accomplishments—that's not a fair comparison with someone younger or tackling a more difficult problem—but just in terms of the way they talk. If you then look at the website of a typical AGI-seeker, even one heading up their own project, you won't get an equivalent sense of formidability.
[...] If you forget the should-universe, and think of the selection effect in the is-universe, it's not difficult to understand. Today, AGI attracts people who fail to comprehend the difficulty of AGI. Back in the earliest days, a bright mind like John McCarthy would tackle AGI because no one knew the problem was difficult. In time and with regret, he realized he couldn't do it. Today, someone on the level of Peter Norvig knows their own competencies, what they can do and what they can't; and they go on to achieve fame and fortune (and Research Directorship of Google) within mainstream AI.
And then...
Then there are the completely hopeless ordinary programmers who wander onto the AGI mailing list wanting to build a really big semantic net.
Or the postdocs moved by some (non-Singularity) dream of themselves presenting the first "human-level" AI to the world, who also dream an AI design, and can't let go of that.
Just normal people with no notion that it's wrong for an AGI researcher to be normal.
Indeed, like most normal people who don't spend their lives making a desperate effort to reach up toward an impossible ideal, they will be offended if you suggest to them that someone in their position needs to be a little less imperfect.
This misled the living daylights out of me when I was young, because I compared myself to other people who declared their intentions to build AGI, and ended up way too impressed with myself; when I should have been comparing myself to Peter Norvig, or reaching up toward E. T. Jaynes. (For I did not then perceive the sheer, blank, towering wall of Nature.)
I don't mean to bash normal AGI researchers into the ground. They are not evil. They are not ill-intentioned. They are not even dangerous, as individuals. Only the mob of them is dangerous, that can learn from each other's partial successes and accumulate hacks as a community.
And that's why I'm discussing all this—because it is a fact without which it is not possible to understand the overall strategic situation in which humanity finds itself, the present state of the gameboard. It is, for example, the reason why I don't panic when yet another AGI project announces they're going to have general intelligence in five years. It also says that you can't necessarily extrapolate the FAI-theory comprehension of future researchers from present researchers, if a breakthrough occurs that repopulates the field with Norvig-class minds.
Even an average human engineer is at least six levels higher than the blind idiot god, natural selection, that managed to cough up the Artificial Intelligence called humans, by retaining its lucky successes and compounding them. And the mob, if it retains its lucky successes and shares them, may also cough up an Artificial Intelligence, with around the same degree of precise control. But it is only the collective that I worry about as dangerous—the individuals don't seem that formidable.
If you yourself speak fluent Bayesian, and you distinguish a person-concerned-with-AGI as speaking fluent Bayesian, then you should consider that person as excepted from this whole discussion.
Of course, among people who declare that they want to solve the AGI problem, the supermajority don't speak fluent Bayesian.
Why would they? Most people don't.

I think this, plus Eliezer's general 'fuck it, I'm gonna call it like I see it rather than be reflexively respectful to authority' attitude, explains most of Ben's 'holy shit, your views were so weird!!' thing.

TAG

"I don't fault 21-year-old Eliezer for trying (except insofar as he was totally wrong about the probability of Unfriendly AI at the time!), because the best way to learn that a weird new path is unviable is often to just take a stab at it"

It was only weird in that involved technologies and methods that were unlikely to work, and EY could have figured that out theoretically by learning more about AI and software development.

Lorenzo Buonanno🔸

I believe Drexler is now giving the ballpark figure of 2013. My own guess would be no later than 2010…

I didn't see the "my own guess" part in the linked document (or the archived version), but it's visible here, was probably edited between 2001 and 2004. Mentioned it in case others are confused after trying to find the quote in context.

[anonymous]

Perhaps also relevant, though it isn’t forecasting, is Eliezer’s weak (in my opinion) attempted takedown of Ajeya Cotra’s bioanchors report on AI timelines. Here’s Eliezer’s bioanchors takedown attempt, here’s Holden Karnofsky’s response to Eliezer, and here’s Scott Alexander’s response.

RobBensinger

Eliezer's post was less a takedown of the report, and more a takedown of the idea that the report provides a strong basis for expecting AGI in ~2050, or for discriminating scenarios like 'AGI in 2030', 'AGI in 2050', and 'AGI in 2070'.

The report itself was quite hedged, and Holden posted a follow-up clarification emphasizing that “biological anchors” is about bounding, not pinpointing, AI timelines. So it's not clear to me that Eliezer and Ajeya/Holden/etc. even disagree about the core question "do biological anchors provide a strong case for putting a median AGI year in ~2050?", though maybe they disagree on the secondary question of how useful the "bounds" are.

Copying over my high-level view, which I recently wrote on Twitter:

I agree with the basic Eliezer argument in Biology-Inspired AGI Timelines that the bio-anchors stuff isn't important or useful because AGI is a software problem, and we neither know which specific software insights are needed, nor how long it will take to get to those software insights, nor the relationship between those insights and hardware requirements.
Focusing on things like bio-anchors and hardware trends is streetlight-fallacy reasoning: it's taking the 2% of the territory we do know about and heavily heavily focusing on that 2%, while shrugging our shoulders at the other 98%.
Like, bio-anchors reasoning might help tell you whether to expect AGI this century versus expecting it in a thousand years, but it won't help you discriminate 2030 from 2050 from 2070 at all.
Insofar as we need to think about timelines at all, it's true that we need some sort of prior, at least a very vague one.
The problem with the heuristic 'look under the streetlight and anchor your prior to whatever you found under the streetlight, however marginal' is that the info under the streetlight isn't a random sampling from the space of relevant unknown facts about AGI; it's a very specific and unusual kind of information.
IMO you'd be better off thinking first about that huge space of unknowns and anchoring to far fuzzier and more uncertain guesses about the whole space, rather than fixating on a very specific much-more-minor fact that's easier to gather data about.
E.g., consider five very different a priori hypotheses about 'what insights might be needed for AGI', another five very different hypotheses about 'how might different sorts of software progress relate to hardware requirements', etc.
Think about different world-histories that might occur, and how surprised you'd be by those world-histories.
Think about worlds where things go differently than you're expecting in 2060, and about what those worlds would genuinely retrodict about the present / past.
E.g., I think scenario analysis makes it more obvious that in worlds where AGI is 30 years away, current trends will totally break at some point on that path, radically new techniques will be developed, etc.
Think about how different the field of AI was in 1992 compared to today, or in 1962 compared to 1992.
When you're spending most of your time looking under the streetlight — rather than grappling with how little is known, trying to painstakingly refine your instincts and intuitions about the harder-to-reason-about aspects of the problem, etc. — I think it becomes overly tempting to treat current trendlines as laws of nature that will be true forever (or that at least have a strong default of being true forever), rather than as 'patterns that arose a few years ago and will plausibly continue for a few years more, before being replaced by new patterns and growth curves'.
Cf. https://twitter.com/robbensinger/status/1537585485211545604

RobBensinger

Commenting on a few minor points from Scott's post, since I meant to write a full reply at some point but haven't had the time:

But also, there are about 10^15 synapses in the brain, each one spikes about once per second, and a synaptic spike probably does about one FLOP of computation. [...] So a human-level AI would also need to do 10^15 floating point operations per second? Unclear.

I'd say 'clearly not, for some possible AI designs'; but maybe it will be true for the first AIs we actually build, shrug.

Or you might do what OpenPhil did and just look at a bunch of examples of evolved vs. designed systems and see which are generally better:

Why aren't there examples like 'amount of cargo a bird can carry compared to an airplane', or 'number of digits a human can multiply together in ten seconds compared to a computer'?

Seems like you'll get a skewed number if your brainstorming process steers away from examples like these altogether.

'AI physicist' is less like an artificial heart (trying to exactly replicate the structure of a biological organ functioning within a specific body), more like a calculator (trying to do a certain kind of cognitive work, without any constraint at all to do it in a human-like way).

MichaelDickens

I read this post kind of quickly, so apologies if I'm misunderstanding. It seems to me that this post's claim is basically:

Eliezer wrote some arguments about what he believes about AI safety.
People updated toward Eliezer's beliefs.
Therefore, people defer too much to Eliezer.

I think this is dismissing a different (and much more likely IMO) possibility, which is that Eliezer's arguments were good, and people updated based on the strength of the arguments.

(Even if his recent posts didn't contain novel arguments, the arguments still could have been novel to many readers.)

Linch

I'm a bit confused by both this post and comments about questions like what level/timing the deference happens.

Speaking for myself, if an internet rando wrote a random blog post called "AGI Ruin: A List of Lethalities," I probably would not read it. But I did read Yudkowsky's post carefully and thought about it nontrivially, mostly due to his track record and writing ability (rather than e.g. because the title was engaging or because the first paragraph was really well-argued).

TAG

"which is that Eliezer's arguments were good,"

There is plenty of evidence against that. His arguments on other subjects aren't good (see OP), his arguments on AI aren't informed by academic expertise or industry experience, his predictions are bad,etc.

Dr. Dante

8mo

I’ve been following the ongoing debate around Yudkowsky and AI alignment, and I find it fascinating how much still depends on whether ethics can only be imposed externally or might actually emerge from within an intelligent system.Recently I read a short book that really shifted my perspective — TAOSHIDŌ The Loop of Ethical Alignment It treats alignment less as obedience and more as a form of internal balance — clarity guiding action, integrity regulating direction, adaptation maintaining stability. It’s a very different approach, more philosophical than technical, but it got me thinking: could ethical self-regulation ever complement the traditional control-based models of AI safety?

JKM

I'm confused by the fact Eliezer's post was posted on April Fool's day. To what extent does that contribute to conscious exaggeration on his part?

Guy Raveh

Right? Up to reading this post, I was convinced it was an April Fool's post.

RobBensinger

The post is serious. Details: https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy?commentId=FounAZsg4kFxBDiXs

David Mathers🔸

It seems really bad, from a communications/PR point of view, to write something that was ambiguous in this way. Like, bad enough that it makes me slightly worried that MIRI will commit some kind of big communications error that gets into the newspapers and does big damage to the reputation of EA as a whole.

VictorSintNicolaas

As someone not active in the field of AI risk, and having always used epistemic deference quite heavily, this feels very helpful. I hope it doesn't end up reducing society's efforts to stop AI from taking over the world some day.

JulianHazell

On the contrary, my best guess is that the “dying with dignity” style dooming is harming the community’s ability to tackle AI risk as effectively as it otherwise could

David Johnston

I agree with many of the comments here that this is overall a bit unfair, and there are good reasons to take Yudkowsky seriously even if you don't automatically accept his self-expressed level of confidence.

My main criticism of Yudkowsky is that he has many innovative/somewhat compelling ideas, but even with many years and a research institution their evolution has been unsatisfying. Many of them are still imprecise, and some of those that are precise(ish) are not satisfactory (e.g the orthogonality thesis, mesa-optimizers). Furthermore, he still doesn't seem very interested in improving this situation.

Derek Shiller

then it would be a violation of the law of the conservation of expected evidence for you to update your beliefs on observing the passage of a minute without the bomb's exploding.

Interesting! I would think this sort of case just shows that the law of conservation of expected evidence is wrong, at least for this sort of application. I figure it might depend on how you think about evidence. If you think of the infinite void of non-existence as possibly constituting your evidence (albeit evidence you're not in a position to appreciate, being dead and all), then that principle wouldn't push you toward this sort of anthropic reasoning.

I am curious, what do you make of the following case?

Suppose you're touring Acme Bomb & Replica Bomb Co with your friend Eli. ABRBC makes bombs and perfect replicas of bombs, but they're sticklers for safety so they alternate days for real bombs and replicas. You're not sure which sort of day it is. You get to the point of the tour where they show off the finished product. As they pass around the latest model from the assembly line, Eli drops it, knocking the safety back and letting the bomb (replica?) land squarely on its ignition button. If it were a real bomb, it would kill everyone unless it were one of the 1-in-a-million bombs that's a dud. You hold your breath for a second but nothing happens. Whew. How much do you want to bet that it's a replica day?

Zach Stein-Perlman

Almost all of this seems reasonable. But:

Yudkowsky has previously held short AI timeline views that turned out to be wrong

I don't think we should update based on this, or eg on the fact that we didn't go extinct due to nanotechnology, because anthropics / observer selection. (We should only update based on whether we think the reasons for those beliefs were bad.)

Derek Shiller

Suppose you've been captured by some terrorists and you're tied up with your friend Eli. There is a device on the other side of the room you that you can't quite make out. Your friend Eli says that he can tell (he's 99% sure) it is a bomb and that it is rigged to go off randomly. Every minute, he's confident there's a 50-50 chance it will explode, killing both of you. You wait a minute and it doesn't explode. You wait 10. You wait 12 hours. Nothing. He starts eying the light fixture, and say's he's pretty sure there's a bomb there too. You believe him?

Zach Stein-Perlman

No, my survival for 12 hours is evidence against Eli being correct about the bomb.

So: oops, I think.

Zach Stein-Perlman

I'm still not totally comfortable. I think my confusion arose because I was considering the related question of whether I could use my better knowledge than Eli to win money from bets (in expectation) -- I couldn't, because Eli has no reason to bet on the bomb going off. More generally, Eliezer never had reason to bet (in the sense that he gets epistemic credit if he's right) on nanotech-doom-by-2010, because in the worlds where he's right we're dead. It feels weird to update against Eliezer on the basis of beliefs that he wouldn't have bet on; updating against him doesn't seem to be incentive-compatible... but maybe that's just the sacrifice immanent to the epistemic virtue of publicly sharing your belief in doom.

rhollerith

I am willing to bite your bullet.

I had a comment here explaining my reasoning, but deleted it because I plan to make a post instead.

Yonatan Cale

I think posts like this better open with "but consider forming your own opinions rather than relying on experts"

𝕮𝖎𝖓𝖊𝖗𝖆

I prefer to just analyse and refute his concrete arguments on the object level.

I'm not a fan of engaging the person of the arguer instead of their arguments.

Granted, I don't practice epistemic deference in regards to AI risk (so I'm not the target audience here), but I'm really not a fan of this kind of post. It rubs me the wrong way.

Challenging someone's overall credibility instead of their concrete arguments feels like bad form and [logical rudeness] (https://www.lesswrong.com/posts/srge9MCLHSiwzaX6r/logical-rudeness).

I wish EAs did not engage in such behaviour and especially not with respect to other members of the community.

bmg

I prefer to just analyse and refute his concrete arguments on the object level.

I agree that work analyzing specific arguments is, overall, more useful than work analyzing individual people's track records. Personally, partly for that reason, I've actually done a decent amount of public argument analysis (e.g. here, here, and most recently here) but never written a post like this before.

Still, I think, people do in practice tend to engage in epistemic deference. (I think that even people who don't consciously practice epistemic deference tend to be influenced by the views of people they respect.) I also think that people should practice some level of epistemic deference, particularly if they're new to an area. So - in that sense - I think this kind of track record analysis is still worth doing, even if it's overall less useful than argument analysis.

𝕮𝖎𝖓𝖊𝖗𝖆

(I hadn't seen this reply when I made my other reply).

What do you think of legitimising behaviour that calls out the credibility of other community members in the future?

I am worried about displacing the concrete object level arguments as the sole domain of engagement. A culture in which arguments cannot be allowed to stand by themselves. In which people have to be concerned about prior credibility, track record and legitimacy when formulating their arguments...

It feels like a worse epistemic culture.

Karthik Tadepalli

Expert opinion has always been a substitute for object level arguments because of deference culture. Nobody has object level arguments for why x-risk in the 21st century is around 1/6: we just think it might be because Toby Ord says so and he is very credible. Is this ideal? No. But we do it because expert priors are the second best alternative when there is no data to base our judgments off of.

Given this, I think criticizing an expert's priors is functionally an object level argument, since the expert's prior is so often used as a substitute for object level analysis.

I agree that a slippery slope endpoint would be bad but I do not think criticizing expert priors takes us there.

𝕮𝖎𝖓𝖊𝖗𝖆

To expand on my complaints in the above comment.

I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I think that's unhealthy and contrary to collaborative knowledge growing.

Yudkowsky has laid out his arguments for doom at length. I don't fully agree with those arguments (I believe he's mistaken in 2 - 3 serious and important ways), but he has laid them out, and I can disagree on the object level with him because of that.

Given that the explicit arguments are present, I would prefer posts that engaged with and directly refuted the arguments if you found them flawed in some way.

I don't like this direction of attacking his overall credibility.

Attacking someone's credibility in lieu of their arguments feels like a severe epistemic transgression.

I am not convinced that the community is better for a norm that accepts such epistemic call out posts.

bmg

I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I think I roughly agree with you on this point, although I would guess I have at least a somewhat weaker version of your view. If discourse about people's track records or reliability starts taking up (e.g.) more than a fifth of the space that object-level argument does, within the most engaged core of people, then I do think that will tend to suggest an unhealthy or at least not-very-intellectually-productive community.

One caveat: For less engaged people, I do actually think it can make sense to spend most of your time thinking about questions around deference. If I'm only going to spend ten hours thinking about nanotechnology risk, for example, then I might actually want to spend most of this time trying to get a sense of what different people believe and how much weight I should give their views; I'm probably not going to be able to make a ton of headway getting a good gears-level-understanding of the relevant issues, particularly as someone without a chemistry or engineering background.

Holly Elmore ⏸️ 🔸

> I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I think it's fair to talk about a person's lifetime performance when we are talking about forecasting. When we don't have the expertise ourselves, all we have to go on is what little we understand and the track records of the experts we defer to. Many people defer to Eliezer so I think it's a service to lay out his track record so that we can know how meaningful his levels of confidence and special insights into this kind of problem are.

Guy Raveh

I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I don't think this is realistic. There is much more important knowledge than one can engage with in a lifetime. The only way of forming views about many things is to somehow decide who to listen to, or at least how to aggregate relevant more strongly based opinions (so, who to count as an expert and who not to and with what weight).

genidma

-1

Tldr

Personally and from my very uneducated vantage point. I question why a superintelligence with a truly universal set of ethics, would pose a risk to other lifeforms. But I also do not know how the initial conditions can be architected. If indeed the initial conditions can be set/architected. That could go a different set of ways and depending on who's values.
What I worry about is what humans (enhanced or not) and cyborgs may chose to do with the bread-crumbs (the leftovers). Or the steps taken to get to AGI.

Here is a schematic (link below) that I started meditating on yesterday. I am not sure if it's polite to share, particularly in light of a reality that I have not taken the time to absorb the post above. But here goes and sharing it, as it may (or may not) help provide some value to someone. Hopefully in a manner that is reasonable. https://qr.ae/pvoVJn

Charles He

-24

The karma on this post is impressive especially since OP could have started this AM UK time but didn’t.

I want to say stuff but it’s not going to help?

On Deference and Yudkowsky's AI Risk Estimates

On Deference and Yudkowsky's AI Risk Estimates

Introduction

Why write this post?

Yudkowsky’s track record: some cherry-picked examples

Fairly clearcut examples

1. Predicting near-term extinction from nanotech

2. Predicting that his team had a substantial chance of building AGI before 2010

Somewhat disputable examples

3. Having high confidence that AI progress would be extremely discontinuous and localized and not require much compute

4. Treating early AI risk arguments as close to decisive

5. Treating "coherence arguments" as forceful

A somewhat meta example

6. Not acknowledging his mixed track record