SH

Seth Herd

85 karmaJoined

Comments
32

I don't have a nice clean citation. I don't think one exists. I've looked at an awful lot of individual opinions and different surveys. I guess the biggest reason I'm convinced this correlation exists is that arguments for low p(doom) very rarely actually engage arguments for risk at their strong points (when they do the discussions are inconclusive in both directions - I'm not arguing that alignmen is hard but that it's very much unknown how hard it is). 

There appears to be a very high correlation between misunderstanding the state of play, and optimism.  And because it's a very complex state of arguments, the vast majority of the world misunderstands it pretty severely.  

I very much wish it was otherwise; I am an optimist who has become steadily more pessimistic as I've made alignment my full-time focus - because the arguments against are subtle (and often poorly communicated) but strong.

 

They arguments for the difficulty of alignment are far too strong to be rationally dismissed down to the 1.4% or whatever it was that the superforecasters arrived at. They have very clearly missed some important points of argument.  

The anticorrelation with academic success seems quite right and utterly irrelevant. As a career academic, I have been noticing for decades that academic success has some quite perverse incentives.

I agree that there are bad arguments for pessimism as well as optimism. The use of bad logic in some prominent arguments says nothing about the strength of other arguments. Arguments on both sides are far from conclusive. So you can hope arguments for the fundamental difficulty of aligning network-based AGI are wrong, but assigning a high probability they're wrong without understanding them in detail and constructing valid counterarguments is tempting but not rational.

If there's a counterargument you find convincing, please point me to it! Because while I'm arguing from the outside view, my real argument is that this is an issue that is unique in intellectual history, so it can really only be evaluated from the inside view. So that's where most of my thoughts on the matter go. 

All of which isn't to say the doomers are right and we're doomed if we don't stop building network-based AGI. I'm saying we don't know.  I'm arguing that assigning a high probability right now based on limited knowledge to humanity accomplishing alignment is not rationally justified.

I think that fact is reflected in the correlation of p(doom) with time-on-task only on alignment specifically. If that's wrong I'd be shocked, because it looks very strong to me, and I do work hard to correct for my own biases. But it's possible I'm wrong about this correlation. If so it will make my day and perhaps my month or year!

It is ultimately a question that needs to be resolved at the object level; we just need to take guesses about how to assign resources based on outside views.

 

I see! Thanks for the clarification. It's a fascinating argument if I'm understanding it correctly now: it could be worth substantially increasing our risk of extinction if we more substantially increased our odds of capturing more of the potential value in our light cone.

I'm not a dedicated utilitarian, so I typically tend to value futures with some human flourishing and little suffering vastly higher than futures with no sentient beings. But I am actually convinced that we should tilt a little toward futures with more flourishing.

Aligning AGI seems like the crux for both survival and flourishing (and aligning society, in the likely case that "aligned" AGI is intent-aligned to take orders from individuals). But there will be small changes in strategy that emphasize flourishing vs mere survival futures, and I'll lean toward those based on this discussion, because outside of myself and my loved ones, my preferences become largely utilitarian.
 

It should also be born in mind that creating misaligned AGI runs a pretty big risk of wiping out not just us but any other sentient species in the lightcone.

 

Agreed on all counts, except that a strong value on rationality seems very likely to be an advantage in on-average reaching more-correct beliefs. Feeling good about changing one's mind instead of bad is going to lead to more belief changes, and those tend to lead toward truth.

Good points on the rationalist community being a bit insular. I don't think about that much myself because I've never been involved with the bay area rationalist community, just LessWrong.

Copied from my comment on LW, because it may actually be more relevant over here where not everyone is convinced about alignment being hard. It's a really sketchy presentation of what I think are strong arguments for why the consensus on this is wrong on this.


I really wish I could agree. I think we should definitely think about flourishing when it's a win/win with survival efforts. But saying we're near the ceiling on survival looks wildly too optimistic to me. This is after very deeply considering our position and the best estimate of our odds, primarily surrounding the challenge of aligning superhuman AGI (including surrounding societal complications).

There are very reasonable arguments to be made about the best estimate of alignment/AGI risk. But disaster likelihoods below 10% really just aren't viable when you look in detail. And it seems like that's what you need to argue that we're near ceiling on survival.

The core claim here is "we're going to make a new species which is far smarter than we are, and that will definitely be fine because we'll be really careful how we make it" in some combination with "oh we're definitely not making a new species any time soon, just more helpful tools". 

When examined in detail, assigning a high confidence to those statements is just as silly as it looks at a glance. That is obviously a very dangerous thing and one we'll do pretty much as soon as we're able. 

90% plus on survival looks like a rational view from a distance, but there are very strong arguments that it's not. This won't be a full presentation of those arguments; I haven't written it up satisfactorily yet, so here's the barest sketch.

Here's the problem: The more people think seriously about this question, the more pessimistic they are. 

(edit - we asymptote at different points but almost universally far above 10% p(doom))

And those who've spent more time on this particular question should be weighted far higher. Time-on-task is the single most important factor for success in every endeavor. It's not a guarantee but it's by far the most important factor. It dwarfs raw intelligence as a predictor of success in every domain (although the two are multiplicative). 

The "expert forecasters" you cite don't have nearly the time-on-task of thinking about the AGI alignment problem. Those who actually work in that area are very systematically more pessimistic the longer and more deeply we've thought about it. There's not a perfect correlation, but it's quite large.

This should be very concerning from an outside view.

This effect clearly goes both ways, but that only starts to explain the effect. Those who intuitively find AGI very dangerous are prone to go into the field. And they'll be subject to confirmation bias. But if they were wrong, a substantial subset should be shifting away from that view after they're exposed to every argument for optimism. This effect would be exaggerated by the correlation between rationalist culture and alignment thinking; valuing rationality provides resistance (but certainly not immunity!) to motivated reasoning/confirmation bias by aligning ones' motivations with updating based on arguments and evidence.

I am an optimistic person, and I deeply want AGI to be safe. I would be overjoyed for a year if I somehow updated to only 10% chance of AGI disaster. It is only my correcting for my biases that keeps me looking hard enough at pessimistic arguments to believe them based on their compelling logic.

And everyone is affected by motivated reasoning, particularly the optimists. This is complex, but after doing my level best to correct for motivations, it looks to me like the bias effects have far more leeway to work when there's less to push against. The more evidence and arguments are considered, the less bias takes hold. This is from the literature on motivated reasoning and confirmation bias, which was my primary research focus for a few years and a primary consideration for the last ten.

That would've been better as a post or a short form, and more polished. But there it is FWIW, a dashed-off version of an argument I've been mulling over for the past couple of years.

I'll still help you aim for flourishing, since having an optimistic target is a good way to motivate people to think about the future.

Edit: I realize this isn't an airtight argument and apologize for the tone of confidence in the absence of presenting the whole thing carefully and with proper references.

It seems like having genuinely safety-minded people within orgs is invaluable. Do you think that having them refuse to join is going to meaningfully slow things down?

It just takes one brave or terrified person in the know to say "these guys are internally deploying WHAT? I've got to stop this!" 

I worry very much that we won't have one such person in the know in OpenAI. I'm very glad we have them in Anthropic.

Having said that, I agree that Anthropic should not be shielded from criticism.

Your assumption that influence flows one way in organizations seems based on fear not psychology. If someone believes AGI is a real risk, they should be motivated enough to resist some pressure from superiors who merely argue that they're doing good stuff.

If you won't actively resist changing your beliefs once you join a culture with importantly different beliefs, then don't join an org.

 

While Anthropic's plan is a terrible one, so is PauseAI's. We have no good plans. And we must'nt fight amongst ourselves.

This seems almost exactly like the repugnant conclusion. Taken to extremes, intuition disagrees with logic. When that happens, it's usually the worse for intuition.

I'm not a utilitarian, but I find the repugnant conclusion impossible to reject if you are.

If you want chose what is good for everyone, there's little argument what that is in those cases.

 And if we're talking about what's good for everyone, that's got to be a linear sum of what's good for each someone. If the sum is nonlinear, who exactly is worth less than the others? This leads to the repugnant conclusion and your conclusion here.

Other definitions of "good for everyone" seem to always mean "what I idiosyncratically prefer for everyone else but me".

Seth Herd
1
0
1
43% agree

We do not have adequate help with AGI x-risk, and the societal issues demand many skillsets that alignment workers typically lack. Surviving AGI and avoiding s-risk far outweigh all other concerns by any reasonable utilitarian logic. 

You were getting disagree votes because it sounded like you were claiming certainty. I realize that you weren't trying to do that, but that's how people were taking it, and I find that quite understandable. Chicken as an analogy has certain death if neither player swerves, in the standard formulation. Qualifying your statement even a little would've gotten your point across better.

 FWIW I agree with your statement as I interpret it. I do tend to think that an objective measure of misalignment risk (I place it around 50% largely based on model uncertainty on all sides) makes the question of which side is safer basically irrelevant. 

Which highlights the problem with this type of miscomunnication. You were making probably by far the most important point here. It didn't play a prominent role because it wasn't communicated in a way the audience would understand.

That's very helpful! They don't try to include the cost of fear; the old story I'd heard about cage-free environments was that there's more fighting and cannibalism. But given the other large benefits, I'm convinced that cage-free is better. 

Wait now - I thought cage-free chickens suffered as much or more than caged?  I heard the claim a long time ago but never looked in to it closely.

Load more