RobBensinger

Buck: So following up on your Will post: It sounds like you genuinely didn't understand that Will is worried about AI takeover risk and thinks we should try to avert it, including by regulation. Is that right?

I'm just so confused here. I thought your description of his views was a ridiculous straw man, and at first I thought you were just being some combination of dishonest and rhetorically sloppy, but now my guess is you're genuinely confused about what he thinks?

(Happy to call briefly if that would be easier. I'm interested in talking about this a bit because I was shocked by your post and want to prevent similar things happening in the future if it's easy to do so.)

Rob: I was mostly just going off Will's mini-review; I saw that he briefly mentioned "governance agendas" but otherwise everything he said seemed to me to fit 'has some worries that AI could go poorly, but isn't too worried, and sees the current status quo as basically good -- alignment is going great, the front-running labs are sensible, capabilities and alignment will by default advance in a way that lets us ratchet the two up safely without needing to do anything special or novel'

so I assumed if he was worried, it was mainly about things that might disrupt that status quo

Buck: what about his line "I think the risk of misaligned AI takeover is enormously important."

alignment is going great, the front-running labs are sensible

This is not my understanding of what Will thinks.

[added by Buck later: And also I don’t think it’s an accurate reading of the text.]

Rob: 🙏

that's helpful to know!

Buck: I am not confident I know exactly what Will thinks here. But my understanding is that his position is something like: The situation is pretty scary (hence him saying "I think the risk of misaligned AI takeover is enormously important."). There is maybe 5% overall chance of AI takeover, which is a bad and overly large number. The AI companies are reckless and incompetent with respect to these risks, compared to what you’d hope given the stakes. Rushing through super intelligence would be extremely dangerous for AI takeover and other reasons.

[added/edited by Buck later: I interpret the review as saying:

He thinks the probability of AI takeover and of human extinction due to AI takeover is substantially lower than you do.
- This is not because he thinks “AI companies/humanity are very thoughtful about mitigating risk from misaligned superintelligence, and they are clearly on track to develop techniques that will give developers justified confidence that AIs powerful enough that their misalignment poses risk of AI takeover are aligned”. It’s because he is more optimistic about what will happen if AI companies and humanity are not very thoughtful and competent.
He thinks that the arguments given in the book have important weaknesses.
He disagrees with the strategic implications of the worldview described in the book.

For context, I am less optimistic than he is, but I directionally agree with him on both points.]

In general, MIRI people often misunderstand someone saying, "I think X will probably be fine because of consideration Y" to mean "I think that plan Y guarantees that X will be fine". And often, Y is not a plan at all, it's just some purported feature of the world.

Another case is people saying "I think that argument A for why X will go badly fails to engage with counterargument Y", which MIRI people round off to "X is guaranteed to go fine because of my plan Y"

Rob: my current guess is that my error is downstream of (a) not having enough context from talking to Will or seeing enough other AI Will-writing, and (b) Will playing down some of his worries in the review

I think I was overconfident in my main guess, but I don't think it would have been easy for me to have Reality as my main guess instead

Buck: When I asked the AIs, they thought that your summary of Will's review was inaccurate and unfair, based just on his review.

It might be helpful to try checking this way in the future.

I'm still interested in how you interpreted his line "I think the risk of misaligned AI takeover is enormously important."

Rob: I think that line didn't stick out to me at all / it seemed open to different interpretations, and mainly trying to tell the reader 'mentally associate me with some team other than the Full Takeover Skeptics (eg I'm not LeCun), to give extra force to my claim that the book's not good'.

like, I still associate Will to some degree with the past version of himself who was mostly unconcerned about near-term catastrophes and thought EA's mission should be to slowly nudge long-term social trends. "enormously important" from my perspective might have been a polite way of saying 'it's 1 / 10,000 likely to happen, but that's still one of the most serious risks we face as a society'

it sounds like Will's views have changed a lot, but insofar as I was anchored to 'this is someone who is known to have oddly optimistic views and everything-will-be-pretty-OK views about the world' it was harder for me to see what it sounds like you saw in the mini-review

(I say this mainly as autobiography since you seemed interested in debugging how this happened; not as 'therefore I was justified/right')

Buck: Ok that makes sense

Man, how bizarre

Claude had basically the same impression of your summary as I did

Which makes me feel like this isn't just me having more context as a result of knowing Will and talking to him about this stuff.

Rob: I mean, I still expect most people who read Will's review to directionally update the way I did -- I don't expect them to infer things like

"The situation is pretty scary."

"The AI companies are reckless and incompetent wrt these risks."

"Rushing through super intelligence would be extremely dangerous for AI takeover and other reasons."

or 'a lot of MIRI-ish proposals like compute governance are a great idea' (if he thinks that)

or 'if the political tractability looked 10-20x better then it would likely be worth seriously looking into a global shutdown immediately' (if he thinks something like that??)

I think it was reasonable for me to be confused about what he thinks on those fronts and to press him on it, since I expect his review to directionally make people waaaaaaay more misinformed and confused about the state of the world

and I think some of his statements don't make sense / have big unresolved tensions, and a lot of his arguments were bad and misinformed. (not that him strawmanning MIRI a dozen different ways excuses me misrepresenting his view; but I still find it funny how disinterested people apparently are in the 'strawmanning MIRI' side of things? maybe they see no need to back me up on the places where my post was correct, because they assume the Light of Truth will shine through and persuade people in those cases, so the only important intervention is to correct errors in the post?)

but I should have drawn out those tensions by posing a bunch of dilemmas and saying stuff like 'seems like if you believe W, then bad consequence X; and if you believe Y, then bad consequence Z. which horn of the dilemma do you choose, so I know what to argue against?', rather than setting up a best-guess interpretation of what Will was saying (even one with a bunch of 'this is my best guess' caveats)

I think Will was being unvirtuously cagey or spin-y about his views, and this doesn't absolve me of responsibility for trying to read the tea leaves and figure out what he actually thinks about 'should government ever slow down or halt the race to ASI?', but it would have been a very easy misinterpretation for him to prevent (if his views are as you suggest)

it sounds like he mostly agrees about the parts of MIRI's view that we care the most about, eg 'would a slowdown/halt be good in principle', 'is the situation crazy', 'are the labs wildly irresponsible', 'might we actually want a slowdown/halt at some point', 'should govs wake up to this and get very involved', 'is a serious part of the risk rogue AI and not just misuse', 'should we do extensive compute monitoring', etc.

it's not 100% of what we're pushing but it's overwhelmingly more important to us than whether the risk is more like 20-50% or more like 'oh no'

I think most readers wouldn't come away from Will's review thinking we agree on any of those points, much less all of them

Buck:

I expect his review to directionally make people waaaaaaay more misinformed and confused about the state of the world

I disagree

and I think some of his statements don't make sense / have big unresolved tensions, and a lot of his arguments were bad and misinformed.

I think some of his arguments are dubious, but I don't overall agree with you.

I think Will was being unvirtuously cagey or spin-y about his views, and this doesn't absolve me of responsibility for trying to read the tea leaves and figure out what he actually thinks about 'should government ever slow down or halt the race to ASI?', but it would have been a very easy misinterpretation for him to prevent (if his views are as you suggest)

I disagree for what it's worth.

it sounds like he mostly agrees about the parts of MIRI's view that we care the most about, eg 'would a slowdown/halt be good in principle', 'is the situation crazy', 'are the labs wildly irresponsible', 'might we actually want a slowdown/halt at some point', 'should govs wake up to this and get very involved', 'is a serious part of the risk rogue AI and not just misuse', 'should we do extensive compute monitoring', etc.
it's not 100% of what we're pushing but it's overwhelmingly more important to us than whether the risk is more like 20-50% or more like 'oh no'

I think that the book made the choice to center a claim that people like Will and me disagree with: specifically, "With the current trends in AI progress building super intelligence is overwhelmingly likely to lead to misaligned AIs that kill everyone."

It's true that much weaker claims (e.g. all the stuff you have in quotes in your message here) are the main decision-relevant points. But the book chooses to not emphasize them and instead emphasize a much stronger claim that in my opinion and Will's opinion it fails to justify.

I think it's reasonable for Will to substantially respond to the claim that you emphasize, rather than different claims that you could have chosen to emphasize.

I think a general issue here is that MIRI people seem to me to be responding at a higher simulacrum level than the one at which criticisms of the book are operating. Here you did that partly because you interpreted Will as himself operating at a higher simulacrum level than the plain reading of the text.

I think it's a difficult situation when someone makes criticisms that, on the surface level, look like straightforward object level criticisms, but that you suspect are motivated by a desire to signal disagreement. I think it is good to default to responding just on the object level most of the time, but I agree there are costs to that strategy.

And if you want to talk about the higher simulacra levels, I think it's often best to do so very carefully and in a centralized place, rather than in a response to a particular person.

I also agree with Habryka’s comment that Will chose a poor phrasing of his position on regulation.

Rob: If we agree about most of the decision-relevant claims (and we agree about which claims are decision-relevant), then I think it's 100% reasonable for you and Will to critique less-decision-relevant claims that Eliezer and Nate foregrounded; and I also think it would be smart to emphasize those decision-relevant claims a lot more, so that the world is likely to make better decisions. (And so people's models are better in general; I think the claims I mentioned are very important for understanding the world too, not just action-relevant.)

I especially think this is a good idea for reviews sent to a hundred thousand people on Twitter. I want a fair bit more of this on LessWrong too, but I can see a stronger claim having different norms on LW, and LW is also a place where a lot of misunderstandings are less likely because a lot more people here have context.

Re simulacra levels: I agree that those are good heuristics. For what it's worth, I still have a much easier time mentally generating a review like Will's when I imagine the author as someone who disagrees with that long list of claims; I have a harder time understanding how none of those points of agreement came up in the ensuing paragraphs if Will tacitly agreed with me about most of the things I care about.

Possibly it's just a personality or culture difference; if I wrote "This is a shame, because I think the risk of misaligned AI takeover is enormously important" (especially in the larger context of the post it occurred in) I might not mean something all that strong (a lot of things in life can be called "enormously important" from one perspective or another); but maybe that's the Oxford-philosopher way of saying something closer to "This situation is insane, we're playing Russian roulette with the world, this is an almost unprecedented emergency."

(Flagging that this is all still speculative because Will hasn't personally confirmed what his views are someplace I can see it. I've been mostly deferring to you, Oliver, etc. about what kinds of positions Will is likely to endorse, but my actual view is a bit more uncertain than it may sound above.)

A Reply to MacAskill on "If Anyone Builds It, Everyone Dies"

RobBensinger10mo2

Copying over my response from LW:

I wasn't exclusively looking at that line; I was also assuming that if Will liked some of the book's core policy proposals but disliked others, then he probably wouldn't have expressed such a strong a blanket rejection. And I was looking at Will's proposal here:

[IABIED skips over] what I see as the crucial period, where we move from the human-ish range to strong superintelligence[1]. This is crucial because it’s both the period where we can harness potentially vast quantities of AI labour to help us with the alignment of the next generation of models, and because it’s the point at which we’ll get a much better insight into what the first superintelligent systems will be like. The right picture to have is not “can humans align strong superintelligence”, it’s “can humans align or control AGI-”, then “can {humans and AGI-} align or control AGI” then “can {humans and AGI- and AGI} align AGI+” and so on.

This certainly sounds like a proposal that we advance AI as fast as possible, so that we can reach the point where productive alignment research is possible sooner.

The next paragraph then talks about "a gradual ramp-up to superintelligence", which makes it sound like Will at least wants us to race to the level of superintelligence as quickly as possible, i.e., he wants the chain of humans-and-AIs-aligning-stronger-AIs to go at least that far:

Elsewhere, EY argues that the discontinuity question doesn’t matter, because preventing AI takeover is still a ‘first try or die’ dynamic, so having a gradual ramp-up to superintelligence is of little or no value. I think that’s misguided.

... Unless he thinks this "gradual ramp-up" should be achieved via switching over at some point from the natural continuous trendlines he expects from industry, to top-down government-mandated ratcheting up of a capability limit? But I'd be surprised if that's what he had in mind, given the rest of his comment.

Wanting the world to race to build superintelligence as soon as possible also seems like it would be a not-that-surprising implication of his labs-have-alignment-in-the-bag claims.

And although it's not totally clear to me how seriously he's taking this hypothetical (versus whether he mainly intends it as a proof of concept), he does propose that we could build a superintelligent paperclip maximizer and plausibly be totally fine (because it's risk averse and promise-keeping), and his response to "Maybe we won't be able to make deals with AIs?" is:

I agree that’s a worry; but then the right response is to make sure that we can.

Not "in that case maybe we shouldn't build a misaligned superintelligence", but "well then we'd sure better solve the honesty problem!".

All of this together makes me extremely confused if his real view is basically just "I agree with most of MIRI's policy proposals but I think we shouldn't rush to enact a halt or slowdown tomorrow".

If his view is closer to that, then that's great news from my perspective, and I apologize for the misunderstanding. I was expecting Will to just straightforwardly accept the premises I listed, and for the discussion to proceed from there.

I'll add a link to your comment at the top of the post so folks can see your response, and if Will clarifies his view I'll link to that as well.

Twitter says that Will's tweet has had over a hundred thousand views, so if he's a lot more pro-compute-governance, pro-slowdown, and/or pro-halt than he sounded in that message, I hope he says loud stuff in the near future to clarify his views to folks!

A Reply to MacAskill on "If Anyone Builds It, Everyone Dies"

RobBensinger10mo2

Oliver gave an argument for "this misrepresents Will's views" on LessWrong, saying:

I currently think this is putting too much weight on a single paragraph in Will's review. The paragraph is:
[IABIED:] "All over the Earth, it must become illegal for AI companies to charge ahead in developing artificial intelligence as they’ve been doing."

[Will:] "The positive proposal is extremely unlikely to happen, could be actively harmful if implemented poorly (e.g. stopping the frontrunners gives more time for laggards to catch up, leading to more players in the race if AI development ends up resuming before alignment is solved), and distracts from the suite of concrete technical and governance agendas that we could be implementing."
I agree that what Will is saying literally here is that "making it illegal for AI companies to charge ahead as they've been doing is extremely unlikely to happen, and probably counterproductive". I think this is indeed a wrong statement that implies a kind of crazy worldview. I also think it's very unlikely what Will meant to say.
I think what Will meant to say is something like "the proposal in the book, which I read as trying to ban AGI development, right now, globally, using relatively crude tools like banning anyone from having more than 8 GPUs, is extremely unlikely to happen and the kind of thing that could easily backfire".
I think the latter is a much more reasonable position, and I think does not imply most of the things you say Will must believe in this response. My best guess is that Will is in favor of regulation that allows slowing things down, in favor of compute monitoring, and even in favor of conditional future pauses. The book does talk about them, and I find Will's IMO kind of crazily dismissive engagement with these proposals pretty bad, but I do think you are just leaning far too much on a very literal interpretation of what Will said in a way that I think is unproductive.
(I dislike Will's review for a bunch of other reasons, which includes his implicit mischaracterization of the policies proposed in the book, but my response would look very different than this post)

Response to Aschenbrenner's "Situational Awareness"

RobBensinger2y9

Leopold's scenario requires that the USG come to deeply understand all the perils and details of AGI and ASI (since they otherwise don't have a hope of building and aligning a superintelligence), but then needs to choose to gamble its hegemony, its very existence, and the lives of all its citizens on a half-baked mad science initiative, when it could simply work with its allies to block the tech's development and maintain the status quo at minimal risk.

Success in this scenario requires a weird combination of USG prescience with self-destructiveness: enough foresight to see what's coming, but paired with a weird compulsion to race to build the very thing that puts its existence at risk, when it would potentially be vastly easier to spearhead an international alliance to prohibit this technology.

Response to Aschenbrenner's "Situational Awareness"

RobBensinger2y20

Three high-level reasons I think Leopold's plan looks a lot less workable:

It requires major scientific breakthroughs to occur on a very short time horizon, including unknown breakthroughs that will manifest to solve problems we don't understand or know about today.
These breakthroughs need to come in a field that has not been particularly productive or fast in the past. (Indeed, forecasters have been surprised by how slowly safety/robustness/etc. have progressed in recent years, and simultaneously surprised by the breakneck speed of capabilities.)
It requires extremely precise and correct behavior by a giant government bureaucracy that includes many staff who won't be the best and brightest in the field -- inevitably, many technical and nontechnical people in the bureaucracy will have wrong beliefs about AGI and about alignment.

The "extremely precise and correct behavior" part means that we're effectively hoping to be handed an excellent bureaucracy that will rapidly and competently solve a thirty-digit combination lock requiring the invention of multiple new fields and the solving of a variety of thorny and poorly-understood technical problems -- all in a handful of years.

(It also requires that various empirical predictions all pan out. E.g., Leopold could do everything right and get the USG fully on board and get the USG doing literally everything right by his lights -- and then the plan ends up destroying the world rather than saving it because it turned out ASI was a lot more compute-efficient to train than he expected, resulting in the USG being unable to monopolize the tech and unable to achieve a sufficiently long lead time.)

My proposal doesn't require qualitatively that kind of success. It requires governments to coordinate on banning things. Plausibly, it requires governments to overreact to a weird, scary, and publicly controversial new tech to some degree, since it's unlikely that governments will exactly hit the target we want. This is not a particularly weird ask; governments ban things (and coordinate or copy-paste each other's laws) all the time, in far less dangerous and fraught areas than AGI. This is "trying to get the international order to lean hard in a particular direction on a yes-or-no question where there's already a lot of energy behind choosing 'no'", not "solving a long list of hard science and engineering problems in a matter of months and weeks and getting a bureaucracy to output the correct long string of digits to nail down all the correct technical solutions and all the correct processes to find those solutions".

The CCP's current appetite for AGI seems remarkably small, and I expect them to be more worried that an AGI race would leave them in the dust (and/or put their regime at risk, and/or put their lives at risk), than excited about the opportunity such a race provides. Governments around the world currently, to the best of my knowledge, are nowhere near the cutting edge in ML. From my perspective, Leopold is imagining a future problem into being ("all of this changes") and then trying to find a galaxy-brained incredibly complex and assumption-laden way to wriggle out of this imagined future dilemma, when the far easier and less risky path would be to not have the world powers race in the first place, have them recognize that this technology is lethally dangerous (something the USG chain of command, at least, would need to fully internalize on Leopold's plan too), and have them block private labs from sending us over the precipice (again, something Leopold assumes will happen) while not choosing to take on the risk of destroying themselves (nor permitting other world powers to unilaterally impose that risk).

Response to Aschenbrenner's "Situational Awareness"

RobBensinger2y20

I think it's still good for some people to work on alignment research. The future is hard to predict, and we can't totally rule out a string of technical breakthroughs, and the overall option space looks gloomy enough (at least from my perspective) that we should be pursuing multiple options in parallel rather than putting all our eggs in one basket.

That said, I think "alignment research pans out to the level of letting us safely wield vastly superhuman AGI in the near future" is sufficiently unlikely that we definitely shouldn't be predicating our plans on that working out. AFAICT Leopold's proposal is that we just lay down and die in the worlds where we can't align vastly superhuman AI, in exchange for doing better in the worlds where we can align it; that seems extremely reckless and backwards to me, throwing away higher-probability success worlds in exchange for more niche and unlikely success worlds.

I also think alignment researchers thus far, as a group, have mainly had the effect of shortening timelines. I want alignment research to happen, but not at the cost of reducing our hope in the worlds where alignment doesn't pan out, and thus far a lot of work labeled "alignment" has either seemed to accelerate the field toward AGI, or seemed to provide justification/cover for increasing the heat and competitiveness of the field, which seems pretty counterproductive to me.

Personal reflections on FTX

RobBensinger2y4

Fair! That's at least a super nonstandard example of an "opinion poll".

Personal reflections on FTX

RobBensinger2y7

There’s a knock against prediction markets, here, too. A Metaculus forecast, in March of 2022 (the end of the period when one could make forecasts on this question), gave a 1.3% chance of FTX making any default on customer funds over the year. The probability that the Metaculus forecasters would have put on the claim that FTX would default on very large numbers of customer funds, as a result of misconduct, would presumably have been lower.

Metaculus isn't a prediction market; it's just an opinion poll of people who use the Metaculus website.

Personal reflections on FTX

RobBensinger2y60

Since writing that post, though, I now lean more towards thinking that someone should “own” managing the movement, and that that should be the Centre for Effective Altruism.

I agree with this. Failing that, I feel strongly that CEA should change its name. There are costs to having a leader / manager / "coordinator-in-chief", and costs to not having such an entity; but the worst of both worlds is to have ambiguity about whether a person or org is filling that role. Then you end up with situations like "a bunch of EAs sit on their hands because they expect someone else to respond, but no one actually takes the wheel", or "an org gets the power of perceived leadership, but has limited accountability because it's left itself a lot of plausible deniability about exactly how much of a leader it is".

Quick Update on Leaving the Board of EV

RobBensinger2y49

Update Apr. 15: I talked to a CEA employee and got some more context on why CEA hasn't done an SBF investigation and postmortem. In addition to the 'this might be really difficult and it might not be very useful' concern, they mentioned that the Charity Commission investigation into EV UK is still ongoing a year and a half later. (Google suggests that statutory inquiries by the Charity Commission take an average of 1.2 years to complete, so the super long wait here is sadly normal.)

Although the Commission has said "there is no indication of wrongdoing by the trustees at this time", and the risk of anything crazy happening is lower now than it was a year and a half ago, I gather that it's still at least possible that the Commission could take some drastic action like "we think EV did bad stuff, so we're going to take over the legal entity that includes the UK components of CEA, 80K, GWWC, GovAI, etc.", which may make it harder for CEA to usefully hold the steering wheel on an SBF investigation at this stage.

Example scenario: CEA tries to write up some lessons learned from the SBF thing, with an EA audience in mind; EAs tend to have unusually high standards, and a CEA staffer writes a comment that assumes this context, without running the comment by lawyers because it seemed innocent enough; because of those high standards, the Charity Commission misreads the CEA employee as implying a way worse thing happened than is actually the case.

This particular scenario may not be a big risk, but the sum of the risk of all possible scenarios like that (including scenarios that might not currently be on their radar) seems non-negligible to the CEA person I spoke to, even though they don't think there's any info out there that should rationally cause the Charity Commission to do anything wild here. And trying to do serious public reflection or soul-searching while also carefully nitpicking every sentence for possible ways the Charity Commission could misinterpret something, doesn't seem like an optimal set-up for deep, authentic, and productive soul-searching.

The CEA employee said that they thought this is one reason (but not the only reason) EV is unlikely to run a postmortem of this kind.

My initial thoughts on all this: This is very useful info! I had no idea the Charity Commission investigation was still ongoing, and if there are significant worries about that, that does indeed help make CEA and EV’s actions over the last year feel a lot less weird-and-mysterious to me.

I’m not sure I agree with CEA or EV’s choices here, but I no longer feel like there’s a mystery to be explained here; this seems like a place where reasonable people can easily disagree about what the right strategy is. I don't expect the Charity Commission to in fact take over those organizations, since as far as I know there's no reason to do that, but I can see how this would make it harder for CEA to do a soul-searching postmortem.

I do suspect that EV and/or CEA may be underestimating the costs of silence here. I could imagine a frog-boiling problem arising here, where it made sense to delay a postmortem for a few months based on a relatively small risk of disaster (and a hope that the Charity Commission investigation in this case might turn out to be brief), but it may not make sense to continue to delay in this situation for years on end. Both options are risky; I suspect the risks of inaction and silence may be getting systematically under-weighted here. (But it’s hard to be confident when I don’t know the specifics of how these decisions are being made.)

I ran the above by Oliver Habryka, who said:

“I talked to a CEA employee and got some more context on why CEA hasn't done an SBF investigation and postmortem.”
Seems like it wouldn't be too hard for them to just advocate for someone else doing it?
Or to just have whoever is leading the investigation leave the organization.
In general it seems to me that an investigation is probably best done in a relatively independent vehicle anyways, for many reasons.
“My thoughts on all this: This is very useful info! I had no idea the Charity Commission investigation was still ongoing, and that does indeed help make CEA and EV’s actions over the last year feel a lot less weird-and-mysterious to me.”
Agree that this is an important component (and a major component for my models).

I have some information suggesting that maybe Oliver and/or the CEA employee's account is wrong, or missing part of the story?? But I'm confused about the details, so I'll look into things more and post an update here if I learn more.

RobBensinger

Posts 38

Sequences 1

Comments585

Topic contributions2

Posts
38

Sequences
1

Comments
585

Topic contributions
2