R

RobBensinger

@ Machine Intelligence Research Institute
8361 karmaJoined Berkeley, CA, USA

Sequences
1

Late 2021 MIRI Conversations

Comments
584

Topic contributions
2

Copying over my response from LW:

I wasn't exclusively looking at that line; I was also assuming that if Will liked some of the book's core policy proposals but disliked others, then he probably wouldn't have expressed such a strong a blanket rejection. And I was looking at Will's proposal here:

[IABIED skips over] what I see as the crucial period, where we move from the human-ish range to strong superintelligence[1]. This is crucial because it’s both the period where we can harness potentially vast quantities of AI labour to help us with the alignment of the next generation of models, and because it’s the point at which we’ll get a much better insight into what the first superintelligent systems will be like. The right picture to have is not “can humans align strong superintelligence”, it’s “can humans align or control AGI-”, then “can {humans and AGI-} align or control AGI” then “can {humans and AGI- and AGI} align AGI+” and so on.

This certainly sounds like a proposal that we advance AI as fast as possible, so that we can reach the point where productive alignment research is possible sooner.

The next paragraph then talks about "a gradual ramp-up to superintelligence", which makes it sound like Will at least wants us to race to the level of superintelligence as quickly as possible, i.e., he wants the chain of humans-and-AIs-aligning-stronger-AIs to go at least that far:

Elsewhere, EY argues that the discontinuity question doesn’t matter, because preventing AI takeover is still a ‘first try or die’ dynamic, so having a gradual ramp-up to superintelligence is of little or no value. I think that’s misguided.

... Unless he thinks this "gradual ramp-up" should be achieved via switching over at some point from the natural continuous trendlines he expects from industry, to top-down government-mandated ratcheting up of a capability limit? But I'd be surprised if that's what he had in mind, given the rest of his comment.

Wanting the world to race to build superintelligence as soon as possible also seems like it would be a not-that-surprising implication of his labs-have-alignment-in-the-bag claims.

And although it's not totally clear to me how seriously he's taking this hypothetical (versus whether he mainly intends it as a proof of concept), he does propose that we could build a superintelligent paperclip maximizer and plausibly be totally fine (because it's risk averse and promise-keeping), and his response to "Maybe we won't be able to make deals with AIs?" is:

I agree that’s a worry; but then the right response is to make sure that we can. 

Not "in that case maybe we shouldn't build a misaligned superintelligence", but "well then we'd sure better solve the honesty problem!".

All of this together makes me extremely confused if his real view is basically just "I agree with most of MIRI's policy proposals but I think we shouldn't rush to enact a halt or slowdown tomorrow".

If his view is closer to that, then that's great news from my perspective, and I apologize for the misunderstanding. I was expecting Will to just straightforwardly accept the premises I listed, and for the discussion to proceed from there.

I'll add a link to your comment at the top of the post so folks can see your response, and if Will clarifies his view I'll link to that as well.

Twitter says that Will's tweet has had over a hundred thousand views, so if he's a lot more pro-compute-governance, pro-slowdown, and/or pro-halt than he sounded in that message, I hope he says loud stuff in the near future to clarify his views to folks!

Oliver gave an argument for "this misrepresents Will's views" on LessWrong, saying:

I currently think this is putting too much weight on a single paragraph in Will's review. The paragraph is: 

[IABIED:] "All over the Earth, it must become illegal for AI companies to charge ahead in developing artificial intelligence as they’ve been doing."

[Will:] "The positive proposal is extremely unlikely to happen, could be actively harmful if implemented poorly (e.g. stopping the frontrunners gives more time for laggards to catch up, leading to more players in the race if AI development ends up resuming before alignment is solved), and distracts from the suite of concrete technical and governance agendas that we could be implementing."

I agree that what Will is saying literally here is that "making it illegal for AI companies to charge ahead as they've been doing is extremely unlikely to happen, and probably counterproductive". I think this is indeed a wrong statement that implies a kind of crazy worldview. I also think it's very unlikely what Will meant to say. 

I think what Will meant to say is something like "the proposal in the book, which I read as trying to ban AGI development, right now, globally, using relatively crude tools like banning anyone from having more than 8 GPUs, is extremely unlikely to happen and the kind of thing that could easily backfire". 

I think the latter is a much more reasonable position, and I think does not imply most of the things you say Will must believe in this response. My best guess is that Will is in favor of regulation that allows slowing things down, in favor of compute monitoring, and even in favor of conditional future pauses. The book does talk about them, and I find Will's IMO kind of crazily dismissive engagement with these proposals pretty bad, but I do think you are just leaning far too much on a very literal interpretation of what Will said in a way that I think is unproductive.

(I dislike Will's review for a bunch of other reasons, which includes his implicit mischaracterization of the policies proposed in the book, but my response would look very different than this post)

Leopold's scenario requires that the USG come to deeply understand all the perils and details of AGI and ASI (since they otherwise don't have a hope of building and aligning a superintelligence), but then needs to choose to gamble its hegemony, its very existence, and the lives of all its citizens on a half-baked mad science initiative, when it could simply work with its allies to block the tech's development and maintain the status quo at minimal risk.

Success in this scenario requires a weird combination of USG prescience with self-destructiveness: enough foresight to see what's coming, but paired with a weird compulsion to race to build the very thing that puts its existence at risk, when it would potentially be vastly easier to spearhead an international alliance to prohibit this technology.

Three high-level reasons I think Leopold's plan looks a lot less workable:

  1. It requires major scientific breakthroughs to occur on a very short time horizon, including unknown breakthroughs that will manifest to solve problems we don't understand or know about today.
  2. These breakthroughs need to come in a field that has not been particularly productive or fast in the past. (Indeed, forecasters have been surprised by how slowly safety/robustness/etc. have progressed in recent years, and simultaneously surprised by the breakneck speed of capabilities.)
  3. It requires extremely precise and correct behavior by a giant government bureaucracy that includes many staff who won't be the best and brightest in the field -- inevitably, many technical and nontechnical people in the bureaucracy will have wrong beliefs about AGI and about alignment.

The "extremely precise and correct behavior" part means that we're effectively hoping to be handed an excellent bureaucracy that will rapidly and competently solve a thirty-digit combination lock requiring the invention of multiple new fields and the solving of a variety of thorny and poorly-understood technical problems -- all in a handful of years.

(It also requires that various empirical predictions all pan out. E.g., Leopold could do everything right and get the USG fully on board and get the USG doing literally everything right by his lights -- and then the plan ends up destroying the world rather than saving it because it turned out ASI was a lot more compute-efficient to train than he expected, resulting in the USG being unable to monopolize the tech and unable to achieve a sufficiently long lead time.)

My proposal doesn't require qualitatively that kind of success. It requires governments to coordinate on banning things. Plausibly, it requires governments to overreact to a weird, scary, and publicly controversial new tech to some degree, since it's unlikely that governments will exactly hit the target we want. This is not a particularly weird ask; governments ban things (and coordinate or copy-paste each other's laws) all the time, in far less dangerous and fraught areas than AGI. This is "trying to get the international order to lean hard in a particular direction on a yes-or-no question where there's already a lot of energy behind choosing 'no'", not "solving a long list of hard science and engineering problems in a matter of months and weeks and getting a bureaucracy to output the correct long string of digits to nail down all the correct technical solutions and all the correct processes to find those solutions".

The CCP's current appetite for AGI seems remarkably small, and I expect them to be more worried that an AGI race would leave them in the dust (and/or put their regime at risk, and/or put their lives at risk), than excited about the opportunity such a race provides. Governments around the world currently, to the best of my knowledge, are nowhere near the cutting edge in ML. From my perspective, Leopold is imagining a future problem into being ("all of this changes") and then trying to find a galaxy-brained incredibly complex and assumption-laden way to wriggle out of this imagined future dilemma, when the far easier and less risky path would be to not have the world powers race in the first place, have them recognize that this technology is lethally dangerous (something the USG chain of command, at least, would need to fully internalize on Leopold's plan too), and have them block private labs from sending us over the precipice (again, something Leopold assumes will happen) while not choosing to take on the risk of destroying themselves (nor permitting other world powers to unilaterally impose that risk).

I think it's still good for some people to work on alignment research. The future is hard to predict, and we can't totally rule out a string of technical breakthroughs, and the overall option space looks gloomy enough (at least from my perspective) that we should be pursuing multiple options in parallel rather than putting all our eggs in one basket.

That said, I think "alignment research pans out to the level of letting us safely wield vastly superhuman AGI in the near future" is sufficiently unlikely that we definitely shouldn't be predicating our plans on that working out. AFAICT Leopold's proposal is that we just lay down and die in the worlds where we can't align vastly superhuman AI, in exchange for doing better in the worlds where we can align it; that seems extremely reckless and backwards to me, throwing away higher-probability success worlds in exchange for more niche and unlikely success worlds.

I also think alignment researchers thus far, as a group, have mainly had the effect of shortening timelines. I want alignment research to happen, but not at the cost of reducing our hope in the worlds where alignment doesn't pan out, and thus far a lot of work labeled "alignment" has either seemed to accelerate the field toward AGI, or seemed to provide justification/cover for increasing the heat and competitiveness of the field, which seems pretty counterproductive to me.

Fair! That's at least a super nonstandard example of an "opinion poll".

There’s a knock against prediction markets, here, too. A Metaculus forecast, in March of 2022 (the end of the period when one could make forecasts on this question), gave a 1.3% chance of FTX making any default on customer funds over the year. The probability that the Metaculus forecasters would have put on the claim that FTX would default on very large numbers of customer funds, as a result of misconduct, would presumably have been lower.

Metaculus isn't a prediction market; it's just an opinion poll of people who use the Metaculus website.

Since writing that post, though, I now lean more towards thinking that someone should “own” managing the movement, and that that should be the Centre for Effective Altruism.

I agree with this. Failing that, I feel strongly that CEA should change its name. There are costs to having a leader / manager / "coordinator-in-chief", and costs to not having such an entity; but the worst of both worlds is to have ambiguity about whether a person or org is filling that role. Then you end up with situations like "a bunch of EAs sit on their hands because they expect someone else to respond, but no one actually takes the wheel", or "an org gets the power of perceived leadership, but has limited accountability because it's left itself a lot of plausible deniability about exactly how much of a leader it is".

Update Apr. 15:  I talked to a CEA employee and got some more context on why CEA hasn't done an SBF investigation and postmortem. In addition to the 'this might be really difficult and it might not be very useful' concern, they mentioned that the Charity Commission investigation into EV UK is still ongoing a year and a half later. (Google suggests that statutory inquiries by the Charity Commission take an average of 1.2 years to complete, so the super long wait here is sadly normal.)

Although the Commission has said "there is no indication of wrongdoing by the trustees at this time", and the risk of anything crazy happening is lower now than it was a year and a half ago, I gather that it's still at least possible that the Commission could take some drastic action like "we think EV did bad stuff, so we're going to take over the legal entity that includes the UK components of CEA, 80K, GWWC, GovAI, etc.", which may make it harder for CEA to usefully hold the steering wheel on an SBF investigation at this stage.

Example scenario: CEA tries to write up some lessons learned from the SBF thing, with an EA audience in mind; EAs tend to have unusually high standards, and a CEA staffer writes a comment that assumes this context, without running the comment by lawyers because it seemed innocent enough; because of those high standards, the Charity Commission misreads the CEA employee as implying a way worse thing happened than is actually the case.

This particular scenario may not be a big risk, but the sum of the risk of all possible scenarios like that (including scenarios that might not currently be on their radar) seems non-negligible to the CEA person I spoke to, even though they don't think there's any info out there that should rationally cause the Charity Commission to do anything wild here. And trying to do serious public reflection or soul-searching while also carefully nitpicking every sentence for possible ways the Charity Commission could misinterpret something, doesn't seem like an optimal set-up for deep, authentic, and productive soul-searching.

The CEA employee said that they thought this is one reason (but not the only reason) EV is unlikely to run a postmortem of this kind.

 

My initial thoughts on all this: This is very useful info! I had no idea the Charity Commission investigation was still ongoing, and if there are significant worries about that, that does indeed help make CEA and EV’s actions over the last year feel a lot less weird-and-mysterious to me.

I’m not sure I agree with CEA or EV’s choices here, but I no longer feel like there’s a mystery to be explained here; this seems like a place where reasonable people can easily disagree about what the right strategy is. I don't expect the Charity Commission to in fact take over those organizations, since as far as I know there's no reason to do that, but I can see how this would make it harder for CEA to do a soul-searching postmortem.

I do suspect that EV and/or CEA may be underestimating the costs of silence here. I could imagine a frog-boiling problem arising here, where it made sense to delay a postmortem for a few months based on a relatively small risk of disaster (and a hope that the Charity Commission investigation in this case might turn out to be brief), but it may not make sense to continue to delay in this situation for years on end. Both options are risky; I suspect the risks of inaction and silence may be getting systematically under-weighted here. (But it’s hard to be confident when I don’t know the specifics of how these decisions are being made.)

 

I ran the above by Oliver Habryka, who said:

“I talked to a CEA employee and got some more context on why CEA hasn't done an SBF investigation and postmortem.”

Seems like it wouldn't be too hard for them to just advocate for someone else doing it?

Or to just have whoever is leading the investigation leave the organization.

In general it seems to me that an investigation is probably best done in a relatively independent vehicle anyways, for many reasons.

“My thoughts on all this: This is very useful info! I had no idea the Charity Commission investigation was still ongoing, and that does indeed help make CEA and EV’s actions over the last year feel a lot less weird-and-mysterious to me.”

Agree that this is an important component (and a major component for my models).


I have some information suggesting that maybe Oliver and/or the CEA employee's account is wrong, or missing part of the story?? But I'm confused about the details, so I'll look into things more and post an update here if I learn more.

I feel like "people who worked with Sam told people about specific instances of quite serious dishonesty they had personally observed" is being classed as "rumour" here, which whilst not strictly inaccurate, is misleading, because it is a very atypical case relative to the image the word "rumour" conjures.

I agree with this.

[...] I feel like we still want to know if any one in leadership argued "oh, yeah, Sam might well be dodgy, but the expected value of publicly backing him is high because of the upside". That's a signal someone is a bad leader in my view, which is useful knowledge going forward.

I don't really agree with this. Everyone has some probability of turning out to be dodgy; it matters exactly how strong the available evidence was. "This EA leader writes people off immediately when they have even a tiny probability of being untrustworthy" would be a negative update about the person's decision-making too!

Load more