DS

Duncan Sabien

664 karmaJoined

Comments
35

I disagree re: motte and bailey; the above is not at all in conflict with the position of the book (which, to be clear, I endorse and agree with and is also my position).

re: "you can imagine," I strongly encourage people to be careful about leaning too hard on their own ability to imagine things; it's often fraught and a huge chunk of the work MIRI does is poking at those imaginings to see where they collapse.  

I'll note that core MIRI predictions about e.g. how machines will be misaligned at current levels of sophistication are being borne out—things we have been saying for years about e.g. emergent drives and deception and hacking and brittle proxies.  I'm pretty sure that's not "rooted in the actual nuts and bolts details" in the way you're wanting, but it still feels ... relevant.

Noting that this is more "opinion of an employee" than "the position of MIRI overall"—I've held a variety of positions within the org and can't speak for e.g. Nate or Eliezer or Malo:

  • The Agent Foundations team feels, to me, like it was a slam dunk at the time; the team produced a ton of good research and many of their ideas have become foundational to discussions of agency in the broader AI sphere
  • The book feels like a slam dunk
  • The research push of 2020/2021 (that didn't pan out) feels to me like it was absolutely the right bet, but resulted in (essentially) nothing; it was an ambitious, many-person project for a speculative idea that had a shot at being amazing.

I think it's hard to generalize lessons, because various projects are championed by various people and groups within the org ("MIRI" is nearly a ship of Theseus).  But some very basic lessons include:

  • Things pretty much only have a shot at all when there are people with a clear and ambitious vision/when there's an owner
  • When we say to ourselves "this has an X% chance of working out" we seem to be actually pretty calibrated
  • As one would expect, smaller projects and clearer projects work out more frequently than larger or vaguer ones

(Sorry, that feels sort of useless, but.)

From my limited perspective/to the best of my ability to see and descrbe, budget is essentially allocated in a "Is this worth doing?  If so, how do we find the resources to make it work?" sense.  MIRI's funding situation has always been pretty odd; we don't usually have a pie that must be divided up carefully so much as a core administrative apparatus that needs to be continually funded + a preexisting pool of resources that can be more or less freely allocated + a sense that there are allies out there who are willing to fund specific projects if we fall short and want to make a compelling pitch.  

Unfortunately, I can't really draw analogies that help an outsider evaluate future projects.  We're intending to try stuff that's different from anything we've tried before, which means it's hard to draw on the past (except insofar as the book and surrounding publicity were also something we'd never tried before, so you can at least a little bit assess our ability to pivot and succeed at stuff outside our wheelhouse by looking at the book).

Oh, I agree that if one feels equipped to go actually look at the arguments, one doesn't need any argument-from-consensus.  This is just, like, "if you are going to defer, defer reasonably."  Thanks for your comment; I feel similarly/endorse.

Made a small edit to reflect.

Again speaking more for the broad audience:

"Some experts downvote Yudkowsky's standing to opine" is not a reasonable standard; some experts think vaccines cause autism.  You can usually find someone with credentials in a field who will say almost anything.

The responsible thing to do (EDIT: if you're deferring at all, as opposed to evaluating the situation for yourself) is to go look at the balance of what experts in a field are saying, and in this case, they're fairly split, with plenty of respected big names (including many who disagree with Eliezer on many questions) saying he knows enough of what he's talking about to be worth listening to.  I get that Yarrow is not convinced, but I trust Hinton, who has reservations of his own but not of the form "Eliezer should be dismissed out of hand for lack of some particular technical expertise."

Also: when the experts in a field are split, and the question is one of existential danger, it seems that the splitness itself is not reassuring.  Experts in nuclear physics do not drastically diverge in their predictions about what will happen inside a bomb or reactor, because we understand nuclear physics.  When experts in the field of artificial intelligence have wildly different predictions and the disagreement cannot be conclusively resolved, this is a sign of looseness in everyone's understanding, and when you ask normal people on the street "hey, if one expert says an invention will kill everyone, and another says it won't, and you ask the one who says it won't where their confidence comes from, and they say 'because I'm pretty sure we'll muddle our way through, with unproven techniques that haven't been invented yet, the risk of killing everyone is probably under 5%,' how do you feel?"

they tend to feel alarmed.

And that characterization is not uncharitable—the optimists in this debate do not have an actual concrete plan.  You can just go check.  It all ultimately boils down to handwaving and platitudes and "I'm sure we'll stay ahead of capabilities [for no explicable reason]."

And we're intentionally aiming at something that exceeds us along the very axis that led us to dominate the planet, so ... ?

Another way of saying this: it's very, very weird that the burden of proof on this brand-new and extremely powerful technology is "make an airtight case that it's dangerous" instead of "make an airtight case that it's a good idea."  Even a 50/50 shared burden would be better than the status quo.

I'll note that

In response to any sort of criticism or disagreement, Yudkowsky and other folks’ default response seems to be to fly into a rage and to try to attack or humiliate the person making the criticism/expressing the disagreement.

...seems false.

If deep learning doesn't change things, Yudkowsky/MIRI should explain why not.

Speaking in my capacity as someone who currently works for MIRI, but who emphatically does not understand all things that Eliezer Yudkowsky understands, and can't authoritatively represent him (or Nate, or the other advanced researchers at MIRI who are above my intellectual pay grade):

My own understanding is that Eliezer has, all along, for as long as I've known him and been following his work, been fairly agnostic as to questions of how AGI and ASI will be achieved, and what the underlying architectures of the systems will be.

I've often seen Eliezer say "I think X will not work" or "I think Y is less doomed than X," but in my experience it's always been with a sort of casual shrug and an attitude of "but of course these are very hard calls" and also with "and it doesn't really matter to the ultimate outcome except insofar as some particular architecture might make reliable alignment possible at all."

Eliezer's take (again, as I understand it) is something like "if you have a system that is intelligent enough and powerful enough to do the actual interesting work that humans want to do, such as end all wars and invent longevity technology and get us to the stars (and achieve these goals in the real world, which involves also being competent at things like persuasion and communication), then that system is going to be very, very, very hard to make safe.  It's going to be easier by many orders of magnitude to create systems that are capable of that level of sophisticated agency that don't care about human flourishing, than it will be to hit the narrow target of a sufficiently sophisticated system that also does in fact happen to care."

That's true regardless of whether you're working with deep learning or symbolic AI.  In fact, deep learning makes it worse—Eliezer was pointing at "even if you build this thing out of nuts and bolts that you thoroughly understand, alignment is a hard problem," and instead we have ended up in a timeline where the systems are grown rather than crafted, giving us even less reason to be confident or hopeful.

(This is a trend: people often misunderstand MIRI's attempts to underscore how hard the problem is as being concrete predictions about what will happen, c.f. the era in which people were like, well, obviously any competent lab trying to build ASI will keep their systems airgapped and secure and have a very small number of trusted and monitored employees acting as intermediaries.  MIRI's response was to demonstrate how even in such a paradigm, a sufficiently sophisticated system would have little trouble escaping the box.  Now, all of the frontier labs routinely feed their systems the entire internet and let those systems interact with any human on Earth and in many cases let those systems write and deploy their own code with no oversight, and some people say "haha, look, MIRI was wrong."  Those people are confused.)

Symbolic AI vs. deep learning was never a crux, for Eliezer or the MIRI view.  It was a non-crucial sidebar in which Eliezer had some intuitions and guesses, some of which he was more confident about and others less confident, and some of those guesses turned out wrong, and none of that ever mattered to the larger picture.  The crucial considerations are the power/sophistication/intelligence of the system, and the degree to which its true goals can be specified/pinned-down, and being wrong about whether deep learning or symbolic AI specifically were capable of reaching the required level of sophistication is mostly irrelevant.

One could argue "well, Eliezer proved himself incapable of predicting the future with those guesses!" but this would be, in my view, disingenuous.  Eliezer has long said, and continues to say, "look, guesses about how the board will look in the middle of the chess game are fraught, I'm willing to share my intuitions but they are far more likely to be wrong than right; it's hard to know what moves Stockfish will make or how the game will play out; what matters is that it's still easy to know with high confidence that Stockfish will win."

That claim was compelling to me in 2015, and it remains compelling to me in 2025, and the things that have happened in the world in the ten years in between have, on the whole, made the case for concern stronger rather than weaker.

To draw up one comment from your response below:

The author of the review's review does not demonstrate to me that they understand Collier's point.

...Collier's review does not even convincingly demonstrate that they read the book, since they get some extremely basic facts about it loudly, loudly wrong, in a manner that's fairly crucial for their criticisms.  I think that you should hold the reviewer and the review's reviewer to the same standard, rather than letting the person you agree with more off the hook.

Fair warning: I wrote this response less for Yarrow specifically and more for the benefit of the EA forum userbase writ large, so I'm not promising that I will engage much beyond this reply.  I might!  But I also might not.  I think I said the most important thing I had to say, in the above.

EDIT: oh, for more on how this:

In fact, there are plenty of reasons why the fact that AIs are grown and not crafted might cut against the MIRI argument. For one: The most advanced, generally capable AI systems around today are trained on human-generated text, encoding human values and modes of thought.

...is badly, badly wrong, see the supplemental materials for the book, particularly chapters 2, 3, and 4, which exhaustively addressed this point long before Collier ever made it, because we knew people would make it.  (It's also addressed in the book, but I guess Collier missed that in their haste to say a bunch of things it seems they already believed and were going to say regardless.)

(Speaking in my capacity as someone who currently works for MIRI)

I think the degree to which we withheld work from the public for fear of accelerating progress toward ASI might be a little overrepresented in the above.  We adopted a stance of closed-by-default research years ago for that reason, but that's not why e.g. we don't publish concrete and exhaustive lists of outputs and budget.

We do publish some lists of some outputs, and we do publish some degree of budgetary breakdowns, in some years.

But mainly, we think of ourselves as asking for money from only one of the two kinds of donors.  MIRI feels that it's pretty important to maintain strategic and tactical flexibility, to be able to do a bunch of different weird things that we think each have a small chance of working out without exhaustive justification of each one, and to avoid the trap of focusing only on clearly legible short chains of this—>that (as opposed to trying both legible and less-legible things).

(A colleague of mine once joked that "wages are for people who can demonstrate the value of their labor within a single hour; I can't do that, which is why I'm on a salary."  A similar principle applies here.)

In the past, funding MIRI led to outputs like our alignment research publications.  In the more recent past, funding MIRI has led to outputs like the work of our technical governance team, and the book (and its associated launch campaign and various public impacts).

That's enough for some donors—"If I fund these people, my money will go into various experiments that are all aimed at ameliorating existential risk from ASI, with a lean toward the sorts of things that no one else is trying, which means high variance and lots of stuff that doesn't pan out and the occasional home run."

Other donors are looking to more clearly purchase a specific known product, and those donors should rightly send fewer of their dollars to MIRI, because MIRI has never been and does not intend to ever be quite so clear and concrete and locked-in.

(One might ask "okay, well, why post on the EA forum, which is overwhelmingly populated by the other kind of donor, who wants to track the measurable effectiveness of their dollars?" and the answer is "mostly for the small number who are interested in MIRI-like efforts anyway, and also for historical reasons since the EA and rationality and AI safety communities share so much history."  Definitely we do not feel entitled to anyone's dollars, and the hesitations of any donor who doesn't want to send their money toward MIRI-like efforts are valid.)

Another way to think about this (imo) is "do you screen falsehoods immediately, such that none ever enter, or do you prune them later at leisure?"

Sometimes, assembling false things (such as rough approximations or heuristics!) can give you insight as to the general shape of a new Actually True thing, but discovering the new Actually True thing using only absolutely pure definite grounded vetted airtight parts would be way harder and wouldn't happen in expectation.

And if you're trying to (e.g.) go "okay, men are stronger than women, and adults are smarter than kids" and somebody interrupts to go "aCtUaLlY this is false" because they have a genuinely correct point about, e.g., the variance present in bell curves, and there being some specific women who are stronger than many men and some specific children who are smarter than many adults ... this whole thing just derails the central train of thought that was trying to go somewhere.

(And if the "aCtUaLlY" happens so reliably that you can viscerally feel it coming, as you start to type out your rough premises, you get demoralized before you even begin, close your draft, and go do something else instead.)

"This post is anonymised because I don't want to have to deal with the interpersonal consequences of the beliefs I hold; I don't want people knowing that I hold my beliefs, and would rather trick people into associating with me in ways they might not if they actually knew my true stance."

Selfish piggyback plug for the concept of sazen.

The essay itself is the argument for why EAs shouldn't steelman things like the TIME piece.

(I understand you're disagreeing with the essay and that's :thumbsup: but, like.)

If you set out to steelman things that were generated by a process antithetical to truth, what you end up with is something like [justifications for Christianity]; privileging-the-hypothesis is an unwise move.

If one has independent reasons to think that many of the major claims in the article are true, then I think the course most likely to not-mislead one is to follow those independent reasons, and not spend a lot of time anchored on words coming from a source that's pretty clearly not putting truth first on the priority list.

Load more