EG

Erich_Grunewald 🔸

Senior Researcher @ Institute for AI Policy and Strategy
2832 karmaJoined Working (6-15 years)Berlin, Germanywww.erichgrunewald.com

Bio

Anything I write here is written purely on my own behalf, and does not represent my employer's views (unless otherwise noted).

Comments
312

A much better order of operations would be to 1) try to negotiate with China to establish an international regulatory framework (plan A), with export control and other stuff being imposed as something that is explicitly linked to China not agreeing to that framework, in the same way sanctions on Russia are imposed explicitly because its aggression against Ukraine, and 2) only if they refuse, try to crush them (plan B).

Maybe if you are President of the United States you can first try the one thing, and then the other. But from the perspective of an individual, you have to assume there's some probability of each of these plans (and other strategies) being executed, and that everything will be really messy (e.g., different actors having different strategies in mind, even within the US). Softening export controls seems like something you could do as part of executing Plan A, but as I mentioned above, it's very unclear to me whether unilaterally doing so makes Plan A more likely to be the chosen strategy, and it does likely make Plan B and Plan C go worse.

When political will in the US to try for plan A is lacking, I think waiting until circumstances make that plan realistic while preparing the groundwork for it is a better strategy than going straight ahead for plan B.

I think you're thinking people have more control over which strategy is adopted than I think they do? Or, what circumstances do you have in mind? Because waiting seems pretty costly.

But I think maybe the cruxiest bits are (a) I think export controls seem great in Plan B/C worlds, which seem much likelier than Plan A worlds, and (b) I think unilaterally easing export controls is unlikely to substantially affect the likelihood of Plan A happening (all else equal). It seems like you disagree with both, or at least with (b)?

I think if your sole objective is to enact a bilateral pause, then easing export controls may be the best option, or maybe not. It's pretty unclear to me how that shakes out, I could definitely also see unilateral concessions as being quite detrimental (for reasons similar to those Peter mention in the other comment).

But I would guess most of the people you are responding to think enacting a cooperative pause is some combination of very unlikely and/or undesirable, and also that export controls help a lot in the absence of such an agreement. The main way export controls help for other plans are by giving the US more slack to (one can hope) spend on safety, and/or because superintelligence developed in the US would imo likely be safer (cf. this comment), and/or because imo US values are better, and this would likely be reflected in the AIs (cf. Claude versus Grok).

If you thought Yudkowsky and Soares used overly confident language and would have taken the "QED" as further evidence of that, but this particular example turns out not to have been written by Yudkowsky and Soares, that's some evidence against your hypothesis. But instead of updating away a little, you seemed to dismiss that evidence and double down. (I think you originally replied to the original comment approvingly or at least non-critically, but then deleted that comment after I replied to it, but I could be misremembering that.)

For what it's worth, I think you're right that Yudkowsky at least uses overly confident language sometimes -- or I should say, is overly confident sometimes, because I think his language generally reflects his beliefs -- but I would've been surprised to see him use "QED" in that way, which is why I reacted to the original comment here with skepticism and checked whether "QED" actually appeared in the book (it didn't). I take that to imply I was better calibrated than anyone who did not so react.

Interesting!

Given that these failures were predictable, it should be possible to systematically predict many analogous failures that might result from training AI systems on specific data sets or (simulated) environments.

Your framework seems to work for simple cases like "ice cream, sucralose, or sex with contraception", but I don't think it works for more complex cases like "peacocks would like giant colorful tails"?

There is so much human behaviour also that would have been essentially impossible to predict just from first principles and natural selection under constraints: poetry, chess playing, comedy, monasticism, sports, philosophy, effective altruism. These behaviours seem further removed from your detectors for instrumentally important subgoals, and/or to have a more complex relationship to those detectors, but they're still widespread and important parts of human life. This seems to support the argument that the relationship between how a mind was evolved (e.g., by natural selection) and what it ends up wanting is unpredictable, possibly in dangerous ways.

Your model might still tell us that generalisation failures are very likely to occur, even if, as I am suggesting, it can't predict many of the specific ways things will misgeneralise. But I'm not sure this offers much practical guidance when trying to develop safer AI systems. But maybe I'm wrong about that?

Who are "they"? If you mean Yudkowsky and Soares, "QED" is something that Hanson (the author of this critique) includes in his paraphrase of Yudkowsky and Soares, but I don't think it's anything Yudkowsky and Soares wrote in their book. The quoted argument is not actually a quote, but a paraphrase.

For what it's worth, I would guess that though the "funness" of AI safety research, or maybe especially technical AI safety research, is probably a factor in determining how many people are interested in working on that, I would be surprised if it's a factor in determining how much money is allocated towards that as a field.

(To be clear, I do think many of these charities do some good and are run with the best of intentions, etc. But I still also stand by the statement in the parent comment.)

That is the most PR-optimized list of donations I have ever seen in my life.

Load more