EG

Erich_Grunewald 🔸

Senior Researcher @ Institute for AI Policy and Strategy
2823 karmaJoined Working (6-15 years)Berlin, Germanywww.erichgrunewald.com

Bio

Anything I write here is written purely on my own behalf, and does not represent my employer's views (unless otherwise noted).

Comments
310

If you thought Yudkowsky and Soares used overly confident language and would have taken the "QED" as further evidence of that, but this particular example turns out not to have been written by Yudkowsky and Soares, that's some evidence against your hypothesis. But instead of updating away a little, you seemed to dismiss that evidence and double down. (I think you originally replied to the original comment approvingly or at least non-critically, but then deleted that comment after I replied to it, but I could be misremembering that.)

For what it's worth, I think you're right that Yudkowsky at least uses overly confident language sometimes -- or I should say, is overly confident sometimes, because I think his language generally reflects his beliefs -- but I would've been surprised to see him use "QED" in that way, which is why I reacted to the original comment here with skepticism and checked whether "QED" actually appeared in the book (it didn't). I take that to imply I was better calibrated than anyone who did not so react.

Interesting!

Given that these failures were predictable, it should be possible to systematically predict many analogous failures that might result from training AI systems on specific data sets or (simulated) environments.

Your framework seems to work for simple cases like "ice cream, sucralose, or sex with contraception", but I don't think it works for more complex cases like "peacocks would like giant colorful tails"?

There is so much human behaviour also that would have been essentially impossible to predict just from first principles and natural selection under constraints: poetry, chess playing, comedy, monasticism, sports, philosophy, effective altruism. These behaviours seem further removed from your detectors for instrumentally important subgoals, and/or to have a more complex relationship to those detectors, but they're still widespread and important parts of human life. This seems to support the argument that the relationship between how a mind was evolved (e.g., by natural selection) and what it ends up wanting is unpredictable, possibly in dangerous ways.

Your model might still tell us that generalisation failures are very likely to occur, even if, as I am suggesting, it can't predict many of the specific ways things will misgeneralise. But I'm not sure this offers much practical guidance when trying to develop safer AI systems. But maybe I'm wrong about that?

Who are "they"? If you mean Yudkowsky and Soares, "QED" is something that Hanson (the author of this critique) includes in his paraphrase of Yudkowsky and Soares, but I don't think it's anything Yudkowsky and Soares wrote in their book. The quoted argument is not actually a quote, but a paraphrase.

For what it's worth, I would guess that though the "funness" of AI safety research, or maybe especially technical AI safety research, is probably a factor in determining how many people are interested in working on that, I would be surprised if it's a factor in determining how much money is allocated towards that as a field.

(To be clear, I do think many of these charities do some good and are run with the best of intentions, etc. But I still also stand by the statement in the parent comment.)

That is the most PR-optimized list of donations I have ever seen in my life.

Thanks for sharing this. I did an Erasmus exchange year in Italy in 2010-11 that was very important for my personal growth, although it was not particularly beneficial professionally or academically.

Nice work!

On AI chip smuggling, rather than the report you listed, which is rather outdated now, I recommend reading Countering AI Chip Smuggling Has Become a National Security Priority, which is essentially a Pareto improvement over the older one.

I also think Chris Miller's How US Export Controls Have (and Haven't) Curbed Chinese AI provides a good overview of the AI chip export controls, and it is still quite up-to-date.

Load more