Working on behavioral research, AI governance, compute governance. Previously an IAPS Fellow, Brown.
Happy to chat about anything, just reach out.
I wrote some criticism in this comment. Mainly, I argue that
(1) A pause could be undesirable. A pause could be net-negative in expectation (with high variance depending on implementation specifics), and that PauseAI should take this concern more seriously.
(2) Fighting doesn't necessarily bring you closer to winning. PauseAI's approach *could* be counterproductive even for the aim of achieving a pause, whether or not it's desirable. From my comment:
Although the analogy of war is compelling and lends itself well to your post's argument, in politics fighting often does not get one closer to winning. Putting up a bad fight may be worse than putting up no fight at all. If the goal is winning (instead of just putting up a fight), then taking criticism to your fighting style seriously should be paramount.
This is a valuable post, but I don't think it engages with a lot of the concern about PauseAI advocacy. I have two main reasons why I broadly disagree:
AI safety is an area with a lot of uncertainty. Importantly, this uncertainty isn't merely about the nature of the risks but about the impact of potential interventions.
Of all interventions, pausing AI development is, some think, a particularly risky one. There are dangers like:
People at PauseAI are probably less concerned about the above (or more concerned about model autonomy, catastrophic risks, and short timelines).
Although you may have felt that you did your "scouting" work and arrived at a position worth defending as a warrior, others' comparably thorough scouting work has led them to a different position. Their opposition to your warrior-like advocacy, then, may not come (as your post suggests) from a purist notion that we should preserve elite epistemics at the cost of impact, but from a fundamental disagreement about the desirability of the consequences of a pause (or other policies), or of advocacy for a pause.
If our shared goal is the clichéd securing-benefits-and-minimizing-risks, or even just minimizing risks, one should be open to thoughtful colleagues' input that one's actions may be counterproductive to that end-goal.
2. Fighting does not necessarily get one closer to winning.
Although the analogy of war is compelling and lends itself well to your post's argument, in politics fighting often does not get one closer to winning. Putting up a bad fight may be worse than putting up no fight at all. If the goal is winning (instead of just putting up a fight), then taking criticism to your fighting style seriously should be paramount.
I still concede that a lot of people dismiss PauseAI merely because they see it as cringe. But I don't think this is the core of most thoughtful people's criticism.
To be very clear, I'm not saying that PauseAI people are wrong, or that a pause will always be undesirable, or that they are using the wrong methods. I am answering to
(1) the feeling that this post dismissed criticism of PauseAI without engaging with object-level arguments, and the feeing that this post wrongly ascribed outside criticism to epistemic purism and a reluctance to "do the dirty work," and
(2) the idea that the scout-work is "done" already and an AI pause is currently desirable. (I'm not sure I'm right here at all, but I have reasons [above] to think that PauseAI shouldn't be so sure either.)
Sorry for not editing this better, I wanted to write it quickly. I welcome people's responses though I may not be able to answer to them!
A few quick ideas:
1. On the methods side, I find the potential use of LLMs/AI as research participants in psychology studies interesting (not necessarily related to safety). This may sound ridiculous at first but I think the studies are really interesting.
From my post on studying AI-nuclear integration with methods from psychology:
[Using] LLMs as participants in a survey experiment, something that is seeing growing interest in the social sciences (see Manning, Zhu, & Horton, 2024; Argyle et al., 2023; Dillion et al., 2023; Grossmann et al., 2023).
2. You may be interested or get good ideas from the Large Language Model Psychology research agenda (safety-focused). I haven't gone into it so this is not an endorsement.
3. Then you have comparative analyses of human and LLM behavior. E.g. the Human vs. Machine paper (Lamparth, 2024) compares humans and LLMs' decision-making in a wargame. I do something similar with a nuclear decision-making simulation, but it's not in paper/preprint form yet.
Thanks Clare! Your comment was super informative and thorough.
One thing that I would lightly dispute is that 360 feedback is easily gameable. I (anecdotally) feel like people with malevolent traits (“psychopaths” here) often have trouble remaining “undiscovered” and so have to constantly move or change social circles.
Of course, almost by definition I wouldn’t know any psychopaths that are still undiscovered. But 360 feedback could help discover the “discoverable” subgroup, since the test is not easily gameable by them.
Any thoughts?
Glad it was useful!
I don't feel qualified to give advice on teaching a language to small kids, although I do have a few thoughts. Please take them with a grain of salt, as I've never done this.
I'm assuming you mean your kids, not kids in a classroom? If this is the case:
That's all I could think of. That said, I think a quick Google/YouTube search might uncover much more valuable guidance on this!
This is missing the point of my 2nd argument. It sure sounds better to "fight and lose than roll over and die."
But I'm saying that "fighting" in the way that PauseAI is "fighting" could make it more likely that you lose.
Not saying "fighting" in general will have this effect. Or that this won't ever change. Or that I'm confident about this. Just saying: take criticism seriously, acknowledge the uncertainty, don't rush into action just because you want to do something.
Unrelated to my argument: Not sure what you mean by "high probability" but I'd take a combination of these views are a reasonable prior: XPT.