As someone trying to start a social movement (PauseAI), I wish EAs were more understanding and forgiving that there isn't a great literature I can just follow. I feel confident that jumping in and finding my way was a good thing to do because advocacy and activism were neglected angles to a very important problem.
Most of my thinking and decision-making with PauseAI US is based on my world model, not specific beliefs about the efficacy of different practices or philosophies in other social movements. I expect local conditions and factors specific to the topic and landscape of AI and organizational factors like my leadership style to be more important than which approach is "best" ceteris parabis.
This is totally spitballing, but doing anything that encourages modularity in the circuits (or perhaps at another level?) of the AIs and the ability to swap mind modules would be really good for interpretability.
Ever since this project, I've had a vague sense that genome architecture has something interesting to teach us about interpreting/predicting NNs, but I've never had a particularly useful insight from it. Love this book on it by Micheal Lynch if anyone's interested.
I've heard this idea of AI group selection floated a few times but people used to say it was too computationally intensive. Now who knows?
Closest biology the idea brings to mind is this paper showing that selecting chickens as groups leads to better overall yields (in factory farming :( ) for the reasons you predict-- they aren't as aggressive or stressed by crowding as the chickens that are individually selected for the biggest yields.
The examples of gradient hackers with positive effects seem like they could be following the pattern of "here's a sub-system doing something bad (e.g. transposons copying themselves incessantly), which the system needs to defend against, so the system finds a way (e.g. introns) to defend which carries other (maybe greater) benefits but which wouldn't have been found otherwise", does that seem like it explains things?
Yes, this is broadly accurate from my knowledge of positive examples (for the organism) of drive. They either contribute more scratch (TEs) or they drive through a nifty innovation (homing endonucleases for mating type switching in yeast, VJD recombination in immune cells) that can be coopted. It's possible there are other positive contributions that we don't know about, of course.
I knew it! I've been wondering about this for literally years, thanks for confirming that this is a thing that happens.
The coolest example is Cupressus dupreziana, the androgenetic cypress. It's hard to observe a history of extinctions from meiotic drive, bc it's not a cause of death that fossilizes, but this one we're seeing just right before it completes. When I learned about this, there were only 28 individuals left in this species. Genome Exclusion is covered in chapter 10 of Burt & Trivers.
Re:analogies to recombination, I did think as I was preparing these old notes to post that possibly I should see the cost function or the task being trained on as somewhat analogous in the sense that they are sort of templates against which performance is being checked? It's a very tenuous thought and I can't quite make the analogy work, but maybe you or someone else can do something with it.
I have to admit, I wouldn't have taken it to heart much if these studies hadn't found much effect (nor if they had found a huge effect). And I feel exposed here bc I know that looks bad, like I'm resisting actual evidence in favor of my vibes, but I really think my model is better and the evidence in these studies should only tweak it.
I'm just not that hopeful that you can control enough of the variables with the few historical examples we have to really know that through this kind of analysis. I also think the defining of aims and impacts is too narrow-- Overton window pushing can manifest in many, many ways and still be contributing to the desired solution.
I'm comfortable with pursuing protests through PauseAI US because they were a missing mood in the AI Safety discussion. They are a form of discussion and persuasion, and I approach them similarly to how I decide to address AI danger in writing or in interviews. They are also a form of showing up in force for the cause, in a way that genuinely signals commitment bc it is very hard to get people to do it, which is important to movement building even when small. The only point of protests is not to get whatever the theme of that protest was (the theme of our protests is either shutting down your company or getting an international treaty lol)-- they feed back into the whole movement and community, which can have many unanticipated but directionally desirable impacts.
I don't think my approach to protests is perfect by any means, and I may have emphasized them too much and failed to do things I should to grow them. But I make my calls about doing protests based on many considerations for how it will affect the rhetorical and emotional environment of the space. I wish there were studies that could tell me how to do this better, but there aren't, just like there aren't studies that tell me exactly what to write to change people's minds on AI danger in the right way. (Actually, a good comparison here would be "does persuasive writing work?" bc there we all have personal experiences of knowing it worked, but actually as a whole the evidence might be thin for it achieving its aims.)
What is the upshot of this? Is this for new audiences to read? It seems like the most straightforward application of it is futures betting, not positively influencing the future.
Perhaps you're indicating that if the money will run out if frontier-AI doesn't becoming self-sustaining by 2030? Maybe we can do something to make that more likely?
Because I do struggle to see how this helps.
When I learned more about eyestalk ablation reviewing the Rethink Priorities report, I was surprised how little it seemed to bother the shrimp, and I did downgrade my concern about welfare from that particular practice. However, I think what people are reacting to is more the barbarity of it than the level or amount of harm. (After all, they already knew the shrimp get killed at the end.) I think it's just so bizarre and gross and exploitative-feeling that it shocks them out of complacency in how they view the shrimp. I think they helplessly imagine themselves losing their own eye and they empathize with the shrimp in a powerful, gut-level way, and that this is why it has been impactful to talk about.
Imo the biggest reason not to do this is that it's labeling the person or getting at their character. There's a threat implied that they will be dismissed out of hand bc they are categorically in bad faith. It can be weaponized.