In my experience, orgs work much harder to get donations from a "grantmaker" than from an individual.
I made my first big donation in 2015, where I donated $20K to REG. I talked to a bunch of orgs in the process of trying to decide where to donate. Some of them didn't respond at all, and many of their responses were shallow.
A few months later, I took a philanthropy class at Stanford where we split up into groups and each group was responsible for figuring out where to donate a $20K grant. The level of communication I got from nonprofits was dramatically different. Orgs bent over backwards to be as communicative and helpful as possible.
My experience was that orgs didn't put much priority on a $20K grant from me as an individual, but they jumped at the possibility of a $20K grant from a Stanford Grantmaker.
For my future donations, I'm considering whether I should rebrand my emails: I could tell nonprofits something like "I'm reaching out as a representative on behalf of the Greatest Happiness Fund, a grantmaker that focuses on supporting effective charities" (Greatest Happiness Fund is the name of my DAF). Maybe I would get better responses that way. It feels a little manipulative though.
I was reading the AI-enabled coups report and one of the mitigations is roughly "build the AI model such that it refuses to do coups". If you can pull that off then it solves the problem, but it means the model is incorrigible, and therefore you have to exactly specify the correct values up front because you're locked-in.
There might be some other report saying the way to avoid bad value lock-in is to make AI corrigible, and you can object to that by pointing out how it enables coups.
You can't look at the problems separately, you have to consider them at the same time and find a solution that works for every problem.
Having an AI that doesn't willingly participate in coups doesn't imply that you need to specify all of the AI's values in advance, or that it will be incorrigible in a broad (and x-risk increasing sense).
I think that the people preventing AI-assisted coups are imagining pretty corrigible AIs (in the sense that Claude right now is very corrigible); they just won't want to do coups (in a similar sense to Claude not wanting to help with bioweapons research), and this just seems pretty workable.
Copying from a comment I wrote yesterday:
Either ASI has more than zero values locked in, or it's fully corrigible. If any values at all are locked in, then we need to have a pretty robust understanding of what the consequences of that will be, because we can't change it ever. Like I don't think we know how to encode something like "don't let people do power grabs, but be fully corrigible in every other way". I don't know how much that's downstream of the facts that (1) we don't know how to encode any values at all and (2) we don't know how to encode corrigibility, but my intuition is that even if we solve #1 and #2, the problem of "don't pick incorrigible values that will screw everything up down the road" is still a hard problem.
This is related to Max Harms' work on CAST. Part of his argument is that pure corrigibility is a more robust target than any set of values because a near miss fails gracefully. Whereas if you try to encode any values at all, a near miss could be catastrophic. He's talking more about the "AI kills everyone" flavor of catastrophe, which is valid, but what I'm talking about here is more that a near miss could permanently lock us in to a bad (or maybe just not-that-good) future. Different argument but the concern arises for a similar reason—if you're specifying values, then you have to get the specification right, beyond just ensuring that the AI does what you want.