I'm posting this in preparation for Draft Amnesty Week (Feb 24- March 2), but please also use this thread for posts you don't plan to write for Draft Amnesty. The last time I posted this question, there were some great responses.
If you have multiple ideas, I'd recommend putting them in different answers, so that people can respond to them separately.
It would be great to see:
- Both spur-of-the-moment vague ideas, and further along considered ideas. If you're in that latter camp, you could even share a google doc for feedback on an outline.
- Commenters signalling with Reactions and upvotes the content that they'd like to see written.
- Commenters responding with helpful resources or suggestions.
- Commenters proposing Dialogues with authors who suggest similar ideas, or which they have an interesting disagreement with (Draft Amnesty Week might be a great time for scrappy/ unedited dialogues).
Draft Amnesty Week
If the responses here encourage you to develop one of your ideas, Draft Amnesty Week (February 24- March 2) might be a great time to post it. Posts tagged "Draft Amnesty Week" don't have to be thoroughly thought through or even fully drafted. Bullet points and missing sections are allowed. You can have a lower bar for posting.
My own take is that while I don't want to defend the "find a correct utility function" approach to alignment to be sufficient at this time, I do think it is actually necessary, and that the modern era is an anomaly in how much we can get away with misalignment being checked by institutions that go beyond an individual.
The basic reason why we can get away with not solving the alignment problem is that humans depend on other humans, and in particular you cannot replace humans with much cheaper workers that have their preferences controlled arbitrarily.
AI threatens the need to depend on other humans, which is a critical part of how we can get away with not needing the correct utility function.
I like the Intelligence Curse series because it points out that an elite that doesn't need the commoners for anything and the commoners have no selfish value to the elite fundamentally means that by default, the elites starve the commoners to death without them being value aligned.
The Intelligence Curse series is below:
https://intelligence-curse.ai/defining/
The AIs are the elites, and the rest of humanity is the commoners in this analogy.