I'd like to distill AI Safety posts and papers, and I'd like to see more distillations generally. Ideally, posts and papers would meet the following criteria:
- Potentially high-impact for more people to understand
- Uses a lot of jargon or is generally complex and difficult to understand
- Not as well-known as you think they should be (in the AI X-risk space)
What posts meet these criteria?
I think the longer blog posts by Anthropic and OpenAI on their approaches to alignment are very important, under-appreciated and sometimes (I think falsely) dismissed as disingenuous.
Commentary from skeptical researchers about these plans could be interesting to include as well.