Paul, I think deceptive alignment (or other spontaneous, stable-across-situations goal pursuit) after just pretraining is very unlikely. I am happy to take bets if you're interested. If so, email me (alex@turntrout.com), since I don't check this very much.Â
I think that "deceptively aligned during pre-training" is closer to e.g. Eliezer's historical views.
I agree, and the actual published arguments for deceptive alignment I've seen don't depend on any difference between pretraining and finetuning, so they can't only apply to one. (People have tried to claim to me, unsurprisingly, that the arguments haven't historically focused on pretraining.)
The EA community has a significant undersupply of information from victims of abusive conduct, since the victims are often branded as "triggered" or "irrational". I've heard this from female friends, I've read about this (e.g. in the TIME article), and I myself paid social costs in sharing a different kind of negative experience. Victims often pay significant social costs to talk about their experiences.
Community norms should not impose costs on sharing such information. I'm sorry you had to pay these costs, Frances. Thank you for speaking out. Hopefully this post decreases the cost in these communities. In fact, such important information should be socially subsidized, not taxed (since e.g. speaking out often requires reliving trauma, which is unpleasant; and most of the benefit is external).