Alexander Turner 🔸

Research scientist @ Google DeepMind
34 karmaJoined Working (0-5 years)turntrout.com

Participation
1

  • Attended an EA Global conference

Comments
2

I’ve repeatedly encountered the harmful notion that it is easy for victims to speak up; that they are immediately rallied behind, celebrated, and believed. This is not true. I was not rallied behind. I had to put in an unspeakable amount of effort for more than a year, and eventually invoke outside legal counsel, to see what I consider basic recognition.

The EA community has a significant undersupply of information from victims of abusive conduct, since the victims are often branded as "triggered" or "irrational". I've heard this from female friends, I've read about this (e.g. in the TIME article), and I myself paid social costs in sharing a different kind of negative experience. Victims often pay significant social costs to talk about their experiences.

Community norms should not impose costs on sharing such information. I'm sorry you had to pay these costs, Frances. Thank you for speaking out. Hopefully this post decreases the cost in these communities. In fact, such important information should be socially subsidized, not taxed (since e.g. speaking out often requires reliving trauma, which is unpleasant; and most of the benefit is external).

Paul, I think deceptive alignment (or other spontaneous, stable-across-situations goal pursuit) after just pretraining is very unlikely. I am happy to take bets if you're interested. If so, email me (alex@turntrout.com), since I don't check this very much. 

I think that "deceptively aligned during pre-training" is closer to e.g. Eliezer's historical views.

I agree, and the actual published arguments for deceptive alignment I've seen don't depend on any difference between pretraining and finetuning, so they can't only apply to one. (People have tried to claim to me, unsurprisingly, that the arguments haven't historically focused on pretraining.)