Thanks for this! Firstly these seem like two separate Qs (“which actors am I most worried about?” vs the specific risks posed by defender-first access programmes).
My current guess (as a relatively uninformed outsider) is that the most worrying near-term category actors with some combination of pre-existing domain knowledge, operational access, and (obviously) intent. So in bio I guess this would mean: state/state-adjacent programmes, small highly motivated teams with wet-lab access, and maybe compromised actors inside institutions that give them access. I definitely don't currently think frontier models are good enough that a literal random person could cause catastrophe if they wanted to (though this could change).
Re defender-first access specifically: yeah makes sense that this could add risk. E.g. if there's stuff like insider misuse, account compromise/coercion, leakage of outputs/workflows, etc.
I don't really have more concrete thoughts beyond this. It would be interesting to see some analysis (if there isn't already one internally) on how robust OAI's Rosalind Biodefense and Anthropic's Project Glasswing are to risks from risky actors getting into the programme.
(Somewhat relatedly I remember reading somewhere that Mythos reportedly got accessed by some unauthorised users via a third-party private Discord group or something).
Thanks for this! Firstly these seem like two separate Qs (“which actors am I most worried about?” vs the specific risks posed by defender-first access programmes).
My current guess (as a relatively uninformed outsider) is that the most worrying near-term category actors with some combination of pre-existing domain knowledge, operational access, and (obviously) intent. So in bio I guess this would mean: state/state-adjacent programmes, small highly motivated teams with wet-lab access, and maybe compromised actors inside institutions that give them access. I definitely don't currently think frontier models are good enough that a literal random person could cause catastrophe if they wanted to (though this could change).
Re defender-first access specifically: yeah makes sense that this could add risk. E.g. if there's stuff like insider misuse, account compromise/coercion, leakage of outputs/workflows, etc.
I don't really have more concrete thoughts beyond this. It would be interesting to see some analysis (if there isn't already one internally) on how robust OAI's Rosalind Biodefense and Anthropic's Project Glasswing are to risks from risky actors getting into the programme.
(Somewhat relatedly I remember reading somewhere that Mythos reportedly got accessed by some unauthorised users via a third-party private Discord group or something).