This blog post was written fast to communicate a concept I think is important. I may edit this post for legibility later.

I think evaluation and support mechanisms should be somewhat “air-gapped,” or isolated, in their information-gathering and decision-making processes. The incentives of optimal evaluators (to critique flaws) seem to run counter to the incentives of optimal supporters (to improve flaws). Individuals who might benefit from support may be discouraged from seeking it by fear of harsher evaluation if their private struggles are shared with evaluators. Evaluators who want to provide support may worry about compromising their evaluation ability if they make inconsistent exceptions. To optimally evaluate and support individuals, I believe that it is necessary to establish and declare appropriate information air gaps between different ecosystem roles.

Evaluation mechanisms, such as academic exams, job interviews, grant applications, and the peer review process, aim to critique an individual or their output. To be maximally effective, evaluation mechanisms should be somewhat adversarial to identify flaws and provide useful criticism. It is in the interests of evaluators to have access to all information about a candidate; however, it is not always in the candidate’s best interests to share all information that might affect the evaluation. It is also in the interests of evaluators for candidates to get access to all the support they need to improve.

If an attribute that disadvantages a job candidate (e.g., a disability) is protected by antidiscrimination law, an evaluator may be biased against the attribute either unconsciously or on the basis that it might genuinely reduce performance. Of course, evaluators should be required to ignore or overcome biases against protected attributes, but this “patch” may break or fail to convince candidates to divulge all evaluation-relevant information. Additionally, in the case that a candidate shares sensitive information with an evaluator, they might not have the appropriate resources or experience to provide beneficial support. Thus, an independent support role might benefit the interests of evaluators.

Support mechanisms, such as psychological counseling, legal support, and drug rehabilitation programs, aim to help individuals overcome their personal challenges, often to improve their chances at evaluation. To be maximally effective, support mechanisms should encourage candidates to divulge highly personal and unflattering information. It is in the interests of supporters to guarantee that sensitive information that could affect evaluation is not shared with evaluators (barring information that might prevent harm to others). Generally, the more information a supporter can access, the better support they can provide.

If a candidate has a secret challenge (e.g., a drug problem) that might rightly bias an evaluator (e.g., an employer), they might be motivated not to seek support for this problem if the supporter (e.g., a psychologist or support group) cannot guarantee this information will be kept private. Candidates can be told that evaluators will not punish them for revealing sensitive information, but this policy seems difficult to enforce convincingly. Thus, it is in the interests of supporters to advertise and uphold confidentiality.

A consequentialist who wants to both filter out poor candidates and benefit candidates who could improve from support will have to strike a balance between the competing incentives of evaluation and support. One particularly effective mechanism used in society is establishing and advertising independent, air-gapped evaluators and supporters. I think the EA and AI safety communities could benefit from more confidential support roles, like the CEA community health team, that understand the struggles unique to these communities (though such should never entirely replace professional legal or counseling services when those are appropriate).





More posts like this

Sorted by Click to highlight new comments since: Today at 1:27 PM

I feel like surprisingly often within EA the evaluation of people/orgs is not adversarial. I’ve heard of lots of cases of people being very transparent with hiring managers (as they are very keen for the manager to make a good decision) or with being very transparent funders where the applicant wants to know their project is worthwhile on the view of the fund.

I am not sure how cruxy this is with the claim that it should be air gapped by gapped by default but it seemed like people most of the time wanting the air gap was fairly important on your view to the key argument of the post.

I used to do this, i.e. try to be super open about everything. Not any more. The reason being the information bottleneck. There is no way I can possible transmit all relevant information, and my experience with funding evaluation (and some second hand anecdotes) is that I don't get a chance to clear up any misconception. So if I have some personal issue that someone might think would interfere with my job, but which I have strong reason to think would not be a problem for complicated reasons, then I would just keep quiet about it to funders and other evaluators. 

Sure in a perfect world where there where not information constraints (and also assuming everyone is aligned) then reviling everything is an optimal policy. But this is not the world we live in.

Yeah, I think that EA is far better at encouraging and supporting disclosure to evaluators than, for example, private industry. I also think EAs are more likely to genuinely report their failures (and I take pride in doing this myself, to the extent I'm able). However, I feel that there is still room for more support in the EA community that is decoupled from evaluation, for individuals that might benefit from this.

I think the EA and AI safety communities could benefit from more confidential support roles, like the CEA community health team

They are not air-gaped!


On the other hand Shay is 

AI Safety Support - Health Coach

I'm also pretty sure AISS's job coaching is air gaped too, but I'm only 90% sure. I'll ping JJ to ask

I mostly agree with this post though I can think of some concrete cases where I’m more confused.

I think in most cases that come to mind the support services are already pretty airgapped from the evaluators. Can you point to some examples where you think that the gap is insufficient, or is this mostly pointing at a feature you like to see in support services?

I think of "management responsibilities" as very much putting on both hats. Though this seems like a very normal arrangement in broader society, so presumably not what Ryan is pointing towards.

In my management role, I have to juggle these responsibilities. I think a HR department should generally exist, even if management is really fair and only wants the best for the world, we promise (not bad faith, just humour).

It would not surprise me if most HR departments are set up as the result of lots of political pressures from various special interests within orgs, and that they are mostly useless at their “support” role.

With more confidence, I’d guess a smart person could think of a far better way to do support that looks nothing like an HR department.

I think MATS would be far better served by ignoring the HR frame, and just trying to rederive all the properties of what an org which does support well would look like. The above post looks like a good start, but it’d be a shame if you all just went with a human human resources department. Traditional companies do not in fact seem like they would be good at the thing you are talking about here.

Unless there’s some weird incentives I know nothing about, effective community support is the kind of thing you should expect to do better than all of civilization at, if you are willing to think about it from first principles for 10 minutes.

I'm not advocating a stock HR department with my comment. I used "HR" as a shorthand for "community health agent who is focused on support over evaluation." This is why I didn't refer to HR departments in my post. Corporate HR seems flawed in obvious ways, though I think it's probably usually better than nothing, at least for tail risks.

This post is mainly explaining part of what I'm currently thinking about regarding community health in EA and at MATS. If I think of concrete, shareable examples of concerns regarding insufficient air-gapping in EA or AI safety, I'll share them here.

I think that what you are pointing to is a real distinction. I'd also point at:

  • Evaluations done when the individual wants to improve themselves
  • Evaluations done when the individual is being assessed by an external party (e.g,. job interview, certification exam). 

In principle, evaluations of the first kind could also be super invasive, and it might be in the interest of the candidate for them to be so.

Just wanted to add that since your post doesn't quite contemplate evaluations of the first kind kind.

I think the distinction you make is real. In the language of this post, I consider the first type of evaluation you mention as a form of "support." Whether someone desires comfort or criticism, they might prefer this to be decoupled from evaluation that might disadvantage them.

Curated and popular this week
Relevant opportunities