NB

Nathan_Barnard

803 karmaJoined Nov 2019

Bio

Blog at The Good Blog https://thegoodblog.substack.com/

Comments
74

Strongly there should be more explicit defences of this argument. 

One way of doing this in a co-operative way might working on co-operative AI stuff, since it seems to increase the likelihood that misaligned AI goes well, or at least less badly. 

Yeah, I think a Bayesian perspective is really helpful here and this reply seems right.  

I think overrated-underrated is useful because it's trying to say whether we should be doing more or less of X on the margin. Often it's much more useful to know whether something is good on the current margin rather than on average. 

There isn't only one notion of utility - utility in decision theory is different to utility in ethics. Utility in decision theory can indeed be derived from choices over lotteries and is incomparable between individuals (without further assumptions) and is equivalent under positive affine transformation because it's just representing choices. 

 Utility in moral philosophy refers to value and typically refers to the value of experiences (as opposed to other conceptions of the good like satisfaction of preferences), is comparable between individuals without further assumptions and isn't equivalent under positive affine transformation. 

An individual's utility (on either of the definitions) may or may not be changed by the political process. 

Consider a new far-right party entering the political sphere. They successfully changed political conversations to be more anti-immigration and have lots of focus on immigrant men committing sexual violence. 

A voter exposed to these new political conversations has their choice behaviour changed because they now feel more angry towards immigrants and want to hurt them, rather than because they think that more restrictive immigration policies would make them personally safer, for instance. 

This same voter also has utility - in the moral philosophy sense - changed by the new political conversation. Now they feel sadistic pleasure when they hear about immigrants being deported on the news, leading to better subjective experiences when they see immigrants being deported. 

I strongly reject the claim that we should imagine voters as exclusively deciding how to vote in terms of the personal benefits they derive in expectation from policies. I think people support capital punishment mostly because it fits with their inbuilt sense of justice rather than because they think it benefits them. 

We could (probably) represent this voter as being an expected utility maximiser where they have positive utility from capital punishment, in the decision theory sense. This is a different claim from the claim that a voter expects their subjective experiences to be more positively valenced when there's capital punishment. 

I'm afraid I can't comment on what ignorance factors do or do not account for under Bayesian regret without rereading the paper, but it's of course possible that they do account for that disparity between actual and assumed preferences. 

My views here are just deferring to gender scholars I respect. 

Yes I agree with this - but if this is part of the theory of change then Athena should probably privilege applicants with these different backgrounds and I don't know if they intend to do this.  

I'm sceptical that there are substantial benefits to generating AI safety research ideas from gender diversity. I haven't read the literature here, but my prior on these types of interventions is that the effect size is small. 

I regardless think Athena is good for the same reasons Lewis put forward in his comment - the evidence that women are excluded from male-dominated work environments seems strong and it's very important that we get as many talented researchers into AI safety as possible. This also seems especially like a problem in the AIS community where anecdotal claims of difficulties from unwanted romantic/sexual advances are common. 

I think the intellectual benefits from gender diversity claims haven't been subjected to sufficient scrutiny because it's convenient to believe. For this kind of claim, I would need to see high-quality causal inference research to believe it and I haven't seen this research and the article linked doesn't cite such research. The linked NatGeo article doesn't seem to me to bring relevant evidence to bear on the question. I completely buy that having more women in the life sciences leads to better medical treatment for women, but that causal mechanism at work here doesn't seem like it would apply to AI safety research. 

Load more