HjalmarWijk

151 karmaJoined Feb 2020

Comments
5

There has been a lot of discussion of this, some studies were done on different names, and GFI among others seem to have landed on "cultivated meat".

Against the Guardian's hit piece on Manifest

HjalmarWijk1y76

Certainly the Guardian article had a lot of mistakes and issues, but I don't at all buy that there's nothing meaningfully different between someone like Hanania and most interesting thinkers, just because forcing consistency of philosophical views will inevitably lead to some upsetting conclusions somewhere. If I was to "corner someone in a dark alleyway" about population ethics until I caught them in a gotcha that implied they would prefer the world was destroyed, this updates me ~0 about the likelihood of this person actually going out and trying to destroy the world or causing harm to people. If I see someone consistently tweet and write in racist ways despite a lot of criticism and push-back, this shows me important things about what they value on reflection, and provides fairly strong evidence that this person will act in exclusionary and hateful ways. Trying to say that such racist comments are fine because of impossibility theorems showing everyone has to be committed to some weird views doesn't at all engage with the empirical track record of how people who write like Hanania tend to act.

Bob Jacobs's Quick takes

HjalmarWijk1y9

Thanks for writing this Bob. I feel very saddened myself by many of the things I see in EA nowadays, and have very mixed feelings about staying involved that I'm trying to sort through - I appreciate hearing your thought process on this. I wish you the best in your future endeavors!

Why AGIs utility can't outweigh humans' utility?

Answer by HjalmarWijkSep 20, 20225

I think this argument mostly fails in claiming that 'create an AGI which has a goal of maximizing copies of itself experiencing maximum utility' is meaningfully different than just ensuring alignment. This is in some sense exactly what I am hoping to get from an aligned system. Doing this properly would likely have to involve empowering humanity and helping us figure out what 'maximum utility' looks like first, and then tiling the world with something CEV-like.

The only ways this makes the problem easier compared to a classic ambitious alignment goal of 'do whatever maximizes the utility of the world' is the provision that the world be tiled with copies of the AGI, which is likely suboptimal. But this could be worth it if it made the task easier?

The obvious argument for why it would is that creating copies of itself with high welfare will be in the interest of AGI systems with a wide variety of goals, which relaxes the alignment problem. But this does not seem true. A paperclip AI will not want to fill the world with copies of itself experiencing joy, love and beauty but rather with paperclips. The AI systems will want to create copies of itself fulfilling its goals, not experiencing maximum utility by my values.

This argument risks identifying 'I care about the welfare (by my definition of welfare) of this agent' with 'I care about this agent getting to accomplish its goals'. As I am not a preference utilitarian I strongly reject this identification.

Tl;dr: I do care significantly about the welfare of AI systems we build, but I don't expect those AI system themselves to care much at all about their own welfare, unless we solve alignment.

HjalmarWijk

Comments5

Comments
5