NN

Neel Nanda

6272 karmaJoined neelnanda.io

Bio

I lead the DeepMind mechanistic interpretability team

Comments
456

I'd argue that you also need some assumptions around is-ought, whether to be a consequentialist or not, what else (if at all) you value and how this trades off against suffering, etc. And you also need to decide on some boundaries for which entities are capable of suffering in a meaningful way, which there's wide spread disagreement on (in a way that imo goes beyond being empirical)

It's enough to get you something like "if suffering can be averted costlessly then this is a good thing" but that's pretty rarely practically relevant. Everything has a cost

I agree that you need ridiculously fundamental assumptions like "I am not a Boltzmann brain that ephemerally emerged from the aether and is about to vanish" and "we are not in a simulation". But if you have that kind of thing, I think you can reasonably discuss objective reality

Less controversial is a very long way from objective - why do you think that "caring about the flourishing of society" is objectively ethical?

Re the idea of an attractor, idk, history has sure had lot of popular beliefs I find abhorrent. How do we know there even is convergence at all rather than cycles? And why does being convergent imply objective? If you told me that the supermajority of civilization concluded that torturing criminals was morally good, that would not make me think it was ethical.

My overall take is that objective is just an incredibly strong word for which you need incredibly strong justifications, and your justifications don't seem close, they seem more about "this is a Schelling point" or "this is a reasonable default that we can build a coalition around"

Idk, I would just downvote posts with unproductively bad titles, and not downvote posts with strong but justified titles. Further posts that seem superficially justified but actually don't justify the title properly are also things I dislike and downvote. I don't think we need a slippery slope argument here when the naive strategy works fine

What do you say to someone who doesn't share your goals? Eg thinks that happiness is only justified if it's earned, and that most people do not deserve it, as they do "bad thing X", and being against promoting happiness to them

Neel Nanda
11
4
3
100% disagree

Morality is Objective

What would this even mean? If I assert that X is wrong, and someone else asserts that it's fine, how do we resolve this? We can appeal to common values that derive this conclusion, but that's pretty arbitrary and largely just feels like my opinion. Claiming that morality is objective just feels groundless. 

Yep, this seems extremely reasonable - I am in practice far more annoyed if a piece makes attacks and does not deliver

I agree in general, but think that titotal's specific use was fine. In my opinion, the main goal of that post was not to engage the AI 2037, which had already be done extensively in private but rather to communicate their views to the broader community. Titles in particular are extremely limited, many people only read the title, and titles are a key way people decide whether to eat on, and efficiency of communication is extremely important. The point they were trying to convey was these models that are treated as high status and prestigious should not be and I disagree that non-violent communication could have achieved a similar effect to that title (note, I don't particularly like how they framed the post, but I think this was perfectly reasonable from their perspective)

I think A>B, eg I often find people who don't know each other in London who it is valuable to introduce. People are not as on the ball as you think, the market is very far from efficient

Though many of the useful intros I make are very international, and I would guess that it's most useful to have a broad network across the world. So maybe C is best, though I expect that regular conference and business trips are enough

I think this is reasonable as a way for the community to reflexively react to things, to be honest. The question I'm trying to answer when I see someone making a post with an argument that seems worth engaging with is: what's the probability that I'll learn something new or change my mind as a result of engaging with this?

When there's a foundational assumption disagreement, it's quite difficult to have productive conversations. The conversation kind of needs to be about the disagreement about that assumption, which is a fairly specific kind of discussion. Eg if someone hasn't really thought about AI alignment much, thinks it's not an issue, but isn't familiar with the reasons I believe it matters, then I put a much lower (though still non-zero) probability that I'll make useful updates from talking to them. Because I have a bunch of standard arguments for the most obvious objections people sometimes raise, and don't learn much from stating them. And I think there's a lot of value to having high-context discussion spaces where people broadly agree on these foundational claims.

These foundational claims are pretty difficult to establish consensus on if people have different priors, and discussing them doesn't really tend to move people either way. I get a lot of value from discussing technical details of what working on AI safety is like with people, much more so than I get from the average "does AI safety matter at all?" conversation.

Obviously, if someone could convince me that AI safety doesn't matter, that would be a big deal. But I'd guess it's only really worth the effort if I'm reasonably sure the person understands why I believe it does matter and disagrees anyway, in a way that isn't stemming from some intractable foundational disagreements in worldviews

Load more