Strawmen, steelmen, and mithrilmen: getting the principle of charity right

MichaelPlant

Something you hear a lot in discussions is that it's important not to strawman arguments: to assume they are much weaker than they are. This is uncharitable.

Instead, the suggestion is you should steelman arguments: consider the strongest version of it, even if that's not what the person has said, and then evaluate that. Steelmanning is thought of as getting the principle of charity just right.

However, I suspect there should be a third category, call it mithrilman: where arguments are treated as much stronger than they are, and you accept them even though you don't understand the reasons. For the non-nerds out there, mithril is Tolkien's fictional super-strong metal in Lord of the Rings.

Whilst strawmanning is being too uncharitable, mithrilmanning is being too charitable. You don't want to do either. Goldilocks lies inbetwixt the two.

I see mithrilmanning quite a lot among effective altruists. Usually, it goes something like this. People are discussing a view or argument they've heard person X make. The individuals are sitting around, brows furrowed, and struggling to find a good steelman of the argument: they can't work out what plausible reasons that person could have for their conclusion. After a while, even though they can't find a suitable steelman, someone says "Well, X does seem really smart, so...". Everyone nods. The conversation moves on.

What's happened is that someone has suggested the group should defer, even though they can't follow the reasoning or provide it themselves. This seems to happen much more often when person X is important (not least because you don't want to risk looking stupid).

I think there can be good cases where one should defer, but I'm worried I see too much of this. We should give people the benefit of the doubt - assume they are smart, thoughtful, etc. rather than fools - but we should still doubt. To err is human. We all make mistakes. We make progress by pointing those out.

So, if you think someone is really smart, but you can't make sense of what they are thinking, at least hesitate on deferring to them. If possible, ask them to explain. It seems too charitable to assume they are right, not charitable enough to assume they are wrong. In assuming that they can give you a sensible answer, you are treating them with appropriate charity.

I don't think I need to say why strawmanning is bad. The danger of mithrilmanning is you end up with too much deference, an information cascade and ultimately false beliefs. People end up believing what X says, even though no one really understands why.

So, if you find yourself overhearing, or even saying yourself, "well, they do seem really smart..." consider adding "um, are we mithrilmanning this? We don't want to defer uncritically."

89 Reactions

Comments11

Sorted by

New & upvoted

Click to highlight new comments since: Today at 6:32 AM

Brad West🔸Jun 9 202319

A lot of the work with mithrilmen is keeping an argument at a level of abstraction where it sounds sensible as a principle, but yet declining to interrogate it further, perhaps because venerated people hold that position.

titotalJun 10 202318

From personal experience, I can tell you that really really smart people are wrong all the time. They're much more likely to be wrong when talking outside of their domain of expertise, but even a physics professor talking about physics will inevitably get stuff wrong in regular conversation.

If someone says something that doesn't make sense, you should of course try and understand their argument and see if you're missing something. But "this person made a mistake" should always be a hypothesis under consideration, and it's often the most likely explanation.

Geoffrey MillerJun 12 20233

Michael - good points.

It sounds like proper steelmanning is mostly applied to arguments, evidence, values, and reasons, whereas mithrilmanning is often applied more to specific influential individuals who tend to be associated with certain positions. (e.g. we might think 'Yann LeCun's machine learning research has been cited 600,000 times, so he must have some valid points when he expresses the view that we shouldn't worry about AI extinction risk - even though he sounds irrational and deranged on this topic.) The mithril armor is really being wrapped around some prestigious person who's making apparently weak arguments, more than around the apparently weak arguments themselves.

My suggestion for overcoming mithrilmanning is to find a less prestigious, but still reputable person, who makes the same arguments, and interrogate the validity of those arguments as if they're from a less influential source. (e.g. if some less-famous AI researchers makes basically the same arguments as LeCun, then dissect that less-famous person's arguments, rather than trying to face down LeCun, as if he's some Final Boss in a scary video game.) This is basically a social psychology hack to make us less intimidated by some famous person's reputation, so we can engage with the quality of their arguments, without getting misled by our instincts to submit and defer to high-status individuals.

MichaelPlantJun 12 20236

Yes, reflecting on this since posting, I have been wondering if there is some important distinction between the principle of charity applied to arguments in the abstract vs its application to the (understated) reasoning of individuals in some particular instance. Steelmanning seems good in the former case, because you're aiming to work your way to the truth. But steelmanning goes to far, and become mithrilmanning, in the latter case when you start assuming the individuals must have good reasons, even though you don't know what they are.

Perhaps mithrilmanning involves an implicit argument from authority ("this person is an authority. Therefore they must be right. Why might they be right?").

rimeJun 11 20232

The problem with strawmanning and steelmanning isn't a matter of degree, and I don't think goldilocks can be found in that dimension at all. If you find yourself asking "how charitable should I be in my interpretation?" I think you've already made a mistake.

Instead, I'd like to propose a fourth category. Let's call it.. uhh.. the "blindman"! ^^

The blindman interpretation is to forget you're talking to a person, stop caring about whether they're correct, and just try your best to extract anything usefwl from what they're saying.^[1] If your inner monologue goes "I agree/disagree with that for reasons XYZ," that mindset is great for debating or if you're trying to teach, but it's a distraction if you're purely aiming to learn. If I say "1+1=3" right now, it has no effect wrt what you learn from the rest of this comment, so do your best to forget I said it.

For example, when I skimmed the post "agentic mess", I learned something I thought was exceptionally important, even though I didn't actually read enough to understand what they believe. It was the framing of the question that got me thinking in ways I hadn't before, so I gave them a strong upvote because that's my policy for posts that cause me to learn something I deem important--however that learning comes about.

Likewise, when I scrolled through a different post, I found a single sentence^[2] that made me realise something I thought was profound. I actually disagree with the main thesis of the post, but my policy is insensitive to such trivial matters, so I gave it a strong upvote. I don't really care what they think or what I agree with, what I care about is learning something.

^{^}
"What they believe is tangential to how the patterns behave in your own models, and all that matters is finding patterns that work."
From a comment on reading to understand vs reading to defer/argue/teach.
^{^}
"The Waluigi Effect: After you train an LLM to satisfy a desirable property , then it's easier to elicit the chatbot into satisfying the exact opposite of property $P$ ."

Jack LewarsJun 15 20233

You might enjoy the book 'Thanks for the Feedback', which basically emphasises this point a lot.

NickLaingJun 10 20232

Thanks I really like this, and would appreciate some examples so I can get my head around this. It might be hard without being uncharitable, but I struggle to think of concrete examples at thte moment.

Jack LewarsJun 15 202321

I guess any of the following might be examples (emphasis on might):

it seems bad to buy expensive historic buildings, which don't seem fit-for-purpose for the proposed use case and have really high running costs - but the people involved are really smart, so...
it seems bad to fly people to the Bahamas to do coworking and collaboration, and like this is being driven by a billionaire's desire for company and personal convenience. It seems like this wouldn't be the method you would choose if you were starting from a point of maximising impact and cost-effectiveness - but the people seem really smart
it seems bad that the largest recipients of funding from the FTX Future Fund are organisations where one of the FTX grantmakers sits on their Board, but...
it seems very very very bad to say you would take the bet every time, if someone told you that there was a 51% chance that you'd double the universe and a 49% chance that you'd destroy it, but...

I'm not sure if people did defer to these arguments because of the people making them rather than a sincere belief that they are good, but it seems at least possible (especially the last one).

NickLaingJun 15 20233

Fantastic examples, I understand it better now

And 100% agree with you that I assessed all of those examples above and was bewildered that so many people seemed to defend them, often based on the fact that "smart and good people" had made the decision

Nice one

MichaelPlantJun 15 20232

Or, senior AI researcher says that AI poses no risk because it's years away. This doesn't really make sense - what will happen in a few years? But he does seem smart and work for a prestigious tech company, so...

Benny SmithJun 12 20231

Thanks for writing this! Seems useful to have a term for excessive charitability. Being able to point at it succinctly might help mitigate information cascades.