148 karmaJoined Jun 2023


An example of invested but not attached: I'm investing time/money/energy into taking classes about subject X. I chose subject X because it could help me generate more value Y that I care about. But I'm not attached to getting good at X, I'm invested in the process of learning it.

I feel more confused after reading your other points. What is your definition of rationality? Is this definition also what EA/LW people usually mean? (If so, who introduces this definition?)

When you say rationally is "what gets you good performance", that seems like it could lead to arbitrary circular reasoning about what is and isn't rational. If I exaggerate this concern and define rationality as "what gets you the best life possible", that's not a helpful definition because it leads to the unfalsifiable claim that rationality is optimal while providing no practical insight.

I've seen EA writing (particularly about AI safety) that goes something like:
I know X and Y thought leaders in AI safety, they're exceptionally smart people with opinion A, so even though I personally think opinion B is more defensible, I also think I should be updating my natural independent opinion in the direction of A, because they're way smarter and more knowledgeable than me.

I'm struggling to see how this update strategy makes sense. It seems to have merit when X and Y know/understand things that literally no other expert knows, but aside from that, in all other scenarios that come to mind, it seems neutral at best, otherwise a worse strategy than totally disregarding the "thought leader status" of X and Y.

Am I missing something?

Two things:

1. I think of "Invested but not attached [to the outcome]" as a pareto-optimal strategy that is neither attached nor detached.

2. I disagree with the second to last paragraph, "Mud-dredging does improve your rationality, however. That's why betting works." I think that if you're escaping in the mountains, then it's true that coming down from the mountain will give you actual data and some degree of accountability. But it's not obvious to me that 1) mud-dredging increases rationality, 2) the kind of rationality that mud-dredging maybe increases is actually more beneficial than harmful in the long run in terms of performance. Furthermore, among all the frameworks out there in terms of mental health or productivity, I believe that creativity is almost universally valued as a thing to foster more than rationality, in terms of performance/success, so I'm curious about where you're coming from.

I appreciate your perspective, and FWIW I have no immediate concerns about the accuracy of your investigation or the wording of your post.

Correct me if I'm wrong: you would like any proposed change in rules or norms to still support what you tried to achieve in that post, which is provide accurate information, presented fairly, and hopefully leading people to update in a way that leads to better decision making.

I support this, I agree that it's important to have some kind of channel to address the kinds of concerns you raised, and I probably would have seen your post as a positive contribution (had I read it and been a part of EA / etc back then but I'm not aware of the full context), and simultaneously I'm saying things like your post could have even better outcomes with a little bit of additional effort/adjustment in the writing.

I encourage you think about my proposed alternatives not as being blockers to this kind of positive contribution. That is not their intended purpose. As an example, if a DTHP rule allows DTHPs but requires a compulsory disclosure at the top addressing the relevant needs, feelings, requests of the writer, I don't think this particularly bars contributions from happening, and I think it would also serve to 1) save time for the writer by reflecting on their underlying purpose for writing, and 2) dampen certain harmful biases that a reader is likely to experience from a traditional hit piece.

If such a rule existed back then, presumably you would have taken it into account during writing. If you visualize what you would have done in that situation, do you think the rule would have negatively impacted 1) what you set out to express in your post and 2) the downstream effects of your post?

I've just partly read and partly skim read that post for the first time. I do suspect that post would be ineligible under a hypothetical "no hit pieces under duck typing" rule. I'll refer to posts like this as DTHP to express my view more generally. (I have no comment on whether it "should" have been allowed or not allowed in the past or what the past/current Forum standards are.)

I've not thought much about this, but the direction of my current view is that there are more constructive ways of expression than DTHPs, and here I'll vaguely describe three alternatives that I suspect would be more useful. By useful I mean that these alternatives potentially promote better social outcomes within the community, while hopefully not significantly undermining desirable practical outcomes such as a shift in funding or priorities.

1. If nothing else, add emotional honesty to the framing of a DTHP. A DTHP becomes more constructive and less prone to inspire reader bias when they are introduced with a clear and honest statement of the needs, feelings, requests from the main author. Maybe two out of three is a good enough bar. I'm inclined to think that the NL DTHP failed spectacularly at this.
2. Post a personal invitation for relevant individuals to learn more. Something like "I believe org X is operating in an undesirable way and would urge funders who might otherwise consider donating to X to consider carefully. If you're in this category, I'm happy to have a one on one call and to share my reasons why I don't encourage donating to X." (And during the one on one you can allude to the mountain of evidence you've gathered, and let someone decide whether they want to see it or not.)
3. Find ways to skirt around what makes a DTHP a DTHP. I think a simple alternative such as posting a DTHP verbatim to one's personal blog, then only sharing or linking to it with people on a personal level is already incrementally less socially harmful than posting it to the forums.

Option 4 is we find some wonderful non-DTHP framework/template for expressing these types of concerns. I don't know what that would look like.

These are suggestions for a potential writer. I haven't attempted to provide community-level suggestions here which could be a thing.

a piece of writing with most of the stereotypical properties of a hit piece, regardless of the intention behind it

Thanks for entertaining my thought experiment, and I'm glad because I better understand your perspective too now, and I think I'm in full agreement with your response.

A shift of topic content here, feel free to not engage if this doesn't interest you.

To share some vague thoughts about how things could be different. I think that posts which are structurally equivalent to a hit piece can be considered against the forum rules, either implicitly already or explicitly. Moderators could intervene before most of the damage is done. I think that policing this isn't as subjective as one might fear, and that certain criteria can be checked even without any assumptions about truthfulness or intentions. Maybe an LLM could work for flagging high-risk posts for moderators to review.

Another angle would be to try and shape discussion norms or attitudes. There might not be a reliable way to influence this space, but one could try for example by providing the right material that would better equip readers to have better online discussions in general as well as recognize unhelpful/manipulative writing. It could become a popular staple much like I think "Replacing Guilt" is very well regarded. Funnily enough, I have been collating a list of green/orange/red flags in online discussions for other educational reasons.

"Attitudes" might be way too subjective/varied to shape, whereas I believe "good discussion norms" can be presented in a concrete way that isn't inflexibly limiting. NVC comes to mind as a concrete framework, and I am of the opinion that the original "sharing information" post can be considered violent communication.

I relate to your write-up on a personal level, as I can easily see myself having the same behavioral preferences as well as modes of imperfection as you if I was in a similar situation.

And with that in mind, there's only one thing that I'm confused about:

A thing I feel particularly bad about is not confronting Sam at any point about the ways he hurt people I care about.

What would that confrontation have looked like? How would you have approached it, even taking into account hindsight wisdom but without being a time-travelling mind-reader?

In that confrontation, what would you be asking for from Sam? (E.g., explanation? reassurance?apology? listening to your concerns?)

What I think I'm hearing from you (and please correct me if I'm not hearing you) is that you feel conflicted by the thought that the efforts of good people with good intentions can be so easily be undone, and that you wish there were some concrete ways to prevent this happening to organizations, both individually and systemically. I hear you on thinking about how things could work better as a system/process/community in this context. (My response won't go into this systems level, not because it's not important, but because I don't have anything useful to offer you right now.)

I acknowledge your two examples ("Alice and Chloe almost ruined an organization) and (keeping bad workers anonymous has negative consequences). I'm not trying to dispute these or convince you that you're wrong. What I am trying to highlight is that there is a way to think about these that doesn't involve requiring us to never make small mistakes with big consequences. I'm talking about a mindset, which isn't a matter of right or wrong, but simply a mental model that one can choose to apply.

I'm asking you to stash away your being right and whatever you perspective you think I hold for a moment and do a thought experiment for 60 seconds.
At t=0, it looks like ex-employee A, with some influential help, managed to inspire significant online backlash against organization X led by well-intentioned employer Z.
It could easily look like Z's project is done, their reputation is forever tarnished, their options have been severely constrained. Z might well feel that way themselves.
Z is a person with good intentions, conviction, strong ambitions, interpersonal skills, and a good work ethic.
Suppose that organization X got dismantled at t=1 year. Imagine Z's "default trajectory" extending into t=2 years. What is Z up to now? Do you think they still feel exactly the way they did at t=0?
At t=10, is Z successful? Did the events of t=0 really ruin their potential at the time?
At t=40, what might Z say recalling the events of t=0 and how much that impacted their overall life? Did t=0 define their whole life? Did it definitely lead to a worse career path, or did adaptation lead to something unexpectedly better? Could they definitely say that their overall life and value satisfaction would have been better if t=0 never played out that way?
In the grand scheme of things, how much did t=0 feeling like "Z's life is almost ruined" translate into reality?

If you entertained this thought experiment, thank you for being open to doing so.

To express my opinion plainly, good and bad events are inevitable, it is inevitable that Z will make mistakes with negative consequences as part of their ambitious journey of life. Is it in Z's best interests to avoid making obvious mistakes? Yes. Is it in their best interests to adopt a robust strategy such that they would never have fallen victim to t=0 events or similarly "bad" events at any other point? I don't think so necessarily, because: we don't know without long-term hindsight whether "traumatic" events t=0 lead to net positive changes or not; even if Z somehow became mistake-proof-without-being-perfect, that doesn't mean something as significant as t=0 couldn't still happen to them without them making a mistake; and lastly because being that robust is practically impossible for most people.
All this to say, without knowing whether "things like t=0" are "unequivocally bad to ever let happen", I think it's more empowering to be curious about what we can learn from t=0 than to arrive at the conclusion at t<1 that preventing it is both necessary and good.

I'll respond to one aspect you raised that I think might be more significant than you realize. I'll paint a black and white picture just for brevity.

If you're running organizations and do so for several years with dozens of employees across time, you will make poor hiring decisions at one time or another. While making a bad hire seems bad, avoiding this risk at all costs is probably a far inferior strategy. If making a bad hire doesn't get in the way of success and doing good, does it even make sense to fixate on it?

Also, if you're blind to the signs before it happens, then you reap the consequences, learn an expensive lesson, and are less likely to make it in future, at least for that type of deficit in judgment. Sometimes the signs are obvious after having made an error, though occasionally the signs are so well hidden that anyone with better judgment than you could have still have made the same mistake.

The underlying theme I'm getting at is that embracing mistakes and imperfection is instrumental. Although many EAs might wish that we could all just get hard things right the first time all the time, that's not realistic. We're flawed human beings and respecting the fact of our limitations is far more practical than giving into fear and anxiety about not having ultimate control and predictability. If anything, being willing to make mistakes is both rational and productive compared to other alternatives.

Load more