Cross-posted from Cold Takes
I’ve spent a lot of my career working on wicked problems: problems that are vaguely defined, where there’s no clear goal for exactly what I’m trying to do or how I’ll know when or whether I’ve done it.
In particular, minimal-trust investigations - trying to understand some topic or argument myself (what charity to donate to, whether civilization is declining, whether AI could make this the most important century of all time for humanity), with little reliance on what “the experts” think - tend to have this “wicked” quality:
- I could spend my whole life learning about any subtopic of a subtopic of a subtopic, so learning about a topic is often mostly about deciding how deep I want to go (and what to skip) on each branch.
- There aren’t any stable rules for how to make that kind of decision, and I’m constantly changing my mind about what the goal and scope of the project even is.
This piece will narrate an example of what it’s like to work on this kind of problem, and why I say it is “hard, taxing, exhausting and a bit of a mental health gauntlet.”
My example is from the 2007 edition of GiveWell. It’s an adaptation from a private doc that some other people who work on wicked problems have found cathartic and validating.
It’s particularly focused on what I call the hypothesis rearticulation part of investigating a topic (steps 3 and 6 in my learning by writing process), which is when:
- I have a hypothesis about the topic I’m investigating.
- I realize it doesn’t seem right, and I need a new one.
- Most of the things I can come up with are either “too strong” (it would take too much work to examine them satisfyingly) or “too weak” (they just aren’t that interesting/worth investigating).
- I need to navigate that balance and find a new hypothesis that is (a) coherent; (b) important if true; (c) maybe something I can argue for.
After this piece tries to give a sense for what the challenge is like, a future piece will give accumulated tips for navigating it.
Flashback to 2007 GiveWell
Context for those unfamiliar with GiveWell:
- In 2007, I co-founded (with Elie Hassenfeld) an organization that recommends evidence-backed, cost-effective charities to help people do as much good as possible with their donations.
- When we started the project, we initially asked charities to apply for $25,000 grants, and to agree (as part of the process) that we could publish their application materials. This was our strategy for trying to find charities that could provide evidence about how much they were helping people (per dollar).
- This example is from after we had collected information from charities and determined which one we wanted to rank #1, and were now trying to write it all up for our website. Since then, GiveWell has evolved a great deal and is much better than the 2007 edition I’ll be describing here.
- (This example is reconstructed from my memory a long time later, so it’s probably not literally accurate.)
Initial “too strong” hypothesis. Elie (my co-founder at GiveWell) and I met this morning and I was like “I’m going to write a page explaining what GiveWell’s recommendations are and aren’t. Basically, they aren’t trying to evaluate every charity in the world. Instead they’re saying which ones are the most cost-effective.” He nodded and was like “Yeah, that’s cool and helpful, write it.”
Now I’m sitting at my computer trying to write down what I just said in a way that an outsider can read - the “hypothesis articulation” phase.
I write, “GiveWell doesn’t evaluate every charity in the world. Our goal is to save the most lives possible per dollar, not to create a complete ranking or catalogue of charities. Accordingly, our research is oriented around identifying the single charity that can save the most lives per dollar spent,”
Hmm. Did we identify the “single charity that can save the most lives per dollar spent?” Certainly not. For example, I have no idea how to compare these charities to cancer research organizations, which are out of scope. Let me try again:
“GiveWell doesn’t evaluate every charity in the world. Our goal is to save the most lives possible per dollar, not to create a complete ranking or catalogue of charities. Accordingly, our research is oriented around identifying the single charity with the highest demonstrated lives saved per dollar spent - the charity that can prove rigorously that it saved the most” - no, it can’t prove it saved the most lives - “the charity that can prove rigorously that ” - uh -
Do any of our charities prove anything rigorously? Now I’m looking at the page we wrote for our #1 charity and ugh. I mean here are some quotes from our summary on the case for their impact: “All of the reports we've seen are internal reports (i.e., [the charity] - not an external evaluator - conducted them) … Neither [the charity]’s sales figures nor its survey results conclusively demonstrate an impact … It is possible that [the charity] simply uses its subsidized prices to outcompete more expensive sellers of similar materials, and ends up reducing people's costs but not increasing their ownership or utilization of these materials … We cannot have as much confidence in our understanding of [the charity] as in our understanding of [two other charities], whose activities are simpler and more straightforward.”
That’s our #1 charity! We have less confidence in it than our lower-ranked charities … but we ranked it higher anyway because it’s more cost-effective … but it’s not the most cost-effective charity in the world, it’s probably not even the most cost-effective charity we looked at …
Hitting a wall. Well I have no idea what I want to say here.
Rearticulating the hypothesis and going “too weak.” Okay, screw this. I know what the problem was - I was writing based on wishful thinking. We haven’t found the most cost-effective charity, we haven’t found the most proven charity. Let’s just lay it out, no overselling, just the real situation.
“GiveWell doesn’t evaluate every charity in the world, because we didn’t have time to do that this year. Instead, we made a completely arbitrary choice to focus on ‘saving lives in Africa’; then we emailed 107 organizations that seemed relevant to this goal, of which 59 responded; we did a really quick first-round application process in which we asked them to provide evidence of their impact; we chose 12 finalists, analyzed those further, and were most impressed with Population Services International. There is no reason to think that the best charities are the ones that did best in our process, and significant reasons to think the opposite, that the best charities are not the ones putting lots of time into a cold-emailed application from an unfamiliar funder for $25k. Like every other donor in the world, we ended up making an arbitrary, largely aesthetic judgment that we were impressed with Population Services International. Readers who share our aesthetics may wish to donate similarly, and can also purchase photos of Elie and Holden at the following link:”
OK wow. This is what we’ve been working on for a year? Why would anyone want this? Why are we writing this up? I should keep writing this so it’s just DONE but ugh, the thought of finishing this website is almost as bad as the thought of not finishing it.
Hitting a wall.
What do I do, what do I do, what do I do.
Rearticulating the hypothesis and assigning myself more work. OK. I gave up, went to sleep, thought about other stuff for a while, went on a vision quest, etc. I’ve now realized that we can put it this way: our top charities are the ones with verifiable, demonstrated impact and room for more funding, and we rank them by estimated cost-effectiveness. “Verifiable, demonstrated” is something appealing we can say about our top charities and not about others, even though it’s driven by the fact that they responded to our emails and others didn’t. And then we rank the best charities within that. Great.
So I’m sitting down to write this, but I’m kind of thinking to myself: “Is that really quite true? That ‘the charities that participated in our process and did well’ and ‘The charities with verifiable, demonstrated impact’ are the same set? I mean … it seems like it could be true. For years we looked for charities that had evidence of impact and we couldn’t find any. Now we have 2-3. But wouldn’t it be better if I could verify none of these charities that ignored us have good evidence of impact just sitting around on their website? I mean, we definitely looked at a lot of websites before but we gave up on it, and didn’t scan the eligible charities comprehensively. Let me try it.”
I take the list of charities that didn’t participate in round 1. That’s not all the charities in the world, but if none of them have a good impact section on their website, we’ve got a pretty plausible claim that the best stuff we saw in the application process is the best that is (now) publicly available, for the “eligible” charities in the cause. (This assumes that if one of the applicants had good stuff sitting around on their website, they would have sent it.)
I start looking at their websites. There are 48 charities, and in the first hour I get through 6, verifying that there’s nothing good on any of those websites. This is looking good: in 8 work hours I’ll be able to defend the claim I’ve decided to make.
Hmm. This water charity has some kind of map of all the wells they’ve built, and some references to academic literature arguing that wells save lives. Does that count? I guess it depends on exactly what the academic literature establishes. Let’s check out some of these papers … huh, a lot of these aren’t papers per se so much as big colorful reports with giant bibliographies. Well, I’ll keep going through these looking for the best evidence I can …
“This will never end.” Did I just spend two weeks reading terrible papers about wells, iron supplementation and community health workers? Ugh and I’ve only gotten through 10 more charities, so I’m only about ⅓ of the way through the list as a whole. I was supposed to be just writing up what we found, I can’t take a 6-week detour!
The over-ambitious deadline. All right, I’ll sprint and get it done in a week. [1 week later] Well, now I’m 60% way through the whole list. !@#$
“This is garbage.” What am I even doing anyway? I’m reading all this literature on wells and unilaterally deciding that it doesn’t count as “proof of impact” the way that Population Services International’s surveys count as “proof of impact.” I’m the zillionth person to read these papers; why are we creating a website out of these amateur judgments? Who will, or SHOULD, care what I think? I’m going to spend another who knows how long writing up this stupid page on what our recommendations do and don’t mean, and then another I don’t even want to think about it finishing up all the other pages we said we’d write, and then we’ll put it online and literally no one will read it. Donors won’t care - they will keep going to charities that have lots of nice pictures. Global health professionals will just be like “Well this is amateur hour.”1
This is just way out of whack. Every time I try to add enough meat to what we’re doing that it’s worth publishing at all, the timeline expands another 2 months, AND we still aren’t close to having a path to a quality product that will mean something to someone.
What’s going wrong here?
- I have a deep sense that I have something to say that is worth arguing for, but I don’t actually know what I am trying to say. I can express it in conversation to Elie, but every time I start writing it down for a broad audience, I realize that Elie and I had a lot of shared premises that won’t be shared by others. Then I need to decide between arguing the premises (often a huge amount of extra work), weakening my case (often leads to a depressing sense that I haven’t done anything worthwhile), or somehow reframing the exercise (the right answer more often than one would think).
- It often feels like I know what I need to say and now the work is just “writing it down.” But “writing it down” often reveals a lot of missing steps and thus explodes into more tasks - and/or involves long periods of playing Super Meat Boy while I try to figure out whether there’s some version of what I was trying to say that wouldn’t have this property.
- I’m approaching a well-established literature with an idiosyncratic angle, giving me constant impostor syndrome. On any given narrow point, there are a hundred people who each have a hundred times as much knowledge as I do; it’s easy to lose sight of the fact that despite this, I have some sort of value-added to offer (I just need to not overplay what this is, and often I don’t have a really crisp sense of what it is).
- Because of the idiosyncratic angle, I lack a helpful ecosystem of peer reviewers, mentors, etc.
- There’s nothing to stop me from sinking weeks into some impossible and ill-conceived version of my project that I could’ve avoided just by, like, rephrasing one of my sentences. (The above GiveWell example has me trying to do extra work to establish a bunch of points that I ultimately just needed to sidestep, as you can see from the final product. This definitely isn’t always the answer, but it can happen.)
- I’m simultaneously trying to pose my question and answer it. This creates a dizzying feeling of constantly creating work for myself that was actually useless, or skipping work that I needed to do, and never knowing which I’m doing because I can’t even tell you who’s going to be reading this and what they’re going to be looking for.
- There aren’t any well-recognized standards I can make sure I’m meeting, and the scope of the question I’m trying to answer is so large that I generally have a creeping sense that I’m producing something way too shot through with guesswork and subjective judgment to cause anyone to actually change their mind.
All of these things are true, and they’re all part of the picture. But nothing really changes the fact that I’m on my way to having (and publishing) an unusually thoughtful take on an important question. If I can keep my eye on that prize, avoid steps that don’t help with it (though not to an extreme, i.e., it’s good for me to have basic contextual knowledge), and keep reframing my arguments until I capture (without overstating) what’s new about what I’m doing, I will create something valuable, both for my own learning and potentially for others’.
“Valuable” doesn’t at all mean “final.” We’re trying to push the conversation forward a step, not end it. One of the fun things about the GiveWell example is that the final product that came out at the end of that process was actually pretty bad! It had essentially nothing in common with the version of GiveWell that first started feeling satisfying to donors and moving serious money, a few years later. (No overlap in top charities, very little overlap in methodology.)
For me, a huge part of the challenge of working on this kind of problem is just continuing to come back to that. As I bounce between “too weak” hypotheses and “too strong” ones, I need to keep re-aiming at something I can argue that’s worth arguing, and remember that getting there is just one step in my and others’ learning process. A future piece will go through some accumulated tips on pulling that off.
Here's a provocative take on your experience that I don't really endorse, but I'd be interested in hearing your reaction to:
In response to such a comment, I might say that GiveWell actually had much more reason to think AMF was indeed one of the most cost-effective charities than GWWC, that Peter Singer's recommendations were good but substantially less cost-effective (and that improvement is clearly worth it), and that the above illustration of the wicked problem experience is useful because it applies more strongly in other areas (e.g. AI forecasting). But I'm curious about your response.
Apologies for chiming in so late!
I believe GWWC's recommendation of Against Malaria Foundation was based on GiveWell's (otherwise they might've recommended another bednet charity). And Peter Singer generally did not recommend the charities that GiveWell ranks highly, before GiveWell ranked them highly.
I don't want to deny, though, that for any given research project you might undertake, there's often a much quicker approach that gets you part of the way there. I think the process you described is a fine way to generate some good initial leads (I think GWWC independently recommended Schistosomiasis Control Initiative before GiveWell did, for example). As the stakes of the research rise, though, I think it becomes more valuable and important to get a lot of the details right - partly because so much money rides on it, partly because quicker approaches seem more vulnerable to adversarial behavior/Goodharting of the process.
I'm going to point aspiring researchers who ask me what it's like to work at an EA think tank to this article. This is exactly my experience for many projects where the end result is an article. It's a bit different when the end result is a decision like "what charity to start".
This link is broken.
Fixed, thanks!
I think descriptions like this of the challenges doing good research poses are really helpful! The description definitely resonates with me.
I might have a suggestion here:
How about writing a draft with imperfect arguments, and then getting a few people from your target audience to read it while you're watching them (for example, in a video call with share screen), and you'll hear their questions/thoughts/pushbacks?
I think about this like "user testing": The pushbacks that people have are often different from what I'd guess myself.
Meta: I'm expecting you to have all sorts of pushbacks to this suggestion, I could explain my thoughts and experiences here over 10 pages, probably. But I'm not going to! I'm going to hope you tell me what you care about and I'll improve only that part
(Big fan btw!)
I think this is often a good approach!
This is the kindest way anyone ever told me that I didn't help ;) <3 <3
If anyone's interested, I just posted about this idea yesterday: https://www.lesswrong.com/posts/8BGexmqqAx5Z2KFjW/how-to-make-your-article-more-persuasive-spoiler-do-user
Would you consider asking representative samples of populations about priority problems, coordinating local experts to develop solutions which best address most of these priorities, and asking these experts to find organizations and individuals who deploy these solutions at the lowest (marginal) cost?
In this way, you are independent of the data that is posted online (which may be non-representative of all entity capacities), work much more efficiently (coordinating experts who spent substantial time learning about their domains), and get much better cost-effectiveness (combining solutions and getting local prices).
Feel free to review my initial reactions to this piece.