Hide table of contents

Cross-posted from Cold Takes

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

I’ve spent a lot of my career working on wicked problems: problems that are vaguely defined, where there’s no clear goal for exactly what I’m trying to do or how I’ll know when or whether I’ve done it.

In particular, minimal-trust investigations - trying to understand some topic or argument myself (what charity to donate to, whether civilization is declining, whether AI could make this the most important century of all time for humanity), with little reliance on what “the experts” think - tend to have this “wicked” quality:

  • I could spend my whole life learning about any subtopic of a subtopic of a subtopic, so learning about a topic is often mostly about deciding how deep I want to go (and what to skip) on each branch.
  • There aren’t any stable rules for how to make that kind of decision, and I’m constantly changing my mind about what the goal and scope of the project even is.

This piece will narrate an example of what it’s like to work on this kind of problem, and why I say it is “hard, taxing, exhausting and a bit of a mental health gauntlet.”

My example is from the 2007 edition of GiveWell. It’s an adaptation from a private doc that some other people who work on wicked problems have found cathartic and validating.

It’s particularly focused on what I call the hypothesis rearticulation part of investigating a topic (steps 3 and 6 in my learning by writing process), which is when:

  • I have a hypothesis about the topic I’m investigating.
  • I realize it doesn’t seem right, and I need a new one.
  • Most of the things I can come up with are either “too strong” (it would take too much work to examine them satisfyingly) or “too weak” (they just aren’t that interesting/worth investigating).
  • I need to navigate that balance and find a new hypothesis that is (a) coherent; (b) important if true; (c) maybe something I can argue for.

After this piece tries to give a sense for what the challenge is like, a future piece will give accumulated tips for navigating it.

Flashback to 2007 GiveWell

Context for those unfamiliar with GiveWell:

  • In 2007, I co-founded (with Elie Hassenfeld) an organization that recommends evidence-backed, cost-effective charities to help people do as much good as possible with their donations.
  • When we started the project, we initially asked charities to apply for $25,000 grants, and to agree (as part of the process) that we could publish their application materials. This was our strategy for trying to find charities that could provide evidence about how much they were helping people (per dollar).
  • This example is from after we had collected information from charities and determined which one we wanted to rank #1, and were now trying to write it all up for our website. Since then, GiveWell has evolved a great deal and is much better than the 2007 edition I’ll be describing here.
  • (This example is reconstructed from my memory a long time later, so it’s probably not literally accurate.)

Initial “too strong” hypothesis. Elie (my co-founder at GiveWell) and I met this morning and I was like “I’m going to write a page explaining what GiveWell’s recommendations are and aren’t. Basically, they aren’t trying to evaluate every charity in the world. Instead they’re saying which ones are the most cost-effective.” He nodded and was like “Yeah, that’s cool and helpful, write it.”

Now I’m sitting at my computer trying to write down what I just said in a way that an outsider can read - the “hypothesis articulation” phase.

I write, “GiveWell doesn’t evaluate every charity in the world. Our goal is to save the most lives possible per dollar, not to create a complete ranking or catalogue of charities. Accordingly, our research is oriented around identifying the single charity that can save the most lives per dollar spent,”

Hmm. Did we identify the “single charity that can save the most lives per dollar spent?” Certainly not. For example, I have no idea how to compare these charities to cancer research organizations, which are out of scope. Let me try again:

“GiveWell doesn’t evaluate every charity in the world. Our goal is to save the most lives possible per dollar, not to create a complete ranking or catalogue of charities. Accordingly, our research is oriented around identifying the single charity with the highest demonstrated lives saved per dollar spent - the charity that can prove rigorously that it saved the most” - no, it can’t prove it saved the most lives - “the charity that can prove rigorously that ” - uh -

Do any of our charities prove anything rigorously? Now I’m looking at the page we wrote for our #1 charity and ugh. I mean here are some quotes from our summary on the case for their impact: “All of the reports we've seen are internal reports (i.e., [the charity] - not an external evaluator - conducted them) … Neither [the charity]’s sales figures nor its survey results conclusively demonstrate an impact … It is possible that [the charity] simply uses its subsidized prices to outcompete more expensive sellers of similar materials, and ends up reducing people's costs but not increasing their ownership or utilization of these materials … We cannot have as much confidence in our understanding of [the charity] as in our understanding of [two other charities], whose activities are simpler and more straightforward.”

That’s our #1 charity! We have less confidence in it than our lower-ranked charities … but we ranked it higher anyway because it’s more cost-effective … but it’s not the most cost-effective charity in the world, it’s probably not even the most cost-effective charity we looked at …

Hitting a wall. Well I have no idea what I want to say here.

alt_text
This image represents me literally playing some video game like Super Meat Boy while failing to articulate what I want to say. I am not actually this bad at Super Meat Boy (certainly not after all the time I’ve spent playing it while failing to articulate a hypothesis), but I thought all the deaths would give a better sense for how the whole situation feels.

Rearticulating the hypothesis and going “too weak.” Okay, screw this. I know what the problem was - I was writing based on wishful thinking. We haven’t found the most cost-effective charity, we haven’t found the most proven charity. Let’s just lay it out, no overselling, just the real situation.

“GiveWell doesn’t evaluate every charity in the world, because we didn’t have time to do that this year. Instead, we made a completely arbitrary choice to focus on ‘saving lives in Africa’; then we emailed 107 organizations that seemed relevant to this goal, of which 59 responded; we did a really quick first-round application process in which we asked them to provide evidence of their impact; we chose 12 finalists, analyzed those further, and were most impressed with Population Services International. There is no reason to think that the best charities are the ones that did best in our process, and significant reasons to think the opposite, that the best charities are not the ones putting lots of time into a cold-emailed application from an unfamiliar funder for $25k. Like every other donor in the world, we ended up making an arbitrary, largely aesthetic judgment that we were impressed with Population Services International. Readers who share our aesthetics may wish to donate similarly, and can also purchase photos of Elie and Holden at the following link:”

OK wow. This is what we’ve been working on for a year? Why would anyone want this? Why are we writing this up? I should keep writing this so it’s just DONE but ugh, the thought of finishing this website is almost as bad as the thought of not finishing it.

Hitting a wall.

alt_text

What do I do, what do I do, what do I do.

Rearticulating the hypothesis and assigning myself more work. OK. I gave up, went to sleep, thought about other stuff for a while, went on a vision quest, etc. I’ve now realized that we can put it this way: our top charities are the ones with verifiable, demonstrated impact and room for more funding, and we rank them by estimated cost-effectiveness. “Verifiable, demonstrated” is something appealing we can say about our top charities and not about others, even though it’s driven by the fact that they responded to our emails and others didn’t. And then we rank the best charities within that. Great.

So I’m sitting down to write this, but I’m kind of thinking to myself: “Is that really quite true? That ‘the charities that participated in our process and did well’ and ‘The charities with verifiable, demonstrated impact’ are the same set? I mean … it seems like it could be true. For years we looked for charities that had evidence of impact and we couldn’t find any. Now we have 2-3. But wouldn’t it be better if I could verify none of these charities that ignored us have good evidence of impact just sitting around on their website? I mean, we definitely looked at a lot of websites before but we gave up on it, and didn’t scan the eligible charities comprehensively. Let me try it.”

I take the list of charities that didn’t participate in round 1. That’s not all the charities in the world, but if none of them have a good impact section on their website, we’ve got a pretty plausible claim that the best stuff we saw in the application process is the best that is (now) publicly available, for the “eligible” charities in the cause. (This assumes that if one of the applicants had good stuff sitting around on their website, they would have sent it.)

I start looking at their websites. There are 48 charities, and in the first hour I get through 6, verifying that there’s nothing good on any of those websites. This is looking good: in 8 work hours I’ll be able to defend the claim I’ve decided to make.

Hmm. This water charity has some kind of map of all the wells they’ve built, and some references to academic literature arguing that wells save lives. Does that count? I guess it depends on exactly what the academic literature establishes. Let’s check out some of these papers … huh, a lot of these aren’t papers per se so much as big colorful reports with giant bibliographies. Well, I’ll keep going through these looking for the best evidence I can …

“This will never end.” Did I just spend two weeks reading terrible papers about wells, iron supplementation and community health workers? Ugh and I’ve only gotten through 10 more charities, so I’m only about ⅓ of the way through the list as a whole. I was supposed to be just writing up what we found, I can’t take a 6-week detour!

The over-ambitious deadline. All right, I’ll sprint and get it done in a week. [1 week later] Well, now I’m 60% way through the whole list. !@#$

“This is garbage.” What am I even doing anyway? I’m reading all this literature on wells and unilaterally deciding that it doesn’t count as “proof of impact” the way that Population Services International’s surveys count as “proof of impact.” I’m the zillionth person to read these papers; why are we creating a website out of these amateur judgments? Who will, or SHOULD, care what I think? I’m going to spend another who knows how long writing up this stupid page on what our recommendations do and don’t mean, and then another I don’t even want to think about it finishing up all the other pages we said we’d write, and then we’ll put it online and literally no one will read it. Donors won’t care - they will keep going to charities that have lots of nice pictures. Global health professionals will just be like “Well this is amateur hour.”1

This is just way out of whack. Every time I try to add enough meat to what we’re doing that it’s worth publishing at all, the timeline expands another 2 months, AND we still aren’t close to having a path to a quality product that will mean something to someone.

alt_text

What’s going wrong here?

  • I have a deep sense that I have something to say that is worth arguing for, but I don’t actually know what I am trying to say. I can express it in conversation to Elie, but every time I start writing it down for a broad audience, I realize that Elie and I had a lot of shared premises that won’t be shared by others. Then I need to decide between arguing the premises (often a huge amount of extra work), weakening my case (often leads to a depressing sense that I haven’t done anything worthwhile), or somehow reframing the exercise (the right answer more often than one would think).
  • It often feels like I know what I need to say and now the work is just “writing it down.” But “writing it down” often reveals a lot of missing steps and thus explodes into more tasks - and/or involves long periods of playing Super Meat Boy while I try to figure out whether there’s some version of what I was trying to say that wouldn’t have this property.
  • I’m approaching a well-established literature with an idiosyncratic angle, giving me constant impostor syndrome. On any given narrow point, there are a hundred people who each have a hundred times as much knowledge as I do; it’s easy to lose sight of the fact that despite this, I have some sort of value-added to offer (I just need to not overplay what this is, and often I don’t have a really crisp sense of what it is).
  • Because of the idiosyncratic angle, I lack a helpful ecosystem of peer reviewers, mentors, etc.
    • There’s nothing to stop me from sinking weeks into some impossible and ill-conceived version of my project that I could’ve avoided just by, like, rephrasing one of my sentences. (The above GiveWell example has me trying to do extra work to establish a bunch of points that I ultimately just needed to sidestep, as you can see from the final product. This definitely isn’t always the answer, but it can happen.)
    • I’m simultaneously trying to pose my question and answer it. This creates a dizzying feeling of constantly creating work for myself that was actually useless, or skipping work that I needed to do, and never knowing which I’m doing because I can’t even tell you who’s going to be reading this and what they’re going to be looking for.
    • There aren’t any well-recognized standards I can make sure I’m meeting, and the scope of the question I’m trying to answer is so large that I generally have a creeping sense that I’m producing something way too shot through with guesswork and subjective judgment to cause anyone to actually change their mind.

All of these things are true, and they’re all part of the picture. But nothing really changes the fact that I’m on my way to having (and publishing) an unusually thoughtful take on an important question. If I can keep my eye on that prize, avoid steps that don’t help with it (though not to an extreme, i.e., it’s good for me to have basic contextual knowledge), and keep reframing my arguments until I capture (without overstating) what’s new about what I’m doing, I will create something valuable, both for my own learning and potentially for others’.

“Valuable” doesn’t at all mean “final.” We’re trying to push the conversation forward a step, not end it. One of the fun things about the GiveWell example is that the final product that came out at the end of that process was actually pretty bad! It had essentially nothing in common with the version of GiveWell that first started feeling satisfying to donors and moving serious money, a few years later. (No overlap in top charities, very little overlap in methodology.)

For me, a huge part of the challenge of working on this kind of problem is just continuing to come back to that. As I bounce between “too weak” hypotheses and “too strong” ones, I need to keep re-aiming at something I can argue that’s worth arguing, and remember that getting there is just one step in my and others’ learning process. A future piece will go through some accumulated tips on pulling that off.

Footnotes


  1. I really enjoyed the “What qualifies you to do this work?” FAQ on the old GiveWell site that I ran into while writing this. 

Show all footnotes
Comments10


Sorted by Click to highlight new comments since:

Here's a provocative take on your experience that I don't really endorse, but I'd be interested in hearing your reaction to:

Finding unusually cost-effective global health charities isn't actually a wicked problem. You just look into the existing literature on global health prioritization, apply a bunch of quick heuristics to find the top interventions, find charities implementing them, and then see which ones will get more done with more funding. In fact, Giving What We Can independently started recommending the Against Malaria Foundation through a process that was much faster than the above. Peter Singer also came up with donation recommendations that seem not much worse than current GiveWell top recommendations based on fairly limited research.

In response to such a comment, I might say that GiveWell actually had much more reason to think AMF was indeed one of the most cost-effective charities than GWWC, that Peter Singer's recommendations were good but substantially less cost-effective (and that improvement is clearly worth it), and that the above illustration of the wicked problem experience is useful because it applies more strongly in other areas (e.g. AI forecasting). But I'm curious about your response.

Apologies for chiming in so late!

I believe GWWC's recommendation of Against Malaria Foundation was based on GiveWell's (otherwise they might've recommended another bednet charity). And Peter Singer generally did not recommend the charities that GiveWell ranks highly, before GiveWell ranked them highly.

I don't want to deny, though, that for any given research project you might undertake, there's often a much quicker approach that gets you part of the way there. I think the process you described is a fine way to generate some good initial leads (I think GWWC independently recommended Schistosomiasis Control Initiative before GiveWell did, for example). As the stakes of the research rise, though, I think it becomes more valuable and important to get a lot of the details right - partly because so much money rides on it, partly because quicker approaches seem more vulnerable to adversarial behavior/Goodharting of the process.

I'm going to point aspiring researchers who ask me what it's like to work at an EA think tank to this article. This is exactly my experience for many projects where the end result is an article. It's a bit different when the end result is a decision like "what charity to start".  

Cross-posted from Cold Takes

This link is broken.

Fixed, thanks!

I think descriptions like this of the challenges doing good research poses are really helpful! The description definitely resonates with me.

I might have a suggestion here:

How about writing a draft with imperfect arguments, and then getting a few people from your target audience to read it while you're watching them (for example, in a video call with share screen), and you'll hear their questions/thoughts/pushbacks?

I think about this like "user testing": The pushbacks that people have are often different from what I'd guess myself.

 

Meta: I'm expecting you to have all sorts of pushbacks to this suggestion, I could explain my thoughts and experiences here over 10 pages, probably. But I'm not going to! I'm going to hope you tell me what you care about and I'll improve only that part

 

(Big fan btw!)

I think this is often a good approach!

This is the kindest way anyone ever told me that I didn't help ;) <3 <3

If anyone's interested, I just posted about this idea yesterday: https://www.lesswrong.com/posts/8BGexmqqAx5Z2KFjW/how-to-make-your-article-more-persuasive-spoiler-do-user

Would you consider asking representative samples of populations about priority problems, coordinating local experts to develop solutions which best address most of these priorities, and asking these experts to find organizations and individuals who deploy these solutions at the lowest (marginal) cost?

In this way, you are independent of the data that is posted online (which may be non-representative of all entity capacities), work much more efficiently (coordinating experts who spent substantial time learning about their domains), and get much better cost-effectiveness (combining solutions and getting local prices).

Feel free to review my initial reactions to this piece.

Curated and popular this week
 ·  · 25m read
 · 
Epistemic status: This post — the result of a loosely timeboxed ~2-day sprint[1] — is more like “research notes with rough takes” than “report with solid answers.” You should interpret the things we say as best guesses, and not give them much more weight than that. Summary There’s been some discussion of what “transformative AI may arrive soon” might mean for animal advocates. After a very shallow review, we’ve tentatively concluded that radical changes to the animal welfare (AW) field are not yet warranted. In particular: * Some ideas in this space seem fairly promising, but in the “maybe a researcher should look into this” stage, rather than “shovel-ready” * We’re skeptical of the case for most speculative “TAI<>AW” projects * We think the most common version of this argument underrates how radically weird post-“transformative”-AI worlds would be, and how much this harms our ability to predict the longer-run effects of interventions available to us today. Without specific reasons to believe that an intervention is especially robust,[2] we think it’s best to discount its expected value to ~zero. Here’s a brief overview of our (tentative!) actionable takes on this question[3]: ✅ Some things we recommend❌ Some things we don’t recommend * Dedicating some amount of (ongoing) attention to the possibility of “AW lock ins”[4]  * Pursuing other exploratory research on what transformative AI might mean for animals & how to help (we’re unconvinced by most existing proposals, but many of these ideas have received <1 month of research effort from everyone in the space combined — it would be unsurprising if even just a few months of effort turned up better ideas) * Investing in highly “flexible” capacity for advancing animal interests in AI-transformed worlds * Trying to use AI for near-term animal welfare work, and fundraising from donors who have invested in AI * Heavily discounting “normal” interventions that take 10+ years to help animals * “Rowing” on na
 ·  · 3m read
 · 
About the program Hi! We’re Chana and Aric, from the new 80,000 Hours video program. For over a decade, 80,000 Hours has been talking about the world’s most pressing problems in newsletters, articles and many extremely lengthy podcasts. But today’s world calls for video, so we’ve started a video program[1], and we’re so excited to tell you about it! 80,000 Hours is launching AI in Context, a new YouTube channel hosted by Aric Floyd. Together with associated Instagram and TikTok accounts, the channel will aim to inform, entertain, and energize with a mix of long and shortform videos about the risks of transformative AI, and what people can do about them. [Chana has also been experimenting with making shortform videos, which you can check out here; we’re still deciding on what form her content creation will take] We hope to bring our own personalities and perspectives on these issues, alongside humor, earnestness, and nuance. We want to help people make sense of the world we're in and think about what role they might play in the upcoming years of potentially rapid change. Our first long-form video For our first long-form video, we decided to explore AI Futures Project’s AI 2027 scenario (which has been widely discussed on the Forum). It combines quantitative forecasting and storytelling to depict a possible future that might include human extinction, or in a better outcome, “merely” an unprecedented concentration of power. Why? We wanted to start our new channel with a compelling story that viewers can sink their teeth into, and that a wide audience would have reason to watch, even if they don’t yet know who we are or trust our viewpoints yet. (We think a video about “Why AI might pose an existential risk”, for example, might depend more on pre-existing trust to succeed.) We also saw this as an opportunity to tell the world about the ideas and people that have for years been anticipating the progress and dangers of AI (that’s many of you!), and invite the br
 ·  · 12m read
 · 
I donated my left kidney to a stranger on April 9, 2024, inspired by my dear friend @Quinn Dougherty (who was inspired by @Scott Alexander, who was inspired by @Dylan Matthews). By the time I woke up after surgery, it was on its way to San Francisco. When my recipient woke up later that same day, they felt better than when they went under. I'm going to talk about one complication and one consequence of my donation, but I want to be clear from the get: I would do it again in a heartbeat. Correction: Quinn actually donated in April 2023, before Scott’s donation. He wasn’t aware that Scott was planning to donate at the time. The original seed came from Dylan's Vox article, then conversations in the EA Corner Discord, and it's Josh Morrison who gets credit for ultimately helping him decide to donate. Thanks Quinn! I met Quinn at an EA picnic in Brooklyn and he was wearing a shirt that I remembered as saying "I donated my kidney to a stranger and I didn't even get this t-shirt." It actually said "and all I got was this t-shirt," which isn't as funny. I went home and immediately submitted a form on the National Kidney Registry website. The worst that could happen is I'd get some blood tests and find out I have elevated risk of kidney disease, for free.[1] I got through the blood tests and started actually thinking about whether to do this. I read a lot of arguments, against as well as for. The biggest risk factor for me seemed like the heightened risk of pre-eclampsia[2], but since I live in a developed country, this is not a huge deal. I am planning to have children. We'll just keep an eye on my blood pressure and medicate if necessary. The arguments against kidney donation seemed to center around this idea of preserving the sanctity or integrity of the human body: If you're going to pierce the sacred periderm of the skin, you should only do it to fix something in you. (That's a pretty good heuristic most of the time, but we make exceptions to give blood and get pier