I currently work with CE/AIM-incubated charity ARMoR on research distillation, quantitative modelling, consulting, and general org-boosting to support policy advocacy for market-shaping tools to incentivise innovation and ensure access to antibiotics to help combat AMR.
I previously did AIM's Research Training Program, was supported by a FTX Future Fund regrant and later Open Philanthropy's affected grantees program, and before that I spent 6 years doing data analytics, business intelligence and knowledge + project management in various industries (airlines, e-commerce) and departments (commercial, marketing), after majoring in physics at UCLA and changing my mind about becoming a physicist. I've also initiated some local priorities research efforts, e.g. a charity evaluation initiative with the moonshot aim of reorienting my home country Malaysia's giving landscape towards effectiveness, albeit with mixed results.
I first learned about effective altruism circa 2014 via A Modest Proposal, Scott Alexander's polemic on using dead children as units of currency to force readers to grapple with the opportunity costs of subpar resource allocation under triage. I have never stopped thinking about it since, although my relationship to it has changed quite a bit; I related to Tyler's personal story (which unsurprisingly also references A Modest Proposal as a life-changing polemic):
I thought my own story might be more relatable for friends with a history of devotion – unusual people who’ve found themselves dedicating their lives to a particular moral vision, whether it was (or is) Buddhism, Christianity, social justice, or climate activism. When these visions gobble up all other meaning in the life of their devotees, well, that sucks. I go through my own history of devotion to effective altruism. It’s the story of [wanting to help] turning into [needing to help] turning into [living to help] turning into [wanting to die] turning into [wanting to help again, because helping is part of a rich life].
I'm looking for "decision guidance"-type roles e.g. applied prioritization research.
Do reach out if you think any of the above piques your interest :)
How do you personally deal with this difficulty?
I personally dealt with this (in part) by referencing Jack Malde's excellent guided cause prio flowchart (this was a first draft to gauge forum receptivity). Sadly, when asked about updates, he replied that "Interest seemed to be somewhat limited."
This passage from David Roodman's essay Appeal to Me: First Trial of a “Replication Opinion” resonated:
When we draw on research, we vet it in rare depth (as does GiveWell, from which we spun off). I have sometimes spent months replicating and reanalyzing a key study—checking for bugs in the computer code, thinking about how I would run the numbers differently and how I would interpret the results. This interface between research and practice might seem like a picture of harmony, since researchers want their work to guide decision-making for the public good and decision-makers like Open Philanthropy want to receive such guidance.
Yet I have come to see how cultural misunderstandings prevail at this interface. From my side, what the academy does and what I and most of the public think it does are not the same. There are two problems. First, about half the time I reanalyze a study, I find that there are important bugs in the code, or that adding more data makes the mathematical finding go away, or that there’s a compelling alternative explanation for the results. (Caveat: most of my experience is with non-randomized studies.) Second, when I send my critical findings to the journal that peer-reviewed and published the original research, the editors usually don’t seem interested (recent exception). Seeing the ivory tower as a bastion of truth-seeking, I used to be surprised. I understand now that, because of how the academy works, in particular, because of how the individuals within academia respond to incentives beyond their control, we consumers of research are sometimes more truth-seeking than the producers.
I had a similar realisation towards the end of my studies which was a key factor in persuading me to not pursue academia. Also I've mentioned this before, but it surprised me how much more these kinds of details mattered in my experience in industry.
Skipping over to his recap of the specific case he looked into:
To recap:
- Two economists performed a quantitative analysis of a clever, novel question.
- It underwent peer review.
- It was published in one of the top journals in economics. Its data and computer code were posted online, per the journal’s policy.
- Another researcher promptly responded that the analysis contains errors (such as computing average daytime temperature with respect to Greenwich time rather than local time), and that it could have been done on a much larger data set (for 1990 to ~2019 instead of 2000–04). These changes make the headline findings go away.
- After behind-the-scenes back and forth among the disputants and editors, the journal published the comment and rejoinder.
- These new articles confused even an expert.
- An outsider (me) delved into the debate and found that it’s actually a pretty easy call.
If you score the journal on whether it successfully illuminated its readership as to the truth, then I think it is kind of 0 for 2. ...
That said, AEJ Applied did support dialogue between economists that eventually brought the truth out. In particular, by requiring public posting of data and code (an area where this journal and its siblings have been pioneers), it facilitated rapid scrutiny.
Still, it bears emphasizing: For quality assurance, the data sharing was much more valuable than the peer review. And, whether for lack of time or reluctance to take sides, the journal’s handling of the dispute obscured the truth.
My purpose in examining this example is not to call down a thunderbolt on anyone, from the Olympian heights of a funding body. It is rather to use a concrete story to illustrate the larger patterns I mentioned earlier. Despite having undergone peer review, many published studies in the social sciences and epidemiology do not withstand close scrutiny. When they are challenged, journal editors have a hard time managing the debate in a way that produces more light than heat.
I have critiqued papers about the impact of foreign aid, microcredit, foreign aid, deworming, malaria eradication, foreign aid, geomagnetic storm risk, incarceration, schooling, more schooling, broadband, foreign aid, malnutrition, …. Many of those critiques I have submitted to journals, usually only to receive polite rejections. I obviously lack objectivity. But it has struck me as strange that, in these instances, we on the outside of academia seem more concerned about getting to the truth than those on the inside.
The part about "what if money were no object?" reminds me of Justin Sandefur's point in his essay PEPFAR and the Costs of Cost-Benefit Analysis that (emphasis mine)
Budgets aren’t fixed
Economists’ standard optimization framework is to start with a fixed budget and allocate money across competing alternatives. At a high-level, this is also how the global development community (specifically OECD donors) tends to operate: foreign aid commitments are made as a proportion of national income, entirely divorced from specific policy goals. PEPFAR started with the goal instead: Set it, persuade key players it can be done, and ask for the money to do it.
Bush didn’t think like an economist. He was apparently allergic to measuring foreign aid in terms of dollars spent. Instead, the White House would start with health targets and solve for a budget, not vice versa. “In the government, it’s usually — here is how much money we think we can find, figure out what you can do with it,” recalled Mark Dybul, a physician who helped design PEPFAR, and later went on to lead it. “We tried that the first time and they came back and said, ‘That’s not what we want...Tell us how much it will cost and we’ll figure out if we can pay for it or not, but don’t start with a cost.’”
Economists are trained to look for trade-offs. This is good intellectual discipline. Pursuing “Investment A” means forgoing “Investment B.” But in many real-world cases, it’s not at all obvious that the realistic alternative to big new spending proposals is similar levels of big new spending on some better program. The realistic counterfactual might be nothing at all.
In retrospect, it seems clear that economists were far too quick to accept the total foreign aid budget envelope as a fixed constraint. The size of that budget, as PEPFAR would demonstrate, was very much up for debate.
When Bush pitched $15 billion over five years in his State of the Union, he noted that $10 billion would be funded by money that had not yet been promised. And indeed, 2003 marked a clear breaking point in the history of American foreign aid. In real-dollar terms, aid spending had been essentially flat for half a century at around $20 billion a year. By the end of Bush’s presidency, between PEPFAR and massive contracts for Iraq reconstruction, that number hovered around $35 billion. And it has stayed there since. (See Figure 2)
Compared to normal development spending, $15 billion may have sounded like a lot, but exactly one sentence after announcing that number in his State of the Union address, Bush pivoted to the case for invading Iraq, a war that would eventually cost America something in the region of $3 trillion — not to mention thousands of American and hundreds of thousands of Iraqi lives. Money was not a real constraint.
A broader lesson here, perhaps, is about getting counterfactuals right. In comparative cost-effectiveness analysis, the counterfactual to AIDS treatment is the best possible alternative use of that money to save lives. In practice, the actual alternative might simply be the status quo, no PEPFAR, and a 0.1% reduction in the fiscal year 2004 federal budget. Economists are often pessimistic about the prospects of big additional spending, not out of any deep knowledge of the budgeting process, but because holding that variable fixed makes analyzing the problem more tractable. In reality, there are lots of free variables.
GiveWell did their first "lookbacks" (reviews of past grants) to see if they've met initial expectations and what they could learn from them:
Lookbacks compare what we thought would happen before making a grant to what we think happened after at least some of the grant’s activities have been completed and we’ve conducted follow-up research. While we can’t know everything about a grant’s true impact, we can learn a lot by talking to grantees and external stakeholders, reviewing program data, and updating our research. We then create a new cost-effectiveness analysis with this updated information and compare it to our original estimates.
(While I'm very glad they did so with their usual high quality and rigor, I'm also confused why they hadn't started doing this earlier, given that "okay, but did we really help as much as we think we would've? Let's check?" feels like such a basic M&E / ops-y question. I'm obviously missing something trivial here, but also I find it hard to buy "limited org capacity"-type explanations for GW in particular given total funding moved, how long they've worked, their leading role in the grantmaking ecosystem etc)
Their lookbacks led to substantial changes vs original estimates, in New Incentives' case driven by large drops in cost per child enrolled ("we think this is due to economies of scale, efficiency efforts by New Incentives, and the devaluation of the Nigerian naira, but we haven’t prioritized a deep assessment of drivers of cost changes") and in HKI's case driven by vitamin A deficiency rates in Nigeria being lower and counterfactual coverage rates higher than originally estimated:
I wonder how they select grants to showcase on that page. They've made grants that are both much larger and more cost-effective than that, e.g. this $71.5M grant in Jan '23 to HKI's vitamin A supplementation program that they estimate would save roughly 49,000 lives at ~$1,450 per life saved after all adjustments (or ~93,000 lives at $770 per life if only adjusting for internal and external validity, or nearly 280k lives at at $260 per life saved before any adjustments, i.e. the standard I usually see in most BOTECs claiming to "beat GW top charities"...). Only thing is, this wouldn't be obvious from their original CEA because they tend to input "donation (arbitrary size)" = $100k instead of the actual grant amounts; I had to manually input their grant budget breakdown into a copy of their CEA to get the numbers above (which also means I may have done it wrong, so caveat utilitor...)
RP's CEA of LN is done by quarter, so it's quarterly. More pertinently to the part you quoted, row 144 is saying the income of the women reached for counselling is modelled as increasing by $1 for every $1.17 of all-in program cost in Q1 '24, the latter quickly dropping to just $0.25 by Q4 '25. If you're wondering what this corresponds to in terms of % increase in earnings, it's 19%, from Canning and Schultz (2019) with -20% and -40% discounts for internal and external validity respectively.
I'm admittedly confused by this. I suppose when you wrote
... none of these have two ticks in my estimation. However, combined, I think this list represents a threat that is extremely likely to be real and capable of ending a galactic civilisation.
you meant that, combined, they nudge your needle 10%?
I've read those comments awhile back and I don't think they support your view for relying overwhelmingly on explicit quantitative cost-effectiveness analyses. In particular the key parts I got out of Isabel's comment weren't what you quoted but instead (emphasis mine not hers)
Cost-effectiveness is the primary driver of our grantmaking decisions. But, “overall estimated cost-effectiveness of a grant” isn't the same thing as “output of cost-effectiveness analysis spreadsheet.” (This blog post is old and not entirely reflective of our current approach, but it covers a similar topic.)
and
That is, we don’t solely rely on our spreadsheet-based analysis of cost-effectiveness when making grants.
which is in direct contradistinction to your style as I understand it, and aligned with what Holden wrote earlier in that link you quoted (emphasis his this time)
While some people feel that GiveWell puts too much emphasis on the measurable and quantifiable, there are others who go further than we do in quantification, and justify their giving (or other) decisions based on fully explicit expected-value formulas. The latter group tends to critique us – or at least disagree with us – based on our preference for strong evidence over high apparent “expected value,” and based on the heavy role of non-formalized intuition in our decisionmaking. This post is directed at the latter group.
We believe that people in this group are often making a fundamental mistake, one that we have long had intuitive objections to but have recently developed a more formal (though still fairly rough) critique of. The mistake (we believe) is estimating the “expected value” of a donation (or other action) based solely on a fully explicit, quantified formula, many of whose inputs are guesses or very rough estimates. We believe that any estimate along these lines needs to be adjusted using a “Bayesian prior”; that this adjustment can rarely be made (reasonably) using an explicit, formal calculation; and that most attempts to do the latter, even when they seem to be making very conservative downward adjustments to the expected value of an opportunity, are not making nearly large enough downward adjustments to be consistent with the proper Bayesian approach.
This view of ours illustrates why – while we seek to ground our recommendations in relevant facts, calculations and quantifications to the extent possible – every recommendation we make incorporates many different forms of evidence and involves a strong dose of intuition. And we generally prefer to give where we have strong evidence that donations can do a lot of good rather than where we have weak evidence that donations can do far more good – a preference that I believe is inconsistent with the approach of giving based on explicit expected-value formulas (at least those that (a) have significant room for error (b) do not incorporate Bayesian adjustments, which are very rare in these analyses and very difficult to do both formally and reasonably).
Note that I'm not saying CEAs don't matter, or that CEA-focused approaches are unreliable — I'm a big believer in the measurability of things people often claim can't be measured, I think in principle EV-maxing is almost always correct but in practice it can be perilous and on the margin people should instead be working a bit more on how different moral conceptions cash out in different recommendations more systematically e.g. with RP's work, if a CEA-based case can't be made for a grant I get very skeptical, I in fact also consider CEAs the main input into my thinking on these kinds of things, etc. I am simply wary of single-parameter optimisation taken to the limit in general (for anything, really, not just for donating), and I see your approach as being willing to go much further along that path than I do (and I'm already further along that path than almost anyone I meet IRL).
But I've seen enough back-and-forth between people in the cluster and sequence camps to have the sense that nobody really ends up changing their mind substantively and I doubt this will happen here either, sorry, so I will respectfully bow out of the conversation.
Nice table from the paper Epic narratives of the Green Revolution in Brazil, China, and India by Lídia Cabral, Poonam Pandey, and Xiuli Xu (2022):