I read eleven cost-effectiveness analyses published by Ambitious Impact (formerly Charity Entrepreneurship) over the last few weeks. Seven were global health and development; four were animal welfare. The earliest was from 2022, the rest from 2024. What I was looking for was not whether any individual CEA had the right answer, but whether the same conceptual parameters were being treated consistently across analyses. They mostly weren't, and three of the inconsistencies seem worth flagging in public.
The short version: AIM's "probability of success" parameter is constructed three completely different ways across the corpus, with values ranging from 0.2 to 1.0 for what is supposed to be the same conceptual quantity. The 2024 GHD template's internal and external validity adjustments are applied inconsistently from one CEA to the next, including one where they're explicitly zeroed out. And template "suggested defaults" are often left at their default values rather than customized, most strikingly in Digital Pulmonary Rehabilitation 2024, which draws on far more template parameters than the other CEAs and ends up with 30 of 55 suggested values left at defaults across the live model. None of these are individual errors. They're patterns about how the template gets used in practice, and I think they might be useful to the people who maintain it.

Hi TSondo! Thanks for a wonderful analysis of structural patterns in AIM's Cost-effectiveness Analyses. As AIM's Research Director, I respond to your findings and suggestions here. Filip Murár and Vicky Cox, Senior Research Managers, contributed to this response.
Finding one: probability of success
We recognize that it is disconcerting to see us changing approach so much year on year. We take a fairly pragmatic approach to modelling and have a low bar for implementing changes where we think appropriate, caring less about comparability across years. Therefore, we are less concerned about year-on-year changes in methodology. For the avoidance of doubt, our CEAs are pretty specific to the task of forecasting a broad point estimate for a plausible future intervention (where we have very little detail about how it will be implemented). They should not be taken literally as evaluations of a certain intervention or charity like GiveWell or Animal Charity Evaluators.
However, you also identify wide variation in shorter timeframes. This is more worrying, as we do want good comparability within a round. We now use 100% for direct delivery and custom amounts for other types of interventions (namely policy or those relying a lot on persuasion of other actors). In these cases, we may use different approaches to estimate a best-guess, which is what is happening in some of the cases you mention. I agree that we should make this choice more explicit, and will take you up on your suggestion to clarify methodology for selecting p(Success). Thanks for the suggestion!
Finding two: validity adjustments
Bang on. We recognize this is an inconsistency. It has largely risen from our constant finding that each evidence base requires a specific treatment, and we haven't worried much about it given this fact. However, your feedback has reignited an old debate about standardizing validity discounts in the way you suggest. Candidly, I can't promise any specific changes until we think a bit more about how to approach this, but we are keen to at least establish more guidance and use the guidance in quality assurance processes. We'll likely draw heavily from this great piece for that
Finding three: defaults left unchanged
Another good point. We'll start being clearer about active choices vs. defaults in future (in the notes). I think this mostly improves reasoning transparency and clearly is needed.
Other stuff you mention
For the avoidance of doubt - the patterns resonate. We were aware of most of these. However, your feedback is helpful for us to calibrate on the relative priority of different improvements we notice. As a small team, this type of feedback to help us figure out what to prioritize is helpful.
We are quite keen to hear more about your suggestion for deeper analysis. I will email you about this next week!
Thanks Morgan! I appreciate the detailed response from you, Filip, and Vicky. Glad the patterns were useful. I'll look out for your email and am happy to dig into whatever direction would be most useful on your end.
Tsondo
Hey! Just checking, did you run this past AIM first? We recommend that you do, out of politeness but also to avoid easy to dispell misunderstandings.
I sent Vicky an email at the time of publishing rather than sharing a draft in advance. I didn't pre-share for two reasons: I don't have an existing relationship with anyone at AIM, so cold-emailing a stranger and asking them to review a draft on a deadline felt like a heavier ask than a heads-up at publication, and I tried to write the post so that any factual claim can be verified in a few minutes from the public spreadsheets (the cell references in each finding are there for that purpose). Vasco Grilo's recent methodology posts on AIM CEAs are part of the same public conversation, and Vicky has engaged with those publicly on the Forum, which is part of why the published-with-heads-up route felt appropriate for a contribution in the same vein.
That said, I'm new to posting on the Forum and I'd take your read on the convention seriously. If the norm here is that pre-sharing should happen even when the analysis is grounded in public data, I'd want to know for next time.
Cheers!
The way you've gone about it seems reasonable enough to me (i.e. I don't think you need to do anything to correct it now).
I still think it is ideal to run a critique like this past an organisation first - primarily because of the chance of misunderstandings. I'm being agnostic here about whether a misunderstanding would be your fault or the fault of AIM for using ambiguous terms/ entering the wrong number etc... The thing we'd be trying to avoid is spreading a critique of an organisation that is based on a hard to dispel/correct misunderstanding (many more people would see a critique than a correction).
I explain in the post that the mod team can help you get some feedback from the org first if you'd rather not do it yourself/ don't feel you have a warm enough lead. Just dm me on the Forum if you'd like that help in the future.
PS- given "I'm new to posting on the Forum and I'd take your read on the convention seriously" I'd also add that I think you'd get more readers if you shared the whole post here. When I first saw this I thought this was the whole post (link-post and cross-post are pretty easy to conflate on the Forum). If you instead want people to spend time on your blog website, I'd just make sure to put a "continue reading" link at the end. Though, you'll get more comments if you put the full post on the Forum.
Thanks Toby, both the specific guidance and the offer are super helpful.
The asymmetric-reach point is something I had not thought of, but makes perfect sense. Even when a misunderstanding would be the org's to own, the critique travels further than the correction, so the pre-share is doing real work beyond politeness. I'll treat it as the default going forward, and I'll take you up on the DM route next time rather than trying to find a warm lead myself. That removes my main hesitation.
And good catch on the link-post vs cross-post distinction. I didn't get that, either, somewhat conflated them. For the next one I'll put the full text on the Forum with a canonical link back. You're right that the comments are where the methodology actually gets pressure-tested, and honestly, my blog has no commenting enabled, so the Forum is clearly where the conversation should live. Appreciate you taking the time to flag both of these.