GWWC lists StrongMinds as a “top-rated” charity. Their reason for doing so is because Founders Pledge has determined they are cost-effective in their report into mental health.
I could say here, “and that report was written in 2019 - either they should update the report or remove the top rating” and we could all go home. In fact, most of what I’m about to say does consist of “the data really isn’t that clear yet”.
I think the strongest statement I can make (which I doubt StrongMinds would disagree with) is:
“StrongMinds have made limited effort to be quantitative in their self-evaluation, haven’t continued monitoring impact after intervention, haven’t done the research they once claimed they would. They have not been vetted sufficiently to be considered a top charity, and only one independent group has done the work to look into them.”
My key issues are:
- Survey data is notoriously noisy and the data here seems to be especially so
- There are reasons to be especially doubtful about the accuracy of the survey data (StrongMinds have twice updated their level of uncertainty in their numbers due to SDB)
- One of the main models is (to my eyes) off by a factor of ~2 based on an unrealistic assumption about depression (medium confidence)
- StrongMinds haven’t continued to publish new data since their trials very early on
- StrongMinds seem to be somewhat deceptive about how they market themselves as “effective” (and EA are playing into that by holding them in such high esteem without scrutiny)
What’s going on with the PHQ-9 scores?
In their last four quarterly reports, StrongMinds have reported PHQ-9 reductions of: -13, -13, -13, -13. In their Phase II report, raw scores dropped by a similar amount:
However, their Phase II analysis reports (emphasis theirs):
As evidenced in Table 5, members in the treatment intervention group, on average, had a 4.5 point reduction in their total PHQ-9 Raw Score over the intervention period, as compared to the control populations. Further, there is also a significant visit effect when controlling for group membership. The PHQ-9 Raw Score decreased on average by 0.86 points for a participant for every two groups she attended. Both of these findings are statistically significant.
Founders Pledge’s cost-effectivenes model uses the 4.5 reduction number in their model. (And further reduces this for reasons we’ll get into later).
Based on Phase I and II surveys, it seems to me that a much more cost-effective intervention would be to go around surveying people. I’m not exactly sure what’s going on with the Phase I / Phase II data, but the best I can tell is in Phase I we had a ~7.5 vs ~5.1 PHQ-9 reduction from “being surveyed” vs “being part of the group” and in Phase II we had ~3.0 vs ~7.1 PHQ-9 reduction from “being surveyed” vs “being part of the group”. [an earlier version of this post used the numbers '~5.1 vs ~4.5 PHQ-9' but Natalia pointed out the error in this comment] For what it’s worth, I don’t believe this is likely the case, I think it’s just a strong sign that the survey mechanism being used is inadequate to determine what is going on.
There are a number of potential reasons we might expect to see such large improvements in the mental health of the control group (as well as the treatment group).
Mean-reversion - StrongMinds happens to sample people at a low ebb and so the progression of time leads their mental health to improve of its own accord
“People in targeted communities often incorrectly believe that StrongMinds will provide them with cash or material goods and may therefore provide misleading responses when being diagnosed.” (source) Potential participants fake their initial scores in order to get into the program (either because they (mistakenly) think there is some material benefit to being in the program or because they think it makes them more likely to get into a program they think would have value for them.
What’s going on with the ‘social-desirability bias’?
Both the Phase I and Phase II trials discovered that 97% and 99% of their patients were “depression-free” after the trial. They realised that these numbers were inaccurate during their Phase II trial. They decided on the basis of this, to reduce their numbers from 99% in Phase II to 92% on the basis of the results two weeks prior to the end.
In their follow-up study of Phases I and II, they then say:
While both the Phase 1 and 2 patients had 95% depression-free rates at the completion of formal sessions, our Impact Evaluation reports and subsequent experience has helped us to understand that those rates were somewhat inflated by social desirability bias, roughly by a factor of approximately ten percentage points. This was due to the fact that their Mental Health Facilitator administered the PHQ-9 at the conclusion of therapy. StrongMinds now uses external data collectors to conduct the post-treatment evaluations. Thus, for effective purposes, StrongMinds believes the actual depression-free rates for Phase 1 and 2 to be more in the range of 85%.
I would agree with StrongMinds that they still had social-desirability bias in their Phase I and II reports, although it’s not clear to me they have fully removed it now. This also relates to my earlier point about how much improvement we see in the control group. If pre-treatment are showing too high levels of depression and the post-treatment group is too low how confident should we be in the magnitude of these effects?
How bad is depression?
Severe depression has a DALY weighting of 0.66.
(Founders Pledge report, via Global Burden of Disease Disability Weights)
The key section of the Disability Weights table reads as follows:
My understanding (based on the lay descriptions, IANAD etc) is that “severe depression” is not quite the right way to describe the thing which has a DALY weighting of 0.66. “severe depression during an episode has a DALY weighting of 0.66” would be more accurate.
Assuming linear decline in severity on the PHQ-9 scale.
(Founders Pledge model)
Furthermore whilst the disability weights are linear between “mild”, “moderate” and “severe” the threshold for “mild” in PHQ-9 terms is not ~1/3 of the way up the scale. Therefore there is a much smaller change in disability weight for going 12 points from 12 - 0 than for 24-12. (One takes you from ~mild to asymptomatic ~.15 and one takes you from “severe episode” to “mild episode” ~0.51 which is a much larger change).
This change would roughly halve the effectiveness of the intervention, using the Founders Pledge model.
Lack of data
My biggest gripe with StrongMinds is they haven’t continued to provide follow-up analysis for any of their cohorts (aside from Phase I and II) despite saying they would in their 2017 report:
Looking forward, StrongMinds will continue to strengthen our evaluation efforts and will continue to follow up with patients at 6 or 12 month intervals. We also remain committed to implementing a much more rigorous study, in the form of an externally-led, longitudinal randomized control trial, in the coming years.
As far as I can tell, based on their conversation with GiveWell:
StrongMinds has decided not pursue a randomized controlled trial (RCT) of its program in the short term, due to:
High costs – Global funding for mental health interventions is highly limited, and StrongMinds estimates that a sufficiently large RCT of its program would cost $750,000 to $1 million.
Sufficient existing evidence – An RCT conducted in 2002 in Uganda found that weekly IPT-G significantly reduced depression among participants in the treatment group. Additionally, in October 2018, StrongMinds initiated a study of its program in Uganda with 200 control group participants (to be compared with program beneficiaries)—which has demonstrated strong program impact. The study is scheduled to conclude in October 2019.
Sufficient credibility of intervention and organization – In 2017, WHO formally recommended IPT-G as first line treatment for depression in low- and middle-income countries. Furthermore, the woman responsible for developing IPT-G and the woman who conducted the 2002 RCT on IPT-G both serve as mental health advisors on StrongMinds' advisory committee.
I don’t agree with any of the bullet points. (Aside from the first, although I think there should be ways to publish more data within the context of their current data).
On the bright side(!) as far as I can tell, we should be seeing new data soon. StrongMinds and Berk Ozler should have finished collecting their data for a larger RCT on StrongMinds. It’s a shame it’s not a direct comparison between cash transfers and IPT-G, (the arms are: IPT-G, IPT-G + cash transfers, no-intervention) but it will still be very valuable data for evaluating them.
Misleading?
(from the StrongMinds homepage)
This implies Charity Navigator thinks they are one of the world’s most effective charities. But in fact Charity Navigator haven’t evaluated them for “Impact & Results”.
WHO: There’s no external validation here (afaict). They just use StrongMinds own numbers and talk around the charity a bit.
I’m going to leave aside discussing HLI here. Whilst I think they have some of the deepest analysis of StrongMinds, I am still confused by some of their methodology, it’s not clear to me what their relationship to StrongMinds is. I plan on going into more detail there in future posts. The key thing to understand about the HLI methodology is that follows the same structure as the Founders Pledge analysis and so all the problems I mention above regarding data apply just as much to them as FP.
The “Inciting Altruism” profile, well, read it for yourself.
Founders Pledge is the only independent report I've found - and is discussed throughout this article.
GiveWell Staff Members personal donations:
I plan to give 5% of my total giving to StrongMinds, an organization focused on treating depression in Africa. I have not vetted this organization anywhere nearly as closely as GiveWell’s top charities have been vetted, though I understand that a number of people in the effective altruism community have a positive view of StrongMinds within the cause area of mental health (though I don’t have any reason to think it is more cost-effective than GiveWell’s top charities). Intuitively, I believe mental health is an important cause area for donors to consider, and although we do not have GiveWell recommendations in this space, I would like to learn more about this area by making a relatively small donation to an organization that focuses on it.
This is not external validation.
The EA Forum post is also another HLI piece.
I don’t have access to the Stanford piece, it’s paywalled.
Another example of them being misleading is in all their reports they report the headline PHQ-9 reduction numbers, but everyone involve knows (I hope) that those aren't really a relevant metric without understanding the counterfactual reduction they actually think is happening. It's either a vanity metric or a bit deceptive.
Conclusion
What I would like to happen is:
- Founders Pledge update or withdraw their recommendation of StrongMinds
- GWWC remove StrongMinds as a top charity
- Ozler's study comes out saying it's super effective
- Everyone reinstates StrongMinds as a top charity, including some evaluators who haven't done so thus far
Hi Simon, thanks for writing this! I’m research director at FP, and have a few bullets to comment here in response, but overall just want to indicate that this post is very valuable. I’m also commenting on my phone and don’t have access to my computer at the moment, but can participate in this conversation more energetically (and provide more detail) when I’m back at work next week.
I basically agree with what I take to be your topline finding here, which is that more data is needed before we can arrive at GiveWell-tier levels of confidence about StrongMinds. I agree that a lack of recent follow-ups is problematic from an evaluator’s standpoint and look forward to updated data.
FP doesn’t generally strive for GW-tier levels of confidence; we’re risk-neutral and our general procedure is to estimate expected cost-effectiveness inclusive of deflators for various kinds of subjective consideration, like social desirability bias.
The 2019 report you link (and the associated CEA) is deprecated— FP hasn’t been resourced to update public-facing materials, a situation that is now changing—but the proviso at the top of the page is accurate: we stand by our recommendation.
This is be
The page doesn't say deprecated and GWWC are still linking to it and recommending it as a top charity. I do think your statements here should be enough for GWWC to remove them as a top charity.
This is what triggered the whole thing in the first place - I have had doubts about StrongMinds for a long time (I privately shared doubts with many EAs ~a year ago) but I didn't think it was considered a top charity and think it's a generally "fine" charity and we should collect more data in the area. Sam Atis' blog led me to see it was considered a top charity, and that was what finally tipped me over the edge.
... (read more)“I think my main takeaway is my first one here. GWWC shouldn't be using your recommendations to label things top charities. Would you disagree with that?”
Yes, I think so- I’m not sure why this should be the case. Different evaluators have different standards of evidence, and GWWC is using ours for this particular recommendation. They reviewed our reasoning and (I gather) were satisfied. As someone else said in the comments, the right reference class here is probably deworming— “big if true.”
The message on the report says that some details have changed, but that our overall view is represented. That’s accurate, though there are some details that are more out of date than others. We don’t want to just remove old research, but I’m open to the idea that this warning should be more descriptive.
I’ll have to wait til next week to address more substantive questions but it seems to me that the recommend/don’t recommend question is most cruxy here.
EDIT:
On reflection, it also seems cruxy that our current evaluation isn’t yet public. This seems very fair to me, and I’d be very curious to hear GWWC’s take. We would like to make all evaluation materials public eventually, but this is not as simple as it might seem and especially hard given our orientation toward member giving.
Though this type of interaction is not ideal for me, it seems better for the community. If they can’t be totally public, I’d rather our recs be semi-public and subject to critique than totally private.
I'm afraid that doesn't make me super impressed with GWWC, and it's not easy for non-public reasoning to be debunked. Hopefully you'll publish it and we can see where we disagree.
I think there's a big difference between deworming and StrongMinds.
"big if true" might be a... (read more)
Simon, I loved your post!
But I think this particular point is a bit unfair to GWWC and also just factually inaccurate.
For a start GWWC do not "recommend" Strong Minds. They very clearly recommend giving to an expert-managed Fund where an expert grantmaker can distribute the money and they do not recommend giving StrongMinds (or to Deworm the World, or AMF, etc). They say that repeatedly across their website, e.g. here. They then also have some charities that they class as "top rated" which they very clearly say are charities that have been "top rated" by another independent organisation that GWWC trusts.
I think this makes sense. Lets consider GWWC's goals here. GWWC exist to serve and grow its community of donors. I expect that maintaining a broad list of charities on their website across cause areas and providing a convenient donation platform for those charities is the right call for GWWC to achieve those goals, even if some of those charities are less proven. Personally as a GWWC member I very much appreciate they have such a broad a variety of charities (e.g., this year, I donated to one of ACE's standout charities and it was great to be able to do so on the G... (read more)
I suspect this is a reading comprehension thing which I am failing at (I know I have failed at this in the past) but I think there are roughly two ways in which GWWC is either explicitly or implicitly recommending StrongMinds.
Firstly, by labelling it as a "Top Charity" then to all but the most careful reader (and even a careful reader) will see this as some kind of endorsement or "recomendation" to use words at least somewhat sloppily.
Secondly, it does explicitly recommend StrongMinds:
Their #1 recommendation is "Donate to expert-managed funds" and their #2 recommendation is "Donate to charities recommended by trusted charity evaluators". They say:
... (read more)Oh dear, no my bad. I didn't at all realise "top rated" was a label they applied to Strong Minds but not to Give Directly and SCI and other listed charities, and thought you were suggesting StrongMinds be delisted from the site. I still think it makes sense for GWWC to (so far) be trusting other research orgs, and I do think they have acted sensibly (although have room to grow in providing a checks and balance). But I also seemed to have misundestood your point somewhat, so sorry about that.
Hi Simon,
I'm back to work and able to reply with a bit more detail now (though also time-constrained as we have a lot of other important work to do this new year :)).
I still do not think any (immediate) action on our part is required. Let me lay out the reasons why:
(1) Our full process and criteria are explained here. As you seem to agree with from your comment above we need clear and simple rules for what is and what isn't included (incl. because we have a very small team and need to prioritize). Currently a very brief summary of these rules/the process would be: first determine which evaluators to rely on (also note our plans for this year) and then rely on their recommendations. We do not generally have the capacity to review individual charity evaluations, and would only do so and potentially diverge from a trusted evaluator's recommendation under exceptional circumstances. (I don't believe we have had such a circumstance this giving season, but may misremember)
(2) There were no strong reasons to diverge with respect to FP's recommendation of StrongMinds at the time they recommended them - or to do an in-depth review of FP's evaluation ourselves - and I think there still aren... (read more)
This is an excellent response from a transparency standpoint, and increases my confidence in GWWC even though I don't agree with everything in it.
One interesting topic for a different discussion -- although not really relevant to GWWC's work -- is the extent to which recommenders should condition an organization's continued recommendation status on obtaining better data if the organization grows (or even after a suitable period of time). Among other things, I'm concerned that allowing recommendations that were appropriate under criteria appropriate for a small/mid-size organization to be affirmed on the same evidence as an organization grows could disincentivize organizations from commissioning RCTs where appropriate. As relevant here, my take on an organization not having a better RCT is significantly different in the context of an organization with about $2MM a year in room for funding (which was the situation when FP made the recommendation, p. 31 here) than one that is seeking to raise $20MM over the next two years.
Fair enough. I think one important thing to highlight here is that though the details of our analysis have changed since 2019, the broad strokes haven’t — that is to say, the evidence is largely the same and the transformation used (DALY vs WELLBY), for instance, is not super consequential for the rating.
The situation is one, as you say, of GIGO (though we think the input is not garbage) and the main material question is about the estimated effect size. We rely on HLI’s estimate, the methodology for which is public.
I think your (2) is not totally fair to StrongMinds, given the Ozler RCT. No matter how it turns out, it will have a big impact on our next reevaluation of StrongMinds.
Edit: To be clearer, we shared updated reasoning with GWWC but the 2019 report they link, though deprecated, still includes most of the key considerations for critics, as evidenced by your observations here, which remain relevant. That is, if you were skeptical of the primary evidence on SM, our new evaluation would not cause you to update to the other side of the cost-effectiveness bar (though it might mitigate less consequential concerns about e.g. disability weights).
Thanks for this! Useful to get some insight into the FP thought process here.
(emphasis added)
Minor nitpick (I haven't personally read FP's analysis / work on this):
Appendix C (pg 31) details the recruitment process, where they teach locals about what depression is prior to recruitment. The group they sample from are groups engaging in some form of livelihood / microfinance programmes, such as hairdressers. Other groups include churches and people at public health clinic wait areas. It's not clear to me based on that description that we should take at face value that the reason for very very high incoming PHQ-9 scores is that these groups are "severely traumatised" (though it's clearly a possibility!)
RE: priors about low effectiveness of therapeutic interventions - if the group is severely traumatised, then while I agree this might m... (read more)
FP's model doesn't seem to be public, but CEAs are such an uncertain affair that aligning even to 2/3 level is a pretty fair amount of convergence.
Thanks for writing this post!
I feel a little bad linking to a comment I wrote, but the thread is relevant to this post, so I'm sharing in case it's useful for other readers, though there's definitely a decent amount of overlap here.
TL; DR
I personally default to being highly skeptical of any mental health intervention that claims to have ~95% success rate + a PHQ-9 reduction of 12 points over 12 weeks, as this is is a clear outlier in treatments for depression. The effectiveness figures from StrongMinds are also based on studies that are non-randomised and poorly controlled. There are other questionable methodology issues, e.g. surrounding adjusting for social desirability bias. The topline figure of $170 per head for cost-effectiveness is also possibly an underestimate, because while ~48% of clients were treated through SM partners in 2021, and Q2 results (pg 2) suggest StrongMinds is on track for ~79% of clients treated through partners in 2022, the expenses and operating costs of partners responsible for these clients were not included in the methodology.
(This mainly came from a cursory review of StrongMinds documents, and not from examining HLI analyses, though I do think "we’re... (read more)
I want to second this! Not a mental health expert, but I have depression and so have spent a fair amount of time looking into treatments / talking to doctors / talking to other depressed people / etc.
I would consider a treatment extremely good if it decreased the amount of depression a typical person experienced by (say) 20%. If a third of people moved from the "depression" to "depression-free" category I would be very, very impressed. Ninety-five percent of people moving from "depressed" to "depression free" sets off a lot of red flags for me, and makes me think the program has not successfully measured mental illness.
(To put this in perspective: 95% of people walking away depression-free would make this far effective than any mental health intervention I'm aware of at any price point in any country. Why isn't anyone using this to make a lot of money among rich American patients?)
I think some adjustment is appropriate to account for the fact that people in the US are generally systematically different from people in (say) Uganda in a huge range of ways which might lead to significant variation in the quality of existing care, or the nature of their problems and their susceptibility to treatment. As a general matter I'm not necessarily surprised if SM can relatively easily achieve results that would be exceptional or impossible among very different demographics.
That said, I don't think these kinds of considerations explain a 95% cure rate, I agree that sounds extreme and intuitively implausible.
Thank you. I'm a little ashamed to admit it, but in an earlier draft I was much more explicit about my doubts about the effectiveness of SM's intervention. I got scared because it rested too much on my geneal priors about intervention and I hadn't finished enough of a review of the literature in which to call BS. (Although I was comfortable doing so privately, which I guess tells you that I haven't learned from the FTX debacle)
I also noted the SM partners issue, although I couldn't figure out whether or not it was the case re: costs so I decided to leave it out. I would definitely like to see SM address that concern.
HLI do claim to have seen some private data from SM, so it's unlikely (but plausible) that HLI do have enough confidence, but everyone else is still in the dark.
I'm a researcher at SoGive conducting an independent evaluation of StrongMinds which will be published soon. I think the factual contents of your post here are correct. However, I suspect that after completing the research, I would be willing to defend the inclusion of StrongMinds on the GGWC list, and that the SoGive write-up will probably have a more optimistic tone than your post. Most of our credence comes from the wider academic literature on psychotherapy, rather than direct evidence from StrongMinds (which we agree suffers from problems, as you have outlined).
Regarding HLI's analysis, I think it's a bit confusing to talk about this without going into the details because there are both "estimating the impact" and "reframing how we think about moral-weights" aspects to the research. Ascertaining what the cost and magnitude of therapy's effects are must be considered separately from the "therapy will score well when you use subjective-well-being as the standard by which therapy and cash transfers and malaria nets are graded" issue. As of now I do roughly think that HLI's numbers regarding what the costs and effect sizes of therapy are on patients are in the ri... (read more)
Edit 03-01-23: I have now replied more elaborately here
Hi Simon, thanks for this post! I'm research director at GWWC, and we really appreciate people engaging with our work like this and scrutinising it.
I'm on holiday currently and won't be able to reply much more in the coming few days, but will check this page again next Tuesday at the latest to see if there's anything more I/the GWWC team need to get back on.
For now, I'll just very quickly address your two key claims that GWWC shouldn't have recommended StrongMinds as a top-rated charity and that we should remove it now, both of which I disagree with.
Our process and criteria for making charity recommendations are outlined here. Crucially, note that we generally don't do (and don't have capacity to do) individual charity research: we almost entirely rely on our trusted evaluators - including Founders Pledge - for our recommendations. As a research team, we plan to specialize in providing guidance on which evaluators to rely rather than in doing individual charity evaluation research.
In the case of StrongMinds, they are a top-rated charity primarily because Founders Pledge recommended them to us, as you highlight. Ther... (read more)
I just want to add my support for GWWC here. I strongly support the way they have made decisions on what to list to date:
That said I would love it if going forw... (read more)
I agree, and I'm not advocating removing StrongMinds from the platform, just removing the label "Top-rated". Some examples of charities on the platform which are not top-rated include: GiveDirectly, SCI, Deworm the World, Happier Lives Institute, Fish Welfare Initiative, Rethink Priorities, Clean Air Task Force...
I'm afraid to say I believe you are mistaken here, as I explained in my other comment. The recommendations section clearly includes top-charities recommended by trusted evaluators and explicitly includes StrongMinds. There is also a two... (read more)
Tbh I think this is a bit unfair: his criticism isn't being disregarded at all. He received a substantial reply from FP's research director Matt Lerner - even while he's on holiday - within a day, and Matt seems very happy to discuss this further when he's back to work.
I should also add that almost all of the relevant work is in fact public, incl. the 2019 report and HLI's analysis this year. I don't think what FP has internally is crucial to interpreting Matt's responses.
I do like the forecasting idea though :).
As I tried to clarify above, this is not a case of secret info having much - if any - bearing on a recommendation. As far as I'm aware, nearly all decision-relevant information is and has been available publicly, and where it isn't Matt has already begun clarifying things and has offered to provide more context next week (see discussion between him and Simon above). I certainly can't think of any secret info that is influencing GWWC's decision here.
FWIW my personal forecast wouldn't be very far from the current market forecast (probably closer to 30%), not because I think the current recommendation decision is wrong but for a variety of reasons, incl. StrongMinds' funding gaps being filled to a certain extent by 2025; new data from the abovementioned RCT; the research community finding even better funding opportunities etc.
I'm fine with the wording: it's technically "top-rated charity" currently but both naming and system may change over the coming years, as we'll hopefully be ramping up research efforts.
meta-comment: If you're going to edit a comment, it would be useful to be specific and say how you edited the comment e.g. in this case, I think you changed the word "disregarded" to something weaker on further reflection.
Reading comments from Matt (FP) and Sjir (GWWC), it sounds like the situation is:
FP performed a detailed public evaluation of SM, which they published in 2019.
This was sufficient for FP to recommend giving to SM.
Because FP is one of GWWC's trusted evaluators, this was sufficient for GWWC to designate SM as top rated.
The public FP evaluation is now stale, though FP has additional unpublished information that is sufficient for them to still recommend SM. Due to resource constraints they haven't been able to update their public evaluation.
It's not clear to me what FP should have done differently: resource constraints are hard. The note at the top of the evaluation (which predates this post) is a good start, though it would be better if it included something like "As of fall 2022, we have continued to follow StrongMinds and still recommend them. We are planning a full update before the 2023 giving season."
In the case of GWWC, I think one of the requirements they should have for endorsing recommendations from their trusted evaluators is that they be supported by public evaluations, and that those evaluations be current. I think in this case GWWC would ideally have moved S... (read more)
Thanks Jeff, I think your summary is helpful and broadly correct, except for two (somewhat relevant) details:
I understand the reasons for your suggestion w.r.t. GWWC's inclusion criteria - we've seriously considered doing this before - but I explain at length why I still think we shouldn't under (4) here. Would welcome any further comments if you disagree!
I think another thing I'd add with StrongMinds is I think people are forgetting:
(1) generally cool-sounding charities usually don't work out under more intense scrutiny (lets call this the generalized GiveWellian skeptical prior)
(2) StrongMinds really has not yet received GiveWell-style intense scrutiny
(3) there are additional reasons on priors to be skeptical of StrongMinds given that the effect sizes seem unusually large/cheap compared to the baseline of other mental health interventions (which admittedly are in developed world contexts which is why this is more of a prior than a knockdown argument).
~
Update: Alex Lawsen independently makes a similar argument to me on Twitter. See also Bruce expressing skepticism in the comments here.
Another reason is that Berk Özler had a scathing review of StrongMinds on Twitter (archived, tweets are now deleted).
I had not realized that he was running an RCT on StrongMinds (as mentioned in this post), so possibly had access to insider data on the (lack of) effectiveness.
Here's the full exchange between Özler and Haushofer:
JH: Whenever someone meekly suggests that one might not leave those with the lowest incomes entirely alone with their mental health struggles, the “it’s not that simple” brigade shows up and talks about the therapy-industrial complex and it’s so tiresome.
BO: Thanks, Johannes. That thread & and recommendation is outrageous: there's no good evidence that Strong Minds is effective, let alone most effective. It's 20-year old studies combined with pre-post data provided by SM itself. People should pay no attention to this 🧵, whatsoever.
JH: This dismissal seems much too strong to me. I thought HLI's discussion of the evidence here was fair and reasonable: https://www.happierlivesinstitute.org/report/strongminds-cost-effectiveness-analysis
BO: Show me one good published study of the impact of SM on the ground at some decent scale...
JH: My point is not that SM is the best available intervention. My point is that people who get upset at HLI for caring about wellbeing on the grounds that this ignores structural interventions are mistaken.
BO: I have zero problems with your point. It's well taken & that's why I thanked yo... (read more)
Just to clarify, Berk has deleted his entire Twitter profile rather than these specific tweets. Will be interesting to the results from the upcoming RCT.
I’m belatedly making an overall comment about this post.
I think this was a valuable contribution to the discussion around charity evaluation. We agree that StrongMinds’ figures about their effect on depression are overly optimistic. We erred by not pointing this out in our previous work and not pushing StrongMinds to cite more sensible figures. We have raised this issue with StrongMinds and asked them to clarify which claims are supported by causal evidence.
There are some other issues that Simon raises, like social desirability bias, that I think are potential concerns. The literature we reviewed in our StrongMinds CEA (page 26) doesn’t suggest it’s a large issue, but I only found one study that directly addresses this in a low-income country (Haushofer et al., 2020), so the evidence appears very limited here (but let me know if I’m wrong). I wouldn’t be surprised if more work changed my mind on the extent of this bias. However, I would be very surprised if this alone changed the conclusion of our analysis. As is typically the case, more research is needed.
Having said that, I have a few issues with the post and see it as more of a conversation starter than the end of th... (read more)
Thanks for writing this Simon. I'm always pleased to see people scrutinising StrongMinds because it helps us all to build a better understanding of the most cost-effective ways to address the huge (and severely neglected) burden of disease from mental health conditions.
HLI's researchers are currently enjoying some well-deserved holiday but they'll be back next week and will respond in more detail then. In the meantime, I want to recommend the following resources (and discussion) for people reading this post:
I also want to clarify two things related to the... (read more)
From an outside view, I see Happier Lives Institute as an advocacy organisation for mental health interventions, although I can imagine HLI see themselves as a research organisation working on communicating the effectiveness of mental health interventions. Ultimately, I am not sure there's a lot distinguishing these roles.
Givewell however, is primarily a research and donor advisory organisation. Unlike HLI, it does not favour a particular intervention, or pioneer new metrics in support of said interventions.
Some things that HLI does that makes me think HLI is an advocacy org:
Edit: Fixed acronym in first paragraph
I agree with all of these reasons. My other reasons for being unclear as to the relationship is the (to my eye) cynical timing and aggressive comparisons published annually during peak giving season.
Last year when this happened I thought it was a coincidence, twice is enemy action.
(Edit: I didn't mean to imply that HLI is an "enemy" in some sense, it's just a turn-of-phrase)
Simon,
It's helpful to know why you thought the relationship was unclear.
But I don't think us (HLI) publishing research during the giving season is "cynical timing" any more than you publishing this piece when many people from GWWC, FP, and HLI are on vacation is "cynical timing".
When you're an organization without guaranteed funding, it seems strategic to try to make yourself salient to people when they reach for their pocketbooks. I don't see that as cynical.
FWIW, the explanation is rather mundane: the giving season acts as hard deadline which pushes us to finish our reports.
To be clear, that's what I meant to imply -- I assumed you published this when you had time, not because the guards were asleep.
Everything is compared to StrongMinds because that's what our models currently say is best. When (and I expect it's only a matter of when) something else takes StrongMinds' place, we will compare the charities we review to that one. The point is to frame the charities we review in terms of how they compare to our current best bet. I guess this is an alternative to putting everything in terms of GiveDirectly cash transfers -- which IMO would generate less heat and light.
GW compares everything to GiveDirectly (which isn't considered their best charity). I like that approach because:
I think for HLI (at their current stage) everthing is going to be a moving target (because there's so much uncertainty about the WELLBY effect of every action) but I'd rather have only one moving target rather than two.
I'm seeing a lot of accusations flying around in this thread (e.g. cynical, aggressive, enemy action, secret info etc.). This doesn't strike me as a 'scout mindset' and I was glad to see Bruce's comment that "it's important to recognise that everyone here does share the same overarching goal of "how do we do good better".
HLI has always been transparent about our goals and future plans. The front page of our website seems clear to me:
Our research agenda is also very clear about our priorities:
And our 2022 charity recommendation post makes it clear that we plan to investigate a wider range of interventions and charities in 2023:
... (read more)This is helpful to know how we come across. Id encourage people to disagree or agree with Elliots comment as a straw poll on how readers perceptions of HLI accord with that characterization.
p.s. I think you meant to write “HLI” instead of “FHI”.
I agreed with Elliott's comment, but for a somewhat different reason that I thought might be worth sharing. The "Don’t just give well, give WELLBYs" post gave me a clear feeling that HLI was trying to position itself as the Happiness/Well-Being GiveWell, including by promoting StrongMinds as more effective than programs run by classic GW top charities. A skim of HLI's website gives me the same impression, although somewhat less strongly than that post.
The problem as I see it is that when you set GiveWell up as your comparison point, people are likely to expect a GiveWell-type balance in your presentation (and I think that expectation is generally reasonable). For instance, when GiveWell had deworming programs as a top charity option, it was pretty clear to me within a few minutes of reading their material that the evidence base for this intervention had some issues and its top-charity status was based on a huge potential upside-for-cost. When GiveWell had standout charities, it was very clear that the depth of research and investigation behind those programs was roughly an order of magnitude or so less than for the top charities. Although I didn't read everything on HLI's web... (read more)
To clarify, the bar I am suggesting here is something like: "After engaging with the recommender's donor-facing materials about the recommended charity for 7-10 minutes, most potential donors should have a solid understanding of the quality of evidence and degree of uncertainty behind the recommendation; this will often include at least a brief mention of any major technical issues that might significantly alter the decision of a significant number of donors."
Information in a CEA does not affect my evaluation of this bar very much. For qualify in my mind as "primarily a research and donor advisory organisation" (to use Elliot's terminology), the organization should be communicating balanced information about evidence quality and degree of uncertainty fairly early in the donor-communication process. It's not enough that the underlying information can be found somewhere in 77 pages of the CEAs you linked.
To analogize, if I were looking for information about a prescription drug, and visited a website I thought was patient-advisory rather than advocacy, I would expect to see a fair discussion of major risks and downsides within the first ten minutes of patient-friendly mat... (read more)
I read this comment as implying that HLI's reasoning transparency is currently better than Givewell's, and think that this is both:
False.
Not the sort of thing it is reasonable to bring up before immediately hiding behind "that's just my opinion and I don't want to get into a debate about it here".
I therefore downvoted, as well as disagree voting. I don't think downvotes always need comments, but this one seemed worth explaining as the comment contains several statements people might reasonably disagree with.
TL;DR
I think an outsider may reasonably get the impression that HLI thinks its value is correlated with their ability to showcase the effectiveness of mental health charities, or of WELLBYs as an alternate metric to cause prioritisation. It might also be the case that HLI believes this, based on their published approach, which seems to assume that 1) happiness is what ultimately matters and 2) subjective wellbeing scores are the best way of measuring this. But I don't personally think this is the case - I think the main value of an organisation like HLI is to help the GH research community work out the extent to which SWB scores are valuable in cause prioritisation, and how we best integrate these with existing measures (or indeed, replace them if appropriate). In a world where HLI works out that WELLBYs actually aren't the best way of measuring SWB, or that actually we should weigh DALYs to SWB at a 1:5 ratio or a 4:1 ratio instead of replacing existing measures wholesale or disregarding them entirely, I'd still see these research conclusions as highly valuable (even if the money shifted metric might not be similarly high). And I think these should be possibilities that HLI remain... (read more)
Hi Simon, I'm one of the authors of HLI's cost-effectiveness analysis of psychotherapy and StrongMinds. I'll be able to engage more when I return from vacation next week.
I see why there could be some confusion there. Regarding the two specifications of WELLBYs, the latter was unique to that appendix, and we consider the first specification to be conventional. In an attempt to avoid this confusion, we denoted all the effects as changes in 'SDs' or 'SD-years' of subjective wellbeing / affective mental health in all the reports (1,2,3,4,5) that were direct results in the intervention comparison.
Regarding whether these changes are "meaningful at all”, -- it's unclear what you mean. Which of the following are you concerned with?
- That standard deviation differences (I.e., Cohen’s d or Hedges g effect sizes) are reasonable ways to do meta-analyses?
- Or is your concern more that even if SDs are reasonable for meta-analyses, they aren’t appropriate for comparing the effectiveness of interventions? We flag some possible concerns in Section 7 of the psychotherapy report. But we haven’t found sufficient evidence after several shallow dives to change our minds.
- Or, you may be con
... (read more)Thanks for writing this. I have to admit to confirmation bias here, but SM's effects are so stupidly large that I just don't believe they are possible. I hadn't seen the control group also having a sharp decline but that raises even more alarm bells.
This is also very important for organizations trying to follow SM's footsteps, like the recently incubated Vida Plena.
I anticipate that SM could enter a similar space as deworming now, where the evidence is shaky but the potential impacts are so high and the cost of delivery so low that it might be recommended/worth doing anyway.
Thanks for this Simon! I have an additional concern which it would be interesting to get other people's views on: While I’m sympathetic to the importance of subjective well-being, I have additional concerns about how spillovers are sometimes incorporated into the cost-effectiveness comparisons between Strongminds and Givewell (like in this comparison with deworming). Specifically, I can see plausible cases where Givewell-type improvements in health/income allow an individual to make choices that sacrifice some of their own subjective well-being, in service of their family/relatives. These could include:
- Migrating to a city or urban area for job opportunities. For the migrant, the move may lead to more social isolation and loss of community. But those receiving remittances could benefit substantially.
- Choosing to work in manufacturing rather than e.g. subsistence agriculture, and so having a better security net (for oneself and ones’ family) but sacrificing day-to-day autonomy.
- Similarly, choosing a long commute for a better opportunity
- Any long-term investments in e.g. children’s education, specifically if these investments are ‘lumpy’ (the sacrifice is only
... (read more)I'm also pretty skeptical about the astronomical success rate SM professes, particularly because of some pretty serious methodology issues. Very significant confounding factors due to the recruitment method is, I think, the most important (recruitment from microfinance and employment training programs, to me, means that their sample would be predisposed to having improvements in depression symptoms because of improvement or even the possibility of improvement in material conditions), but the lackluster follow through with control groups and long-term assessment are also significant. I would love for them to have a qualitative study with participants to understand the mechanisms of improvement and what the participants feel has been significant in alleviating their depressive symptoms.
That being said, I think it's worth mentioning that SM is not the first to try this method of treatment, and that there are a considerable amount of studies that have similar results (their methods also leave something to be desired, in my opinion, but not so much so that I think they should be disregarded). Meta-analyses for IPT have found that IPT is effective in treating depression and notewor... (read more)
If SM's intervention is as effective as it reports, then presumably that effect would be demonstrated not only on the PHQ-9 but also on more "objective" measures like double-blinded observer ratings of psychomotor agitation/retardation between treatment and control groups. Although psychomotor effects are only a fairly small part of the disease burden of depression, their improvement or non-improvement vs. controls would update my assessment of the methodological concerns expressed in this post. Same would be true of tests of concentration, etc.
SoGive is working on a review of StrongMinds. Our motivations for working on this included the expectation that the community might benefit from having more in-depth, independent scrutiny on the StrongMinds recommendation -- an expectation which appears to be validated by this post.
I'm sorry we're not in a position to provide substantive comment at this stage -- this is partly because the main staff member working on this is on holiday right now, and also because our work is not finished yet.
We will likely publish more updates within the next 2-3 months.
[This is a more well-thought-out version of the argument I made on Twitter yesterday.]
I think the Phase II numbers were not meant to be interpreted quite that way. For context, this is line chart of scores over time for Phase I, and this is the corresponding chart for Phase II. We can see that in the Phase II chart, the difference between the control and treatment groups is much larger than that in the Phase I chart. Eyeballing, it looks like the difference between the control and treatment groups in Phase II eventually reaches ~10 points, not 4.5.
The quote from the Phase II report in your post says:
What this seems to be saying is they ran a ... (read more)
For anyone who wants to bet on what action will happen here, this market has $90 of liquidity. which is a lot by manifold standards. If you think the market is wrong, correct it and make mana that you can give to charity!
As promised, I am returning here with some more detail. I will break this (very long) comment into sections for the sake of clarity.
My overview of this discussion
It seems clear to me that what is going on here is that there are conflicting interpretations of the evidence on StrongMinds' effectiveness. In particular, the key question here is what our estimate of the effect size of SM's programs should be. There are other uncertainties and disagreements, but in my view, this is the essential crux of the conversation. I will give my own (personal) interpretation below, but I cannot stress enough that the vast majority of the relevant evidence is public—compiled very nicely in HLI's report—and that neither FP's nor GWWC's recommendation hinges on "secret" information. As I indicate below, there are some materials that can't be made public, but they are simply not critical elements of the evaluation, just quotes from private communications and things of that nature.
We are all looking at more or less the same evidence and coming to different conclusions.
I also think there is an important subtext to this conversation, which is the idea that both GWWC and FP should not recommend things for... (read more)
During the re-evaluation, it would be great if FP could also check the partnership programme by StrongMinds - e.g. whether this is an additional source of revenue for them, and what the operational costs of the partners who help treat additional patients for them are. At the moment these costs are not incorporated into HLI's CEA, but partners were responsible for ~50 and ~80% of the clients treated in 2021 and 2022 respectively. For example, if we crudely assume costs of treatment per client are constant regardless of whether it's treated by StrongMinds or by a StrongMinds partner, then:
Starting with 5x GiveDirectly, and using 2021 figures, if >~60% of the observed effect is due to bias it will be <1x GiveDirectly.
Starting with 5x GiveDirectly, and using 2022 figures, if >~0% of the observed effect is due to bias, it will be at <1x GiveDirectly.
(Thanks again for all your work, looking forward to the re-evaluation!)
This post has made me realize that it's pretty hard to quickly find information about recommended charities that includes the number of interventions assessed, the sample size, and a note on the evidence quality, something like this comes from a RCT that was carried out well or this was pre- post- data with no randomization. I'd expect this in a summary or overview type presentation but I'm not sure how valuable this would be for everyone. At least for me personally it is, and it's something that I would use to be more tentative to give or would give less where evidence is limited.
Thanks so much for this
Like I've said before, I really like strong minds, but we need an adequately powered RCT vs. cash. This should be a priority, not just a down-the-line thing to do. That their current RCT doesn't have a purely cash arm is borderline negligence- I could barely believe it when I read the protocol. I wonder how the Strongminds team justified this, especially when the study involves cash anyway.
And the cash transfer should be about as much as the therapy costs (100-150 dollars)
An RCT with both HLI approved subjective wellbeing measures and a couple of other traditional measures would surely answer this question to the level that we would have a very good indication on just how highly to rate strongminds.
I think posts of this kind are incredibly useful, and I'm also impressed by the discussion in the comments. Discussions like this are a key part of what EA is about. I'm curating the post.[1]
Assorted things I appreciated:
Note: I don't want to say that I endorse all of the post's conclusions. I don't think I'm qualified to say that with confidence, and I'm worried that people might defer to me thinking that I am in fact confident.
Personally, I have been confused about how to understand the various reports that were coming out about StrongMinds, and the discussion here (both in the post and in the comments) has helped me with this.
I think the discussion in these comments has been impressively direct, productive, and polite. I've enjoyed following it and want to give props to everyone involved. Ya'll make me proud to be part of this community.
I noticed there's no reference for this quote. Where did you find it? What is the evidence for this claim?
Sean Mayberry, Founder and Executive Director, and Rasa Dawson, Director of Development on May 23, 2019.
From the Givewell discussion linked.
Re:
As anecdotal evidence, I've been tracking my mental health with a similar inventory (the Becket Depression inventory, which is similar but has 21 items rather than 9) for a few years now, and this tracks.
On your comment about what exactly the 0.66 QALY means, there is extensive public discussion about how to assign a QALY weighting to moderate-to-severe depression in the NICE guidance on esketamine
https://www.nice.org.uk/guidance/ta854/history
(Download the 'Committee Papers' published on 28th Jan 2020)
I'm not sure if any of that is helpful, but it might give some useful upper and lower bounds
Thank you! It's 876 pages long - could you provide a page reference too please