Aidan Whitfield🔸

Thank you for the very thoughtful comment! As mentioned in the report, we’ve really appreciated the open and thoughtful way you have engaged with us throughout this evaluation. We look forward to seeing you implement your planned improvements and to re-evaluating HLI’s work in the future!

GWWC’s 2023–2024 impact evaluation (executive summary)

Aidan Whitfield🔸7mo7

Hi Vasco, thanks for your response! Sorry for my delay in getting back to you, I have just got back from leave. I have tried to leave responses to your main points below, but if I have missed anything please let me know.

Re: our marginal multiplier: This is not something we have explicitly tried to model. The most relevant information I can provide here is that our bar for undertaking new pledge acquisition activities is that they must at least exceed 5x in expectation, but this is still not the same as our marginal multiplier for a number of reasons. While we hope to publish an estimate of our average multiplier for 2025 in early 2026, I currently don’t expect to try to explicitly estimate our marginal multiplier. One key reason for this is that, unless we have identified a very scalable method for growing pledges, our marginal multiplier estimate would change quite quickly as we receive more funding and so may only be relevant for a brief period.

Re: how we report the number of pledgers on the website: Thanks for sharing your thoughts here! I continue to think that the statement on our website is accurate and that it isn’t misleading to use the terms ‘community’ and ‘pledging’ here. Simply, these are the numbers of people in our community who have taken pledges with GWWC. I don’t believe the text makes a claim about the number of pledgers who are reporting their donations (which is not a requirement of the pledge) or the number who are fulfilling their pledge (which we don’t have a reliable estimate of). It isn’t clear to me why we should think that the median person who reads the statement would assume that 90% of pledgers are donating.

Re: modelling pledge value for different cohorts: This kind of regression modelling will be something we continue to consider implementing in future evaluations, but currently it isn’t clear enough to me that these models will be better predictors of future cohort pledge donations than the 'average of recent years' method we currently use. The trends to date have simply been too noisy for me to feel confident in any given mathematical model. I also think these models involve some tradeoffs in terms of time investment and legibility and that we also need to factor in these considerations when selecting our approach.

Re: Trial Pledges: We have not estimated the fraction of impact we attribute to the 🔸10% Pledge that was caused by the 🔹Trial Pledge, but I would roughly guess for recent cohorts it is somewhere in the vicinity of 5–20%. It is difficult to come up with a precise estimate because we don’t know how causally responsible the 🔹Trial Pledge is for the 🔸10% Pledge in these cases (as you note).

GWWC’s 2023–2024 impact evaluation (executive summary)

Aidan Whitfield🔸7mo9

Hi Ramiro, thanks for sharing your concerns. In my response to Vasco’s comment, I explain why I don’t think our communications around the number of people who have signed the pledge is misleading. As for whether we take more credit than is due for pledge donations, I want to flag two important ways we try to ensure we aren’t overestimating our impact (among others):

We do not take credit for donations not recorded on the platform unless we have evidence from our surveys that additional pledge donations occurred (see our recording coefficient).
We only take credit for pledge donations that our survey of pledgers indicates we are causally responsible for (see our counterfactuality coefficient). In this impact evaluation, we estimated that we caused 33% of pledge donations.

If this is still your preference, you can resign from your pledge using this form. I hope I have understood your concerns, but please let me know if you have any further questions or concerns.

GWWC’s 2023–2024 impact evaluation (executive summary)

Aidan Whitfield🔸7mo13

Hi Vasco, thanks for your engagement! I have put together some responses to your questions/comments below. Please let me know if I missed anything or you have further questions.

> The Centre for Exploratory Altruism Research (CEARCH) estimated GWWC's marginal multiplier to be 17.6 % (= 2.18*10^6/(12.4*10^6)) of GWWC's multiplier. This suggests GWWC's marginal multiplier from 2023 to 2024 was 1.06 (= 0.176*6), such that donating to GWWC over that period was roughly as cost-effective as to GiveWell's top charities. A marginal multiplier of 1 may look bad, but is actually optimal in the sense GWWC should spend more (less) for a marginal multiplier above (below) 1.

I would actually expect our marginal multiplier to be much closer to our average multiplier than the CEARCH method implies. Most importantly, I expect most of our marginal resources are dedicated to identifying and executing on scalable pledge growth strategies. I think this work, in expectation, provides a pretty strong multiplier. By comparison our average multiplier includes some major fixed costs (e.g., related to running our donation platform).

It's also worth noting that pledge growth accelerated between 2023 and 2024, such that our average multiplier for 2024 was roughly 50% higher than that for 2023. In 2025, pledge growth is currently exceeding 2024 growth (by this time in 2024 we had ~280 new 🔸10% Pledges, so far in 2025 we have ~370 new 🔸10% Pledges), although our costs are also higher.

> So I wonder whether the information below on GWWC's website is somewhat misleading.

I don't think I agree that the information on the website is misleading seeing as it just states the number of people who have taken the pledge. I think it’s important to bear in mind that the pledge has never required that pledgers record their donations with GWWC and we know that many of our most engaged pledgers do not record their donations.

> I guess pledges starting in later years are less valuable, such that you are overestimating your impact by not controlling for the year the pledge started.

The regression you suggest is something we have considered, but don’t think it is an obvious improvement over our approach of taking the mean over the most recent pledge years. While there might be an effect of the year the pledge started on average first-year pledge donations, we do not think this trend is linear. For instance, the 2021 cohort had the second highest average first-year donations across all cohorts and the five cohorts with the lowest average first-year donations were 2010, 2017, 2018, 2016 and 2012. Ultimately, this is an empirical question and my prediction is that our average method will be more predictive of the first year of pledge donations for the 2024 cohort than the regression. If you are interested in performing this analysis yourself, you can find total inflation-adjusted pledge donations by pledge cohort and year of pledge in a table in this document.

> Have you considered retiring The Trial Pledge? You estimated 96 % of your impact came from The 10 % Pledge.

We currently aren’t considering retiring the 🔹Trial Pledge. While in terms of direct donation value the 🔹Trial Pledge contributes a relatively small fraction of our pledge impact, we believe the main value add of the 🔹Trial Pledge comes from 🔹Trial pledgers ‘upgrading’ to 🔸10% Pledges. For example, roughly 10% of those who have taken a 🔹Trial Pledge are now 🔸10% Pledges and we are currently exploring ways to improve conversion rates even more. We have also seen some evidence that retention may be stronger for 🔸10% Pledges that follow 🔹Trial Pledges than for other 🔸10% Pledges.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸1y3

Hi Rosie, thanks for sharing your thoughts on this! It’s great to get the chance to clarify our decision-making process so it’s more transparent, in particular so readers can make their own judgement as to whether or not they agree with our reasoning about FP GHDF. Some one my thoughts on each of the points you raise:

We agree there is a positive track record for some of FP GHDF’s grants and this is one of the key countervailing considerations against our decision not to rely on FP GHDF in the report. Ultimately, we concluded that the instances of ‘hits’ we were aware of were not sufficient to conclude that we should rely on FP GHDF into the future. Some of our key reasons for this included:
1. These ‘hits’ seemed to fall into clusters for which we expect there is a limited supply of opportunities, e.g., several that went on to be supported by GiveWell were AIM-incubated charities. This means, we expect these opportunities to be less likely to be the kinds of opportunities that FP GHDF would fund on the margin with additional funding
2. We were not convinced that these successes would be replicated in the future under the new senior researcher (see our crux relating to consistency of the fund).
Ultimately, what we are trying to do is establish where the next dollar can be best spent by a donor. We agree it might not be worth it for a researcher to spend as much time on small grants, but this by itself should not be a justification for us to recommend small grants over large ones (agree point 3 can be a relevant consideration here though).
We agree that the relative value donors place on supporting early stage and riskier opportunities compared to more established orgs could be a crux here. However, we still needed a bar against which we could assess FP GHDF (i.e., we couldn’t have justifiably relied on FP GHDF on the basis of this difference in worldview, independent of the quality of FP GHDF’s grantmaking). As such, we tried to assess whether FP GHDF grant evaluations convincingly demonstrated that opportunities met their self-stated bar. As we have acknowledged in the report, just because we don’t think the grant evaluations convincingly show opportunities meet the bar, doesn’t mean they really don’t (e.g., the researcher may have considered information not included in the grant evaluation report). However, we can only assess on the basis of the information we reviewed.
Regarding our focus on the BOTECs potentially being misplaced, I want to be clear that we did review all of these grant evaluations in full, not just the BOTECs. If we thought the issues we identified in the BOTECs were sufficiently compensated for by reasoning included in the grant evaluations more generally this would have played a part in our decision-making. I think assessing how well the BOTECs demonstrate opportunities surpass Founders Pledge’s stated bar was a reasonable evaluation strategy because: a) As mentioned above, these BOTECs were highly decision relevant — grants were only made if BOTECs showed opportunities to surpass 10x GiveDirectly and we know of no instances where an opportunity scored above 10x GiveDirectly and would not have been eligible for FP GHDF funding. b) The BOTECs are where many of the researcher’s judgements are made explicit and so can be assessed. At least for the three evaluations we reviewed in detail, a significant fraction of the work in the grant evaluation was justifying inputs to the BOTECs. On the other point raised here, it is true that not all of the concerns we had with the BOTECs were errors. Some of our concerns related to inputs that seemed (to us) optimistic and were, in our view, insufficiently justified considering the decision-relevant effect they had on the overall BOTEC. While not errors, these made it more difficult for us to justifiably conclude that the FP GHDF grants were in expectation competitive with GiveWell.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸1y3

Thanks for the comment! While we think it could be correct that the quality of evaluations differs between our recommendations in different cause areas, my view is that the evaluating evaluators project applies pressure to increase the strength of evaluations across all cause areas. In our evaluations we communicate areas where we think evaluators can improve. Because we are evaluating multiple options in each cause area, if in future evaluations we find one of our evaluators has improved and another has not, then the latter evaluator is less likely to be recommended in future, which provides an incentive for both evaluators to improve their processes over time.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸1y3

Thanks for your comment, Huw! I think Michael has done a great job explaining GWWC’s position on this, but please let us know if we can offer any clarifications.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸1y3

Thanks so much for your comment, Karolina! We are looking forward to re-evaluating AWF next year.

GWWC's 2024 evaluations of evaluators

Aidan Whitfield🔸1y5

Thanks for the comment, Matt! We are very grateful for the transparent and constructive engagement we have received from you and Rosie throughout our evaluation process.

I did want to flag that you are correct in anticipating that we do not agree that with the three differences in perspectives that you note here nor do we think our approach implies we do agree:

1) We do not think a grant can only be identified as cost-effective in expectation if a lot of time is spent making an unbiased, precise estimate of cost-effectiveness. As mentioned in the report, we think a rougher approach to BOTECing intended to demonstrate opportunities meet a certain bar under conservative assumptions is consistent with a GWWC recommendation. When comparing the depth of GiveWell’s and FP’s BOTECs we explicitly address this:

[This difference] is also consistent with FP’s stated approach to creating conservative BOTECs with the minimum function of demonstrating opportunities to be robustly 10x GiveDirectly cost-effectiveness. As such, this did not negatively affect our view of the usefulness of FPs BOTECs for their evaluations.

Our concern is that, based on our three spot checks, it is not clear that FP GHDF BOTECs do demonstrate that marginal grants in expectation surpass 10x GiveDirectly under conservative assumptions.

2) We would not claim that CEAs should be the singular determinant of whether a grant is made. However, considering that CEAs seem decisive in GHDF grant decisions (in that grants are only made from the GHDF when they are shown by BOTEC to be >10x GiveDirectly in expectation), we think it is fair to assess these as important decision-making tools for the FP GHDF as we have done here.

3) We do not think maximising calculated EV in the case of each grant is the only way to maximise cost-effectiveness over the span of a grantmaking program. We agree some risk-neutral grantmaking strategies should be tolerant to some errors and ‘misses’, which is why we checked three grants, rather than only checking one. Even after finding issues in the first grant we were still open to relying on FP GHDF if these seemed likely to be only occurring to a limited extent, but in our view their frequency across the three grants we checked was too high to currently justify a recommendation.

I hope these clarifications make it clear that we do think evaluators other than GiveWell (including FP GHDF) could pass our bar, without requiring GiveWell levels of certainty about individual grants.

Aidan Whitfield🔸

Posts 5

Comments19

Posts
5

Comments
19