We would like to extend our gratitude to Giving What We Can (GWWC) for conducting the "Evaluating the Evaluators" exercise for a second consecutive year. We value the constructive dialogue with GWWC and their insights into our work. While we are disappointed that GWWC has decided not to defer to our charity recommendations this year, we are thrilled that they have recognized our Movement Grants program as an effective giving opportunity alongside the EA Animal Welfare Fund.
Movement Grants
After reflecting on GWWC’s 2023 evaluation of our Movement Grants (MG) program we made several adjustments, all of which are noted in GWWC’s 2024 report. We’re delighted to see that the refinements we made to our program this year have led to grantmaking decisions that meet GWWC’s bar for marginal cost-effectiveness and that they will recommend our MG program on their platform and allocate half of their Effective Animal Advocacy Fund to Movement Grants.
As noted by GWWC, ACE’s MG program is unique in its aims to fund underserved segments of the global animal advocacy movement and address two key limitations to effectiveness within the movement:
- Limited evidence about which interventions are effective and in which contexts
- Disproportionate attention devoted to some regions and animal groups
For impact-focused donors seeking opportunities to build an evidence-based and resilient global animal advocacy movement, our Movement Grants program is an effective giving opportunity which supports brilliant animal advocates all over the world.
Alongside their recommendation of our MG program, GWWC has outlined several areas for improvement that we are grateful for and will reflect on.
We agree with these suggestions by GWWC:
- Improving the MG model to reflect our movement-building strategy—we have already started to revise our theory of change for the MG program to explicitly separate the pathways to impact for grants that are made primarily on the basis of movement-building and which are made on a more ‘direct impact’ basis. This will help us better account for movement building in our model and make future assessments about whether we want to maintain this approach to our grantmaking.
- Revising our use of Impact Potential (IP) scores in the MG model—the IP scores played a minimal role in our final grant decisions this year. Before the GWWC evaluation, we had already decided to revise our use of these scores because they do not model scope in a sufficiently useful way and they run the risk of combining parameters in our model in such a way that clouds the decision-relevant information.
- Better integrating scope comparisons—while our current MG model has scope baked into some of the factors (e.g. theory of change, long-term impact), we agree with GWWC that a more robust approach would be to make scope a factor that our grant reviewers score separately. We intend to make this adjustment in our next round of Movement Grants and we appreciate the specific logistical suggestions from GWWC on how we might do this.
Improving the documentation of our reasoning for making grant decisions—this is mainly related to our internal processes that don’t have any bearing on our grant decisions; however, we agree with GWWC that despite our diligent record keeping for how our thinking evolves through the grant review process, we need a better record that summarizes, in one place, the main rationale and cruxes for each grant decision. This is something we intend to implement in our next granting round.
We also want to note the following challenge:
GWWC recommends that we introduce a clearer framework for prioritizing between interventions. We agree with this recommendation—of the possible interventions available to help animals, some are already excluded from applying or rejected at an early stage from our grantmaking based on the scope of impact. However, while we intend to make further improvements in scope comparison between interventions, there remain challenges due to the many externalities that affect intervention effectiveness and our ability to estimate them. GWWC notes in the report that we appear resistant to doing this because it would be unhelpfully speculative. We want to clarify that we are willing to make speculations where we think they will be useful while highlighting the challenges. There is a difference between the more speculative forward-looking cost-effectiveness analyses (CEAs) we would be undertaking as a grantmaker compared with the CEAs the Charity Evaluations team undertakes on completed work. To overcome this, we may also consider comparing the known cost-effectiveness of the most similar organizations’ previous work or leveraging CEAs that have used the total available information on an intervention (e.g. this estimate). However, we remain cautious about spending time and resources on trying to find comparable cost estimates between interventions when doing so might not sufficiently increase the overall marginal cost-effectiveness of our grant decisions. We also want to note that this is a challenge for any animal advocacy funder, not just ACE. This is an area where we expect to continue to try different approaches and improve year-on-year, balancing available information and our team’s capacity.
We are grateful for the rare opportunity to reflect deeply on our work and to learn from GWWC’s perspectives so that we can award grants that are the most impactful for animals. We are especially thankful too for the larger GWWC and EA community that is willing to support highly promising projects to help some of the most neglected individuals who suffer greatly.
Charity Evaluations
On the other hand, we are disappointed that GWWC does not find our Charity Evaluations program justifiably competitive with MG (and the EA Animal Welfare Fund) and believe that donors might miss out on some of the most impactful donation opportunities because of GWWC’s decision. We will elaborate on the relationship between our Charity Evaluations and Movement Grants below, but first address some points specific to Charity Evaluations.
We agree with some of GWWC’s conclusions and suggestions for improvement, which appear in their 2024 report. We think that focusing on these will improve the quality of our recommendations moving forward:
- Devote more resources to quantitatively modeling cost-effectiveness, including better defining upper and lower bounds and exploring new ways to assess speculative programs (such as using a mix of qualitative and quantitative arguments).
- Explicitly assess charities’ strategic prioritization, such as their own focus on cost-effectiveness. While we develop a sense of this internally as we evaluate charities, we think it could be helpful to take it into account more systematically.
- Explore other methods for more comprehensively capturing the magnitude of impact and limiting factors. This might mean adapting the theory of change analysis to include systematically assessing the weakest-seeming and most fundamental assumptions.
- Reconsider the minimum and maximum sizes of grants we’re willing to make through the Recommended Charity Fund (RCF), for example by potentially reconsidering the safeguards in the disbursement model that lead to more consistent grant amounts over time.
However, there are also areas where we disagree with GWWC’s conclusions. While we acknowledge these parts of our methods have room for improvement, we think the changes that they suggest may not make a meaningful difference to the quality of our recommendations:
- Include all decision-relevant factors in the published charity reviews. GWWC has highlighted their concerns based on our public reviews, such as recommendation decisions that seemed to insufficiently take into account scope (e.g., comparing ÇHKD and Sinergia’s cost-effectiveness results, arguments for longer-term paths to impact, etc.). While we agree that it’d be ideal to more clearly publicly demonstrate and formalize cross-charity comparisons and the other thinking that leads to our decisions, we also want to note that we already consider much of what GWWC suggests in our internal decision-making. We’ve made significant adjustments to our charity reviews in the past year and will continue to refine them to ensure they are transparent and informative while remaining accessible and encouraging effective giving. We think it’s possible that the Charity Evaluations program may have been disadvantaged in this evaluation by our in-depth charity reviews since GWWC’s comments seem to be informed more by what we published rather than the quality of our internal decisions.
- Update the decision-making process so that it directly compares all recommended charities on marginal cost-effectiveness. Our basis for deciding whether to add a Recommended Charity is whether we think it would lead to more animals being helped on the margin (compared to having a smaller number of Recommended Charities), which is conceptually different from ranking charities. Given the types of uncertainty currently faced by the animal advocacy movement when it comes to calculating cost-effectiveness, we decide whether a charity should be recommended based on a range of decision criteria rather than scoring and ranking charities based on our sense of their relative marginal cost-effectiveness. In the future, if we had sufficiently robust evidence to form reliable cost-effectiveness estimates, including evidence or good proxies for speculative work with complex long-term theories of change, it’s possible we would move more toward the kind of ranking approach that GWWC suggests. Additionally, we consider relative cost-effectiveness during each Recommended Charity Fund distribution, where we adjust the size of each grant depending on the most up-to-date plans that charities share with us.
- Increase emphasis on the marginal dollar. We think our current methods adequately track where influenced funding would go. With the exception of granting restricted funding, we already implement GWWC’s suggestions. For example, we consider how charities’ programs are likely to change with gaining or losing an ACE recommendation and we also take into account charities’ strategic prioritization (although we don’t assess it systematically, as noted above, and some of this information is kept confidential so it doesn’t appear in our charity reviews). Additionally, this consideration only comes into play relatively infrequently; in most cases, ACE-influenced funding does not go toward novel programs but toward expansions of current programs, investments in the charity as a whole, or funding gaps.
- Give more consideration to cost-effectiveness analyses (CEAs) in decisions not to recommend charities. Similar to GWWC’s evaluations of evaluators, ACE looks for evidence to justifiably conclude that a charity should be recommended, and we do not recommend charities in the absence of that evidence. When we say that a CEA did not play a meaningful role in our decision not to recommend a charity, we mean that it was insufficient to justify a recommendation. From that perspective, CEAs play just as much of a role to recommend as to not recommend a particular charity. While there are improvements we can make to our CEAs so that they are more indicative of cost-effectiveness (or lack thereof), we don’t see this as an indicator of a lack of scope sensitivity.
GWWC also suggests some broader strategic shifts in our programs that we plan to consider as a part of upcoming strategic planning. While these changes would help align our Charity Evaluations program more closely with GWWC’s criteria, we’re currently unsure if they would do the most good for the animal advocacy movement and for animals. These include:
- Making restricted recommendations to specific programs, granting restricted funding, and removing the constraint that the fund grants to all current Recommended Charities each round. We need to carefully consider any trade-offs between the benefits of this idea and downstream considerations such as the needs of our donor base and losing the benefits of allowing Recommended Charities the flexibility to pursue what they need the most to maximize their impact. Also, restricting funding likely introduces some degree of funging, reducing any cost-effectiveness gains.
- Allowing Recommended Charities to apply for Movement Grants. Theoretically, this would mean Recommended Charities could apply for targeted Movement Grants funding for their marginal programs (which we don’t currently permit). However, this might risk negatively impacting the movement-building goals of the Movement Grants program.
- Changing the evaluation cycle from two years to one year. In theory, this would allow for a more precise assessment of charities’ upcoming programs, which is why ACE used a one-year cycle until 2021. However, we changed to a two-year cycle largely based on feedback from the charities we evaluated that annual evaluations were overly demanding/time-consuming. We’ll carefully consider the balance between timely evaluations, the burden on charities that we evaluate, and our team capacity (for comparison, ACE has an evaluations team of 5.5 FTE compared to GiveWell’s ~39 researchers).
Moving forward, we will continue evolving the Charity Evaluations program to find the organizations that can do the most good with additional donations and we thank GWWC for critically engaging with our work. We also appreciate that they acknowledge the difficulties of our work and the inherent differences between evaluators and funders in the animal advocacy space. However, given those difficulties, we’re currently not sure whether ACE nor any other evaluator that recommends whole charities in this space would be seen by GWWC as competitive with charitable funds that give restricted grants. Because of this, we’re not sure whether we think that GWWC’s current approach leads to the best outcomes for animals.
How ACE views Movement Grants vs. Charity Recommendations
We want to acknowledge that the language GWWC uses implies that they see our Movement Grants (MGs) and Charity Evaluations programs as “competitive” with each other. This is not a view we share—we see them as complementary.
Although there’s some overlap between charities that are a fit for each program, they serve different purposes:
- The Recommended Charity Fund (RCF)
- provides high-impact evidence-based donation opportunities, and
- ensures critical organizations can do their underfunded and neglected work.
- The Movement Grants program
- funds highly promising projects,
- is more likely to fund hits-based opportunities, and
- focuses on movement-building.
The programs are complementary and supplement each other:
- MGs support charities that may go on to become Recommended Charities (RCs)—this is evidenced by the fact that the majority of our current RCs are former MG recipients.
- Some charities are ineligible for our Charity Evaluations because RCs need to be able to absorb more funding and have more of a track record, and ACE needs to have confidence in them over a longer time horizon (the entire two-year recommendation cycle). This means that the RCF misses some of the highest-impact funding opportunities because it doesn't fund independent projects, brand-new charities, or only specific projects at a charity that might do a mixture of more and less cost-effective work.
- We don’t give MGs to RCs for the duration of their recommendation, so the MG program also misses some of the highest-impact funding opportunities.
- MGs can support former RCs if they re-enter an exploratory phase and need project- or program-specific funding to test out their ideas.
They also serve different donors:
- In our view neither clearly has a higher expected value than the other. However, since the MG program funds projects with a smaller track record or projects that can have a longer-term impact in building the movement, which leads to more uncertainty and variance in outcomes, it is more suited to donors with a higher risk tolerance. On the other hand, the RCF includes more established, highly cost-effective organizations with a strong track record and a clear path to impact and appeals to more risk-averse donors.
- Funds work well for individual donors but some users of ACE’s and GWWC’s services, such as certain Effective Giving Initiatives and other third parties, cannot collect donations for regranting funds. They must either set up their own fund composed of established, well-researched charities that can take in large amounts of unrestricted additional donations, and/or direct donations to specific charities.
- The effective animal advocacy space is still deeply neglected and mostly funded by existing animal advocates. Therefore, we see ACE’s unique role as critical in securing counterfactually new donations to high-impact charities. We aim to engage audiences who are currently unfamiliar with both effective giving and EAA, and who might never change their individual behavior (for instance by changing their diet). To convert them to support farmed and wild animal welfare requires more accessible and simple solutions. Individual Recommended Charities are more intuitive and easy to understand and may be more appealing to this audience.
We are proud to support all of our current Recommended Charities and Movement Grantees, and would like to take this opportunity to celebrate the impactful work they do to help make the world a kinder place for animals.
- ACE Team
FP Research Director here.
I think Aidan and the GWWC team did a very thorough job on their evaluation, and in some respects I think the report serves a valuable function in pushing us towards various kinds of process improvements.
I also understand why GWWC came to the decision they did: to not recommend GHDF as competitive with GiveWell. But I'm also skeptical that any organization other than GiveWell could pass this bar in GHD, since it seems that in the context of the evaluation GiveWell constitutes not just a benchmark for point-estimate CEAs but also a benchmark for various kinds of evaluation practices and levels of certainty.
I think this comes through in three key differences in perspective:
My claim is that, although I'm fairly sure sure GWWC would not explicitly say "yes" to each of these questions, the implication of their approach suggests otherwise. FP, meanwhile, thinks the answer to each is clearly "no." I should say that GWWC has been quite open in saying that they think GHDF could pass the bar or might even pass it today — but I share other commenters' skepticism that this could be true by GWWC's lights in the context of the report! Obviously, though, we at FP think the GHDF is >10x.
The GHDF is risk-neutral. Consequently, we think that spending time reducing uncertainty about small grants is not worthwhile: it trades off against time that could be spent evaluating and making more plausibly high-EV grants. As Rosie notes in her comment, a principal function of the GHDF has been to provide urgent stopgap funding to organizations that quite often end up actually receiving funding from GW. Spending GW-tier effort getting more certain about $50k-$200k grants literally means that we don't spend that time evaluating new high-EV opportunities. If these organizations die or fail to grow quickly, we miss out on potentially huge upside of the kind that we see in other orgs of which FP has been an early supporter. Rosie lists several such organizations in her comment.
The time and effort that we don't spend matching GiveWell's time expenditure results in higher variance around our EV estimates, and one component of that variance is indeed human error. We should reduce that error rate — but the existence of mistakes isn't prima facie evidence of lack of rigor. In our view, the rigor lies in optimizing our processes to maximize EV over the long-term. This is why we have, for instance, guidelines for time expenditure based on the counterfactual value of researcher time. This programme entails some tolerance for error. I don't think this is special pleading: you can look at GHDF's list of grantees and find a good number that we identified as cost-effective before having that analysis corroborated by later analysis from GiveWell or other donors. This historical giving record, in combination with GWWC's analysis, is what I think prospective GHDF donors should use to decide whether or not to give to the Fund.
Finally - a common (and IMO reasonable) criticism of EA-aligned or EA-adjacent organizations is an undue focus on quantification: "looking under the lamppost." We want to avoid this without becoming detached from the base numeric truth, so one particular way we want to avoid it is by allowing difficult-to-quantify considerations to tilt us toward or away from a prospective grant. We do CEAs in nearly every case, but for the GHDF they serve an indicative purpose (as they often do at e.g. Open Phil) rather than a determinative one (as they often do at e.g. GiveWell). Non-quantitative considerations are elaborated and assessed in our internal recommendation template, which GWWC had access to but which I feel they somewhat underweighted in their analysis. These kinds of considerations find their way into our CEAs as well, particularly in the form of subjective inputs that GWWC, for their part, found unjustified.
[highly speculative]
It seems plausible to me that the existence of higher degrees of random error could inflate a more error-tolerant evaluator's CEAs for funded grants as a class. Someone could probably quantify that intuition a whole lot better, but here's one thought experiment:
Suppose ResourceHeavy and QuickMover [which are not intended to be GiveWell and FP!] are evaluating a pool of 100 grant opportunities and have room to fund 16 of them. Each has a policy of selecting the grants that score highest on cost-effectiveness. ResourceHeavy spends a ton of resources and determines the precise cost-effectiveness of each grant opportunity. To keep the hypo simple, let's suppose that all 100 have a true cost effectiveness of 10.00-10.09 Units, and ResourceHeavy nails it on each candidate. QuickMover's results, in contrast, include a normally-distributed error with a mean of 0 and a standard deviation of 3.
In this hypothetical, QuickMover is the more efficient operator because the underlying opportunities were ~indistinguishable anyway. However, QuickMover will erroneously claim that its selected projects have a cost-effectiveness of ~13+ Units because it unknowingly selected the 16 projects with the highest positive error terms (i.e., those with an error of +1 SD or above). Moreover, the random distribution of error determined which grants got funded and which did not -- which is OK here since all candidates were ~indistinguishable but will be problematic in real-world situations.
While the hypo is unrealistic in some ways, it seems that given a significant error term, which grants clear a 10-Unit bar may be strongly influenced by random error, and that might undermine confidence in QuickMover's selections. Moreover, significant error could result in inflated CEAs on funded grants as a class (as opposed to all evaluated grants as a class) because the error is a in some ways a one-way rachet -- grants with significant negative error terms generally don't get funded.
I'm sure someone with better quant skills than I could emulate a grant pool with variable cost-effectiveness in addition to a variable error term. And maybe these kinds of issues, even if they exist outside of thought experiments, could be too small in practice to matter much?
It's definitely true that all else equal, uncertainty inflates CEAs of funded grants, for the reasons you identify. (This is an example of the optimizer's curse.) However:
Thanks for the comment, Matt! We are very grateful for the transparent and constructive engagement we have received from you and Rosie throughout our evaluation process.
I did want to flag that you are correct in anticipating that we do not agree that with the three differences in perspectives that you note here nor do we think our approach implies we do agree:
1) We do not think a grant can only be identified as cost-effective in expectation if a lot of time is spent making an unbiased, precise estimate of cost-effectiveness. As mentioned in the report, we think a rougher approach to BOTECing intended to demonstrate opportunities meet a certain bar under conservative assumptions is consistent with a GWWC recommendation. When comparing the depth of GiveWell’s and FP’s BOTECs we explicitly address this:
[This difference] is also consistent with FP’s stated approach to creating conservative BOTECs with the minimum function of demonstrating opportunities to be robustly 10x GiveDirectly cost-effectiveness. As such, this did not negatively affect our view of the usefulness of FPs BOTECs for their evaluations.
Our concern is that, based on our three spot checks, it is not clear that FP GHDF BOTECs do demonstrate that marginal grants in expectation surpass 10x GiveDirectly under conservative assumptions.
2) We would not claim that CEAs should be the singular determinant of whether a grant is made. However, considering that CEAs seem decisive in GHDF grant decisions (in that grants are only made from the GHDF when they are shown by BOTEC to be >10x GiveDirectly in expectation), we think it is fair to assess these as important decision-making tools for the FP GHDF as we have done here.
3) We do not think maximising calculated EV in the case of each grant is the only way to maximise cost-effectiveness over the span of a grantmaking program. We agree some risk-neutral grantmaking strategies should be tolerant to some errors and ‘misses’, which is why we checked three grants, rather than only checking one. Even after finding issues in the first grant we were still open to relying on FP GHDF if these seemed likely to be only occurring to a limited extent, but in our view their frequency across the three grants we checked was too high to currently justify a recommendation.
I hope these clarifications make it clear that we do think evaluators other than GiveWell (including FP GHDF) could pass our bar, without requiring GiveWell levels of certainty about individual grants.
Hey Aidan,
I want to acknowledge my potential biases for any new comment thread readers (I used to be the senior researcher running the fund at FP, most or all of the errors highlighted in the report are mine, and I now work at GiveWell.) these are personal views.
I think getting further scrutiny and engagement into key grantmaking cruxes is really valuable. I also think this discussion this has prompted is cool. A few points from my perspective-
As Matt’s comment points out- there is a historical track record for many of these grants. Some have gone on the be GiveWell supported, or (imo) have otherwise demonstrated success in a way that suggests they were a ‘hit’. In fact, with the caveat that there are a good number of recent ones where it’s too early to tell, there hasn’t yet been one that I consider a ‘miss’. Is it correct to update primarily from 3 spot checks of early stage BOTECs (my read of this report) versus updating from what actually happened after the grant was made? Is this risking goodharting?
Is this really comparing like for like? In my view, small grants shouldn’t require as strong an evidence base as like, a multimillion grant, mainly due to the time expenditure reasons that Matt points out. I am concerned about whether this report is getting us further to a point where (due to the level of rigour and therefore time expenditure required) the incentives for grantmaking orgs are to only make really large grants. I think this systematically disadvantages smaller orgs, and I think this is a negative thing (I guess your view here partially depends on your view on point ‘3’ below.)
In my view, a really crucial crux here is really about the value of supporting early stage stuff, alongside other potentially riskier items, such as advocacy and giving multipliers. I am genuinely uncertain, and think that smart and reasonable people can disagree here. But I agree with Matt’s point- that there’s significant upside through potentially generating large future room for funding at high cost effectiveness. This kind of long term optionality benefit isn’t typically included in an early stage BOTEC (because doing a full VOI is time consuming) and I think it’s somewhat underweighted in this report.
I no longer have access to the BOTECs to check (since I’m no longer at FP) and again I think the focus on BOTECs is a bit misplaced. I do want to briefly acknowledge though that I’m not sure that all of these are actually errors (but I still think it’s true that there are likely some BOTEC errors, and I think this would be true for many/ most orgs making small grants).
Hi Rosie, thanks for sharing your thoughts on this! It’s great to get the chance to clarify our decision-making process so it’s more transparent, in particular so readers can make their own judgement as to whether or not they agree with our reasoning about FP GHDF. Some one my thoughts on each of the points you raise: