The problem
In the 1990s, the World Health Organization (WHO) had an important function. They had to calculate, estimate, and publish the number of deaths caused by different diseases. These numbers influenced several things, from government spending on treatment programs, to the public perception of progress being made on different issues. However, even though the people doing the calculations were well-meaning and generally competent, there was a big problem. There was no oversight and the process lacked consistency, meaning that each WHO group used different methods, calculations, and assumptions. This resulted in estimates double- and triple-counting a single death.
This mis-estimation was potentially fatal, with funding and intellectual resources being devoted towards certain diseases over other, more important areas. A concerned staff member at the WHO noticed the problem after discovering that by adding up the four biggest killers (malaria, diarrhea, TB, and measles) in a lower income country, this added up to more than 100% of the total number of deaths in that country, and that was not even counting all other causes of death. When the employee brought up the concern with coworkers and management, it was largely dismissed. It would have looked bad, both for the individual groups and the WHO as a whole, to admit or address such a large mistake. Even after the staff member triple-checked his work and strengthened it through deeper research, it went unheard. The unspoken rule was: don’t embarrass the higher-ups.
The end line result was the founding of a completely new project outside of the WHO- the Global Burden of Disease Study- which measured impact correctly, did not double or triple count deaths, and which is, in fact, used to this day by groups like GiveWell, the Gates Foundation, and many others.
This is a true story, paraphrased from Epic Measures, and it highlights one of my biggest concerns about the EA movement. Trying to calculate counterfactual impact is a very hard task and much like with the WHO numbers, not only is each EA organization using different systems, they each have an incentive to publish high impact results. With impact, the calculations are even harder to do correctly than in the case of deaths, as it is often plausible that five different people or organizations were required for an action to happen. Sadly, if each of the five take 100% credit, you will end up with the EA movement as a whole taking 500% credit for a given action.
This can also happen with donations. It would be very easy for an EA to find out about EA from Charity Science, to read blog posts from both GWWC and TLYCS, sign up for both pledges, and then donate directly to GiveWell (who would count this impact again). This person would become quadruple counted in EA, with each organization using their donations as impact to justify their running. The problem is that, at the end of the day, if the person donated $1000, TLYCS, GWWC, GiveWell, and Charity Science may each have spent $500 on programs for getting this person into the movement/donating. Each organization would proudly report they have 2:1 ratios and give themselves a pat on the back, when really the EA movement as a whole just spent $2000 for $1000 worth of donations.
The previous example used donations because it’s easy and clear cut to make the case that this is the wrong move without getting into more difficult issues, but it generalizes to talent as well. For example, recently, Fortify Health was founded. Clearly the founders deserve 100% impact- without them, the project certainly would not have happened. But wait a second: both of them think that without Charity Science’s support, the project would definitely not have happened. So, technically, Charity Science could also take 100% credit. (Since from our perspective, if we did not help Fortify Health it would not have happened, so it is a 100% counterfactually caused by Charity Science project). But wait a second, what about the donors who funded the project early on (because of Charity Science’s recommendation)? Surely they deserve some credit for impact as well! What about the fact that without the EA movement, it would have been much less likely for Charity Science and Fortify Health to connect?
With multiple organizations and individuals, you can very easily attribute a lot more impact than actually happens. A project’s evaluation could easily create the perception of x4 the impact it really had. This is even more likely, if it's unclear where people are taking their credit for impact from (e.g. I might publish a report on Charity Science's overall impact with “supporting new charities” impact listed, but not specify on the exact help I gave or how many others were involved). This is not even talking about deliberate rounding or naive overestimation of the value of that project.
Sadly, all these issues occur even with everyone trying to be as honest and careful as they can be. To jump back to the financial example, you can imagine Charity Science, GWWC and TLYCS not knowing exactly how much the person who donates $1000 is actually donating, leading to different and often over optimistic estimates across the organization.
The solutions
Sadly, I cannot think of a silver bullet solution. Thankfully, though, I think there are some things that can really help.
Transparent sharing of data regarding impact and the methodology to calculate impact
Mistakes like this are much more likely to happen the less clear and transparent the causal chain of impact is. Many organizations have internal counterfactual calculations, but it’s hard for donors or other organizations to make sense of the end line data without knowing how it was, in fact, estimated. Obviously, not all data is going to be shareable (e.g. the names of the people donating). However, the process for calculating impact can be shared and compared, which, in turn, can allow for an open discussion of these issues (e.g. how to disaggregate the impact of two organizations taking similar actions.) It also gives the community a chance to sanity check each other’s numbers. If Charity Science was massively over-estimating something relative to external observers, it would be hard for them to point out this flaw without a high level of transparency.
Efforts towards a consistent evaluation process between organizations
The more similar the process that is used between organizations, the easier it would be to take seriously the end line numbers. Something like this could be coordinated on the EA Forum and could clear up a lot of confusion regarding impact evaluation. (For example, if I hire someone to Charity Science, does that count as a career change?) I think that current organizations have very different intuitions and processes, and thus, end line numbers. I also think that to increase consistency, donors should insist upon seeing the data before donating to an organization.
Independent unbiased external impact analysis
The solution to the WHO problem was not just more interdepartmental coordination and transparency. It was, in fact, independent external analysis. Although I think this is “the solution”, it's easily the hardest to execute well. The results from something like this would a) be very sensitive to the evaluators’ values (e.g. if they valued one cause a lot more than another, it would be hard to generalize), b) be very time consuming (I expect it would take many hours to get a strong understanding of all the aspects of an organization; likely months to years of full time work), c) would require a fairly unprecedented level of transparency in the charity world.
Things like this can happen. I think GiveWell’s external reviewing of poverty charities is a good example of something pretty close to the ideal, and I think it would allow for much stronger evaluation and accountability when considering and comparing the impacts of different organizations.
Here are my less rushed thoughts on why this line of thought is mistaken. Would have been better to do this as a comment in the first place - sorry about that.
This is a shorter and less rushed version of the argument I made in an earlier post on counterfactual impact, which could have been better in a few ways. Hopefully, people will find this version clearer and more convincing.
Suppose that we are assessing the total lifetime impact of two agents: Darren, a GWWC member who gives $1m to effective charities over the course of his life; and GWWC, which, let’s assume in this example, moves only Darren’s money to effective charities. If Darren had not heard of GWWC, he would have had zero impact, and if GWWC had not had Darren as a member it would have had zero impact.
When we ask how much lifetime counterfactual impact someone had, we are asking how much impact they had compared to the world in which they did not exist. On this approach, when we are assessing Darren’s impact, we compare two worlds:
Actual world: Darren gives $1m to GWWC recommended charities.
Counterfactual worldD: Darren does not exist and GWWC acts as it would have if Darren did not exist.
In the actual world, an additional $1m is given to effective charities compared to the Counterfactual WorldD. Therefore, Darren’s lifetime counterfactual impact is $1m. Similarly, when we are assessing GWWC’s counterfactual impact, we compare two worlds:
Actual world: GWWC recruits Darren ensuring that $1m goes to effective charities
Counterfactual worldG: GWWC does not exist and Darren acts as he would have done if GWWC did not exist.
In the actual world, an additional $1m is given to effective charities compared to the Counterfactual WorldG. Therefore, GWWC’s lifetime counterfactual impact is $1m.
This seems to give rise to the paradoxical conclusion that the lifetime counterfactual impact of both GWWC and Darren is $2m, which is absurd as this exceeds the total benefit produced. We would assess the lifetime counterfactual impact of both Darren and GWWC collectively by comparing two worlds:
The difference between the Actual world and the counterfactual worldG&D is $1m, not $2m, so, the argument goes, the earlier method of calculating counterfactual impact must be wrong. The hidden premise here is:
Premise. The sum of the counterfactual impact of any two agents, A and B, taken individually, must equal the sum of the counterfactual impact of A and B, taken collectively.
In spite of its apparent plausibility, this premise is false. It implies that the conjunction of the counterfactual worlds we use to assess the counterfactual impact of two agents, taken individually, must be the same as the counterfactual world we use to assess the counterfactual impact of two agents, taken collectively. But this is not so. The conjunction of the counterfactual worlds we use to assess the impact of Darren and GWWC, taken individually, is:
Counterfactual worldD+G: GWWC does not exist and Darren acts as he would have done if GWWC did not exist; and Darren does not exist and GWWC acts as it would have done if Darren did not exist.
This world is not equivalent to Counterfactual worldD&G. Indeed, in this world Darren does not exist and acts as he would have done had GWWC not existed. But if GWWC had not existed, Darren would, ex hypothesi, still have existed. Therefore, this is not a description of the relevant counterfactual world which determines the counterfactual impact of both Darren and GWWC. This shows that you cannot unproblematically aggregate counterfactual worlds, it does not show that we assessed the counterfactual impact of Darren or GWWC in the wrong way.
To reiterate this point, when we assess Darren’s lifetime counterfactual impact, we ask: “what would have happened if Darren only hadn’t existed?” When we assess Darren and GWWC’s lifetime counterfactual impact, we ask “what would have happened if Darren and GWWC hadn’t existed?” These questions inevitably produce different answers about what GWWC would have done: in one case, we ask what GWWC would have done if Darren hadn’t existed, and in another we are assuming GWWC doesn’t even exist. This is why we get surprising answers when we mistakenly try to aggregate the counterfactual impact of multiple agents.
I agree with you that impact is importantly relative to a particular comparison world, and so you can't straightforwardly sum different people's impacts. But my impression is that Joey's argument is actually that it's important for us to try to work collectively rather than individually. Consider a case of three people:
Anna and Bob each have $600 to donate, and want to donate as effectively as possible. Anna is deciding between donating to TLYCS and AMF, Bob between GWWC and AMF. Casey is currently not planning to donate, but if introduced to EA by TLYCS ... (read more)