3

261

In terms of things that would have helped when I was younger, I'm pretty on board with GWWC's new community strategy,^{[1]} and Grace's thoughts on why a gap opened up in this space. I was routinely working 60-70 hour weeks at the time, so doing something like an EA fellowship would have been an implausibly large ask and a lot of related things seem vibed in a way I would have found very offputting. My actual starting contact points with the EA community consisted of no-obligation low-effort socials and prior versions of EA Global.

In terms of things now, it's complicated. I suspect anything that prompts people to talk about how much they are giving and/or where is pretty powerful; knowing other traders who were donating 65+% was a real motivation to challenge myself on why I couldn't do the same or at least get closer, and I suspect I've had similar impacts on some others. Obviously, this kind of pressure can go wrong, but when it's mostly self-directed - 'why can't I?' rather than 'why don't you?' - and bouncing around very high-earning circles I think it nets out pretty positive. Seeing people find constructive things to do with their money also helps counter "Funding Overhang" memes.

Others' mileage may vary on how much these generalise.

^{^}Since my wife is involved with the GWWC London group and I have given a lot of money to GWWC since their reboot, I can't really claim to be unbiased here.

Thanks for this, pretty interesting analysis.

Every time I come across an old post in the EA forum I wonder if the karma score is low because people did not get any value from it or if people really liked it and it only got a lower score because fewer people were around to upvote it at that time.

The other thing going on here is that the karma system got an overhaul when forum 2.0 launched in late 2018, giving some users 2x voting power and also introducing strong upvotes. Before that, one vote was one karma. I don't remember exactly when the new system came in, but I'd guess this is the cause of the sharp rise on your graph around December 2018. AFAIK, old votes were never re-weighted, which is why if you go back through comments on old posts you'll see a lot of things with e.g. +13 karma and 13 total votes, a pattern I don't recall ever seeing since.

Partly as a result, most of the karma old posts have will have been from people going back and upvoting them later once the new system was impemented, e.g. from memory my post from your list was around +10 for most of its life, and has drifted to its current +59 over the past couple of years.

This jumps out to me because I'm pretty sure that post was not a particularly high-engagement post even at the time it was written, but it's the second-highest 2015 post on your list. I think this is because it's been linked back to a fair amount and so can partially benefit from the karma inflation.

(None of which is meant to take away from the work you've done here, just providing some possibly-helpful context.)

So taking a step back for a second, I think the primary point of collaborative written or spoken communication is to take the picture or conceptual map in my head and put it in your head, as accurately as possible. Use of *any* terms should, in my view, be assessed against whether those terms are likely to create the right picture in a reader's or listener's head. I appreciate this is a somewhat extreme position.

If everytime you use the term heavy-tailed (and it's used a lot - a quick CTRL + F tells me it's in the OP 25 times) I have to guess from context whether you mean the mathematical or commonsense definitions, it's more difficult to parse what you actually mean in any given sentence. If someone is reading and doesn't even know that those definitions substantially differ, they'll probably come away with bad conclusions.

This isn't a hypothetical corner case - I keep seeing people come to bad (or at least unsupported) conclusions in exactly this way, while thinking that their reasoning is mathematically sound and thus nigh-incontrovertible. To quote myself above:

The above, in my opinion, highlights the folly of ever thinking 'well, log-normal distributions are heavy-tailed, and this should be log-normal because things got multiplied together, so the top 1% must be at least a few percent of the overall value'.

If I noticed that use of terms like 'linear growth' or 'exponential growth' were similarly leading to bad conclusions, e.g. by being extrapolated too far beyond the range of data in the sample, I would be similarly opposed to their use. But I don't, so I'm not.

If I noticed that engineers at firms I have worked for were obsessed with replacing exponential algorithms with polynomial algorithms because they are better in some limit case, but worse in the actual use cases, I would point this out and suggest they stop thinking in those terms. But this hasn't happened, so I haven't ever done so.

I do notice that use of the term heavy-tailed (as a binary) in EA, especially with reference to the log-normal distribution, is causing people to make claims about how we should expect this to be 'a heavy-tailed distribution' and how important it therefore is to attract the top 1%, and so...you get the idea.

Still, a full taboo is unrealistic and was intended as an aside; closer to 'in my ideal world' or 'this is what I aim for my own writing', rather than a practical suggestion to others. As I said, I think the actual suggestions made in this summary are good - replacing the question 'is this heavy-tailed or not' with '*how *heavy-tailed is this' should do the trick- and hope to see them become more widely adopted.

Briefly on this, I think my issue becomes clearer if you look at the full section.

If we agree that log-normal is more likely than normal, and log-normal distributions are heavy-tailed, then saying 'By contrast, [performance in these jobs] is thin-tailed' is just incorrect? Assuming you meant the mathematical senses of heavy-tailed and thin-tailed here, which I guess I'm not sure if you did.

This uncertainty and resulting inability to assess whether this section is true or false obviously loops back to why I would prefer not to use the term 'heavy-tailed' at all, which I will address in more detail in my reply to your other comment.

Ex-postperformance appears ‘heavy-tailed’ in many relevant domains, but with very large differences inhowheavy-tailed: the top 1% account for between 4% to over 80% of the total. For instance, we find ‘heavy-tailed’ distributions (e.g. log-normal, power law) of scientific citations, startup valuations, income, and media sales. By contrast, a large meta-analysis reports ‘thin-tailed’ (Gaussian) distributions for ex-post performance in less complex jobs such as cook or mail carrier

Hi Max and Ben, a few related thoughts below. Many of these are mentioned in various places in the doc, so seem to have been understood, but nonetheless have implications for your summary and qualitative commentary, which I sometimes think misses the mark.

- Many distributions are heavy-tailed mathematically, but not in the common use of that term, which I think is closer to 'how concentrated is the thing into the top 0.1%/1%/etc.', and thus 'how important is it I find top performers' or 'how important is it to attract the top performers'. For example, you write the following:

What share of total output should we expect to come from the small fraction of people we’re most optimistic about (say, the top 1% or top 0.1%) – that is, how

heavy-tailedis the distribution of ex-ante performance?

- Often, you can't derive this directly from the distribution's mathematical type. In particular, you cannot derive it from whether a distribution is heavy-tailed in the mathematical sense.
- Log-normal distributions are particuarly common and are a particular offender here, because they tend to occur whenever lots of independent factors are multiplied together. But here is the approximate* fraction of value that comes from the top 1% in a few different log-normal distributions:

EXP(N(0,0.0001)) -> 1.02%

EXP(N(0,0001)) -> 1.08%

EXP(N(0,0.01)) -> 1.28%

EXP(N(0,0.1)) -> 2.22%

EXP(N(0,1)) -> 9.5% - For a real-world example, geometric brownian motion is the most common model of stock prices, and produces a log-normal distribution of prices, but models based on GBM actually produce pretty thin tails in the commonsense use, which are in turn much thinner than the tails in real stock markets, as (in?)famously chronicled in Taleb's Black Swan among others. Since I'm a finance person who came of age right as that book was written, I'm particularly used to thinking of the log-normal distribution as 'the stupidly-thin-tailed one', and have a brief moment of confusion every time it is referred to as 'heavy-tailed'.
- The above, in my opinion, highlights the folly of ever thinking 'well, log-normal distributions are heavy-tailed, and this should be log-normal because things got multiplied together, so the top 1% must be at least a few percent of the overall value'.
. In fact, as I understand it many oft-used examples of normal distributions, such as height and other biological properties, are actually believed to follow a log-normal distribution.**Log-normal distributions with low variance are practically indistinguishable from normal distributions**

***

I'd guess we agree on the above, though if not I'd welcome a correction. But I'll go ahead and flag bits of your summary that look weird to me assuming we agree on the mathematical facts:

By contrast, a large meta-analysis reports ‘thin-tailed’ (Gaussian) distributions for ex-post performance in less complex jobs such as cook or mail carrier [1]: the top 1% account for 3-3.7% of the total.

I haven't read the meta-analysis, but I'd tentatively bet that much like biological properties these jobs actually follow log-normal distributions and they just couldn't tell (and weren't trying to tell) the difference.

These figures illustrate that the difference between ‘thin-tailed’ and ‘heavy-tailed’ distributions can be modest in the range that matters in practice

I agree with the direction of this statement, but it's actually worse than that: depending on the tail of interest "heavy-tailed distributions" can have thinner tails than "thin-tailed distributions"! For example, compare my numbers for the top 1% of various log-normal distributions to the right-hand-side of a standard N(0,1) normal distribution where we cut off negative values (~3.5% in top 1%).

It's also somewhat common to see comments like this from 80k staff (This from Ben Todd elsewhere in this thread):

You can get heavy tailed outcomes if performance is the product of two normally distributed factors (e.g. intelligence x effort).

You indeed can, but like the log-normal distribution this will *tend to* have pretty thin tails in the common use of the term. For example, multipling two N(100,225) distributions together, chosen because this is roughly the distribution of IQ, gets you a distribution where the top 1% account for 1.6% of the total. Looping back to my above thought, I'd also guess that performance on jobs like cook and mail-carrier look very close to this, and empirically were observed to have similarly thin tails (aptitude x intelligence x effort might in fact be the right framing for these jobs).

***

Ultimately, the recommendation I would give is much the same as the bottom line presented, which I was very happy to see. Indeed, I'm mostly grumbling because I want to discourage anything which treats heavy-tailed as a binary property**, as parts of the summary/commentary tend to, see above.

Some advice for how to work with these concepts in practice:

- In practice,
don’t treat ‘heavy-tailed’ as a binary property. Instead, askhowheavy the tails of some quantity of interest are, for instance by identifying the frequency of outliers you’re interested in (e.g. top 1%, top 0.1%, …) and comparing them to the median or looking at their share of the total. [2]Carefully choose the underlying population and the metric for performance, in a way that’s tailored to the purpose of your analysis. In particular, be mindful of whether you’re looking at the full distribution or some tail (e.g. wealth of all citizens vs. wealth of billionaires).

*Approximate because I was lazy and just simulated 10000 values to get these and other quoted numbers. AFAIK the true values are not sufficiently different to affect the point I'm making.

**If it were up to me, I'd taboo the term 'heavy-tailed' entirely, because having an oft-used term whose mathematical and commonsense notions differ is an obvious recipe for miscommunication in a STEM-heavy community like this one.

I want to push back against a possible interpretation of this moderately strongly.

If the charity you are considering starting has a 40% chance of being 2x better than what is currently being done on the margin, and a 60% chance of doing nothing, I very likely want you to start it, naive 0.8x EV be damned. I could imagine wanting you to start it at much lower numbers than 0.8x, depending on the upside case. The key is to be able to monitor whether you are in the latter case, and stop if you are. Then you absorb a lot more money in the 40% case, and the actual EV becomes positive even if all the money comes from EAs.

If monitoring is basically impossible and your EV estimate is never going to get more refined, I think the case for not starting becomes clearer. I just think that's actually pretty rare?

From the donor side in areas and at times where I've been active, I've generally been very happy to give 'risky' money to things where I trust the founders to monitor and stop or switch as appropriate, and much more conservative (usually just not giving) if I don't. I hope and somewhat expect other donors are willing to do the same, but if they aren't that seems like a serious failure of the funding landscape.

CEA has now confirmed that Miri was correct to understand their budget - not EVF's budget - as around $30m.