This brings to mind the assumption of normal distributions when using frequentest parametric statistical tests (t-test, ANOVA, etc.). If plots 1-3 represented random samples from three groups, an ANOVA would indicate there was no significant difference between the mean values of any group, which usually be reported as there being no significant difference between the groups (even though there is clearly a difference between them). In practice, this can come up when comparing a treatment that has a population of non-responders and strong responders vs. a treatment where the whole population has an intermediate response. This can be easily overlooked in a paper if the data is just shown as mean and standard deviation, and although better statistical practices are starting to address this now, my experience is that even experienced biomedical researchers often don't notice this problem. I suspect that there are many studies which have failed to identify that a group is composed of multiple subgroups that respond differently by averaging them out in this way.
The usual case for dealing with non-normal distributions is to test for normality (i.e. Shapiro-Wilk's test) in the data from each group and move to a non-parametric test if that fails for one or more groups (i.e. Mann-Whitney's, Kruskal-Wallis's or Friedman's tests), but even that is just comparing medians so I think it would probably still indicate no significant difference between (the median values of) these plots. Testing for difference between distributions is possible (i.e. Kolmogorov–Smirnov's test), but my experience is that this seems to be over-powered and will almost always report a significant difference between two moderately sized (~50+ samples) groups, and the result is just that there is a significant difference in distributions, not what that actually represents (i.e differing means, standard deviations, kurtosis, skewness, long-tailed, completely non-normal, etc. )

Nice post! Here's an illustrative example in which the distribution of p matters for expected utility.
Say you and your friend are deciding whether to meet up but there's a risk that you have a nasty, transmissible disease. For each of you, there's the same probability p that you have the disease. Assume that whether you have the disease is independent of whether your friend has it. You're not sure if p has a beta(0.1,0.1) distribution or a beta(20,20) distribution, but you know that the expected value of p is 0.5.
If you meet up, you get +1 utility. If you meet up and one of you has the disease, you'll transmit it to the other person, and you get -3 utility. (If you both have the disease, then there's no counterfactual transmission, so meeting up is just worth +1.) If you don't meet up, you get 0 utility.
It makes a difference which distribution p has. Here's an intuitive explanation. In the first case, it's really unlikely that one of you has it but not the other. Most likely, either (i) you both have it, so meeting up will do no additional harm or (ii) neither of you has it, so meeting up is harmless. In the second case, it's relatively likely that one of you has the disease but not the other, so you're more likely to end up with the bad outcome.
If you crunch the numbers, you can see that it's worth meeting up in the first case, but not in the second. For this to be true, we have to assume conditional independence: that you and your friend having the disease are independent events, conditional on the probability of an arbitrary person having the disease being p. It doesn't work if we assume unconditional independence but I think conditional independence makes more sense.
The calculation is a bit long-winded to write up here, but I'm happy to if anyone is interested in seeing/checking it. The gist is to write the probability of a state obtaining as the integral wrt p of the probability of that state obtaining, conditional on p, multiplied by the pdf of p (i.e. P(s1,s2)=∫P(s1,s2|p)f(p)dp). Separate the states via conditional independence (i.e. P(s1,s2|p)=P(s1|p)P(s2|p)) and plug in values (e.g. P(you have it|p)=p) and integrate. Here's the calculation of the probability you both have it, assuming the beta(0.1,0.1) distribution. Then calculate the expected utility of meeting up as normal, with the utilities above and the probabilities calculated in this way. If I haven't messed up, you should find that the expected utility is positive in the beta(0.1,0.1) case (i.e. better to meet up) and negative in the beta(20,20) case (i.e. better not to meet up).
Reflecting on this example and your x-risk questions, this highlights the fact that in the beta(0.1,0.1) case, we're either very likely fine or really screwed, whereas in the beta(20,20) case, it's similar to a fair coin toss. So it feels easier to me to get motivated to work on mitigating the second one. I don't think that says much about which is higher priority to work on though because reducing the risk in the first case could be super valuable. The value of information narrowing uncertainty in the first case seems much higher though.
Nice example, I see where you're going with that.
I share the intuition that the second case would be easier to get people motivated for, as it represents more of a confirmed loss.
However, as your example shows actually the first case could lead to an 'in it together' effect on co-ordination. Assuming the information is taken seriously. Which is hard as, in advance, this kind of situation could encourage a 'roll the dice' mentality.