Unjournal eval. of Caviola et al, 2022 "Population ethical intuitions"  a need for further evidence?

david_reinstein

This is a linkpost for https://unjournal.pubpub.org/pub/evalsumpopintuitions/

The Unjournal commissioned two evaluations of "Population ethical intuitions" (link: ungated pdf) by Caviola, Althaus, Mogensen, and Goodwin, from experts with complementary backgrounds. (Also see the authors' post in this forum.)

Overview and my take

This work is among the first empirical papers considering attitudes towards population ethics. My (David Reinstein) synthesis: This is a strong start and they did a lot right. But it warrants further follow-up if we want to have a reasonably confident take on general population intuitions, including

Further data analysis/reporting on this survey, which should be easy to do (evaluators suggest specific methods for addressing heterogeneity and aggregation, visualizing distributions, reporting medians, and confidence intervals or other direct characterisations of uncertainty
Further surveys/replications on a range of global samples and with adjusted question framings, for robustness

While the second (anonymous) evaluator raised concerns that might make any such exercise less meaningful, I suspect that followup work could address some of these, or at least provide evidence for the extent to which these concerns matter.

Update: the usual search for forward citations, and elicit.com search and ChatGPT deep research did not find anything following up on this work/on this question. Please share if you know otherwise

My impression is that the "publish a lot and quickly" academic incentives in Psychology (and Philosophy?) make it challenging to do long-term, expensive, detailed continuing projects. This is something The Unjournal is hoping to help change, by providing rigorous quantified "post-publication" evaluation of these projects.

From our evaluation summary abstract:

The first evaluator (Bruers, with expertise in welfare economics and normative ethics) rates the paper highly, while E2 (an ~experimental economist) is moderately favorable. Both see a contribution (“Highly policy relevant”, “valuable empirical insights”).
Both offer some critiques and provide suggestions for robustness checks and ambitious future work.
Bruers expresses “weak confidence” in the paper’s “main result” that “people do not hold the neutrality and procreation asymmetry intuitions”, loosely suggesting we should “prioritize existential risk reduction”. E2 criticizes the paper’s fundamental approach as: (1) unable to accommodate non-utilitarian beliefs/behavior, (2) implying an unrealistic “hedonic arithmetic” and (3) relying on “choices between populations” that may not reflect inherent axiological preferences.
E2 also raises doubts about underpowered null results, limited characterisations of uncertainty, and the authors’ approach to aggregating participant responses into a “people believe” statement.

From our 'claim evaluation'

Main research claim

Belief in claim

Suggested robustness checks

Evaluator 1 Stijn Bruers

People do not hold the neutrality and procreation asymmetry intuitions. They believe that adding a happy person is good and [adding a happy person] is as good as adding an equally intense unhappy person is bad.

I have weak confidence in the result: I expect the neutrality and (a)symmetry views strongly depend on the context (e.g. on the choice set: the possible populations that one could choose).

In some contexts, such as situations where one could avoid a repugnant conclusion, people may hold the asymmetry view more strongly (especially after reflection). In future research, one could investigate such possible context or choice-set dependence.

A survey that measures people’s population ethical judgments in choice-set dependent contexts and under more reflection.

E.g. do people still prefer population A over population B when population C becomes an option and when they learn about e.g. repugnant or sadistic conclusions that could arise in such contexts or choice sets?

Evaluator 2 Anonymous

“Next, a one-sample t-test against the midpoint 4 revealed that participants on average judged it as an improvement to add one neutral person into the world, (M = 4.23, SD = 0.67), t(156) = 4.40, p < .001, d = 0.35. This suggests the existence of a weak general preference to create a new person, even if their happiness level is neutral.” (p. 9).

Credible Interval: [0.8, 1]

Study 2a should be globally replicated; whether a new neutral person is seen as beneficial to this world should be correlated with life conditions in each country that is studied.

Highlighted comments from evaluator 1 (Bruers)

I wrote “people may be slightly negative utilitarians.” I think based on studies 1a to 1c, the authors are too quick to draw the conclusion that the respondents weigh suffering stronger than happiness.

The framing of the survey question may create a bias.

Suggests considering reversing the question "what percentage of happy and unhappy people would there have to be for you to think that this world is overall positive" to ask about the percentage necessary "for you to think that this world is overall negative"

The present survey results are consistent with such a symmetric weighing of positive and negative welfare. The authors do not discuss this possibility in their paper (though they very briefly touch on this issue in section 16.5 “Limitations”). In a future survey, researchers could use both positive and negative framings of the questions that were used in surveys 1a, 1b and 1c, to test this symmetry. If the apparent asymmetry of the results in surveys 1a, 1b and 1c is due to a framing effect, the apparent contradiction with the observed symmetry in survey questions 2a and 2b could be resolved.

Highlighted comments from evaluator 2 (anonymous)

As well as specific methodological conccerns, the second evaluator raised fundamental concerns with the approach, the inferences one might make from the data, and whether and how it should inform public policy. (I think these objections have some overlap with David Hugh-Jones evaluation of "Ends versus Means: Kantians, Utilitarians, and Moral Decisions"),

"Forcing" participants into the utilitarian framework?

More fundamentally, as discussed above, the question is whether participants’ responses represent meaningful moral intuitions or merely random behavior when forced into the utilitarian framework.

The introduction of the happiness scale pins down subjects’ preferences under the assumption that the scale is accepted by subjects. However, in study 1b, we do not know whether subjects actually accept the scale. The problem is that behavior that does not fit neatly within the utilitarian framework is nonetheless identified and analyzed as meaningful. This problem is not present in later studies, where subjects had to accept such underlying assumptions before proceeding. But such an acceptance requirement is a double-edged sword. On the one hand, it ensures some basic validity of responses. On the other hand, one selects away those respondents who are not ready to accept researcher suppositions, perhaps with a good reason for not doing so.

Evidence of misunderstanding or random behavior?

The evidence suggests participants struggled not with understanding the scenarios but with their mathematical implications. In Study 3b, fewer than half of participants correctly identified that two populations had equal total happiness, with only 35% in happiness conditions and 26% in unhappiness conditions accurately calculating total welfare levels.

More fundamentally... the question is whether participants’ responses represent meaningful moral intuitions or merely random behavior when forced into the utilitarian framework.

Are "choices between populations" meaningful?

Secondly, this whole research rests on the implicit assumption that choices between populations are somehow legitimate (see also the next section in this evaluation). The experimental subjects did not have any such experience with choosing between populations, nor did these experiments in any way contain a variable payment for choosing correctly. What then, do these experiments measure? They might measure intuitions, but they could also measure social norms, or what respondents deemed appropriate. It is important to note that this issue exists in all treatments and hence does not affect the identification of differences between treatments. But it does affect baseline values, which are heavily discussed throughout the whole paper.

The researchers’ happiness scale implies a kind of hedonic arithmetic that simply doesn’t correspond to lived human experience. If a person at +100 happiness will likely adapt downward and a person at -50 will likely adapt upward, what exactly are we measuring when we ask people to choose between populations with different distributions of these ephemeral states?

Implications for governance

The authors note that understanding population ethics has “direct implications for decision-making” and “global priority setting.”

The experiments inadvertently demonstrate why judgments based on abstract aggregate calculations contain utmost danger. In Study 3d, participants exhibited what the authors call “averagist” tendencies even in cases that lead to the “Sadistic Conclusion.” That ordinary people might endorse such conclusions when thinking abstractly should give us pause about using these frameworks to guide real policy.

I do not believe, however, that these studies tell us much about current vs. future populations (no discount rates), or that they could inform deeply normative issues (is–ought problem).

Aggregation issues

When the authors report that “people believe” a certain trade-off ratio between happiness and suffering, etc., they are committing a statistical fallacy. The average of heterogeneous moral positions does not represent any actual person’s view, much less “people’s” views

EA Forum Bot Site
EA Forum

Unjournal eval. of Caviola et al, 2022 "Population ethical intuitions" a need for further evidence?

12