The Moral Two Envelopes Problem and the Moral Weights Project

Derek Shiller

This post is an overview of the Moral Two Envelopes Problem and a take on whether it applies to Rethink Priorities' Moral Weights Project. It is a product of discussions between myself, Michael St. Jules, Hayley Clatterbuck, Marcus Davis, Bob Fischer, and Arvo Muñoz Morán, but need not reflect anyone's views but my own. Thanks to Brian Tomasik, Carl Shulman, and Michael St. Jules for comments. Michael St. Jules discusses many of these issues in depth in a forum post earlier this year.

Introduction

When deciding how to prioritize interventions aimed at improving the welfare of different species of animals, it is critical to have some sense of their relative capacities for wellbeing. Animals’ welfare capacities are, for the most part, deeply uncertain. Our uncertainty stems both from the gaps in our understanding of the cognitive faculties and the behaviors of different species, which constitute the external evidence of consciousness and sentience, and from our limited grasp of the bearing of that evidence on what we should think.

Rethink Priorities’ Moral Weights Project produced estimates of relative moral significance of some intensively farmed species that reflected our uncertainties. It assumed that moral significance depends on capacities for welfare. In order to respect uncertainty and differences of opinion about methodology, it adopted a Monte Carlo approach. Potential sources of evidence were combined with different theories of their evidential significance to produce a range of predictions. Those predictions were aggregated to create overall estimates of relative welfare capacity and hence moral significance.

The project addresses a complex issue and our methodology is open to criticism. Most notably, in anticipation of work like the Moral Weights Project, Brian Tomasik explored complications to naive aggregation of expected value across different theories^[1]. He directed his criticisms at attempts to aggregate welfare estimates in light of different theories of the significance of brain size, but similar criticisms could be developed for the other proxies that the Moral Weights Project considered. If the issues Tomasik identified did apply to the project's methodology, then the conclusions of the report would be compromised.

It is my view that while there are important lessons to be drawn from Tomasik’s work and Tomasik is right that certain forms of aggregation would be inappropriate, the concerns he posed do not apply to the project's methodology. This document explains why.

Moral Two Envelopes Problem

The Two Envelopes Problem is a venerable issue in decision theory that has generated a lot of scholarly discussion. At its heart, it poses a challenge in understanding how to apply expected value reasoning to making decisions. The challenge depends on the Two Envelopes Case, a thought experiment in which some amounts of money are placed into two envelopes. A subject chooses between them and keeps the money inside their chosen envelope. They know that one envelope contains exactly twice as much money as the other but they don’t know which contains more. Before they’ve seen how much is inside their chosen envelope, they are allowed to switch to the other. Expected value calculations appear to support switching.

The argument for switching is as follows. There is some exact but unknown amount of money inside the chosen envelope. Let’s call that amount ‘CEM’. The subject knows that the unchosen envelope is equally likely to contain ½ CEM and 2 CEM. The expected value of switching to the other envelope is therefore ½ * ½ CEM + ½ * 2 CEM = 1.25 CEM. The expected value of not switching is 1 CEM.

Given the symmetry of the situation, there is obviously no reason to switch. The Two Envelopes Problem is to explain where the expected value reasoning goes wrong.

The problem Tomasik posed bears a resemblance. Tomasik illustrates it with his Two Elephants Case: Suppose we’re uncertain between two theories, one of which values species equally and one which values species linearly by neuron count. Consider two prospects, one that benefits one human to some degree and one that benefits two elephants to the same degree. Assume that elephants have a quarter the neuron count of humans. Then on one of the theories, helping the elephants is twice as good as helping humans. On the other, it is half as good.

As with the standard Two Envelopes Case, we can argue from the expected value for helping either. Suppose we’ve elected to help the human. Call the amount of value we produce HEV (Human Expected Value). The expected value of switching to help the elephants is 1.25 HEV. Therefore, we should switch. But we could just as easily make the same argument in reverse. Call the amount of value produced by helping the elephants 1 EEV. Then helping the human has an expected value of 1.25 EEV.

Tomasik argues that applying expected value reasoning to evaluating prospects in this way is inappropriate.

In contrast with the Two Envelopes Problem, he sees little challenge in explaining what makes that reasoning inappropriate. For Tomasik, at least some of the fundamental issues at play in the Two Envelopes Case and the Two Elephants Case are different. He claims that the Two Envelopes Problem is solvable, but the Moral Two Envelopes Problem is not. The difference is that the Moral Two Envelopes Problem involves expected value calculations that aggregate the value of each species according to different normative theories.

According to Tomasik, distinct utility functions are incomparable. The numbers that each theory attributes only make sense within the theory; they specify the tradeoffs sanctioned by those theories. In order to aggregate them, we would need to find some common unit in which they both make sense. But there is no common unit.

Three Related Issues

There is a straightforward explanation for why the Moral Two Envelopes Problem is not a problem for the Moral Weights Project: the Moral Weights Project does not attempt to aggregate across normative theories. Instead, it assumes a single normative theory and aggregates across assumptions about how some behavioral and neurological proxies relate to the cognitive traits that matter to that theory. I’ll explore this idea more below. But before that, Tomasik’s observation of the similarities with the Two Envelopes Problem raises other issues that should be addressed; I will describe three challenges that warrant attention.

The Pinning Problem

One line of response to the traditional Two Envelopes Problem asserts that the problematic reasoning rests on a subtle equivocation.^[2] The traditional thought experiment assumes that any amount of money may be in the two envelopes, but the equivocation is easier to see if we assume specific amounts. Suppose that one envelope contains $10 and the other $20. The expected value of switching from the chosen envelope to the other is then 1.25 * CEM. However ‘CEM’ is tied by definition to the specific amount of money in one envelope and we are uncertain which of the two values that specific amount is. If ‘CEM’ refers to the higher value, then that value is $20 and the cost of switching is $10. If ‘CEM’ refers to the lower value, then it is $10 and the gain in switching is $10. It is only because ‘CEM’ refers to different values in the scenarios the subject is uncertain between that switching can have an expected value of 1.25 CEM, staying can have an expected value of 1 CEM, and switching can still not be worthwhile.^[3]

There is an issue here, which I’ll call the ‘Pinning Problem’. Sometimes we point to an uncertain value and give it a name, then use units of that value for expected value calculations. That can be ok. But it can also be problematic when our evidence bears on what that unit must actually be in different epistemically possible situations. ‘CEM’ might refer to $10 or it might refer to $20. Part of the Pinning Problem concerns knowing when it is acceptable to use fixed terms to represent potentially varying units of value in expected value calculations – it isn’t always wrong. Part of it concerns knowing when our terms could represent different levels of value in the first place.

This could be a problem for a Moral Weights Project if the welfare capacities were specified in a unit whose significance we were uncertain about and whose value would be different under different possible scenarios, considered as actual. This problem isn’t tied to normativity, so presents a different issue than the one that Tomasik focused on.

The Ratio Incorporation Problem

Suppose that the oracle (who states only truths) asserts that the relative value of a $1000 donation to The Tuberculosis Initiative (TI) and of the same amount to Delead The World (DW) is 10:1. We had previously thought it was 5:1. There are multiple ways of adjusting our beliefs. We might take this as evidence that TI’s work was more effective than we initially thought and raise that assessment while holding our regard for DW fixed. Alternatively, we might lower our assessment for DW’s work while holding TI’s fixed. Finally, we become both more optimistic about TI and more pessimistic about DW. The oracle’s revelation does not tell us which way to go.

Sometimes we are uncertain about the ratio of values between different prospects. The proper way to reason about our prospects given such uncertainty may depend on how various possible ratios would be incorporated if we were to decide that they were true.

Suppose that the oracle informs us that the ratio of value in donations to TI and to DW is either 10:1 or 1:2 and we are equally uncertain between them. This might seem to suggest that we should give to TI rather than DW. But really it depends on how we would choose to incorporate these ratios into our absolute estimates of effectiveness. Suppose that we would incorporate information about the 10 to 1 ratio, if confirmed, by lowering our estimate of DW, but incorporate the 2 to 1 ratio by upping our estimate of DW (in both cases holding TI fixed). In that case, the expected value of DW would actually be higher, even though the more favorable ratio is on TI’s side. On the other hand, if we instead held DW fixed, the expected value of TI would be higher.

The Ratio Incorporation Problem means that we need to know more than just the ratios according to different theories, we need to know how to adjust our prior expectations about value in light of those ratios. Or, alternatively, we need to know how to gauge the significance of some ratios against others.

The Metric Locality Problem

Finally, different metrics may specify different relative values such that there is no way to compare values across metrics.

This problem contains several subproblems. One subproblem is making sense of the idea of the same amount of value within different normative theories. We might think the theories are fundamentally about different things, so no translation between the two metrics is possible. Consider: how many euros is equivalent to 80 degrees Fahrenheit? The question doesn’t make sense. We might think the same thing is true about the utility measures of different normative theories. Kantian deontologists, for example, just think about value in such a different way than utilitarians that trying to fit their valuations on one scale looks a bit like trying to put money and temperature on one scale.

Another subproblem concerns how to actually compare across metrics, assuming it is intelligible that one value in each could be equivalent. Compare: is Nigel Richards better at Scrabble than Magnus Carlsen is at chess? It doesn’t sound like an incoherent question in quite the way that asking about the value of a currency in temperature is, but it is also not obvious that there is an answer. The numbers representing the values of options in a utility calculus traditionally reflect an ordering of prospects and the relative sizes of differences between values, but not absolute values. Specific values can be transformed with linear functions (such as by multiplying each assignment by two) without altering the ordering.

In order to fix a scale that we might use to compare the values assigned by two theories, we just need to know how to translate the value of two assignments from one to the other. From the translations of two specific values, we can derive a general understanding of how normative distance translates between the two scales. And with a relative measure of distance and any shared value, we can calibrate how far from that shared value any other value is. To translate between Celsius and Fahrenheit, you only need to know that the metrics are the same at -40°, and that the difference between 0° and 100° in Celsius is the same as the difference between 32° and 212° in Fahrenheit.

It is plausible that the value represented with zero may be granted special significance that is assumed to be the same across measures. This reduces our problem to finding one other translation between the metrics. Without another translation to fix a scale, we can’t assume that numbers mean the same thing in different metrics. (We assume too that the proxies do not imply non-linear differences in value. This is a non-trivial assumption^[4], but consistent with our rather minimalistic interpretation of the proxies.)

If different metrics are incomparable and the numbers actually assigned are only accurate up to linear transformations, then there is no way to aggregate the results from different metrics.

The Metric Locality Problem may be thought of as a reason for skepticism about the possibility of resolving the Pinning Problem or the Ratio Incorporation Problem in normative cases. In order for it to be possible to know how to identify one unit of value across normative theories, it would need to be coherent to identify value across theories. In order for it to be possible to incorporate different ratios relative to one another, there would have to be a way of fitting them on the same scale. The Metric Locality Problem says there is not.

Avoiding These Issues

These are real challenges. Each could be a problem for a project like the Moral Weights Project. However, we think the Moral Weights Project follows a methodology that avoids these challenges. Several different factors contribute to the explanation of why the Moral Weights Project does not face the Moral Two Envelopes Problem.

Pinning to Humans

The Moral Weights Project aims to assign numerical values to represent the moral significance of species. The methodology involves assigning numerical values according to different theories and then aggregating those values. In each of the theories, it is assumed that humans always have a moral weight of 1 and so any variance implied by those theories is applied to other species.

Assuming a constant value for the welfare of humans allows us to fix a scale across different measures. The key to this is that we also assume assignments of 0 to have a fixed meaning in the metric for each theory. In each metric, 0 is the value of not existing as a welfare subject. The significance of non-existence is uncontroversial. Plausibly, it is the default against which all other changes are assessed. So the meaning of this number may be assumed identical between theories.

We assume 1 to reflect the amount of value provided by the welfare capacity of human beings. Since all other numerical assignments are interpretable relative to these values, and since the numbers for these two absolute amounts of value are the same in each metric we consider, all other numerical assignments can be interpreted as equally significant.

The Pinning Problem is solved by representing everything in human units. We do not have to worry about the meanings of our terms shifting in different contexts, because the value of human units is introspectively fixed. (More on this below.) The Ratio Incorporation Problem is solved by the constraint to always hold the value of humans fixed. The capacity of welfare for humans is assumed to be the same no matter which approach to inferring welfare from physiological and behavioral traits is correct, so any information about the ratio of human and non-human welfare is incorporated by adjusting the value assigned to non-human welfare.

The justification for assuming a consistent meaning to the assignments to humans is that we, as humans, have special^[5] introspective access to our capacity for pleasure and pain. We know how badly pain hurts. We know how good pleasure feels. Our interest in behavioral and neurological proxies relates to what it tells us about the extent to which other animals feel the same as we do.

Our grasp on our own welfare levels is independent of theory. Prick your finger: that’s how bad a pricked finger feels. That is how bad it feels no matter how it is that the number of neurons you have in your cortex relates to your capacity to suffer^[6]. This is the best access we can have to our own capacities for suffering. If you’re suffering from a migraine, learning about the true ratio of suffering in humans and chickens shouldn’t make you feel any better or worse about your present situation^[7].

Consider again the Two Elephants Case. Under one theory, two elephants are worth twice as much as one human: 2 HEV. Under another theory, two elephants are worth half as much as one human: ½ HEV. Symmetrically, the two theories have humans coming out worth half and twice as much as the two elephants respectively. Suppose that we pin the value of humans to be identical across the theories: 1 HEV refers to the same amount no matter which theory is true. Then it follows that in the second framing, although humans are worth half as much as the elephants in one theory, and twice as much in the other, the value of the two elephants shifts between the two theories, and calculating the expected value in elephantine units is inappropriate.

There are two caveats to this.

First, we aren’t assuming that humans have the same value according to every normative theory. We are only assuming that humans have a certain level of welfare, as determined by their valenced experiences, no matter what the proxies say. It is only because we assume that valenced experiences determine moral significance that we can infer that humans have the same level of moral significance.

Second, we are assuming that we have direct access to the range of our own capacity for suffering (at least to an extent that is independent of the question of which proxies are correct). The direct access we have to our phenomenal states is somewhat mysterious and open to doubt: we might struggle to explain how we reliably know about the extent of our own welfare states. Nevertheless, we think that there is sufficient consensus that we have such access which is an acceptable assumption for this project.

Assuming a Normative Theory

According to Tomasik’s assessment, the Moral Two Envelopes Problem is a problem specifically for aggregations of value across the utility functions that result from different normative theories. So, for instance, if we are equally unsure between a deontological theory that assigns a value of -5 to murder and a utilitarian theory that assigns a value of -1 to murder, we can’t average them to get an expected value of -3. The problem is supposed to be that the numbers in these functions are incomparable. In contrast, disagreement about factual matters within a theory is supposed to be unproblematic.

The theories that we assign variable degrees of probability within the Moral Weights Project are (for the most part^[8]) not normative theories. Instead, our assessments assume hedonism: valenced experiences are what matters, and so calibrating moral weights involves assessing species for their capacity for pleasure and pain. The theories over which we are uncertain relate the relationships between physiological and behavioral proxies for valenced experiences. One theory suggests that brain size is a proxy for suffering. Another theory suggests that aversive behavior is a proxy for suffering. Our uncertainty is not about what matters directly, but about how what we know about the world relates to what matters.

The question of how proxies relate to valenced experiences is a factual question, not a question of what to value^[9]. There are possible questions of what to value that might be confused for the question of the relevance of proxies. For instance, it might be that we assign higher moral weights to humans because we think more complex cognitive states matter more or because we think that the amount of neural mass involved in a feeling matters beyond how it feels. However, these were not the kinds of theories over which we try to aggregate. For the proxies in which cognitive complexity is taken as indicative of a wider range, the assumption is that cognitive complexity correlates (perhaps as the result of evolutionary pressures) with the determinants of that welfare range, not that cognitive complexity constitutes the thing of value.

The fact that we treat proxies as proxies of something further that cannot be directly studied may call some of the methodological choices into question. In particular, some may be skeptical that our proxies provide the same evidence across the phylogenetic tree: similarities in behavior indicate similarities in underlying mental faculties in creatures who share our neuroanatomy and evolutionary heritage. For instance, play behavior may be taken to provide more evidence for welfare capacity in chickens than in fruit flies. The nuances here are more difficult to formalize and study in a rigorous and consistent manner. In the interest of making progress, the Moral Weight Project adopted an approach that smooths over such complexities. That said, readers should be cautious about naively accepting the results of the project for very distant species and I would not endorse straightforwardly extending the methodology to non-biological systems.

Assuming a single normative theory is also potentially problematic. It is especially problematic if that theory, like hedonism, is widely regarded as false. However, the Moral Weights Project team thought hedonism’s tractability and its proximity to the truth were enough to justify its use. While hedonism is not widely accepted, the numerical values produced by the project are still informative and speak to an important component of moral value.

It might be objected that there aren’t any plausible correlates that underlie welfare capacities that are independent from the sorts of proxies we chose.^[10] On that view, our uncertainty about how welfare capacities relate to proxies would not be addressed by any facts we don’t know about the true underlying nature of welfare. The question of how to think about chicken suffering or shrimp suffering is then not what it really feels like to be a chicken or to be a shrimp, but instead how we want to categorize their states. This amounts to a rejection of realism about the notion of welfare ranges. Doubts about the validity of the concept of welfare ranges would reduce the value of the project, but that shouldn’t come as a surprise. It would suggest issues with the foundations and aims of the project rather than its methodology.

Calibrating Through Shared Assessments

Finally, I believe that it is possible to aggregate across the metrics for different normative theories so long as those metrics are properly calibrated. Proper calibration is possible for normative theories that are not too different from one another.

Tomasik considers and rejects one calibration scheme according to which all theories are put on a single scale according to the best possible outcomes in those theories. I agree with Tomasik that this approach would be problematic. Many normative theories place no upper limit on the amount of value, and there is no reason we can see to think normative theories must all assume the same overall stakes.

I think that a more promising strategy involves using some shared assessments across normative theories to create a common currency.

First, consider a particularly easy case: suppose we have hedonism and hedonism+. Hedonism+ shares hedonism’s verdicts on all things except that it also attributes some value to personal autonomy, and is therefore willing to pay costs in experience for greater levels of autonomy. Let hedonism and hedonism+ share not just their assessments of the value of pleasures and pains, but the reasons for those assessments: the explanations they provide for their value and our epistemic access to that value are identical. Given this, it is reasonable to equate the value each assigns to pleasure and pain in the two theories. The value of autonomy in hedonism+ can then be inferred from the tradeoffs that hedonism+ warrants with pleasure and pain.

We can generalize from this special case. Common reasoning about sources of value may let us calibrate across normative theories. Insofar as two theories place an amount of value on the same prospect for the exact same reasons, we can assume it is the same amount of value.^[11]

This strategy might be applied to other flavors of hedonism that value different kinds of experiences. Consider a flavor of hedonism that attributes greater wellbeing to more complex varieties of pleasure and pain. Human capacities for moral value on this theory might be greater than on theories that treat complex pleasure and pain the same as simple pleasures and pain. But if each flavor of hedonism agrees about the value of simple pleasure and pain, and complex hedonism sees additional reasons to value complex pleasure or pain, then we can calibrate between the theories using the shared assessments of simple pleasures and pains.^[12]

Conclusion

Although the Moral Two Envelopes Problem presents real difficulties, the Moral Weights Project methodology largely avoids them by assuming a single normative theory and pinning its value units to humans.

^{^}
See also Holden Karnofsky’s update on cause prioritization at Open Philanthropy.
^{^}
I don’t claim that this solves the Two Envelope Problem, or even that the problem is solvable in full generality. I think it is solvable in every finite case, where our uncertainties don’t permit just any amount of money in the two envelopes, for the reasons expressed here. The Moral Two Envelopes Problem doesn’t seem to rely on problematic infinities, so it is appropriate to focus on just the finite case.
^{^}
Whether this expected value is action-relevant depends in part on our preferences. Suppose that Spencer and Trish are both allowed to choose an envelope (the same or different) and get the amount of money inside. Trish chooses one envelope. Spencer doesn’t care about exactly how much money he gets, he only cares about how much money he gets relative to Trish. He’d rather get $2 if Trish gets $1 than $500 if Trish gets $1000. What he really cares about is expected value in Trish-based units (He’d sell his kidney for $2 if Trish gets $1, but wouldn’t sell it for $500 if Trish gets $1000). Spencer should pick the envelope Trish does not pick, not because it has higher expected value in absolute monetary terms, but because it has a higher expected value in the Trish-based units.

^{^}

Thanks to Michael St. Jules for stressing this point. If we took the proxies as corresponding to theories about what makes mental states valuable, this could be a significant issue. Instead, we see the proxies not as identifying physical bases of normatively-significant features but just as possible sources of evidence regarding normatively-significant features that are assumed to be the same no matter which proxies actually provide the best evidence.

^{^}

The introspective access provides two advantages. First, we can examine our internal mental life in a way that would be very difficult to replicate from a third-person perspective with other animals. We get some insight into what we’re paying attention to, what properties are represented, what aspects of our experiences we find motivating, etc. In principle, we might be able to do this from a third-person perspective with any other animal, but it would take a lot of neuroscientific work. Second, introspection of valenced experiences provides a target for fixing a unit that we can in part recall through memory and imagination. We could define a unit of suffering by indexing to any noxious event in an individual of any species, but the significance of that unit would remain mysterious to us insofar as we couldn’t imagine what it is like to experience it and couldn’t use our internal cognitive machinery to assess tradeoffs with other things we care about.

^{^}

This doesn’t mean that which theory of welfare is true doesn’t have a bearing on how good or bad our lives are. It may be that our lives would have been worse than they are if there was (counterfactually) a linear relationship between brain size and suffering. This is a strange counterfactual to entertain because I take that it is not possible that the proxies should be different than they are, at least conditional on certain plausible assumptions: the true nature of consciousness and the forces of evolution in our ancestors’ environment require the proxies to be roughly whatever they are. However, we don’t need to worry about this. For our purposes, it makes sense to just aggregate across the value of different theories considered as actual.

^{^}

Michael St. Jules makes a similar argument.

^{^}

We did include one set of proxies – our ‘higher / lower pleasures’ model – that we gave an explicitly normative explanation. Removing this model wouldn’t significantly change the results. Furthermore, the proxies relate to markers for intelligence that would also fit naturally with non-normative rationales.

^{^}

Carl Shulman and Brian Tomasik both suggested a view according to which the relevant facts underdetermine the factual question. On my understanding, they think that suffering is a normatively loaded concept, and so the question about which states count as suffering is itself normative. Given that the physical facts don’t force an answer, the precise delineation of suffering vs non-suffering is normative. This view seems like it makes more sense for the kind of uncertainty we have over insects than the kind of uncertainty we have over chickens; we can be reasonably confident that insects don’t share the robust natural kind of cognitive state that underlies our consciousness. In any case, another plausible response to factual underdetermination is to reflect that indeterminacy in welfare ranges. Such complexities were beyond the scope of the project as planned, which aimed to apply a concrete (albeit rough) methodology to generate precise moral weights.

^{^}

Thanks to Carl Shulman for making this point.

^{^}

Michael St. Jules discusses similar ideas. See also Constructivism about Intertheoretic Comparisons.

^{^}

This may suggest that there is much more value at stake according to expansive normative views. If we adopt a meta-normative principle in which we should favor expected choice-worthiness, this would give us reason to be maximalists. It isn’t obvious to me that that conclusion is wrong, but if we find it disagreeable we can also reject the maximization of expected choice-worthiness.

Show all footnotes

92 Reactions

More posts like this

Comments20

Sorted by

New & upvoted

Click to highlight new comments since: Today at 6:11 PM

titotalOct 15 202413

So in the two elephants problem, by pinning to humans are you affirming that switching from the 1 human EV to 1 elephant EV, when you are unsure about the HEV to EEV conversion, actually is the correct thing to do?

Like, option 1 is 0.25 HEV better than option 2, but option 2 is 0.25 EEV better than option 1, but you should pick option 1?

what if instead of an elephant, we were talking about a sentient alien? Wouldn't they respond to this with an objection like "hey, why are you picking the HEV as the basis, you human-centric chauvinist?"

Derek ShillerOct 16 20242

I think you should make the conversion because you know what human experience is like. You don't know what elephant or alien experience is like. Elephants or aliens may make different choices than you do, but they are responding to different evidence than you have, so that isn't that weird.

Lukas FinnvedenOct 16 202412

The alien will use the same reasoning and conclude that humans are more valuable (in expectation) than aliens. That's weird.

Different phrasing: Consider a point in time when someone hasn't yet received introspective evidence about what human or alien welfare is like, but they're soon about to. (Perhaps they are a human who has recently lost all their memories, and so don't remember what pain or pleasure or anything else of-value is like.) They face a two envelope problem about whether to benefit an alien, who they think is either twice as valuable as a human, equally valuable as a human, or half as valuable as a human. At this point they have no evidence about what either human or alien experience is like, so they ought to be indifferent between switching or not. So they could be convinced to switch to benefitting humans for a penny. Then they will go have experiences, and regardless of what they experience, if they then choose to "pin" the EV-calculation to their own experience, the EV of switching to benefitting non-humans will be positive. So they'll pay 2 pennies to switch back again. So they 100% predictably lost a penny. This is irrational.

Michael St Jules 🔸Oct 16 202414

One important difference is that we're never in this situation if and because we've already committed to human-based units, so there's no risk of such a money pump or such irrational behaviour.

And there's good reason for this. We have direct access to our own experiences, and understand, study and conceptualize consciousness, suffering, desires, preferences and other kinds of welfare in reference to our own and via conservative projections, e.g. assuming typical humans are similar to each other.

To be in the kind of position this thought experiment requires, I think you'd need to study and conceptualize welfare third-personally and fairly independently of human experiences, the only cases we have direct access to and the ones we're most confident in.

Probably no human has ever started conceptualizing consciousness and welfare in the third-person without first experiencing welfare-relevant states themselves. Luke Muehlhauser also illustrated how he understood animal pain in reference to his own in his report for Open Phil:

I sprain my ankle while playing soccer, don’t notice it for 5 seconds, and then feel a “rush of pain” suddenly “flood” my conscious experience, and I think “Gosh, well, whatever this is, I sure hope nothing like it happens to fish!” And then I reflect on what was happening prior to my conscious experience of the pain, and I think “But if that is all that happens when a fish is physically injured, then I’m not sure I care.” And so on.

It might be possible to conceptualize consciousness and welfare entirely third-personally, but it's not clear we'd even be talking about the same things anymore. That also seems to be throwing out or underusing important information: our direct impressions from our own experiences. That might be epistemically irrational.

I also discuss this thought experiment here, here (the section that immediately follows) and in the comments on that post with Owen.

FWIW, I could imagine an AI in the position of your thought experiment, though, and then it could use a moral parliament or some other approach to moral uncertainty that doesn't depend on common units or intertheoretic comparisons. But we humans are starting from somewhere else.

Also, notably, in chickens vs humans, say, a factory farmed chicken doesn't actually hold a human-favouring position, like the alien does. We could imagine a hypothetical rational moral agent with hedonic states and felt desires like a chicken, although their specific reasoned desires and preferences wouldn't be found in chickens. And this is also very weird.

titotalOct 17 202410

If we make reasoning about chickens that is correct, it should also be able to scale up to aliens without causing problems. If your framework doesn't work for aliens, that's an indication that something is wrong with it.

Chickens don't hold a human-favouring position because they are not hedonic utilitarians, and aren't intelligent enough to grasp the concept. But your framework explicitly does not weight the worth of beings by their intelligence, only their capacity to feel pain.

I think it's simply wrong to switch in the case of the human vs alien tradeoff, because of the inherent symmetry of the situation. And if it's wrong in that case, what is it about the elephant case that has changed?

Derek ShillerOct 16 20248

The alien will use the same reasoning and conclude that humans are more valuable (in expectation) than aliens. That's weird.

Granted, it is a bit weird.

At this point they have no evidence about what either human or alien experience is like, so they ought to be indifferent between switching or not. So they could be convinced to switch to benefitting humans for a penny. Then they will go have experiences, and regardless of what they experience, if they then choose to "pin" the EV-calculation to their own experience, the EV of switching to benefitting non-humans will be positive. So they'll pay 2 pennies to switch back again. So they 100% predictably lost a penny. This is irrational.

I think it is helpful to work this argument out within a Bayesian framework. Doing so will require thinking in some ways that I'm not completely comfortable with (e.g. having a prior over how much pain hurts for humans), but I think formal regimentation reveals aspects of the situation that make the conclusion easier to swallow.

In order to represent yourself as learning how good human experiences are and incorporating that information into your evidence, you will need to assign priors that allow for each possible value human experiences might have. You will also need to have priors for each possible value alien experiences might have. To make your predictable loss argument go through, you will still need to treat alien experiences as either half as good or twice as good with equal probabilities no matter how good human experiences turn out to be. (Otherwise, your predictable loss argument needs to account for what the particular experience you feel tells you about the probabilities that the alien's experiences are higher or lower, this can give you evidence that contradicts the assumption that the alien's value is equally likely to be half or twice.) This isn't straightforwardly easy. If you think that human experience might be either worth N or N/2 and you think alien experience might be either N/2 or N, then learning that human experience is N will tell you that the alien experience is worth N/2.

There are a few ways to set up the priors to get the conclusion that you should favor the alien after learning how good human experience is (no matter how good that is). One way is to assume off the bat that aliens are likely to have a higher probability of higher experiential values. Suppose, to simplify things a bit, you thought that the highest value of experience an human could have is N. (More realistically, the values should trail off with ever lower probabilities, but the basic point I'm making would still go through -- alien's possible experience values couldn't decline at the same rate as humans without violating the equal probability constraint.) Then, to allow that you could still infer that alien experience is as likely to be twice as good as any value you could discover, the highest value an alien could have would have to be 2*N. It makes sense given these priors that you should give preference to the alien even before learning how good your experiences are: your priors are asymmetric and favor them.

Alternatively, we can make the logic work by assigning a 0 probability to every possible value of human experience (and a 0 to every possible value of alien experience.) This allows that you could discover that human experience had any level of value, and, conditional on however good that was, the alien was likely to have half or twice as good experiences. However, this prior means that in learning what human experience is like, you will learn something to which you previously assigned a probability of 0. Learning propositions to which you assigned a 0 is notoriously problematic and will lead to predictable losses if you try to maximize expected utility for reasons completely separate from the two envelopes problem.

titotalOct 17 20245

I think switching has to be wrong, for symmetry based reasons.

Let's imagine you and a friend fly out on a spaceship, and run into an alien spaceship from an another civilisation that seems roughly as advanced as you. You and your buddy have just met the alien and their buddy but haven't learnt each others languages, when an accident occurs: your buddy and their buddy go flying off in different directions and you collectively can only save one of them. The human is slightly closer and a rescue attempt is slightly more likely to be successful as a result: based solely on hedonic utilitarianism, do you save the alien instead?

We'll make it even easier and say that our moral worth is strictly proportional to number of neurons in the brain, which is an actual, physical quantity.

I can imagine being an EA-style reasoner, and reasoning as follows: obviously I should anchor that the alien and humans have equal neuron counts, at level N. But obviously there's a lot of uncertainty here. Let's approximate a lognormal style system and say theres a 50% chance the alien is also level N, a 25% chance they have N/10 neurons, and a 25% chance they have 10N neurons. So the expected number of neurons in the alien is 0.25*N/10 + 0.5*N + 0.25*(10N) = 3.025N. Therefore, the alien is worth 3 times as much a human in expectation, so we should obviously save it over the human.

Meanwhile, by pure happenstance, the alien is also a hedonic EA-style reasoner with the same assumptions, with neuron count P. They also do the calculation, and reason that the human is worth 3.025P, so we should save the human.

Clearly, this reasoning is wrong. The cases of the alien and human are entirely symmetric: both should realise this and rate each other equally, and just save whoevers closer.

If your reasoning gives the wrong answer when you scale it up to aliens, it's probably also giving the wrong answer for chickens and elephants.

Derek ShillerOct 17 20246

Clearly, this reasoning is wrong. The cases of the alien and human are entirely symmetric: both should realise this and rate each other equally, and just save whoevers closer.

I don’t think it is clearly wrong. You each have separate introspective evidence and you don’t know what the other’s evidence is, so I don’t think you should take each other as being in the same evidential position (I think this is the gist of Michael St. Jules’ comment). Perhaps you think that if they do have 10N neurons, then the depth and quality of their internal experiences, combined with whatever caused you to assign that possibility a 25% chance, should lead them to assign that hypothesis a higher probability. You need not think that they are responding correctly to their introspective evidence just because they came to a symmetric conclusion. Maybe the fact that they came to a symmetric conclusion is good evidence that you actually have the same neuron count.

Your proposal of treating them equally is also super weird. Suppose that I offer you a bet with a 25% chance of a payout of $0.1, a 50% chance of $1, and a 25% chance of $10. It costs $1. Do you accept? Now I say, I will make the payout (in dollars) dependent on whether humans or aliens have more neurons. Your credences haven’t changed. Do you change your mind about the attractiveness of this monetary bet? What if I raise the costs and payout to amounts of money on the scale of a human life? What if I make the payout be constituted by saving one random alien life and the cost be the amount of money equal to a human life? What if the costs and payouts are alien and human lives? If you want to say that you should think the human and alien life are equally valuable in expectation, despite the ground facts about probabilities of neuron counts and assumed valuation schema, you’re going to have to say something uncomfortable at some point about when your expected values come apart from probabilities of utilities.

titotalOct 18 20242

This seems to me like an attempt to run away from the premise of the thought experiment. I'm seeing lot's of "maybes" and "mights" here, but we can just explain them away with more stipulations: You've only seen the outside of their ship, you're both wearing spacesuits that you can't see into, you've done studies and found that neuron count and moral reasoning skills are mostly uncorrelated, and that spacefilight can be done with more or less neurons, etc.

None of these avert the main problem: The reasoning really is symmetrical, so both perspectives should be valid. The EV of saving the alien is 2N, where N is the human number of neurons, and the EV of saving the human from the alien perspective is 2P, where P is the is alien number of neurons. There is no way to declare one perspective the winner over the other, without knowing both N and P. Remember in the original two envelopes problem, you knew both the units, and the numerical value in your own envelope: this was not enough to avert the paradox.

See, the thing that's confusing me here is that there are many solutions to the two envelope problem, but none of them say "switching actually is good". They are all about how to explain why the EV reasoning is wrong and switching is actually bad. So in any EV problem which can be reduced to the two envelope problem, you shouldn't switch. I don't think this is confined to alien vs human things either: perhaps any situation where you are unsure about a conversion ratio might run into two envelopy problems, but I'll have to think about it.

Derek ShillerOct 18 20244

See, the thing that's confusing me here is that there are many solutions to the two envelope problem, but none of them say "switching actually is good".

What I've been suggesting is that when looking inside the envelope, it might subsequently make sense to switch depending upon what you see: when assessing human/alien tradeoffs, it might make sense to prefer helping the aliens depending on what it is like to be human. (It follows that it could have turned out that it didn't make sense to switch given certain human experiences -- I take this to play out in the moral weights context with the assumption that given certain counterfactual qualities of human experience, we might have preferred different schemes relating the behavioral/neurological indicators to the levels of welfare.)

This is not at all a rare view among academic discussions, particularly given the assumption that your prior probabilities should not be equally distributed over an infinite number of possibilities about what each of your experiences will be like (which would be absurd in the human/alien case).

Michael St Jules 🔸Oct 17 20246

The humans and aliens have (at least slightly) different concepts for the things they're valuing, each being informed by and partly based on their own direct experiences, which differ. So they can disagree on the basis of caring about different things and having different views.

This is like one being hedonistic utilitarian and the other being preference utilitarian. They're placing value on different concepts. It's not problematic for them to disagree.

Lukas FinnvedenOct 19 20242

I agree that having a prior and doing a bayesian update makes the problem go away. But if that's your approach, you need to have a prior and do a bayesian update — or at least do some informal reasoning about where you think that would lead you. I've never seen anyone do this. (E.g. I don't think this appeared in the top-level post?)

E.g.: Given this approach, I would've expected some section that encouraged the reader to reflect on their prior over how (dis)valuable conscious experience could be, and asked them to compare that with their own conscious experience. And if they were positively surprised by their own conscious experience (which they ought to have a 50% chance of being, with a calibrated prior) — then they should treat that as crucial evidence that humans are relatively more important compared to animals. And maybe some reflection on what the author finds when they try this experiment.

I've never seen anyone attempt this. My explanation for why is that this doesn't really make any sense. Similar to Tomasik, I think questions about "how much to value humans vs. animals having various experiences" comes down to questions of values & ethics, and I don't think that these have common units that it makes sense to have a prior over.

Michael St Jules 🔸Oct 17 20242

Then they will go have experiences, and regardless of what they experience, if they then choose to "pin" the EV-calculation to their own experience, the EV of switching to benefitting non-humans will be positive. So they'll pay 2 pennies to switch back again. So they 100% predictably lost a penny. This is irrational.

You're assuming they will definitely have a human experience (e.g. because they are human) and so switch to benefitting non-humans. If you're assuming that, but not allowing them to assume that themselves, then they're being exploited through asymmetric information or their priors not matching the situation at hand, not necessarily irrationality.

If they assume they're human, then they can pin to what they'd expect to experience and believe as a human (even if they haven't experienced it yet themself), and then they'd just prioritize non-humans from the start and never switch.

But you can instead assume it's actually 50-50 whether you end up as a human or an alien, and you have these two options:

at an extra cost of 1 penny, get the human experience, or get the alien experience, 50% probability each, pin to it, and help the other beings.
at no extra cost, flip a coin, with heads for helping humans and tails for helping aliens, and then commit to following through on that, regardless of whether you end up having human experience or alien experience.

I think there's a question of which is actually better. Does 2 stochastically dominate 1? You find something out in 1, and then help the beings you will come to believe it's best to help (although this doesn't seem like a proper Bayesian update from a prior). In 2, if you end up pinning to your own experience, you'll regret prioritizing humans if your experience is human, and you'll regret prioritizing aliens if your experience is alien.