quila

63 karmaJoined www.lesswrong.com/users/quila

Bio

autistic, thinking about alignment foundations, suffering-focused-altruist

Posts
1

Sorted by New
2
quila
· · 1m read

Comments
39

summary of my comment: in anthropics problems where {what your sense of of 'probability' yields} is unclear, instead of focusing on 'what the probability is', focus on fully comprehending the situation and then derive what action you prefer to take.[1]

 

imv this is making a common mistake ("conflating logical and indexical uncertainty"). here's what i think the correct reasoning is. it can be done without ever updating our probability about what the rate is.

i write two versions of this for two cases: 

  • case 1, where something like many-worlds is true, in which case as long as the rate of omnideath events is below ~100%, there will always be observers like you. 

    case 2, where there is only one small world, in which case if the rate of an omnideath event occurring at least once were high there might be 'no observers' or 'no instantiated copies of you'.

    i try to show these are actually symmetrical.

    finally i show symmetry to a case 3, where our priors are 50/50 between two worlds, in one of which we certainly (rather than probably) would not exist; we do not need to update our priors (in response to our existence), even in that case, to choose what to do.

     

case 1: where something like many-worlds is true.

  1. we're uncertain about rate of omnideath events.
    • simplifying premise: the rate is discretely either high or very low, and we start off with 50% credence on each
  2. we're instantiated ≥1 times in either case, so our existence does not have any probability of ruling either case out.
  3. we could act as if the rate is low, or act as if it is high. these have different ramifications:
    • if we act as if the rate is low, this has expected value equal to: prior(low rate) × {value-of-action per world, conditional on low rate} × {amount of worlds we're in, conditional on low rate}
    • if we act as if the rate is high, this has expected value equal to: prior(high rate) × {value-of-action per world, conditional on high rate} × {amount of worlds we're in, conditional on high rate}
  4. (if we assume similar conditional 'value-of-action per world' in each), then acting as if the rate is low has higher EV (on those 50/50 priors), because the amount of worlds we're in is much higher if the rate of omnideath events is low.

note that there is no step here where the probability is updated away from 50/50.

probability is the name we give to a variable in an algorithm, and this above algorithm goes through all the relevant steps without using that term. to focus on 'what the probability becomes' is just a question of definitions: for instance, if defined as {more copies of me in a conditional = (by definition) 'higher probability' assigned to that conditional}, then that sense of probability would 'assign more to a low rate' (a restatement of the end of step 4).

 

onto case 2: single world

(i expect the symmetry to be unintuitive. i imagine a reader having had a thought process like this:

"if i imagine a counterfactual where the likelihood of omnideath events per some unit of time is high, then because [in this case 2] there is only one world, it could be the case that my conditionals look like this: 'there is one copy of me if the rate is low, but none at all if the rate is high'. that would be probabilistic evidence against the possibility it's high, right? because i'm conditioning on my existence, which i know for sure occurs at least once already." (and i see this indicated in another commenter's, "I also went on quite a long digression trying to figure out if it was possible to rescue Anthropic Shadow by appealing to the fact that there might be large numbers of other worlds containing life")

i don't know if i can make the symmetry intuitive, so i may have to abandon some readers here.)

  1. prior: uncertain about rate of omnideath events.
    • (same simplifying premise)
  2. i'm instantiated exactly once.
    • i'm statistically more probable to be instantiated exactly once if the rate is low.
      • though, a world with a low rate of omnideath events could still have one happen, in which case i wouldn't exist conditional on that world.
    • i'm statistically more probable to be instantiated exactly zero times if the rate is high.
      • though, a world with a high rate of omnideath events could still have one not happen, in which case i would exist conditional on that world.
  3. (again then going right to the EV calculation):

    we could act as if the rate is low, or act as if it is high. these have different ramifications:

    • if we act as if the rate is low, this has expected value equal to: prior(low rate) × {value-of-action in the sole world, conditional on low rate} × {statistical probability we are alive, conditional on low rate}
    • if we act as if the rate is high, this has expected value equal to: prior(high rate) × {value-of-action in the sole world, conditional on high rate} × {statistical probability we are alive, conditional on high rate}

      (these probabilities are conditional on either rate already being true, i.e p(alive|some-rate), but we're not updating the probabilities of either rate themselves. this would violate bayes' formula if we conditioned on 'alive', but we're intentionally avoiding that here.

      in a sense we're avoiding it to rescue the symmetry of these three cases. i suspect {not conditioning on one's own existence, rather preferring to act as if one does exist per EV calc} could help with other anthropics problems too.)

  4. (if we assume similar conditional 'value-of-action' in each sole world), then acting as if the rate is low has higher EV (on those 50/50 priors), because 'statistical probability we are alive' is much higher if the rate of omnideath events is low.

(i notice i could use the same sentences with only a few changes from case 1, so maybe this symmetry will be intuitive)

(to be clear, in reality our prior might not be 50/50 (/fully uncertain); e.g., maybe we're looking at the equations of physics and we see that it looks like there should be chain reactions which destroy the universe happening frequently, but which are never 100% likely; or maybe we instead see that they suggest world-destroying chain reactions are impossible.)

main takeaway: in anthropics problems where {what your sense of of 'probability' yields} is unclear, instead of focusing on 'what the probability is', focus on fully comprehending the situation and then derive what action you prefer to take.

 

(case 3) i'll close with a further, more fundamental-feeling insight: this symmetry holds even when we consider a rate of 100% rather than just 'high'. as in, you can run that through this same structure and it will give you the correct output.

the human worded version of that looks like, "i am a fixed mathematical function. if there are no copies of me in the world[2], then the output of this very function is irrelevant to what happens in the world. if there are copies of me in the world, then it is relevant / directly corresponds to what effects those copies have. therefore, because i am a mathematical function which cares about what happens in worlds (as opposed to what the very function itself does), i will choose my outputs as if there are copies of me in the world, even though there is a 50% chance that there are none."

  1. ^

    (as in the 'sleeping beauty problem', once you have all the relevant variables, you can run simulations / predict what choice you'd prefer to take (such as if offered to bet that the day is tuesday) without ever 'assigning a probability') 

  2. ^

    (note: 'the world' is underdefined here, 'the real world' is not a fundamental mathematical entity that can be pointed to in any formal systems we know of, though we can fix this by having the paragraph, instead of referring to 'the world', refer to some specific function which the one in question is uncertain about whether itself is contained in it.

    we seem to conceptualize the 'world' we are in as some particular 'real world' and not just some abstract mathematical function of infinitely many. i encourage thinking more about this, though it's unrelated to the main topic of this comment.)

That is interesting.

I think that most beings[1] who I've become very close with (I estimate this reference class is ≤10) have gone vegan in a way that feels like it was because of our discussions, though it's possible they would have eventually otherwise, or that our discussions were just the last straw.

In retrospect, I wonder if I had first emanated a framing of taking morality seriously (such that "X is bad" -> "I will {not do X} or try to stop X"). I think I also tended to become that close with beings who do already, and who are more intelligent/willing to reflect than what seems normal.

  1. ^

    (I write 'being' because some are otherkin)

already in the thread we've got examples of people considering whether murdering someone who eats meat isn't immoral

If the question were about humans who cause an equivalent amount of harm to other humans, I would not expect to see objection to the question merely being asked or considered. When humans are at risk, this question is asked even when the price is killing (a lower number of) humans who are not causing the harm. It is true that present human culture applies such a double standard to humans versus to members of other species, but this is not morally relevant and should not influence what moral questions one allows themself to consider (though it still does, empirically. This is relevant to a principle introduced below).

I think that this question is both intuitive to ask and would be important in a neartermist frame given the animal lives at stake. It has also been discussed in at least one published philosophy paper.[1] That paper concludes (on this question) that in the current world, it is a much less effective way of reducing animal torture compared to other ways, and so shouldn't be done in order to avoid ending up arrested and unable to help animals in far more effective ways, but that it would likely reduce more suffering than it causes.[2] That is my belief too, by which I mean that seems to be the way the world is, not that I like that the world is this way. (This is a core rationalist principle, which I believe is also violated by other of your points in the 'pattern matches to dangerous beliefs' section.)

The Litany of Tarski is a template to remind oneself that beliefs should stem from reality, from what actually is, as opposed to what we want, or what would be convenient. For any statement X, the litany takes the form "If X, I desire to believe that X".

 

I think there are other instances of different standards being applied to how we treat extreme harm of humans versus extreme harm of members of other species throughout your comment.[3]

For example, I think that if one believes factory farming is a moral catastrophe (as you do), and if discomfort originated from that one's morality alone, then the use of 'meat' over 'animals' or 'animal bodies' would cause more discomfort than the use of 'meat eater' over 'meat eating'.

That's not to say I don't think the term 'eating' instead of 'eater' is better, or rather, more generally, that language should not have words that refer to a being by one of their malleable behaviors or attitudes. I might favor such a general linguistic change.

However, if this were a discussion of great ongoing harm being caused to humans, such as through abuse or murder, I would not expect to find comments objecting to referring to the humans causing that harm as 'abusers' or 'murderers' on the basis that they might stop in the future.[4] (I'm solely commenting on the perceived double standard here.)

 

There are other examples (in section three), but I can (in general) only find words matching my thoughts very slowly (this took me almost two hours to write and revise) so I'm choosing to stop here.

  1. ^

    https://journalofcontroversialideas.org/article/2/2/206

    In the Journal of Controversial Ideas, co-founded by Peter Singer. (wikipedia)

  2. ^

    I think this also shows that this question is importantly two questions:

    1. Is it right to kill someone who would otherwise continually cause animals to be harmed and killed, in isolation, i.e in a hypothetical thought-experiment-world where there's no better way to stop this, and doing so will not prevent you from preventing greater amounts of harm?
      • In this case, 'yes' feels like the obvious answer to me.
      • I also think it would feel like an obvious answer for most people if present biases towards members of other species were removed, for most would say 'yes' to the version of this question about a human creating and killing humans.
    2. Is it right to kill someone who would otherwise continually cause animals to be harmed and killed, in the current world, where this will lead to you being imprisoned?
      • In this case, 'no' feels like the obvious answer to me, because you could do more good just by causing two humans to go vegan for life, and even more good by following EA principles.
  3. ^

    (To preclude certain objections: These are different standards which would not be justified by members of a given species experiencing only less suffering from 0-2 years of psychological desperation and physical torture than humans would from that same situation).

  4. ^

    (Relatedly, after reading your comment, one thing I tried was to read it again with reference to {people eating animals} mentally replaced with reference to {people enacting moral catastrophes that are now widely opposed}, to isolate the 'currently still supported' variable, to see if anything in my perception or your comment would be unexpected if that variable were different, despite it not being a morally relevant variable. This, I think, is a good technique for avoiding/noticing bias.)

This is so interesting to me.

I introduced this topic and wrote more about it in this shortform. I wanted to give the topic its own thread and see if others might have responses.

I don't want to be screwed by tail outcomes. I want to hedge against them.

I do this too, but even despite the worlds size making my choices mostly only effecting value on the linear parts of my value function! Because tail outcomes are often large. (Maybe I mean something like kelly-betting/risk-aversion is often useful to fulfill instrumental subgoals too).

(Edit: and I think 'correctly accounting for tail outcomes' is just the correct way to deal with them).

saying you have a concave utility function means you genuinely place lower value on additional lives given the presence of many lives

Yes, though it's not because additional lives are less intrinsically valuable, but because I have other values which are non-quantitative (narrative) and almost maxxed out way before there are very large numbers of lives.

A different way to say it would be that I value multiple things, but many of them don't scale indefinitely with lives, so the overall function goes up faster at the start of the lives graph.

Are your values about the world, or the effects of your actions on the world?

An agent who values the world will want to effect the world, of course. These have no difference in effect if they're both linear, but if they're concave...

Then there is a difference.[1]

If an agent has a concave value function which they use to pick each individual action:  where L is the amount of lives saved by the action, then that agent would prefer a 90% chance of saving 1 life (for √1 × .9 = .9 utility), over a 50% chance of saving 3 lives (for √3 × .5 = .87 utility). The agent would have this preference each time they were offered the choice.

This would be odd to me, partly because it would imply that if they were presented this choice enough times, they will appear to overall prefer an x% chance at saving n lives to an x% chance of saving >n lives. (Or rather, the probability distribution version instead of discrete version of that statement)

For example, after taking the first option 10 times, the probability distribution over amount of lives saved looks like this (on the left side). If they had instead took the second option 10 times, it would look like this (right side)

(Note: Claude 3.5 Sonnet wrote the code to display this and to calculate the expected utility, so I'm not certain it's correct. Calculation output and code in footnote[2])

Now if we prompted the agent to choose between each of these probability distributions, they would assign an average utility of 3.00 to the one on the left, and 3.82 to the one on the right, which from the outside looks like contradicting their earlier sequence of choices.[3]

We can generalize this beyond this example to say that, in situations like this, the agent's best action is to precommit to take the second option repeatedly.[4]

We can also generalize further and say that for an agent with a concave function used to pick individual actions, the initial action which scores the highest would be to self-modify into (or commit to taking the actions of) an agent with a concave utility function over the contents of the world proper.[5]

I wrote this after having a discussion (starts ~here at the second quote) with someone who seemed to endorse following concave utility functions over the possible effects of individual actions.[6] I think they were drawn to this as a formalization of 'risk aversion', though, so I'd guess that if they find the content of this text true, they'd want to continue acting in a risk-averse-feeling way, but may search for a different formalization.

My motive for writing this though was mostly intrigue. I wasn't expecting someone to have a value function like that, and I wanted to see if others would too. I wondered if I might have just been mind-projecting this whole time, and if actually this might be common in others, and if that might help explain certain kinds of 'risk averse' behavior that I would consider suboptimal at fulfilling one's actual values[7] (this is discussed more extensively in my linked comment).

  1. ^

    Image from 'All About Concave and Convex Agents'.

    For discussion of the actual values of some humans, I recommend 'Value Theory'

  2. ^
    Calculation for Option 1:
    Lives  Probability  Utility Prob * Utility
    ------------------------------------------
        0       0.0000   0.0000         0.0000
        1       0.0000   1.0000         0.0000
        2       0.0000   1.4142         0.0000
        3       0.0000   1.7321         0.0000
        4       0.0001   2.0000         0.0003
        5       0.0015   2.2361         0.0033
        6       0.0112   2.4495         0.0273
        7       0.0574   2.6458         0.1519
        8       0.1937   2.8284         0.5479
        9       0.3874   3.0000         1.1623
       10       0.3487   3.1623         1.1026
    ------------------------------------------
        Total expected utility:         2.9956
    
    Calculation for Option 2:
    Lives  Probability  Utility Prob * Utility
    ------------------------------------------
        0       0.0010   0.0000         0.0000
        3       0.0098   1.7321         0.0169
        6       0.0439   2.4495         0.1076
        9       0.1172   3.0000         0.3516
       12       0.2051   3.4641         0.7104
       15       0.2461   3.8730         0.9531
       18       0.2051   4.2426         0.8701
       21       0.1172   4.5826         0.5370
       24       0.0439   4.8990         0.2153
       27       0.0098   5.1962         0.0507
       30       0.0010   5.4772         0.0053
    ------------------------------------------
        Total expected utility:         3.8181

    code: 

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.stats import binom
    
    def calculate_utility(lives_saved):
        return np.sqrt(lives_saved)
    
    def plot_distribution(prob_success, lives_saved, n_choices, option_name):
        x = np.arange(n_choices + 1) * lives_saved
        y = binom.pmf(np.arange(n_choices + 1), n_choices, prob_success)
        
        bars = plt.bar(x, y, alpha=0.8, label=option_name)
        plt.xlabel('Number of lives saved')
        plt.ylabel('Probability')
        plt.title(f'Probability Distribution for {option_name}')
        plt.xticks(x)
        plt.legend()
        
        for bar in bars:
            height = bar.get_height()
            plt.text(bar.get_x() + bar.get_width()/2., height/2,
                     f'{height:.2%}',  # Changed to percentage format
                     ha='center', va='center', rotation=90, color='white')
    
    def calculate_and_print_details(prob_success, lives_saved, n_choices, option_name):
        x = np.arange(n_choices + 1) * lives_saved
        p = binom.pmf(np.arange(n_choices + 1), n_choices, prob_success)
        
        print(f"\nDetailed calculation for {option_name}:")
        print(f"{'Lives':>5} {'Probability':>12} {'Utility':>8} {'Prob * Utility':>14}")
        print("-" * 42)
        
        total_utility = 0
        for lives, prob in zip(x, p):
            utility = calculate_utility(lives)
            weighted_utility = prob * utility
            total_utility += weighted_utility
            print(f"{lives:5d} {prob:12.4f} {utility:8.4f} {weighted_utility:14.4f}")
        
        print("-" * 42)
        print(f"{'Total expected utility:':>27} {total_utility:14.4f}")
        
        return total_utility
    
    # Parameters
    n_choices = 10
    prob_1, lives_1 = 0.9, 1
    prob_2, lives_2 = 0.5, 3
    
    # Calculate and print details
    print("Calculation for Option 1:")
    eu_1 = calculate_and_print_details(prob_1, lives_1, n_choices, "Option 1")
    print("\nCalculation for Option 2:")
    eu_2 = calculate_and_print_details(prob_2, lives_2, n_choices, "Option 2")
    
    # Plot distributions
    plt.figure(figsize=(15, 6))
    
    plt.subplot(1, 2, 1)
    plot_distribution(prob_1, lives_1, n_choices, "Option 1 (90% chance of 1 life)")
    plt.subplot(1, 2, 2)
    plot_distribution(prob_2, lives_2, n_choices, "Option 2 (50% chance of 3 lives)")
    
    plt.tight_layout()
    plt.show()
    
    print(f"\nFinal Results:")
    print(f"Expected utility for Option 1: {eu_1:.4f}")
    print(f"Expected utility for Option 2: {eu_2:.4f}")
  3. ^

    (of course, if it happened it wouldn't really be a contradiction, it would just be a program being run according to what it says)

  4. ^

    (Though, if one accepts that, I have a nascent intuition that the same logic forces one to accept what I was writing about Kelly betting in the discussion this came from.)

  5. ^

    Recall that actions are picked only individually, not according to the utility the current function would assign to future choices made under the new utility function.

    (That would instead have its own exploits, namely looping between many small positive actions and one big negative 'undoing' action whose negative utility is square-rooted)

  6. ^

    (I initially thought they meant over the total effects of all their actions throughout their past and future, rather than per action.)

  7. ^

    I'll claim that if one doesn't reflectively endorse optimally fulfilling some values, then those are not their actual values, but maybe are a simplified version of them.

Could you describe your intuitions? 'valuing {amount of good lives saved by one's own effect} rather than {amount of good lives per se}' is really unintuitive to me.

The parenthetical isn't why it's unexpected, but clarifying how it's actually different.

As an attempt at building intuition for why it matters, consider if an agent applied the 'square of lives saved by me' function newly to each action instead of keeping track of how many lives they've saved over their existence. Then this agent would gain more utility by taking four separate actions, each of which certainly save 1 life (for 1 utility each), than from one lone action that certainly saves 15 lives (for 3.87 utility). Then generalize this example to the case where they do keep track, and progress just 'resets' for new clones of them. Or the real-world case where there's multiple agents with similar values.

Why does the size of the world net of your decision determine the optimal decision?

I describe this starting from 6 paragraphs up in my edited long comment. I'm not sure if you read it pre- or post-edit.

It sounds like we agree about what risk aversion is! The term I use that includes your example of valuing the square root of lives saved is a 'concave utility function'. I have one of these, sort of; it goes up quickly for the first x lives (I'm not sure how large x is exactly), then becomes more linear.

But it's unexpected to me for other EAs to value {amount of good lives saved by one's own effect} rather than {amount of good lives per se}. I tried to indicate in my comment that I think this might be the crux, given the size of the world.

(In your example of valuing the square root of lives saved (or lives per se), if there's 1,000 good lives already, then preventing 16 deaths has a utility of 4 under the former, and  under the latter; and preventing 64 is twice as valuable under the former, but ~4x as valuable under the latter)

Your impact matrix places all its weight on the view that animals a high enough moral value that donating to humans is net negative

If by weight you meant probability, then placing 100% of that in anything is not implied by a discrete matrix, which must use expected values (i.e the average of {probability × impact conditional on probability}). One could mentally replace each number with a range for which the original number is the average.

(It is the case that my comment premises a certain weighting, and humans should not update on implied premises, except in case of beliefs about what may be good to investigate, to avoid outside-view cascades.)

If you have a lot of uncertainty and you are risk averse

I think beliefs about risk-aversion are probably where the crux between us is. 

Uncertainty alone does not imply one should act in proportion to their probabilities.[1]

I don't know what is meant by 'risk averse' in this context. More precisely, I claim risk aversion must either (i) follow instrumentally from one's values, or (ii) not be the most good option under one's own values.[2]

  • Example of (i), where acting in a way that looks risk-averse is instrumental to fulfilling ones actual values: The Kelly criterion.

    In a simple positive-EV bet, like at 1:2-odds on a fair coinflip, if one continually bets all of their resources, the probability they eventually lose everything approaches 1 as all their gains are concentrated into an unlikely series of events, resulting in many possible worlds where they have nothing and one where they have a huge amount of resources. The average resources had across all possible worlds is highest in this case.

    Under my values, that set of outcomes is actually much worse than available alternatives (due to diminishing value of additional resources in a single possible world). To avoid that, we can apply something called the Kelly criterion, or in general bet with sums that are substantially smaller than the full amount of currently had resources.

    This lets us choose the distribution of resources over possible worlds that our values want to result from resource-positive-EV bets; we can accept a lower average for a more even distribution.

    Similarly, if presented with a series of positive-EV bets about things you find morally valuable in themselves, I claim that if you Kelly bet in that situation, it is actually because your values are more complex than {linearly valuing those things} alone.

    As an example, I would prefer {a 90% chance of saving 500 good lives} to {a certainty of saving 400} in a world that already had many lives, but if those 500 lives were all the lives that exist, I would switch to preferring the latter - a certainty of only 100 of the 500 dying - even if the resulting quantities then became the eternal maximum (no creation of new minds possible, so we can't say it actually results in a higher expected amount).

    This is because I have other values that require just some amount of lives to be satisfied, including vaguely 'the unfolding of a story', and 'the light of life/curiosity/intelligence continuing to make progress in understanding metaphysics until no more is possible'.

    Another way to say this would be to say that our values are effectively concave over the the thing in question, and we're distributing them across possible futures.

    This is importantly not what we do when we make a choice in an already large world, and we're not effecting all of it - then we're choosing between, e.g., {90%: 1,000,500, 10%: 1,000,000} and {100%: 1,000,400}. (And notably, we are in a very large world, even beyond Earth.)

    At least my own values are over worlds per se, rather than the local effects of my actions per se. Maybe the framing of the latter leads to mistaken Kelly-like tradeoffs[3], and acting as if one assigns value to the fact of being net-positive itself.

    (I expanded on this section about Kelly in a footnote at first, then had it replace example (i) in the main post. I think it might make the underlying principle clear enough to make example (ii) unnecessary, so I've moved (ii) to a footnote instead.)[4]

  1. ^

    There are two relevant posts from Yudkowsky's sequences that come to mind here. I could only find one of them, 'Circular Altruism'. The other was about a study wherein people bet on multiple outcomes at once in proportion to the probability of each outcome, rather than placing their full bet on the most probable outcome, in a simple scenario where the latter was incentivized.

  2. ^

    (Not including edge-cases where an agent values being risk-averse)

  3. ^

    It just struck me that some technical term should be used instead of 'risk aversion' here, because the latter in everyday language includes things like taking a moment to check if you forgot anything before leaving home.

  4. ^

    Example of (ii), where I seem to act risk-unaverse

    I'm offered the option to press a dubious button. This example ended up very long, because there is more implied uncertainty than just the innate chances of the button being of either kind, but maybe the extra detail will help show what I mean / be more surface for a cruxy disagreement to be exposed.

    I think (66%) it's a magic artifact my friends have been looking for, in which case it {saves 1 vegan[5] who would have died} when pressed. But I'm not sure; it might also be (33%) a cursed decoy, in which case it {causes 1 vegan[5] who would not have died to die} when pressed instead.

    • I can't gain evidence about which possible button it is. I have only my memories and reasoning with which to make the choice of how many times to press it.
    • Simplifying assumptions to try to make it closer to a platonic ideal than a real-world case (can be skipped):
      • The people it might save or kill will all have equal counterfactual moral impact (including their own life) in the time which would be added or taken from their life
      • Each death has an equal impact on those around them
      • The button can't kill the presser

        These are unrealistic, but they mean I don't have to reason about how at-risk vegans are less likely to be alignment researchers than non-at-risk vegans who I risk killing, or how I might be saving people who don't want to live, or how those at risk of death would have more prepared families, or how my death could cut short a series of bad presses, anything like that.

    In this case, I first wonder what it means to 'save a life', and reason it must mean preventing a death that would otherwise occur. I notice that if no one is going to die, then no additional lives can be saved. I notice that there is some true quantity of vegans who will die absent any action, and I would like to just press the button exactly that many times, but I don't know that true quantity, so I have reason about it under uncertainty.

    So, I try to reason about what that quantity is by estimating an amount of lives at various levels of at-risk; and though my estimates are very uncertain (I don't know what portion of the population is vegan, nor how likely different ones are to die), I still try.

    In the end I have a wide probability distribution that is not very concentrated at any particular point, and which is not the one an ideal reasoner would produce, and because I cannot do any better, I press the button exactly as many times as there are deaths in the distribution's average[6].

    More specifically, I stop once it has a ≤ 50% chance of {saving an additional life conditional on it already being a life-saving button}, because anything less, when multiplied by the 66% chance of it being a life-saving button, would be an under 33% total chance of saving a life compared to a 33% chance of certainly ending one. The last press will have only a very slightly positive EV, and one press further would have a very slightly negative EV. 

    Someone following a 'risk averse principle' might stop pressing once their distribution says an additional press scores less than 60% on that conditional, or something. They may reason, "Pressing it only so many times seems likely to do good across the vast majority of worldviews in the probability distribution," and that would be true.

    In my view, that's just accepting the opposite trade: declining a 60% chance of preventing a death in return for a 40% chance of preventing a death.

    I don't see why this simple case would not generalize to reasoning about real-world actions under uncertainties about different things like how bad the experience would be as a factory farmed animal. But it would be positive for me to learn of such reasons if I'm missing something.

  5. ^

    (To avoid, in the thought experiment, the very problem this post is about)

  6. ^

    (given the setup's simplifying assumptions. in reality, there might be a huge average number that mostly comes from tail-worlds, let alone probable environment hackers)

So a portfolio of global health and animal welfare donations seems likely to do good across both worldviews.

Yes, sufficient donation to animal welfare can make it net positive. But it doesn't sound so good when one draws out the impact matrix:

  • Donate only to animal charities: +100
  • Donate only to human charities: -10
    • (not -100, because animal welfare is neglected, so proactive work on it does more good than creating more passive animal-eaters does bad)
  • Donate half to both: +45

    (edited to add: If one thinks the animal eater problem's conclusions are more likely true, such that these numbers could represent an average of one's probability distribution. It looks like there's disagreement in values about whether it's valuable itself to for an action to be {at least positive} per se; I write about this extensively below)

 

That has the same structure as this:

  • Donate only to EA charities: +100
  • Donate only to bad-thing-x: -10
    • Not naming a particular bad-thing, because it's unnecessary, but you can imagine.
  • Donate half to both: +45

If these feel different, a relevant factor may be how uplifting humans isn't the kind of thing that narratively should be bad. It's a central example of something many feel is supposed to be good - improving or saving lives. And in a different world, it would be good, because humans are not inherently evil.

I would point to purchase fuzzies and utilons separately in situations like this. If one believes the median human life has a net-negative impact, and also cares a lot about humans, and feels compelled to 'act on' their care for humans somehow, there are various less-harmful or net-positive ways they could. Here are some ways from brainstorming:

  • Find effective positive-net-impact things to do that involve 'helping humans' instrumentally
  • Help humans in small-scale but emotionally resonant ways
    • Helping a community or friends, separately from EA focus
  • Find some human charities that are maybe less effective at saving humans, but also a lot less likely to be net-negative
  • Help humanity become more morally reflective rather than trying to help lives directly
  • Work on alignment to help humans and other animals in the long-term future (!)
Load more