Welcome to the EA Forum bot site. If you are trying to access the Forum programmatically (either by scraping or via the api) please use this site rather than forum.effectivealtruism.org.

This site has the same content as the main site, but is run in a separate environment to avoid bots overloading the main site and affecting performance for human users.

Quick takes

Show community
View more
Set topic
Frontpage
Global health
Animal welfare
Existential risk
Biosecurity & pandemics
10 more

In my opinion, one of the main things that EA / rationality / AI safety communities have going for them is that they’re extremely non-elitist about ideas. If you have a “good idea” and you write about it on one of the many public forums, it’s extremely likely to be read by someone very influential. And insofar as it’s actually a good idea, I think it’s quite likely to be taken up and implemented, without all the usual status games that might get in the way in other fields.

29
calebp
2d
6
In my opinion, one of the main things that EA / rationality / AI safety communities have going for them is that they’re extremely non-elitist about ideas. If you have a “good idea” and you write about it on one of the many public forums, it’s extremely likely to be read by someone very influential. And insofar as it’s actually a good idea, I think it’s quite likely to be taken up and implemented, without all the usual status games that might get in the way in other fields.

Musings on non-consequentialist altruism under deep unawareness

(This is a reply to a comment by Magnus Vinding, which ended up seeming like it was worth a standalone Quick Take.)

 

From Magnus:

For example, if we walk past a complete stranger who is enduring torment and is in need of urgent help, we would rightly take action to help this person, even if we cannot say whether this action reduces total suffering or otherwise improves the world overall. I think that's a reasonable practical stance, and I think the spirit of this stance applies to many ways in which we can and do benefit strangers, not just to rare emergencies.

The intuition here seems to be, "trying to actively do good in some restricted domain is morally right (e.g., virtuous), even when we're not justified in thinking this will have net-positive consequences[1] according to impartial altruism". Let's call this intuition Local Altruism is Right (LAR). I'm definitely sympathetic to this. I just think we should be cautious about extending LAR beyond fairly mundane "common sense" cases, especially to longtermist work.

For one, the reason most of us bothered with EA interventions was to do good "on net" in some sense. We weren't explicitly weighing up all the consequences, of course, but we didn't think we were literally ignoring some consequences — we took ourselves to be accounting for them with some combination of coarse-grained EV reasoning, heuristics, "symmetry" principles, discounting speculative stuff, etc. So it's suspiciously convenient if, once we realize that that reason was confused, we still come to the same practical conclusions.

Second, for me the LAR intuition goes away upon reflection unless at least the following hold (caveat in footnote):[2]

  1. The "restricted domain" isn't too contrived in some sense, rather it's some natural-seeming category of moral patients or welfare-relevant outcome.
    1. (How we delineate "contrived" vs. "not contrived" is of course rather subjective, which is exactly why I'm suspicious of LAR as an impartial altruistic principle. I'm just taking the intuition on its own terms.)
  2. I'm at least justified in (i) expecting my intervention to do good overall in that domain, and (ii) expecting not to have large off-target effects of indeterminate net sign in domains of similar "speculativeness" (see "implementation robustness").
    1. ("Speculativeness", too, is subjective. And while I definitely find it intuitive that our impacts on more robustly foreseeable moral patients are privileged,[3] I haven't found a satisfying way of making sense of this intuition. But if we want to respect this intuition, condition (ii) seems necessary. If the set of moral patients A seems no less robustly foreseeable than set B, why would I morally privilege B? Cf. my discussion here.)

Some examples:

  • Trying to reduce farmed animal suffering: Depends on the intervention.
    • I have the LAR intuition for things like donating to humane invertebrate slaughter research, which doesn't seem to have large backfire risks on other animals in the near term. (Even if it does plausibly have large backfire risks for future digital minds, say.)
    • For me LAR is considerably weaker for things like vegan advocacy, which has a lot of ambiguous effects on wild animal suffering (which don't seem more "speculative", given the fairly robust-seeming arguments Tomasik has written about).
  • Trying to prevent digital suffering (even pre-space colonization): All the interventions I'm aware of are so apparently non-implementation-robust that I don't have the Locally Virtuous intuition here. Example here.
    • If we instead said something like "this intervention is implementation-robust w.r.t. helping some specific subset of digital minds," that subset feels contrived, so I still wouldn't have the intuition in favor of the intervention.
  • Trying to prevent extinction: (Assuming a non-suffering-focused view for the sake of argument, because lots of EAs seem to think trying to prevent extinction is common-sensically good/right.) As I've argued here and here, the interventions EAs have proposed to reduce extinction risk don't seem to satisfy condition (i) of implementation robustness above. Even if they did, off-target effects on animal suffering arguably undermine condition (ii) (see e.g. this post).

 

None of which is to say I have a fleshed-out theory! I'm keen to think more about what non-consequentialist altruism under unawareness might look like.

  1. ^

    I mean to include Clifton's Option 3 as a possible operationalization of "net-positive consequences according to impartial altruism".

  2. ^

    In the definition of LAR, "trying to actively do good" is the key phrase. I find it pretty intuitive that we don't need conditions nearly as strong as (1)+(2) below when we're asking, "Should you refrain from doing [intuitively evil thing]?"

  3. ^

    Maybe the most promising angle is to show that it's normatively relevant that our beliefs about the more distant moral patients are (qualitatively?) less grounded in good reasons (see Clifton).

Musings on non-consequentialist altruism under deep unawareness (This is a reply to a comment by Magnus Vinding, which ended up seeming like it was worth a standalone Quick Take.)   From Magnus: The intuition here seems to be, "trying to actively do good in some restricted domain is morally right (e.g., virtuous), even when we're not justified in thinking this will have net-positive consequences[1] according to impartial altruism". Let's call this intuition Local Altruism is Right (LAR). I'm definitely sympathetic to this. I just think we should be cautious about extending LAR beyond fairly mundane "common sense" cases, especially to longtermist work. For one, the reason most of us bothered with EA interventions was to do good "on net" in some sense. We weren't explicitly weighing up all the consequences, of course, but we didn't think we were literally ignoring some consequences — we took ourselves to be accounting for them with some combination of coarse-grained EV reasoning, heuristics, "symmetry" principles, discounting speculative stuff, etc. So it's suspiciously convenient if, once we realize that that reason was confused, we still come to the same practical conclusions. Second, for me the LAR intuition goes away upon reflection unless at least the following hold (caveat in footnote):[2] 1. The "restricted domain" isn't too contrived in some sense, rather it's some natural-seeming category of moral patients or welfare-relevant outcome. 1. (How we delineate "contrived" vs. "not contrived" is of course rather subjective, which is exactly why I'm suspicious of LAR as an impartial altruistic principle. I'm just taking the intuition on its own terms.) 2. I'm at least justified in (i) expecting my intervention to do good overall in that domain, and (ii) expecting not to have large off-target effects of indeterminate net sign in domains of similar "speculativeness" (see "implementation robustness"). 1. ("Speculativeness", too, is subjective. And whil

Distillation for Robust Unlearning Paper (https://arxiv.org/abs/2506.06278) makes me re-interested in the idea of using distillation to absorb the benefits of a Control Protocol (https://arxiv.org/abs/2312.06942).

I thought that was a natural "Distillation and Amplification" next step based for control anyways, but the empirical results for unlearning make me excited about how this might work for control again.

Like, I guess I am just saying that if you are actually in a regime where you are using Trusted model some nontrivial fraction of the time, you might be able to distill off of that.

I relate it to the idea of iterated amplification and distillation; the control protocol is the scaffold/amplification. Plus, it seems natural that your most troubling outputs would receive special attention from bot/human/cyborg overseers and receive high quality training feedback.

Training off of control might make no sense at all if you then think of that model as just one brain playing a game with itself that it can always rig/fake easily. And since a lot of the concern is scheming, this might basically make the "control protocol distill" dead on arrival because any worthwhile distill would still need to be smart enough that it might be sneak attacking us for roughly the same reasons the original model was and even extremely harmless training data doesn't help us with that.

Seems good to make the model tend to be more cool and less sketchy even if it would only be ~"trusted model level good" at some stuff. Idk though, I am divided here.

Distillation for Robust Unlearning Paper (https://arxiv.org/abs/2506.06278) makes me re-interested in the idea of using distillation to absorb the benefits of a Control Protocol (https://arxiv.org/abs/2312.06942). I thought that was a natural "Distillation and Amplification" next step based for control anyways, but the empirical results for unlearning make me excited about how this might work for control again. Like, I guess I am just saying that if you are actually in a regime where you are using Trusted model some nontrivial fraction of the time, you might be able to distill off of that. I relate it to the idea of iterated amplification and distillation; the control protocol is the scaffold/amplification. Plus, it seems natural that your most troubling outputs would receive special attention from bot/human/cyborg overseers and receive high quality training feedback. Training off of control might make no sense at all if you then think of that model as just one brain playing a game with itself that it can always rig/fake easily. And since a lot of the concern is scheming, this might basically make the "control protocol distill" dead on arrival because any worthwhile distill would still need to be smart enough that it might be sneak attacking us for roughly the same reasons the original model was and even extremely harmless training data doesn't help us with that. Seems good to make the model tend to be more cool and less sketchy even if it would only be ~"trusted model level good" at some stuff. Idk though, I am divided here.

The Job Market is rough right now.

I am reaching out to my EA network - the job market is super tough right now and I know a few of us are looking for new roles. Anyone know of any People Operations / HR roles for someone based in London, looking to work remotely/globally? TYSM!

The Job Market is rough right now. I am reaching out to my EA network - the job market is super tough right now and I know a few of us are looking for new roles. Anyone know of any People Operations / HR roles for someone based in London, looking to work remotely/globally? TYSM!

Popular comments