OCB

Owen Cotton-Barratt

10792 karmaJoined

Sequences
3

Reflection as a strategic goal
On Wholesomeness
Everyday Longermism

Comments
972

Topic contributions
3

Thanks, I agree with your mathematics and think this framework is helpful for letting us zoom in to possible disagreements.

There are two places where I find myself sceptical of the framing in your comment:

  1. You're framing it as something like a haircut on the value of working on a particular topic. But I think that if you're targeting just some set of worlds where we don't get meaningful automated assistance for certain kinds of strategy work, it might be important to explicitly condition on that, rather than just think of it as a haircut.
    • We might then also ask about whether we have more or less leverage on these worlds. Some takes:
      • Of course it will depend a bunch on the specifics of the question we're wondering whether we can punt.
      • Broadly I think worlds where we don't get a bunch of automated assistance for strategy in a timely fashion look significantly worse than worlds where we do get this assistance.
        • This is compatible with either being higher-leverage.
        • An important type of leverage we may is the possibility of moving worlds from the first bucket to the second.
      • I'm actually pretty unsure which worlds we have more leverage over (there seem to be quite a lot of considerations pointing each way), and suspect that this question is more to-the-point than thinking of it as a haircut.
  2. You say that it's hard to reach or exceed a 90% chance. I find myself dubious of this. If we see something like a technological singularity, there will be an enormous amount of progress on very many dimensions. Some of the things that actors will eventually want to do will be tremendously harder than others. I certainly don't want to assume that people will get around to doing things that are a good idea and are possible as early as it's feasible for them to do so; but I think that if you go somewhat past that to the point where it's very easy to get the good thing, it's not hard for it to be a >90% safe bet to think it will happen.
    • As an analogy, I imagine people before the industrial revolution might reasonably have predicted that there would be a lot more capacity for thinking about space strategy before anyone went to the moon.

Maybe there's a common theme here: I have the impression that I'm more imagining a default world where we get these upgrades to strategic capacity in a timely fashion, and then considering deviations from that; and you're more saying "well maybe things look like that, but maybe they look quite different", and less privileging the hypothesis.

I guess I do just think it's appropriate to privilege this hypothesis. We've written about how even current or near-term AI could serve to power tools which advance our strategic understanding. I think that this is a sufficiently obvious set of things to build, and there will be sufficient appetite to build them, that it's fair to think it will likely be getting in gear (in some form or another) before most radically transformative impacts hit. I wouldn't want to bet everything on this hypothesis, but I do think it's worth exploring what betting on it properly would look like, and then committing a chunk of our portfolio to that (if it's not actively bad on other perspectives).

You discuss the idea of clauses that allow for later escape from poorly-conceived deals as a guardrail. This feels like a powerful possibility which might add a significant amount of robustness. 

But I'm wondering if the idea might be more broadly applicable than that. If we have the kind of machinery that allows us to add that kind of clause, maybe we could use it for the whole essence of the deal? Rather than specify up front what you wish to exchange, just specify the general principles of exchange -- and trust the smarter and wiser actors of the future to interpret it in a fair and benevolent manner. 

In general I think reading this article I'm finding that I have some sympathy for the central claim that there could be useful deals to strike early (that it isn't possible to strike later); however I find myself feeling quite sceptical of the frameworks for thinking about different types of deals etc. -- I don't see why we shouldn think that we have done more here than scrape the surface of the universe of possibilities, and my best guess is that actually-wise deals would look quite different than anything you're outlining. Curious what you make of this -- does this feel too radically sceptical or something?

I am sympathetic to this.

Like many people, I’ve been following this thread with dismay. I think that Frances’s experiences sound terrible, and seem very unnecessary.

I have hesitated to weigh in on this thread. But I agree that the answers can’t just be at the policy level; and I’m keen to see further discussion about cultural dynamics which may contribute to the issues[1]. At this point I’ve given this question a good amount of thought (though I could definitely still be wrong), so I wanted to highlight a couple of things people might want to consider:

Focus on intent

I’m glad Frances calls this out as a problem, as I think it’s underappreciated as a contributing factor to problematic dynamics. I actually think it has more issues beyond what she lists.

A focus on intent:

  • Gets in the way of straightforwardly talking about what kinds of behaviour are good or bad
  • Moves attention from “person X had bad experiences; what happened there and how could they have been avoided?” to “is person Y bad?” (which is liable to lead to people coming to Y’s defence, in a way that risks invalidating X’s experiences)
  • Sends the (potentially dangerous) message “if your intentions are good, there’s nothing to worry about”

Distrust of moral intuitions

(caveat: not sure I’m naming the truest version of this; but I’m pretty sure there’s something in this vicinity)

I think EA teaches people that it’s important to think through the implications of our actions, rather than relying on unconsidered moral intuitions. Which is correct! But I worry that sometimes people can absorb this lesson too far, and start not paying attention to their own moral intuitions when they don’t have explicit arguments for them[2].

A friend put it to me as “I think sometimes EA accidentally encourages a lack of groundedness”.

Anyway it’s pure speculation on my part to imagine this at play in CEA’s (in)actions. But rather than imagine that the people reading Riley’s document didn’t feel any discomfort, I find it easier to imagine them feeling a little uncomfortable about it but not trusting the discomfort, or orienting in a locally-consequentialist way and guessing that it would ultimately create more costs and be worse (possibly including worse-for-Frances) to escalate it rather than leave it be.


TBC, I don’t think that the right amount of focus on intent or distrust of our own moral intuitions is zero! And I absolutely think that it’s possible to do these in ways that are healthy. But if I'm right, then I kind of want people to be tracking the potential vulnerabilities from going too far in these directions; so wanted to share. I'll default to not posting more on this thread.

  1. ^

    For the removal of any ambiguity, I'm not trying to disclaim personal responsibility for my own past mistakes! But when things go wrong to the degree of causing harm, I think they've often gone wrong at several levels at once; it's useful to look at all of these.

  2. ^

    Or further: discount their own sense of right and wrong in order to defer to people who’ve thought about things more.

I disagree that 5 barely matters and is beside the point. I think doing 5 in an earnest way (as especially Holden's post is doing) is a move towards having the company acting in integrity in a forward-looking way. Maybe that move won't stick, but it really does feel meaningfully better to me to be finding somewhere solid to stand now rather than trying to paper things over. 

And it makes sense that people want to discuss 1-4 (I'm not entirely endorsing your descriptions here, but I don't think that's important), I just think it's better for everyone if it's clear that the thing they're upset about is 1-4 rather than 5.

I feel better about Anthropic as a result of this change, although I understand if people feel worse. But I think that the proper target of their upset should be past-Anthropic declaring that it would hold to kind of confused/dubious standards (which I worry may have been corrosive for people's ability to think clearly about what is needed), rather than current Anthropic correcting that.

(I previously felt that the RSP commitments were kind of "off" somehow, and reading the new things feels like fresh air, people taking a more serious look and engaging with the world for real. I don't think I should get any credit for this feeling! Indeed despite feeling that they were "off", I didn't super engage or even manage to get to the bottom of why they felt off. I'm just expressing my feelings as this reaction seemed like a missing mood in the conversation.)

Yep, I guess I'm into people trying to figure out what they think and which arguments seem convincing, and I think that it's good to highlight sources of perspectives that people might find helpful-according-to-their-own-judgement for that. I do think I have found Drexler's writing on AI singularly helpful on my inside-view judgements.

That said: absolutely seems good for you to offer counterarguments! Not trying to dismiss that (but I did want to explain why the counterargument wasn't landing for me).

On Dichotomy:

  • Because you've picked a particularly strong form of Maxipok to argue against, you're pushed into choosing a particularly strong form of Dichotomy that would be necessary to support it
  • But I think that this strong form of Dichotomy is relatively implausible to start with
    • And I would guess that Bostrom at the time of writing the article would not have supported it; certainly the passages you quote feel to me like they're supporting something weaker
  • Here's a weaker form of Dichotomy that I feel much more intuitive sympathy for:
    • Most things that could be "locked in" such that they have predictable long-term effects on the total value of our future civilization, and move us away from the best outcomes, actually constrain us to worlds which are <10% as good than the worlds without any such lock-in (and would therefore count as existential catastrophes in their own right)
  • The word "most" is doing work there, and I definitely don't think it's absolute (e.g. as you point out, the idea of diving the universe up 50/50 between a civilization that will do good things with it and one that won't); but it could plausibly still be enough to guide a lot of our actions

Looking at the full article:

  • OK I think I much more strongly object to the frame in this forum post than in the research article -- in particular, the research article is clear that it's substituting in a precisification you call Maxipok for the original principle
  • But I'm not sure what to make of this substitution! Even when I would have described myself as generally bought into Maxipok, I'm not sure if I would have been willing to sign up to this "precisification", which it seems to me is much stronger
    • In particular, your version is a claim about the existence of actions which are (close to) the best in various ways; whereas in order to discard Maxipok I would have wanted not just an existence proof, but practical guidelines for finding better things
  • You do provide some suggestions for finding better things (which is great), but you don't directly argue that trying to pursue those would be better in expectation than trying to follow Maxipok (or argue about in which cases it would be better)
    • This makes me feel that there's a bit of a motte-and-bailey: you've set up a particular strong precisification of Maxipok (that it's not clear to me e.g. Bostrom would have believed at the time of writing the paper you are critiquing); then you argue somewhat compellingly against it; then you conclude that it would be better if you people did {a thing you like but haven't really argued for} instead

(Having just read the forum summary so far) I think there's a bunch of good exploration of arguments here, but I'm a bit uncomfortable with the framing. You talk about "if Maxipok is false", but this seems to me like a type error. Maxipok, as I understand it, is a heuristic: it's never going to give the right answer 100% of the time, and the right lens for evaluating it is how often it gives good answers, especially compared to other heuristics the relevant actors might reasonably have adopted.

Quoting from the Bostrom article you link:

At best, maxipok is a rule of thumb or a prima facie suggestion.

It seems to me like when you talk about maxipok being false, you are really positing something like:

Strong maxipok: The domain of applicability of maxipok is broad, so that pretty much all impartial consequentialist actors should adopt it as a guiding principle

Whereas maxipok is a heuristic (which can't have truth values), strong maxipok (as I'm defining it here) is a normative claim, and can have truth values. I take it that this is what you are mostly arguing against -- but I'd be interested in your takes; maybe it's something subtly different.

I do think that this isn't a totally unreasonable move on your part. I think Bostrom writes in some ways in support of strong maxipok, and sometimes others have invoked it as though in the strong form. But I care about our collectively being able to have conversations about the heuristic, which is one that I think may have a good amount of value even if strong maxipok is false, and I worry that in conflating them you make it harder for people to hold or talk about those distinctions.

(FWIW I've also previously argued against strong maxipok, even while roughly accepting Dichotomy, on the basis that other heuristics may be more effective.)

I think Eric has been strong about making reasoned arguments about the shape of possible future technologies, and helping people to look at things for themselves. I wouldn't have thought of him (even before looking at this link[1]) as particularly good on making quantitative estimates about timelines; which in any case is something he doesn't seem to do much of.

Ultimately I am not suggesting that you defer to Drexler. I am suggesting that you may find reading his material as a good time investment for spurring your own thoughts. This is something you can test for yourself (I'm sure that it won't be a good fit for everyone).

  1. ^

    And while I do think it's interesting, I'm wary of drawing too strong conclusions from that for a couple of reasons:

    1. If, say, all this stuff now happened in the next 30 years, so that he was in some sense just off by a factor of two, how would you think his predictions had done? It seems to me this would be mostly a win for him; and I do think that it's quite plausible that it will mostly happen within 30 years (and more likely still within 60).
    2. That was 30 years ago; I'm sure that he is in some ways a different person now.
Load more