Lukas Finnveden

1588 karmaJoined


Research analyst at Open Philanthropy. All opinions are my own.


Project ideas for making transformative AI go well, other than by working on alignment


Topic contributions

And here’s the full list of the 57 speakers we featured on our website

That's not right: You listed these people as special guests — many of them didn't do a talk. Importantly, Hanania didn't. (According to the schedule.)

I just noticed this. And it makes me feel like "if someone rudely seeks out controversy, don't list them as a special guest" is such a big improvement over the status quo.

  • Hanania was already not a speaker. (And Nathan Young suggests that last year, this was partly a conscious decision rather than him not just feeling like he wanted to give a talk.)
  • If you just had open ticket sales and allowed Hanania to buy a ticket (or not) just like everyone else, then I think that would be a lot better in the eyes of most people who don't like that Hanania is listed as a special guest (including me). My guess would be that it's a common conference policy to "Have open ticket sales, and only refuse people if you think they might actively break-norms-and-harm-people during the events (not based on their views on twitter)". (Though I could be off-base here — I haven't actually read many conferences' policies.)
  • I think people who are concerned about preserving the "open expression of ideas" should basically not care who gets to be listed as a "special guest". This has roughly no consequence on their ability to express their ideas. It's just a symbolic gesture of "we think this person is cool, and we think that you should choose whether to go to our event partly based on whether you also think this person is cool". It's just so reasonable to exclude someone from a list like that even just on the basis of "this person is rude and unnecessarily seeks out controversy and angering people". (Which I think basically everyone agrees is true for e.g. Hanania.)

Here's one line of argument:

  • Positive argument in favor of humans: It seems pretty likely that whatever I'd value on-reflection will be represented in a human future, since I'm a human. (And accordingly, I'm similar to many other humans along many dimensions.)
    • If AI values where sampled ~randomly (whatever that means), I think that the above argument would be basically enough to carry the day in favor of humans.
  • But here's a salient positive argument in favor of why AIs' values will be similar to mine: People will be training AIs to be nice and helpful, which will surely push them towards better values.
    • However, I also expect people to be training AIs for obedience and, in particular, training them to not disempower humanity. So if we condition on a future where AIs disempower humanity, we evidentally didn't have that much control over their values. This signiciantly weakens the strength of the argument "they'll be nice because we'll train them to be nice".
      • In addition: human disempowerment is more likely to succeed if AIs are willing to egregiously violate norms, such a by lying, stealing, and killing. So conditioning on human disempowerment also updates me somewhat towards egregiously norm-violating AI. That makes me feel less good about their values.
    • Another argument is that, in the near term, we'll train AIs to act nicely on short-horizon tasks, but we won't particularly train them to deliberate and reflect on their values well. So even if "AIs' best-guess stated values" are similar to "my best-guess stated values", there's less reason to belive that "AIs' on-reflection values" are similar to "my on-reflection values". (Whereas the basic argument of my being similar to humans still work ok: "my on-reflection values" vs. "other humans' on-reflection values".)

Edit: Oops, I accidentally switched to talking about "my on-reflection values" rather than "total utilitarian values". The former is ultimately what I care more about, though, so it is what I'm more interested in. But sorry for the switch.

There might not be any real disagreement. I'm just saying that there's no direct conflict between "present people having material wealth beyond what they could possibly spend on themselves" and "virtually all resources are used in the way that totalist axiologies would recommend".

What's the argument for why an AI future will create lots of value by total utilitarian lights?

At least for hedonistic total utilitarianism, I expect that a large majority of expected-hedonistic-value (from our current epistemic state) will be created by people who are at least partially sympathetic to hedonistic utilitarianism or other value systems that value a similar type of happiness in a scope-sensitive fashion. And I'd guess that humans are more likely to have such values than AI systems. (At least conditional on my thinking that such values are a good idea, on reflection.)

Objective-list theories of welfare seems even less likely to be endorsed by AIs. (Since they seem pretty niche to human values.)

There's certainly some values you could have that would mainly be concerned that we got any old world with a large civilization. Or that would think it morally appropriate to be happy that someone got to use the universe for what they wanted, and morally inappropriate to be too opinionated about who that should be. But I don't think that looks like utilitarianism.

I find it plausible that future humans will choose to create much fewer minds than they could. But I don't think that "selfishly desiring high material welfare" will require this. Just the milky way has enough stars for each currently alive human to get an entire solar system each. Simultaneously, intergalactic colonization is probably possible (see here) and I think the stars in our own galaxy is less than 1-in-a-billion of all reachable stars. (Most of which are also very far away, which further contributes to them not being very interesting to use for selfish purposes.)

When we're talking about levels of consumption that are greater than a solar system, and that will only take place millions of years in the future, it seems like the relevant kind of human preferences to be looking at is something like "aesthetic" preference. And so I think the relevant analogies are less that of present humans optimizing for their material welfare, but perhaps more something like "people preferring the aesthetics of a clean and untouched universe (or something else: like the aesthetics of a universe used for mostly non-sentient art) over the aesthetics of a universe which is packed with joy".

I think your point "We may seek to rationalise the former [I personally don’t want to live in a large mediocre world, for self-interested reasons] as the more noble-seeming latter [desire for high average welfare]" is the kind of thing that might influence this aesthetic choice. Where "I personally don’t want to live in a large mediocre world, for self-interested reasons" would split into (i) "it feels bad to create a very unequal world where I have lots more resources than everyone else", and (ii) "it feels bad to massively reduce the amount of resources that I personally have, to that of the average resident in a universe packed full with life".

compared to MIRI people, or even someone like Christiano, you, or Joe Carlsmith probably have "low" estimates

Christiano says ~22% ("but you should treat these numbers as having 0.5 significant figures") without a time-bound; and Carlsmith says ">10%" (see bottom of abstract) by 2070. So no big difference there.

I'll hopefully soon make a follow-up post with somewhat more concrete projects that I think could be good. That might be helpful.

Are you more concerned that research won't have any important implications for anyone's actions, or that the people whose decisions ought to change as a result won't care about the research?

Similary, 'Politics is the Mind-Killer' might be the rationalist idea that has aged worst - especially for its influences on EA.

What influence are you thinking about? The position argued in the essay seems pretty measured.

Politics is an important domain to which we should individually apply our rationality—but it’s a terrible domain in which to learn rationality, or discuss rationality, unless all the discussants are already rational. [...]

I’m not saying that I think we should be apolitical, or even that we should adopt Wikipedia’s ideal of the Neutral Point of View. But try to resist getting in those good, solid digs if you can possibly avoid it. If your topic legitimately relates to attempts to ban evolution in school curricula, then go ahead and talk about it—but don’t blame it explicitly on the whole Republican Party; some of your readers may be Republicans, and they may feel that the problem is a few rogues, not the entire party.

I liked this recent interview with Mark Dybul who worked on PEPFAR from the start: https://www.statecraft.pub/p/saving-twenty-million-lives

One interesting contrast with the conclusion in this post is that Dybul thinks that PEPFAR's success was a direct consequence of how it didn't involve too many people and departments early on — because the negotiations would have been too drawn out and too many parties would have tried to get pieces of control. So maybe a transparent process that embraced complexity wouldn't have achieved much, in practice.

(At other parts in the process he leaned farther towards transparency than was standard — sharing a ton of information with congress.)

Load more