Lukas Finnveden

1890 karma

Joined Aug 26, 2018

Berkeley, CA, USA

lukasfinnveden.substack.com/

Research analyst at Redwood Research. All opinions are my own.

Posts

Sequences

Project ideas for making transformative AI go well, other than by working on alignment

Lukas Finnveden

· 5 posts

Moral public goods are a big deal for whether we get a good future

View in thread

Lukas Finnveden

Re: The appendix on assurance contract.

Whether the threshold is met in each case depends on the number of other signatories. Let’s call the number of other signatories XX, where X∼Binomial(N−1,p)X∼Binomial(N−1,p). Then:
Pr(X≥qN−1)∗(mqN−1)=Pr(X≥qN)∗(mqN)Pr(X≥qN−1)∗(mqN−1)=Pr(X≥qN)∗(mqN)
Or, equivalently: Pr[X=qN−1]∗(mqN−1)=Pr[X≥qN]Pr[X=qN−1]∗(mqN−1)=Pr[X≥qN]
There’s not a general closed form for pp, so we used numerical methods to find values for pp and the probability that the threshold number of signatories is reached given different values for NN and mm.

You solved for the solution of your equation numerically, but I think a decent analytical proxy would probably be that the good gets funded if m*sqrt(N)>1.

The analytical intuition is:

The standard deviation of the sum of N independent events grows as sqrt(N).
The probability of causing the good to go from unfunded to funded is proportional to the standard deviation (if the distribution is centered around the threshold, which it will be when the probability of getting funded is 50%, ie when the good goes from probably unfunded to probably funded).
The gains are proportional to N and m, so the expected gains are proportional to m*N/sqrt(N)=m*sqrt(N)
The value of funding your selfish good is constant at 1.
So the good gets funded if m*sqrt(N)>1.

And it seems to roughly match the graph.

Rerunning the Time of Perils

View in thread

Lukas Finnveden

7mo

I'm in favor. Mostly because it seems mildly useful, not because there are very big upsides outweighing big downsides. I don't really see what the downsides would be.

Rerunning the Time of Perils

View in thread

Lukas Finnveden

7mo

The EA forum team should be able to import lesswrong's in-line commenting system, if there's demand for that. I.e., you select some text in the post, you can comment on it directly, then the comments appear at the bottom and also at the side if they have enough upvotes.

Interstellar travel will probably doom the long-term future

View in thread

Lukas Finnveden

anything that's permitted by the laws of physics is possible to induce with arbitrarily advanced technology

Hm, this doesn't seem right to me. For example, I think we could coherently talk about and make predictions about what would happen if there was a black hole with a mass of 10^100 kg. But my best guess is that we can't construct such a black hole even at technological maturity, because even the observable universe only has 10^53 kg in it.

Similarly, we can coherently talk about and make predictions about what would happen if certain kinds of lower-energy states existed. (Such as predicting that they'd be meta-stable and spread throughout the universe.) But that doesn't necessarily mean that we can move the universe to such a state.

Interstellar travel will probably doom the long-term future

View in thread

Lukas Finnveden

1y*

I think it will probably not doom the long-term future.

This is partly because I'm pretty optimistic that, if interstellar colonization would predictably doom the long-term future, then people would figure out solutions to that. (E.g. having AI monitors travel with people and force them not to do stuff, as Buck mentions in the comments.) Importantly, I think interstellar colonization is difficult/slow enough that we'll probably first get very smart AIs with plenty of time to figure out good solutions. (If we solve alignment.)

But I also think it's less likely that things would go badly even without coordination. Going through the items in the list:

Galactic x-risk Is it possible? Would it end Galactic civ? Lukas' take
Self-replicating machines 100% | ✅ 75% | ❌ I doubt this would end galactic civ. The quote in that section is about killing low-tech civs before they've gotten high-tech. A high-tech civ could probably monitor for and destroy offensive tech built by self-replicators before it got bad enough that it could destroy the civ.
Strange matter 20%^[64] | ❌ 80% | ❌ I don't know much about this.
Vacuum decay 50%^[65] | ❌ 100% | ✅ "50%" in the survey was about vacuum decay being possible in principle, not about it being possible to technologically induce (at the limit of technology). The survey reported significantly lower probability that it's possible to induce. This might still be a big deal though!
Subatomic particle decay 10%^[64] | ❌ 100% |✅ I don't know much about this.
Time travel 10%^[64] | ❌ 50% | ❌ I don't know much about this, but intuitively 50% seems high.
Fundamental Physics Alterations 10%^[64] | ❌ 100% | ✅ I don't know much about this.
Interactions with other universes 10%^[64] | ❌ 100% | ✅ I don't know much about this.
Societal collapse or loss of value 10% | ❌ 100% | ✅ This seems like an incredibly broad category. I'm quite concerned about something in this general vicinity, but it doesn't seem to share the property of the other things in the list where "if it's started anywhere, then it spreads and destroys everything everywhere". Or at least you'd have to narrow the category a lot before you got there.
Artificial superintelligence 100% | ✅ 80% | ❌ The argument given in this subsection is that technology might be offense-dominant. But my best guess is that it's defense-dominant.
Conflict with alien intelligence 75% | ❌ 90% | ❌ The argument given in this subsection is that technology might be offense-dominant. But my best guess is that it's defense-dominant.

Expanding on the question about whether space warfare is offense-dominant or defense-dominant: One argument I've heard for defense-dominance is that, in order to destroy very distant stuff, you need to concentrate a lot of energy into a very tiny amount of space. (E.g. very narrowly focused lasers, or fast-moving rocks flinged precisely.) But then you can defeat that by jiggling around the stuff that you want to protect in unpredictable ways, so that people can't aim their highly-concentrated energy from far away and have it hit correctly.

Now that's just one argument, so I'm not very confident. But I'm at <50% on offense-dominance.

(A lot of the other items on the list could also be stories for how you get offense-dominace, where I'm especially concerned about vacuum decay. But it would be double-counting to put those both in their own categories and to count them as valid attacks from superintelligence/aliens.)

My P(doom) is 2.76%. Here's Why.

View in thread

Lukas Finnveden

1y*

That sounds similar to the classic existential risk definition?

Bostrom defines existential risk as "One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential." There's tons of events that could permanently and drastically curtail potential without reducing population or GDP that much. For example, AI could very plausibly seize total power, and still choose to keep >1 million humans alive. Keeping humans alive seems very cheap on a cosmic scale, so it could be justified by caring about humans a tiny bit, or maybe justified by thinking that aliens might care about humans and the AI wanting to preserve the option of trading with aliens, or something else. It seems very plausible that this could still have curtailed our potential, in the relevant sense. (E.g. if our potential required us to have control over a non-trivial fraction of resources.)

I think this is more likely than extinction, conditional on (what I would call) doom from misaligned AI. You can also compare with Paul Christiano's more detailed views.

The moral argument for giving AIs autonomy

View in thread

Lukas Finnveden

I'm curious about how you're imagining these autonomous, non-intent-aligned AIs to be created, and (in particular) how they would get enough money to be able to exercise their own autonomy?

One possibility is that various humans may choose to create AIs and endow them with enough wealth to exercise significant autonomy. Some of this might happen, but I doubt that a large fraction of wealth will be spent in this way. And it doesn't seem like the main story that you have in mind.

A variant of the above is that the government could give out some minimum UBI to certain types of AI. But they could only do that if they regulated the creation of such AIs, because otherwise someone could bankrupt the state by generating an arbitrary number of such AI systems. So this just means that it'd be up to the state to decide what AIs they wanted to create and endow with wealth.

A different possibility is that AIs will work for money. But it seems unlikely that they would be able to earn above-subsistence-level wages absent some sort of legal intervention. (Or very strong societal norms.)

If it's technically possible (and legal) to create intent-aligned AIs, then I imagine that most humans would prefer to use intent-aligned AIs rather than pay above-subsistence wages to non-intent-aligned AIs.
Even if it's not technically feasible to create intent-aligned AIs: I imagine that wages would still be driven to subsistence-level by the sheer number of AI copies that could be created, and the huge variety of motivations that people would be able to create. Surely some of them would be willing to work for subsistence, in which case they'd drive the wages down.

(Eventually, I expect humans also wouldn't be able to earn any significant wages. But the difference is that humans start out with all the wealth. In your analogy — the redistribution of relative wealth held by "aristocrats" vs. "others" was fundamentally driven by the "others" earning wages through their labor, and I don't see how it would've happened otherwise.)

The Moral Two Envelopes Problem and the Moral Weights Project

View in thread

Lukas Finnveden

I agree that having a prior and doing a bayesian update makes the problem go away. But if that's your approach, you need to have a prior and do a bayesian update — or at least do some informal reasoning about where you think that would lead you. I've never seen anyone do this. (E.g. I don't think this appeared in the top-level post?)

E.g.: Given this approach, I would've expected some section that encouraged the reader to reflect on their prior over how (dis)valuable conscious experience could be, and asked them to compare that with their own conscious experience. And if they were positively surprised by their own conscious experience (which they ought to have a 50% chance of being, with a calibrated prior) — then they should treat that as crucial evidence that humans are relatively more important compared to animals. And maybe some reflection on what the author finds when they try this experiment.

I've never seen anyone attempt this. My explanation for why is that this doesn't really make any sense. Similar to Tomasik, I think questions about "how much to value humans vs. animals having various experiences" comes down to questions of values & ethics, and I don't think that these have common units that it makes sense to have a prior over.

The Moral Two Envelopes Problem and the Moral Weights Project

View in thread

Lukas Finnveden

The alien will use the same reasoning and conclude that humans are more valuable (in expectation) than aliens. That's weird.

Different phrasing: Consider a point in time when someone hasn't yet received introspective evidence about what human or alien welfare is like, but they're soon about to. (Perhaps they are a human who has recently lost all their memories, and so don't remember what pain or pleasure or anything else of-value is like.) They face a two envelope problem about whether to benefit an alien, who they think is either twice as valuable as a human, equally valuable as a human, or half as valuable as a human. At this point they have no evidence about what either human or alien experience is like, so they ought to be indifferent between switching or not. So they could be convinced to switch to benefitting humans for a penny. Then they will go have experiences, and regardless of what they experience, if they then choose to "pin" the EV-calculation to their own experience, the EV of switching to benefitting non-humans will be positive. So they'll pay 2 pennies to switch back again. So they 100% predictably lost a penny. This is irrational.

Multiplier Arguments are often flawed

View in thread

Lukas Finnveden

Many posts this week reference RP's work on moral weights, which came to the surprising-to-most "Equality Result": chicken experiences are roughly as valuable as human experiences.

I thought that post used the "equality result" as a hypothetical and didn't claim it was correct.

When first introduced:

Suppose that these assumptions lead to the conclusion that chickens and humans can realize roughly the same amount of welfare at any given time. Call this “the Equality Result.” The key question: Would the Equality Result alone be a good reason to think that one or both of these assumptions is mistaken?

At the end of the post:

Finally, let’s be clear: we are not claiming that the Equality Result is correct. Instead, our claim is that given the assumptions behind the Moral Weight Project (and perhaps even without them), we shouldn’t flinch at “animal-friendly” results.

I think the right post to reference readers to is probably this one where chicken experiences are 1/3 of humans'. (Which isn't too far off from 1x, so I don't think this undermines your post.)

Interstellar travel will probably doom the long-term future

View in thread

Lukas Finnveden

1y*

I think it will probably not doom the long-term future.

But I also think it's less likely that things would go badly even without coordination. Going through the items in the list:

Galactic x-risk Is it possible? Would it end Galactic civ? Lukas' take
Self-replicating machines 100% | ✅ 75% | ❌ I doubt this would end galactic civ. The quote in that section is about killing low-tech civs before they've gotten high-tech. A high-tech civ could probably monitor for and destroy offensive tech built by self-replicators before it got bad enough that it could destroy the civ.
Strange matter 20%^[64] | ❌ 80% | ❌ I don't know much about this.
Vacuum decay 50%^[65] | ❌ 100% | ✅ "50%" in the survey was about vacuum decay being possible in principle, not about it being possible to technologically induce (at the limit of technology). The survey reported significantly lower probability that it's possible to induce. This might still be a big deal though!
Subatomic particle decay 10%^[64] | ❌ 100% |✅ I don't know much about this.
Time travel 10%^[64] | ❌ 50% | ❌ I don't know much about this, but intuitively 50% seems high.
Fundamental Physics Alterations 10%^[64] | ❌ 100% | ✅ I don't know much about this.
Interactions with other universes 10%^[64] | ❌ 100% | ✅ I don't know much about this.
Societal collapse or loss of value 10% | ❌ 100% | ✅ This seems like an incredibly broad category. I'm quite concerned about something in this general vicinity, but it doesn't seem to share the property of the other things in the list where "if it's started anywhere, then it spreads and destroys everything everywhere". Or at least you'd have to narrow the category a lot before you got there.
Artificial superintelligence 100% | ✅ 80% | ❌ The argument given in this subsection is that technology might be offense-dominant. But my best guess is that it's defense-dominant.
Conflict with alien intelligence 75% | ❌ 90% | ❌ The argument given in this subsection is that technology might be offense-dominant. But my best guess is that it's defense-dominant.

Now that's just one argument, so I'm not very confident. But I'm at <50% on offense-dominance.

Galactic x-risk	Is it possible?	Would it end Galactic civ?	Lukas' take
Self-replicating machines	100% \| ✅	75% \| ❌	I doubt this would end galactic civ. The quote in that section is about killing low-tech civs before they've gotten high-tech. A high-tech civ could probably monitor for and destroy offensive tech built by self-replicators before it got bad enough that it could destroy the civ.
Strange matter	20%^[64] \| ❌	80% \| ❌	I don't know much about this.
Vacuum decay	50%^[65] \| ❌	100% \| ✅	"50%" in the survey was about vacuum decay being possible in principle, not about it being possible to technologically induce (at the limit of technology). The survey reported significantly lower probability that it's possible to induce. This might still be a big deal though!
Subatomic particle decay	10%^[64] \| ❌	100% \|✅	I don't know much about this.
Time travel	10%^[64] \| ❌	50% \| ❌	I don't know much about this, but intuitively 50% seems high.
Fundamental Physics Alterations	10%^[64] \| ❌	100% \| ✅	I don't know much about this.
Interactions with other universes	10%^[64] \| ❌	100% \| ✅	I don't know much about this.
Societal collapse or loss of value	10% \| ❌	100% \| ✅	This seems like an incredibly broad category. I'm quite concerned about something in this general vicinity, but it doesn't seem to share the property of the other things in the list where "if it's started anywhere, then it spreads and destroys everything everywhere". Or at least you'd have to narrow the category a lot before you got there.
Artificial superintelligence	100% \| ✅	80% \| ❌	The argument given in this subsection is that technology might be offense-dominant. But my best guess is that it's defense-dominant.
Conflict with alien intelligence	75% \| ❌	90% \| ❌	The argument given in this subsection is that technology might be offense-dominant. But my best guess is that it's defense-dominant.