MR

Maxime Riché

Research Engineer
73 karmaJoined Working (6-15 years)Brighton, UK

Comments
10

I somewhat agree with your points. Here are some contributions, and pushbacks:

I get that there's been a lot of work on this and that we can make progress on it (I know, I'm an astrobiologist), but I'm sure there are so many unknown unknowns associated with the origin of life, development of sentience, and spacefaring civilisation that we just aren't there yet. The universe is so enormous and bonkers and our brains are so small - we can make numerical estimates sure, but creating a number doesn't necessarily mean we have more certainty.

Something interesting about these hypotheses and implications is that they get stronger the more uncertainty we have, as long as one uses some form of EDT (e.g., CDT + exact copies). The less we know about how conditioning on Humanity ancestry impacts utility production, the more the Civ-Similarity Hypothesis is close to correct. The broader our distribution over the density of SFC in the universe, the more the Civ-Saturation Hypothesis is close to correct. This seems true as long as you account for the impact of correlated agents (e.g., exact copies) and that they exist. For the Civ-Similarity Hypothesis, this comes from the application of the Mediocrity Principle. For the Civ-Saturation Hypothesis, this comes from the fact that we have orders of magnitude more exact copies in saturated worlds than in empty worlds.

I think you're posing a post-understanding of consciousness question. Consciousness might be very special or it might be an emergent property of anything that synthesises information, we just don't know. But it's possible to imagine aliens with complex behaviour similar to us, but without evolving the consciousness aspect, like superintelligent AI probably will be like. For now, the safe assumption is that we're the only conscious life, and I think it's very important that we act like it until proven otherwise. 

Consciousness is indeed one of the arguments pushing the Civ-Similarity Hypothesis toward lower values (humanity being more important), and I am eager to discuss its potential impact. Here are several reasons why the update from consciousness may not be that large:

  • Consciousness may not be binary, in that case, we don't know if humans are low, medium, or high consciousness, I only know that I am not at zero. We should then likely assume we are average. Then, the relevant comparison is no longer between P(humanity is "conscious") and P(aliens creating SFCs are "conscious") but between P(humanity's consciousness > 0) and P(aliens-creating-SFC's consciousness > 0)
  • If human consciousness is a random fluke and has no impact on behavior (or it could be selected in or out), then we have no reason to think that aliens will create more or less conscious descendants than us. Consciousness needs to have a significant impact on behavior to change the chance that (artificial) descendants are conscious. But the larger the effect of consciousness on behaviors, the more likely consciousness is to be a result of evolution/selection.
  • We don't understand much about how the consciousness of SFC creators would influence the consciousness of (artificial) SFC descendants. Even if Humans are abnormal in being conscious, it is very uncertain how much that changes how likely our (artificial) descendants are to be conscious.

I am very happy to get pushback and to debate the strength of the "consciousness argument" on Humanity's expected utility.

What's the difference between "P(Alignment | Humanity creates an SFC)" and "P(Alignment AND Humanity creates an SFC)"? 

I will try to explain it more clearly. Thanks for asking.

P(Alignment AND Humanity creates an SFC) = P(Alignment | Humanity creates an SFC) x P(Humanity creates an SFC)

So the difference is that when you optimize for P(Alignment | Humanity creates an SFC), you no longer optimize for the term P(Humanity creates an SFC), which was included in the conjunctive probability.
 

Can you maybe run us through 2 worked examples for bullet point 2? Like what is someone currently doing (or planning to do) that you think should be deprioritised? And presumably, there might be something that you think should be prioritised instead? 

Bullet point 2 is: (ii) Deprioritizing to some degree AI Safety agendas mostly increasing P(Humanity creates an SFC) but not increasing much P(Alignment | Humanity creates an SFC).

Here are speculative examples. The degree to which their priorities should be updated is to be debated. I only claim that they may need to be updated conditional on the hypotheses being significantly correct.

  • AI Misuse reduction: If the PTIs are (a) to prevent extinction through misuse and chaos, (b) to prevent the loss of alignment power resulting from a more chaotic world, and (c) to provide more time for Alignment research, then it is plausible that the PTI (a) would become less impactful.
  • Misalign AI Control: If the PTIs are (c) as above, (d) to prevent extinction through controlling early misaligned AI trying to take over, (e) to control misaligned early AIs to make them work on Alignment research, and (f) to create fire alarms (note: which somewhat contradicts the path (b) above), then it is plausible the PTI (d) would be less impactful since these early misaligned AI may have a higher chance to not create an SFC after taking over (e.g., they don't survive destroying humanity or don't care about space colonization).
    • Here is another vague diluted effect: If an intervention, like AI control, increases P(Humanity creates an SFC | Early Misalignment), then this intervention may need to be discounted more than if it was increasing P(Humanity creates an SFC) only. Changing P(Humanity creates an SFC) may have no impact when the hypotheses are significantly correct, but P(Humanity creates an SFC | Misalignment) is net negative, and Early Misalignment and (Late) Misalignment may be strongly correlated.
  • AI evaluations: The reduction of the impact of (a) and (d) may also impact the overall importance of this agenda.

These updates are, at the moment, speculative.

Sorry if that's not clear.

Are the reformulations in the initial summary helping? The second bullet point is the most relevant.
 

  • (i) Deprioritizing significantly extinction risks, such as nuclear weapon and bioweapon risks.
  • (ii) Deprioritizing to some degree AI Safety agendas mostly increasing P(Humanity creates an SFC) but not increasing much P(Alignment | Humanity creates an SFC).
  • (iii) Giving more weight to previously neglected AI Safety agendas. E.g., a "Plan B AI Safety" agenda that would focus on decreasing P(Humanity creates an SFC | Misalignment), for example, by implementing (active & corrigible) preferences against space colonization in early AI systems.

Interesting and nice to read!

Do you think the following is right?

The larger the Upside-focused Colonist Curse, the fewer resources agents caring about suffering will control overall and the smaller the risks of conflicts causing S-risks?

This may balance out the effect that the larger the Upside-focused Colonist Curse, the more neglected S-risks are.

High Upside-focused Colonist Curse produces fewer S-risks at the same time as making them more neglected.

Thanks for your response! 

Yet, I am still not clearly convinced that my reading doesn't make sense. Here are some comments:

  • "respondents were very uncertain"
    This seems to be, at the same time, the reason why you could want to diversify your portfolio of interventions for reducing X-risks. And the reason why someone could want to improve such estimates (of P(Nth scenario|X-risk)). But it doesn't seem to be a strong reason to discard the conclusion of the survey (It would be, if we had more reliable information elsewhere).
  • "there's overlap between the scenarios":
    I am unsure, but it seems that the overlaps are not that big overall. Especially, the overlap between {1,2,3} and {4,5} doesn't seem huge. (I also wonder if these overlaps also illustrate that you could reduce X-risks using a broader range of interventions (than just "AI alignment" and "AI governance"))
  1. The “Superintelligence” scenario (Bostrom, 2014)
  2. Part 2 of “What failure looks like” (Christiano, 2019)
  3. Part 1 of “What failure looks like” (Christiano, 2019)
  4. War (Dafoe, 2018)
  5. Misuse (Karnofsky, 2016)
  6. Other existential catastrophe scenarios.
  • "no 1-1 mapping between "fields" and risk scenarios"
    Sure, this would benefit from having a more precise model.
  • "Priority comparison of interventions is better than high-level comparisons"
    Right. High-level comparisons are so much cheaper to do, that it seems worth it to stay at that level for now.


The point I am especially curious about is the following:
- Is this survey pointing to the fact that the importance of working on "Technical AI alignment", "AI governance", "Cooperative AI" and "Misuse limitation" are all within one OOM?
By importance here I mean, the importance as in the ITN framework of 80k, not the overall priority, which should include neglectedness, tractabilities and looking at object-level interventions.

I am confused by this survey. Taken at face value, working on improving Cooperation would only be x2 less impactful than working on hard AI alignment (only looking at the importance of the problem). And working on partial/naive alignment would be as impactful as working on AI alignment (looking only at the importance).
Does that make sense?

(I make a bunch of assumptions to come up with these values. The starting point is the likelihood of the 5-6 X-risks scenarios. Then I associate each scenario with a field (AI alignment, naive AI alignment, Cooperation) that reduces its likelihood. Then I produce the value above, and they stay similar even if I assume a 2-step model where some scenarios happen before others. Google sheet)

Thanks for this clarification! I guess the "capability increase over time around and after reaching human level" is more important than the "GDP increase over time" to look at how hard alignment is. It's likely why I assumed takeoff meant the former. Now I wonder if there is a term for "capability increase over time around and after reaching human level"...

Reading Eli's piece/writing this review persuaded me to be more sceptical of Paul style continuous takeoff[6] and more open to discontinuous takeoff; AI may simply not transform the economy much until it's capable of taking over the world[7].

From the post we don't get information about the acceleration rate of AI capabilities but on the impact on the economy. This argument is thus against slow takeoff with economic consequences but not against slow takeoff without much economic consequences.

 So updating from that towards a discontinuous takeoff doesn't seem right. You should be updating from  slow takeoff with economic consequences to slow takeoff without economic consequences.

Does that make sense?