Looking back, it appears that much of my intellectual output could be described as legibilizing work, or trying to make certain problems in AI risk more legible to myself and others. I've organized the relevant posts and comments into the following list, which can also serve as a partial guide to problems that may need to be further legibilized, especially beyond LW/rationalists, to AI researchers, funders, company leaders, government policymakers, their advisors (including future AI advisors), and the general public.

  1. Philosophical problems
    1. Probability theory
    2. Decision theory
    3. Beyond astronomical waste (possibility of influencing vastly larger universes beyond our own)
    4. Interaction between bargaining and logical uncertainty
    5. Metaethics
    6. Metaphilosophy: 1, 2
  2. Problems with specific philosophical and alignment ideas
    1. Utilitarianism: 1, 2
    2. Solomonoff induction
    3. "Provable" safety
    4. CEV
    5. Corrigibility
    6. IDA (and many scattered comments)
    7. UDASSA
    8. UDT
  3. Human-AI safety (x- and s-risks arising from the interaction between human nature and AI design)
    1. Value differences/conflicts between humans
    2. “Morality is scary” (human morality is often the result of status games amplifying random aspects of human value, with frightening results)
    3. Positional/zero-sum human values, e.g. status
    4. Distributional shifts as a source of human safety problems
      1. Power corrupts (or reveals) (AI-granted power, e.g., over future space colonies or vast virtual environments, corrupting human values, or perhaps revealing a dismaying true nature)
      2. Intentional and unintentional manipulation of / adversarial attacks on humans by AI
  4. Meta / strategy
    1. AI risks being highly disjunctive, potentially causing increasing marginal return from time in AI pause/slowdown (or in other words, surprisingly low value from short pauses/slowdowns compared to longer ones)
    2. Risks from post-AGI economics/dynamics, specifically high coordination ability leading to increased economy of scale and concentration of resources/power
    3. Difficulty of winning AI race while being constrained by x-safety considerations
    4. Likely offense dominance devaluing “defense accelerationism”
    5. Human tendency to neglect risks while trying to do good
    6. The necessity of AI philosophical competence for AI-assisted safety research and for avoiding catastrophic post-AGI philosophical errors
    7. The problem of illegible problems

Having written all this down in one place, it's hard not to feel some hopelessness that all of these problems can be made legible to the relevant people, even with a maximum plausible effort. Perhaps one source of hope is that they can be made legible to future AI advisors. As many of these problems are philosophical in nature, this seems to come back to the issue of AI philosophical competence that I've often talked about recently, which itself seems largely still illegible and hence neglected.

Perhaps it's worth concluding on a point from a discussion between @WillPetillo and myself under the previous post, that a potentially more impactful approach (compared to trying to make illegible problems more legible), is to make key decisionmakers realize that important safety problems illegible to themselves (and even to their advisors) probably exist, therefore it's very risky to make highly consequential decisions (such as about AI development or deployment) based only on the status of legible safety problems.

46

0
0

Reactions

0
0

More posts like this

Comments1
Sorted by Click to highlight new comments since:

I very much agree with this and have been struggling with a similar problem in terms of achieving high value futures, versus mediocre ones.

I think there may be some sort of a “Fragile Future Value Hypothesis,” somewhat related to Will MacAskill’s “No Easy Eutopia,” (and the essay which follows this one in the series) and somewhat isomorphic to “The Vulnerable World Hypothesis,” in which there may be many path dependencies, potentially leading to many low and medium value futures attractor states we could end up in, because, in expectation, we are somewhat clueless as to which crucial considerations matter, and if we act wrongly on any of those crucial considerations, we could potentially lose most or even nearly all future value.

I also agree that making the decisionmakers working on AI highly aware of this could be an important solution, I’ve been thinking that the problem isn’t so much that people at the labs don’t care about future value, they are often quite explicitly utopian, it just seems to me that they don’t have much awareness of the fact that near-best futures might actually be highly contingent and very difficult to achieve, and the illegibility of this fact means that they are not really trying to be careful about which path they set us on.

I also agree that trying to get advanced AI working on these types of issues as soon it is able to meaningfully assist could be an important solution and intend to start working on this as one of my main objectives— although I’ve been a bit more focused on macrostrategy than philosophy because I think this might be a bit more feasible for current or near-future AI, and if we get in the right strategic position then maybe that could position us to figure out the philosophy stuff which I think is going to be a lot harder for AI.

Curated and popular this week
Relevant opportunities