Hide table of contents


 


A friend in technical AI Safety shared a list of cruxes for their next career step. 

One premise implicit in their writing was that a long-term safe AGI can be build in the first place (or that we should resolve to build safe AGI, since if AGI is inevitable then there is almost nothing else we can do but to resolve to make it safe). 


Copy-pasting a list I wrote in response (with light edits):

Why long-term AGI safety might be an unsolvable problem:

  1. Most specifiable problems are unsolvable:
    On prior, most engineering problems we can specify (as combinations of desiderata) are impossible to solve.
     
  2. Survivorship bias of solvable problems:
    There are of course higher-level engineering principles for picking what’s solvable, but in practice a lot of engineering looks like tinkering and failing (most inventions seem to have come out such a process).

    The engineering problems that we hear about later are mostly ones that turned out possible (or are ones that people have kept on working/trying to solve). So there is a survivorship bias in that most of the problems engineers hear about appear to be solvable (on top of that, there is a motivated focus on remembering the problems that turned out possible after all; try googling "impossible problems" – I was surprised).
     
  3. Outside the reference class:
    Would we presume that one of the most complex, intractable and dangerous engineering problems we could think of – long-term AGI safety – would in fact be solvable?
     
  4. Neglected research/portfolio diversification:
    Few AI safety researchers though raised the question whether any dynamic described by an AGI threat model falls outside a theoretical limit of controllability, ie. whether given unsafe dynamic is uncontrollable. A sequence in which new research is often done:  specify a novel threat model, jump to trying to solve it, and then refute solutions that turned out unsound (done either by multiple people, or all in one person all in one person).

    So given that the majority of the (short, non-reasoned-through) claims I've at least read from AIS researchers on this crucial consideration are that technical AGI alignment/safety is possible, what direction should we expect researchers' beliefs here to move on average if they open-mindedly and rigorously researched this consideration?
     
  5. Unreliable intuitions:
    Nor can we rely on the confidently voiced intuition that long-term AGI safety is possible in principle. 

    That would beg the question: under what principle? (or what do people even mean with ‘in principle’?)
     
  6. Founder effects and arguments from authority:
    Nor can we take comfort in any founder researcher still saying that they know AGI safety to be possible.

    Thirty years into running a program to secure the foundations of mathematics, David Hilbert concluded “We must know. We will know!” By then Kurt Gödel had constructed his first incompleteness theorem. Still, Hilbert kept the quote for his gravestone.
     
  7. History of young men resolving to do the impossible: 
    Nor can we rely on the hope that if we try long enough (another 14 years?), maybe AGI safety turns out possible after all.

    Historically, researchers and engineers tried over decades, if not millenia, to solve impossible problems:
    1. perpetual motion machines that both conserve and disperse energy.
    2. singular methods for 'squaring the circle', 'doubling the cube' or 'trisecting the angle'.
    3. formal axiomatic systems that are consistent, complete and decidable.
    4. distributed data stores where messages of data are consistent in their content. and also continuously available in a network that is also tolerant to partitions.
    5. uniting general relativity and quantum mechanics into some local variable theory.

      ...until some bright outsider proved by contradiction that the combination of desiderata is unsolvable based on the (empirically falsified) laws of physics or (formally verified) transformations of axioms.
       
  8. Resemblances of AGI to a perpetual motion machine:
    Forrest, the researcher I’m working with, referred to the 'Aligned AGI' idea as seeking to build a 'Perpetual General Benefit Machine'.

    As in the notion both shared by people involved at AGI R&D labs and in the AGI-alignment community that they can build a machine:
    1. that operates into perpetuity,
    2. self-learns internal code and self-modifies underlying hardware (ie. initialising new internal code variants and connecting up new standardised parts for storing, processing and transmitting encoded information), and
    3. autonomously enacts changes modelled internally over domains across the external contexts of the global environment
      1. where all (changing) interactions of the machine's (changing) internal components with connected surroundings of the (changing) environment...
      2. are being aligned and kept in alignment in how they function with relevant metrics that are optimised toward the 'benefit' (and therefore also the (stochastically) guaranteed safety/survival) of all humans living everywhere for all time (ie. not just for original developers/researchers/executives/investors).
         

To clarify:  I am not writing up *conclusive* reasoning for each alternative claim listed above. These are basic obvious reasons that might lead you to start questioning core beliefs underlying the notion that working on understanding neural networks is a way forward. 

No comments on this post yet.
Be the first to respond.
Curated and popular this week
 ·  · 23m read
 · 
Or on the types of prioritization, their strengths, pitfalls, and how EA should balance them   The cause prioritization landscape in EA is changing. Prominent groups have shut down, others have been founded, and everyone is trying to figure out how to prepare for AI. This is the first in a series of posts examining the state of cause prioritization and proposing strategies for moving forward.   Executive Summary * Performing prioritization work has been one of the main tasks, and arguably achievements, of EA. * We highlight three types of prioritization: Cause Prioritization, Within-Cause (Intervention) Prioritization, and Cross-Cause (Intervention) Prioritization. * We ask how much of EA prioritization work falls in each of these categories: * Our estimates suggest that, for the organizations we investigated, the current split is 89% within-cause work, 2% cross-cause, and 9% cause prioritization. * We then explore strengths and potential pitfalls of each level: * Cause prioritization offers a big-picture view for identifying pressing problems but can fail to capture the practical nuances that often determine real-world success. * Within-cause prioritization focuses on a narrower set of interventions with deeper more specialised analysis but risks missing higher-impact alternatives elsewhere. * Cross-cause prioritization broadens the scope to find synergies and the potential for greater impact, yet demands complex assumptions and compromises on measurement. * See the Summary Table below to view the considerations. * We encourage reflection and future work on what the best ways of prioritizing are and how EA should allocate resources between the three types. * With this in mind, we outline eight cruxes that sketch what factors could favor some types over others. * We also suggest some potential next steps aimed at refining our approach to prioritization by exploring variance, value of information, tractability, and the
 ·  · 1m read
 · 
I wanted to share a small but important challenge I've encountered as a student engaging with Effective Altruism from a lower-income country (Nigeria), and invite thoughts or suggestions from the community. Recently, I tried to make a one-time donation to one of the EA-aligned charities listed on the Giving What We Can platform. However, I discovered that I could not donate an amount less than $5. While this might seem like a minor limit for many, for someone like me — a student without a steady income or job, $5 is a significant amount. To provide some context: According to Numbeo, the average monthly income of a Nigerian worker is around $130–$150, and students often rely on even less — sometimes just $20–$50 per month for all expenses. For many students here, having $5 "lying around" isn't common at all; it could represent a week's worth of meals or transportation. I personally want to make small, one-time donations whenever I can, rather than commit to a recurring pledge like the 10% Giving What We Can pledge, which isn't feasible for me right now. I also want to encourage members of my local EA group, who are in similar financial situations, to practice giving through small but meaningful donations. In light of this, I would like to: * Recommend that Giving What We Can (and similar platforms) consider allowing smaller minimum donation amounts to make giving more accessible to students and people in lower-income countries. * Suggest that more organizations be added to the platform, to give donors a wider range of causes they can support with their small contributions. Uncertainties: * Are there alternative platforms or methods that allow very small one-time donations to EA-aligned charities? * Is there a reason behind the $5 minimum that I'm unaware of, and could it be adjusted to be more inclusive? I strongly believe that cultivating a habit of giving, even with small amounts, helps build a long-term culture of altruism — and it would
 ·  · 1m read
 · 
I recently read a blog post that concluded with: > When I'm on my deathbed, I won't look back at my life and wish I had worked harder. I'll look back and wish I spent more time with the people I loved. Setting aside that some people don't have the economic breathing room to make this kind of tradeoff, what jumps out at me is the implication that you're not working on something important that you'll endorse in retrospect. I don't think the author is envisioning directly valuable work (reducing risk from international conflict, pandemics, or AI-supported totalitarianism; improving humanity's treatment of animals; fighting global poverty) or the undervalued less direct approach of earning money and donating it to enable others to work on pressing problems. Definitely spend time with your friends, family, and those you love. Don't work to the exclusion of everything else that matters in your life. But if your tens of thousands of hours at work aren't something you expect to look back on with pride, consider whether there's something else you could be doing professionally that you could feel good about.