Hide table of contents

[Epistemic status: Shallow dive into research questions, backed by some years of on-and-off thinking about this kind of plan.]

Introduction

There is some chance that civilization will cease to function before we hit an intelligence explosion. If it does, it would be good to preserve existing alignment research for future generations who might rebuild advanced technology, and ideally have safe havens ready for current and future researchers to spend their lives adding to that pool of knowledge.

This might delay capabilities research by many decades, centuries, or longer while allowing basic theoretical alignment research to continue, and so be a potential Yudkowskian positive model violation for which we should prepare.

Setting this infrastructure up is a massively scalable intervention, and one that should likely be tackled by people who are not already on the researcher career path. It would have been good to get started some years ago given recent events, but now is the second best time to plant a tree.[1]

Preserving alignment knowledge through a global catastrophe

What data do we want to store?

Thankfully, the EleutherAI people are working on a dataset of all alignment research[2]. It's still a WIP[3] and contributions to the scripts to collect it are welcome, so if you're a programmer looking for a shovel ready way to help with this then consider submitting a PR[4].

How do we want to store it?

My shallow dive into this uncovered these options:

  • We could print it out on paper
    • Lifetime: 500+ years in good conditions (might depend significantly on paper and ink quality, more research needed). Vacuum sealing it with low humidity seems like it would help significantly.
    • Pros: Totally human readable.
  • Microsoft's Project Silica is the longest lasting option I could find
    • Lifetime: 10000+ years
    • Cons: Would require high levels of technology to read it back. I'm not seeing an option to buy the machines required to write new archives and expect them to be very advanced/expensive, so this would be limited to storing pre-collapse research.
  • CDs could be a minimalist option
    • Lifetime: Maybe 50 years  if stored in good conditions
    • Pros: Good ability for researchers to explore the information on computers while those last)
    • Cons: It's very plausible that a severe GCR[5] would set us back far enough that we'd not regain CD reading technology before they decayed so they aren't a full solution.
  • The Arctic World Archive seems worth including in the portfolio
    • Lifetime: 1000+ years
    • Pros: It's a pretty straightforward case of turning money into archives
    • Cons: Not very accessible in the meantime
  • The DOTS system (a highly stable tape-based storage medium) might be a strong candidate, if it is buyable.[6]
    • Lifetime: 200-2000+ years
    • Pros: Human readable or digital archives, possibly usable for some time after collapse.

Each has advantages, so some combination of them might be ideal.

Where do we store it?

Having many redundant backups seem advisable, preferably protected by communities which can last centuries or in locations which will not be disturbed for a very long time. Producing "alignment backup kits" to send out and offering microgrants to people all around the world to place them in secure locations would achieve this. We'd likely want basic (just pre-collapse work) and advanced (capable of adding archives for a long time post-collapse) options.

If you'd like to take on the challenge of preparing these kits, storing an archive, or coordinating things, please join the Alignment After A GCR Discord (AAAG). I'm happy to collaborate and give some seed funding. If you want to help collect and improve the archive files, #accelerating-alignment on EAI is the place to go.

Continuing alignment research after a global catastrophe

It is obviously best if as many people survive the GCR as possible, and supporting the work of organizations like the Alliance to Feed the Earth in Disasters seems extremely valuable. However, a targeted intervention to focus on allowing alignment researchers to continue their work in the wake of a disaster might be an especially cost-effective way to improve the long-term future of humanity.

Evacuation plans

A list of which researchers to prioritize would need to be drawn up.[7] They would need instructions on how to get to the haven, ideally someone with reliable transport to take them there. In case of moments of extreme risk, they would be encouraged to preemptively (and hopefully temporarily) move to the haven.

Designing havens

The locations would need to be be bought, funded, and partially populated before the GCR.[8] I have some ideas about which other subcultures might be good to draw from, with the Authentic Relating community top of the list.[9]

The havens would need to be well-stocked to weather the initial crisis and recover after. They should be located in places where farming or fishing could produce a surplus in the long term to allow some of the people living there to spend much of their time making research progress. Being relatively far from centers of population seems beneficial, but close enough to major hubs that transport is practical. There are many considerations, and talking to ALLFED to get their models of how to survive GCRs seems like an obvious first step to plan this.

Avoiding the failure mode of allowing so many people to join that the whole group goes under would be both challenging and necessary. Clear rules would have to be agreed on for who could join.

The culture would need to be set up to be conducive to supporting research in the long term while being mostly self-sufficient, this would be an interesting challenge in designing community. People with the skills to produce food and other necessities would need to be part of the team.

Call to action

Even more than archiving, this needs some people to make it their primary project in order for it to happen. That could include you! I would be happy to provide advice, mentorship, connections, and some seed funding to a founder or team who wants to take this project on.[10] Message me here or @A_donor on the Discord.

This project could also benefit from volunteers for various roles. If you or someone you know would like to help by

  • Searching for locations
  • Potentially moving to a haven early and helping set up
  • Researching questions
  • Putting us in contact with people who might make this work (e.g. people with experience in self-sufficient community building)
  • Doing other tasks to increase the chances that we recover from GCRs with a strong base of alignment theory

Please join the Discord and introduce yourself, specifically indicating that you'd like to help with havens so I know to add you to those channels.

I can fund the very early stages of both projects, but in order to scale it to something really valuable we would need major funders on board. If you are or have access to a major funder and want to offer advice or encouragement to apply that would increase the chances that this goes somewhere.

It's quite likely that I won't post public updates about the havens part of this project even if it's going relatively well, as having lots of attention on it seems net-negative, so don't be surprised if you don't hear anything more.

  1. ^

    "The best time to plant a tree is twenty years ago. The second best time is now." - Quote

  2. ^

    They want to use it to train language models to help with alignment research, but it aims to contain exactly what we'd want.

  3. ^

    Work In Progress

  4. ^

    Pull Request - A way of suggesting changes to a repository using version control, usually used in programming.

  5. ^

    Global Catastrophic Risk - An event which causes massive global disruption, such as a severe pandemic or nuclear war.

  6. ^

    The website is unclear on whether it's immediately available.

  7. ^

    If you're a researcher and want to be on the list, feel free to contact me with your location and I'll keep track of everyone's requests. We might possibly use Alignment EigenKarma as an unbiased metric to prioritize if that exists in time.

  8. ^

    Unless anyone knows of good places which might be joinable already, if you do please message me!

  9. ^

    They are compatible with Rationalist/EA culture, more likely than most to be able to create stable communities, and some of them like the idea of building strong community for the benefit of all of humanity.

  10. ^

    I have a reasonably strong track record as a Mentor/Manager/Mysterious Old Wizard/Funder package deal. If you're enthusiastic and bright don't worry if the task seems overwhelming, I can help you pick up the skills and decompose tasks.

40

0
0

Reactions

0
0

More posts like this

Comments11


Sorted by Click to highlight new comments since:

This is a good idea. And I think it can be extrapolated to preserving relevant records of our civilization / species so eventual successors won't have so much trouble thinking about Fermi paradox and great filters. Idk, maybe a vault on the moon? Mars? (I submitted an entry to the FTX contest on that, btw. Now I realize how this idea first came to my mind: Cixin Liu's Death's End)

[As posted in the Discord]. An MVP of this might be making offline copies of the AI Alignment Forum, EA Forum and LessWrong available using an app like Kiwix, and encouraging EAs to download them. Bonus if they are automatically updated every month or so. Next step for resilience would be burying old phones with copies of the content on them.

As an avid user of Kiwix, I'd be very interested in any of those.

Agreed that civilization restart manuals would be good, would be happy to have the alignment archives stored alongside those. Would prefer not to hold up getting a MVP of this much smaller and easier archive in place waiting for that to come together though.

The purpose of preserving alignment is not to get back to AI as quickly as possible, but to make it more likely that when we eventually do climb the tech tree we are more likely to be able to align advanced AIs. Even if we have to reinvent a large number of technologies, having alignment research ready represents a (slightly non-standard) form of differential technological development rather than simply speeding up the recovery overall.

Can someone port it to Kiwix (or similar, for offline reading on a phone)? (I'm happy to fund this)

Maybe also worth considering stone tablets?

My guess is these are great for longevity,  but maybe prohibitively expensive[1] if you want to print out e.g. the entire alignment forum plus other papers. 

Could be good for a smaller selected key insights collection, if that exists somewhere?

  1. ^

    Likely reference class is gravestones. I'm getting numbers like:  "Extra characters are approximately $10 thereafter" and "It costs around £1.95 per letter or character", even with a bulk discount that's going to add up.

Now I'm imagining a friendly AGI etching the Textbook From The Future on stone tablets. But it would be an interesting exercise to try and condense the key insights made to date into 1k or 10k characters.

Curated and popular this week
 ·  · 13m read
 · 
Notes  The following text explores, in a speculative manner, the evolutionary question: Did high-intensity affective states, specifically Pain, emerge early in evolutionary history, or did they develop gradually over time? Note: We are not neuroscientists; our work draws on our evolutionary biology background and our efforts to develop welfare metrics that accurately reflect reality and effectively reduce suffering. We hope these ideas may interest researchers in neuroscience, comparative cognition, and animal welfare science. This discussion is part of a broader manuscript in progress, focusing on interspecific comparisons of affective capacities—a critical question for advancing animal welfare science and estimating the Welfare Footprint of animal-sourced products.     Key points  Ultimate question: Do primitive sentient organisms experience extreme pain intensities, or fine-grained pain intensity discrimination, or both? Scientific framing: Pain functions as a biological signalling system that guides behavior by encoding motivational importance. The evolution of Pain signalling —its intensity range and resolution (i.e., the granularity with which differences in Pain intensity can be perceived)— can be viewed as an optimization problem, where neural architectures must balance computational efficiency, survival-driven signal prioritization, and adaptive flexibility. Mathematical clarification: Resolution is a fundamental requirement for encoding and processing information. Pain varies not only in overall intensity but also in granularity—how finely intensity levels can be distinguished.  Hypothetical Evolutionary Pathways: by analysing affective intensity (low, high) and resolution (low, high) as independent dimensions, we describe four illustrative evolutionary scenarios that provide a structured framework to examine whether primitive sentient organisms can experience Pain of high intensity, nuanced affective intensities, both, or neither.     Introdu
 ·  · 2m read
 · 
A while back (as I've just been reminded by a discussion on another thread), David Thorstad wrote a bunch of posts critiquing the idea that small reductions in extinction risk have very high value, because the expected number of people who will exist in the future is very high: https://reflectivealtruism.com/category/my-papers/mistakes-in-moral-mathematics/. The arguments are quite complicated, but the basic points are that the expected number of people in the future is much lower than longtermists estimate because: -Longtermists tend to neglect the fact that even if your intervention blocks one extinction risk, there are others it might fail to block; surviving for billions  (or more) of years likely  requires driving extinction risk very low for a long period of time, and if we are not likely to survive that long, even conditional on longtermist interventions against one extinction risk succeeding, the value of preventing extinction (conditional on more happy people being valuable) is much lower.  -Longtermists tend to assume that in the future population will be roughly as large as the available resources can support. But ever since the industrial revolution, as countries get richer, their fertility rate falls and falls until it is below replacement. So we can't just assume future population sizes will be near the limits of what the available resources will support. Thorstad goes on to argue that this weakens the case for longtermism generally, not just the value of extinction risk reductions, since the case for longtermism is that future expected population  is many times the current population, or at least could be given plausible levels of longtermist extinction risk reduction effort. He also notes that if he can find multiple common mistakes in longtermist estimates of expected future population, we should expect that those estimates might be off in other ways. (At this point I would note that they could also be missing factors that bias their estimates of
 ·  · 3m read
 · 
We’ve redesigned effectivealtruism.org to improve understanding and perception of effective altruism, and make it easier to take action.  View the new site → I led the redesign and will be writing in the first person here, but many others contributed research, feedback, writing, editing, and development. I’d love to hear what you think, here is a feedback form. Redesign goals This redesign is part of CEA’s broader efforts to improve how effective altruism is understood and perceived. I focused on goals aligned with CEA’s branding and growth strategy: 1. Improve understanding of what effective altruism is Make the core ideas easier to grasp by simplifying language, addressing common misconceptions, and showcasing more real-world examples of people and projects. 2. Improve the perception of effective altruism I worked from a set of brand associations defined by the group working on the EA brand project[1]. These are words we want people to associate with effective altruism more strongly—like compassionate, competent, and action-oriented. 3. Increase impactful actions Make it easier for visitors to take meaningful next steps, like signing up for the newsletter or intro course, exploring career opportunities, or donating. We focused especially on three key audiences: * To-be direct workers: young people and professionals who might explore impactful career paths * Opinion shapers and people in power: journalists, policymakers, and senior professionals in relevant fields * Donors: from large funders to smaller individual givers and peer foundations Before and after The changes across the site are aimed at making it clearer, more skimmable, and easier to navigate. Here are some side-by-side comparisons: Landing page Some of the changes: * Replaced the economic growth graph with a short video highlighting different cause areas and effective altruism in action * Updated tagline to "Find the best ways to help others" based on testing by Rethink