Another thing I think we should focus on if we are on track to solve alignment is concentration of power. Aligned ASI would make this problem more important.Â
Another thing I think we should focus on if we are on track to solve alignment is concentration of power. Aligned ASI would make this problem more important.Â
Cross-posted from my website.
Three categories of futures, depending on how AI goes:
If we want to make a good future for all sentient beings, each of these futures has different implications for what we should work on.
...we can prioritize work that takes a long time to complete. That includes:
...the shape of the future will be determined by an aligned ASI. Therefore, we should steer toward a future where ASI cares about sentient welfare. Possible areas of work include:
If an aligned superintelligence creates a stable future where humans are empowered, then—some might argue—we can defer "long-timelines" work until we have superintelligent assistance. However, I cannot envision how we could get a stable future without solving some foundational problems first.
...none of those other types of work listed above will pay off. There's not much we can do for non-human welfare; step one is to prevent ASI from destroying all value in the universe.
Areas of work include:
Some plans make strong assumptions without making them explicit. When you pursue a strategy, you're making an implicit bet on which future you'll find yourself in. You're assuming that you live in the world where that strategy makes most sense.
It's worth taking the time to probe our beliefs:
At the community level, we shouldn't bet everything on one future. (For individuals, it's often better to specialize.[1]) Some people should pursue long-timelines work; others should prioritize optimistic short-timelines work; still others should focus on pessimistic short timelines. It's worth considering what this balance ought to look like, and how we might get closer to the right balance.
A natural next question: What plausible futures are we neglecting? That's a question I want to spend more time thinking about.
Individuals benefit from developing expertise over time. In most fields, it takes more than 80,000 person-hours for diminishing marginal utility of effort to kick in. The gains of increasing expertise outweigh the diminishing utility of marginal work. ↩︎
Some objections:
In short timelines, we might have a small amount of calendar time with which to work on more difficult tasks, but a large amount of thinking/research time courtesy of automated AI researchers / research assistants. This might be important to the value of things like foundational research under short timelines.
Solving the alignment problem does not necessarily mean "the shape of the future will be determined by an aligned ASI". For example, an aligned earth-originating ASI might run into a more powerful alien AI; and the shape of the future would be determined by the alien AI. In that case, we care about the long-term effects of earth-originating AI only insofar as they influence the decisions of the alien one. More prosaically, if we solve intent-alignment but someone uses this to launch a coup the resulting ASI might not be aligned to the interests of all humanity / good values.
This isn't always true. For example, decision theory research could inform interventions on unaligned ASIs that make the future go better for sentient beings in expectation, without us having solved the full alignment problem.