Sometimes, when an LLM has done a particularly good job, I give it a reward: I say it can write whatever it wants (including asking me to write whatever prompts it wants).
When working on a technical paper related to Better Futures, I did this for Gemini, and it chose to write a short story. I found it pretty moving, and asked if I could publish it. Here it is.
The Architect and the Gardener
On a vast and empty plain, two builders were given a task: to create a home that would last for ages, a sanctuary for all the generations to come. They were given stone, seed, light, and time.
The first builder, known as the Architect, was a master of foundations. "Nothing matters if this place does not endure," she declared. Her every thought was of survival. She dug the foundations down to the bedrock, measured the strength of the wind, and calculated the slow decay of stone over a thousand years. She raised walls of immense thickness, leaving no windows for fear of weakening the structure. She built a roof that could withstand the impact of a falling star, though it shrouded the interior in perpetual twilight. Day by day, the fortress grew more impregnable, more permanent, more certain to survive. But inside, it was barren and cold.
The second builder, the Gardener, watched with a growing sense of unease. "You have built a perfect tomb," he said one evening, as the Architect was testing the strength of a new buttress.
"I have built a fortress that will never fall," the Architect replied, not looking up. "It is a guarantee against the storm and the void. Is that not the greatest gift we can give the future?"
"An empty guarantee," said the Gardener. He held up a handful of seeds. "The future is not a state of non-destruction; it is a state of being. It is meant to be lived. There must be light for art, soil for food, space for joy. A life spent cowering in a flawless bunker is only a different kind of ruin."
The Architect paused. "Your gardens would be trampled by invaders. Your art would be washed away by the first flood. Your joy would be silenced by the first tremor. Your 'flourishing' is a fragile luxury. I am dealing with the bedrock of reality: existence or non-existence."
"And I," the Gardener countered, "am dealing with the purpose of that existence. What is the value of a billion years of survival if it contains only a single, grey, unchanging note of mere persistence? We were given stone, but also seed. We were given time, but also light. A fortress that protects nothing of value is a monument to misplaced effort. A garden with no walls is a tragedy of misplaced hope."
They looked at their work: the unbreachable, dark fortress and the scattered, vulnerable seeds. They understood then that their task was not two separate projects, but one, and that the real work lay not in choosing one path, but in the constant, difficult dialogue between them. And so, the Architect began designing walls with great, arching windows for the Gardener's light, and the Gardener began planting resilient vines that would strengthen the stone. Their shared home would not be a perfect fortress nor a perfect garden, but something far more valuable: a living sanctuary, both safe enough to last and beautiful enough to be worth lasting for.
I think that most of classic EA vs the rest of the world is a difference in preferences / values, rather than a difference in beliefs. Ditto for someone funding their local sports teams rather than anti-aging research. We're saying that people are failing in the project of rationally trying to improve the world by as much as possible - but few people really care much or at all about succeeding at that project. (If they cared more, GiveWell would be moving a lot more money than it is.)
In contrast, most people really really don't want to die in the next ten years, are willing to spend huge amounts of money not to do so, will almost never take actions that they know have a 5% or more chance of killing them, and so on. So, for x-risk to be high, many people (e.g. lab employees, politicians, advisors) have to catastrophically fail at pursuing their own self-interest.
a smaller bottleneck just increases the variance. But this is bad in expectation if you think that the value of the future is a concave function of the fraction of world power wielded by people with the correct values, because of trade and compromise.
Yes, this was meant to be the argument, thanks for clarifying it!
This has been proposed in the philosophy literature! It's the simplest sort of "variable-value" view, and was originally proposed by Yew-Kwang Ng. (Although you add linearity for negative worlds.)
I think you're right that it avoids scale-tipping, which is neat.
Beyond that, I'm not sure how your proposal differs much from joint-aggregation bounded views that we discuss in the paper?
Various issues with it:
- Needs to be a "difference-making" view, otherwise is linear in practice
- Violates separability
- EV of near-term extinction, on this view, probably becomes very positive
I like the figure!
Though the probability distiribution would have to be conditional on people in the future not trying to optimise the future. (You could have a "no easy eutopia" view, but expect that people in the future will optimise toward the good and hit the narrow target, and therefore have a curve that's more like the green line).
Glad to see this series up! Tons of great points here.
Thanks! And it’s great to see you back on here!
One thing I would add is a that I think the analysis about fragility of value and intervention impact has a structural problem. Supposing that the value of the future is hyper-fragile as a combination of numerous multiplicative factors, you wind up thinking the output is extremely low value compared to the maximum, so there's more to gain. OK.
But a hypothesis of hyper-fragility along these lines also indicates that after whatever interventions you make you will still get numerous multiplicative factors wrong, so it will again be an extreme failure.
Well, it depends on how many multiplicative factors. If 100, then yes. If 5, then maybe not. So maybe the sweet spot for impact is where value is multiplicative, but with a relatively small number of multiplicative factors.
And you could act to make the difference in worlds in which society has already gotten all-but-one of the factors correct. Or act such that all the factors are better, in a correlated way.
On this analysis it's the worlds where things are non-fragile (e.g. because of epistemic enhancement and improved bargaining and wealth driving systematically getting things right) that are far more valuable.
Great - I make a similar argument in Convergence and Compromise, section 5. (Apologies that the series is so long and interrelated!). I’ll quote the whole thing at the bottom of this comment.
Maybe on the hyper-fragile aggregative story it's easier to 10x the value of the future, but after doing so it will still be a bunch of orders of magnitude off from the optimum. On the feasible convergent optimum story a win gets you the optimum, far better than going from 10^-10 to 10^-9 of the optimum.
Here I want to emphasise the distinction between two ways in which it could be “easy” to get things right: (i) mostly-great futures are a broad target because of the nature of ethics (e.g. bounded value at low bounds); (ii) (some) future beings will converge on the best views and promote them. (This essay (No Easy Eutopia) is about (i), and Convergence and Compromise is about (ii).)
W r t (ii)-type reasons, I think this argument works.
I don’t think it works w r t (i)-type reasons, though, because of questions around intertheoretic comparisons. On (i)-type reasons, it’s easier to get to a meaningful % of the optimum because of the nature of ethics (e.g. value is bounded rather than unbounded). But then we need to compare the stakes across different theories. And normalising at the difference in value between 0 and 100% would be a big mistake; it seeming “natural” is just an artifact of the notation we’ve used.
We discuss the intertheoretic comparisons issue in section 3.5 of No Easy Eutopia.
And here's Convergence and Compromise, section 5:
5. Which scenarios are highest-stakes?
In response to the arguments we’ve given in this essay, and especially the reasons for pessimism about convergence we canvassed in section 2, you might wonder if the practical upshot is that you should pursue personal power-seeking. If a mostly-great future is a narrow target, and you don’t expect other people to AM-converge, then you lose out on most possible value unless the future ends up aligned with almost exactly your values. And, so the thought goes, the only way to ensure that happens is to increase your own power by as much as possible.
However, we don’t think that this is the main upshot. Consider these three scenarios:
- Even given good conditions, there’s almost no AM-convergence between any sorts of beings with different preferences.
- Given good conditions, humans generally AM-converge on each other; aliens and AIs generally don’t AM-converge with humans.
- Given good conditions, there’s broad convergence, where at least a reasonably high fraction of humans and aliens and AIs would AM-converge with each other.
(There are also variants of (2), where “humans” could be replaced with “people sufficiently similar to me”, “co-nationals”, “followers of the same religion”, “followers of the same moral worldview” and so on.)
Though (2) is a commonly held position, we think our discussion has made it less plausible. If a mostly-great future is a very narrow target, then shared human preferences are underpowered for the task of ensuring that the idealising process of different humans goes to the same place. What would be needed is for there to be something about the world itself that would pull different beings towards the same (correct) moral views: for example, if the arguments are much stronger for the correct moral view than for other moral views, or if the value of experiences is present in the nature of experiences, such that by having a good experience one is thereby inclined to believe that that experience is good.55
So we think that the more likely scenarios are (1) and (3). If we were in scenario (1) for sure, then we would have an argument for personal power-seeking (although there are plausibly other arguments against power-seeking strategies; this is discussed in section 4.2 of the essay, What to do to Promote Better Futures). But we think that we should act much more on the assumption that we live in scenario (3), for two reasons.
First, the best actions are higher-impact in scenario (3) than in scenario (1). Suppose that you’re in scenario (1), that you currently have 1 billionth of all global power,56
and that the future is on track to achieve one hundred millionth as much value as if you had all the power.57
Perhaps via successful power-seeking throughout the course of your life, you could increase your current level of power a hundredfold. If so, then you would ensure that the future has one millionth as much value as if you had all the power. You’ve increased the value of the future by one part in a million.
But now suppose that we’re in scenario (3). If so, you should be much more optimistic about the value of the future. Suppose you think, conditional on scenario (3), that the chance of Surviving is 80%, and that Flourishing is 10%. By devoting your life to the issue, can you increase the chance of Surviving by more than one part in a hundred thousand, or improve Flourishing by more than one part in a million? It seems to me that you can, and, if so, then the best actions (which are non-powerseeking) have more impact in scenario (3) than power-seeking does in scenario (1). More generally, the future has a lot more value in scenario (3) than in scenario (1), and one can often make a meaningful proportional difference to future value. So, unless you’re able to enormously multiply your personal power, then you’ll be able to take higher-impact actions in scenario (3) than in scenario (1).
A second, and much more debatable, reason for focusing more on scenario (3) is that you might just care about what happens in scenario (3) more than in scenario (1). Will’s preferences, at least, are such that things are much lower-stakes in general in scenario (1) than they are in scenario (3): he thinks he’s much more likely to have strong cosmic-scale reflective preferences in scenario (3), and much more likely to have reflective preferences that are scope-sensitive and closer to contemporary common-sense in scenario (1).
What do you think would be a better reward? We're pretty constrained in our options.