David Johnston

671 karmaJoined


There are several AGI pills one can swallow. I think the prospects for a treaty would be very bright if CCP and USG were both uncontrollability-pilled. If uncontrollability is true, strong cases for it are valuable.

On the other hand, if uncontrollability is false, Aschenbrenner's position seems stronger (I don't mean that it necessarily becomes correct, just that it gets stronger).

Looks like someone should attempt a pivotal act. If you think you might be the right person for the job - you probably are!

I would take the proposal to be AI->growth->climate change or other negative growth side effects

I can see how this gets you  for each each item , but not . One of the advantages Ozzie raises is the possibility to keep track of correlations in value estimates, which requires more than the marginal expectations.

So constructing a value ratio table means estimating a joint distribution of values from a subset of pairwise comparisons, then sampling from the distribution to fill out the table?

In that case, I think estimating the distribution is the hard part. Your example is straightforward because it features independent estimates, or simple functional relationships.

The only piece of literature I had in mind was von Neumann and Morgenstern’s representation theorem. It says: if you have a set of probability distributions over a set of outcomes and for each pair of distributions you have a preference (one is better than the other, or they are equal) and if this relation satisfies the additional requirements of transitivity, continuity and independence from alternatives, then you can represent the preferences with a utility function unique up to affine transformation.

Given that this is a foundational result for expected utility theory, I don’t think it is unusual to think of a utility function as a representation of a preference relation.

Do you envision your value ratio table to be underwritten by a unique utility function? That is, could we assign a single number to every outcome such that the table cell corresponding to three outcomes pair is always equal to ? These utilities could be treated as noisy estimates, which allows for correlations between and for some pairs.

My remarks concern what a value ratio table might be if it is more than just a “visualisation” of a utility function.

Because we are more likely to see no big changes than to see another big change.

if the risk is usually quite low (e.g. 0.001 % per century), but sometimes jumps to a high value (e.g. 1 % per century), the cumulative risk (over all time) may still be significantly below 100 % (e.g. 90 %) if the magnitude of the jumps decreases quickly, and risk does not stay high for long.

I would call this model “transient deviation” rather than “random walk” or “regular oscillation”

We can still get H4 if the amplitude of the oscillation or random walk decreases over time, right?

The average needs to fall, not the amplitude. If we're looking at risk in percentage points (rather than, say, logits, which might be a better parametrisation), small average implies small amplitude, but small amplitude does not imply small average.

Only if the sudden change has a sufficiently large magnitude, right?

The large magnitude is an observation - we have seen risk go from quite low to quite high over a short period of time. If we expect such large magnitude changes to be rare, then we might expect the present conditions to persist.

FWIW I think the general kind of model underlying what I’ve written is a joint distribution that models value something like

Thought about this some more. This isn't a summary of your work, it's an attempt to understand it in my terms. Here's how I see it right now: we can use pairwise comparisons of outcomes to elicit preferences, and people often do, but they typically choose to insist that each outcome has a value representable as a single number and use the pairwise comparisons to decide which number to assign each outcome. Insisting that each outcome has a value is a constraint on preferences that can allow us to compute which outcome is preferred between two outcomes for which we do not have direct data.

I see this post as arguing that we should instead represent preferences as a table of value ratios. This is not about eliciting preferences, but representing them. Why would we want to represent them like this? At first glance:

  • If the important thing is we represent preferences as a table, then we can capture every important comparison with a table of binary preferences
  • If we want to impose additional constraints so that we can extrapolate preferences, preference ratios seems to push us back to assigning one or more values to every outcome

What makes value ratios different from other schemes with multiple valuation functions is that value ratios give us a value function for each outcome we investigate. That is, there is a one-to-one correspondence between outcomes and value functions.

Here is a theory of why that might be useful: When we talk about the value of outcomes (such as "$5"), we are actually talking about that outcome in some context (such as "$5 for me now" or "$5 for someone who is very poor, now"). Preference relations can and do treat these outcomes as different depending on the context - $5 for me is worth less than $5 for someone who is very poor. Because of this, a value scale based on "$5-equivalents" will be different depending on the context of the $5.

A key proposition to motivate value ratios, Proposition 1: every outcome which we consider comes with a unique implied mixture of contexts. That is, if I say "the value of $5", I mean  where  is the mixture of contexts implied by my having said "$5".

This means, if I want to compare "the value of $10m" to "the value of saving a child's life", I have two options: I can compare  to  or I can compare  to . These might give me different answers, and the correct comparison depends which applied context I am considering these options in.

A value ratio could therefore be considered a table where each column is a context and each row specifies the relative value of the given item in that context. Note that, under this interpretation, we should not expect , unless . This is because items have different values in different contexts.

This can be extended to distributions over value ratios, in which case perhaps each sample comes with a context sampled from the distribution of contexts for that column of the table (I'm not entirely sure that works, but maybe it does). This can allow us to represent within-column correlations if we know that one outcome is  times better than another, regardless of context.

I don't think proposition 1 is plausible if we interpret it strictly. I'm pretty sure at different times people talk about the value of $5 with different implied contexts, and at other times I think people probably make some effort to consider the value of quite different outcomes in a common context. However, I think there still might be something to it. Whenever you're weighing up different outcomes, you definitely have an implicit context in mind. Furthermore, there probably is a substantial correlation between the context and the outcome - if two different people are considering the value of saving a child's life then there probably is substantial overlap between the contexts they're considering. Moreover, it's plausible that context sensitivity is an issue for the kinds of value comparisons that EAs want to make.

Load more