AD

Anthony DiGiovanni

1170 karmaJoined

Bio

Researcher at the Center on Long-Term Risk. I (occasionally) write about altruism-relevant topics on my Substack. All opinions my own.

Comments
162

It seems to me that you need to weight the probability functions in your set according to some intuitive measure of your plausibility, according to your own priors.

The concern motivating the use of imprecise probabilities is that you don't always have a unique prior you're justified in using to compare the plausibility of these distributions. In some cases you'll find that any choice of unique prior, or unique higher-order distribution for aggregating priors, involves an arbitrary choice. (E.g., arbitrary weights assigned to conflicting intuitions about plausibility.)

I don't think you need to be ambiguity / risk averse to be worried about robustness of long-term causes. You could think that (1) the long term is extremely complex and (2) any paths to impact on such a complex system that humans right now can conceive of will be too brittle to model errors.

It's becoming increasingly apparent to me how strong an objection to longtermist interventions this comment is. I'd be very keen to see more engagement with this model.

My own current take: I hold out some hope that our ability to forecast long-term effects, at least under some contingencies within our lifetimes, will be not-terrible enough. And I'm more sympathetic to straightforward EV maximization than you are. But the probability of systematically having a positive long-term impact by choosing any given A over B seems much smaller than longtermists act as if is the case — in particular, it does seem to be in Pascal's mugging territory.

My understanding is that:

  1. Spite (as a preference we might want to reduce in AIs) has just been relatively well-studied compared to other malevolent preferences. If this subfield of AI safety were more mature there might be less emphasis on spite in particular.
  2. (Less confident, haven't thought that much about this:) It seems conceptually more straightforward what sorts of training environments are conducive to spite, compared to fanaticism (or fussiness or little-to-lose, for that matter).

Thanks for asking — you can read more about these two sources of s-risk in Section 3.2 of our new intro to s-risks article. (We also discuss "near miss" there, but our current best guess is that such scenarios are significantly less likely than other s-risks of comparable scale.)

I've found this super useful over the past several months—thanks!

Given that you can just keep doing better and better essentially indefinitely, and that GPT is not anywhere near the upper limit, talking about the difficulty of the task isn't super meaningful.

I don't understand this claim. Why would the difficulty of the task not be super meaningful when training to performance that isn't near the upper limit?

In "Against neutrality...," he notes that he's not arguing for a moral duty to create happy people, and it's just good "others things equal." But, given that the moral question under opportunity costs is what practically matters, what are his thoughts on this view?: "Even if creating happy lives is good in some (say) aesthetic sense, relieving suffering has moral priority when you have to choose between these." E.g., does he have any sympathy for the intuition that, if you could either press a button that treats someone's migraine for a day or one that creates a virtual world with happy people, you should press the first one?

(I could try to shorten this if necessary, but worry about the message being lost from editorializing.)

I am (clearly) not Tobias, but I'd expect many people familiar with EA and LW would get something new out of Ch 2, 4, 5, and 7-11. Of these, seems like the latter half of 5, 9, and 11 would be especially novel if you're already familiar with the basics of s-risks along the lines of the intro resources that CRS and CLR have published. I think the content of 7 and 10 is sufficiently crucial that it's probably worth reading even if you've checked out those older resources, despite some overlap.

Load more