However, I think you mean that the expected welfare range will be significant (for example, at least 1 % of that of humans) as long as there is one plausible model (for example, which gets 10 % weight) which predicts a significant welfare range (for example, 10 % of that of humans).
Yeah, basically.
I wonder whether electromagnetic (EM) field theories of consciousness could shed some light on it. I assume the maximum intensity of the EM fields generated by brain activity depends on the number of neurons, at least when assessed across species (there is little variance in the number of neurons of humans, which means the maximum intensity of EM fields may not vary much in humans).
This makes me think we're inclined towards a different basic perspective on the determinants of valence. This kind of sounds like you're thinking of pain as a sort of physical magnitude, like weight or charge. Then it is reasonable to think it is likely to scale with size, so that much smaller brains are likely to have much smaller magnitudes. I'm more inclined towards functionalist interpretations of welfare, on which something like relative functional significance determines welfare levels. E.g. something's attention-grabbing capacity helps to determine its welfare significance. In that case, you might be deeply skeptical that small animals have the right functional role at all, but once you grant they do, it is much more plausible that welfare ranges are similar to humans. However, for my point to be right, I think you just need to treat these kinds of functionalist views as in the running. You don't have to be confident that they're true.
. For the conclusion to follow, (ii) needs to be: evidence of a welfare range that is not too insignificant to crucially undermine the importance of their large numbers.
I disagree with this. You don't need evidence of a welfare range that is not too insignificant -- you need a presumption of a reasonable probability of a welfare range that is not too small and no significant evidence against it. Without a theory of how you get from neurons to welfare, it seems to me to be very unreasonable to be confident that simpler brains most likely have much smaller welfare ranges. And we don't have a good theory of how to get from neurons to welfare.
One thing I've found for myself: it is easy to think of lifestyle changes across the board, but different areas have very different impacts on your budget. Opting for the slightly nicer apartment may end up costing way more than switching to the fancy vegan yogurt. Insofar as lifestyle creep is reasonable (it is probably healthy to be increasing your quality of life a bit over time), it is better to focus on the little things than the big ones.
AIs could have negligible welfare (in expectation) even if they are conscious. They may not be sentient even if they are conscious, or have negligible welfare even if they are sentient. I would say the (expected) total welfare of a group (individual welfare times population) matters much more for its moral consideration than the probability of consciousness of its individuals. Do you have any plans to compare the individual (expected hedonistic) welfare of AIs, animals, and humans? You do not mention this in the section "What’s next".
This is an important caveat. While our motivation for looking at consciousness is largely from its relation to moral status, we don't think that establishing that AIs were conscious would entail that they have significant states that counted strongly one way or the other for our treatment of them, and establishing that they weren't conscious wouldn't entail that we should feel free to treat them however we like.
We think that it estimates of consciousness still play an important practical role. Work on AI consciousness may help us to achieve consensus on reasonable precautionary measures and motivate future research directions with a more direct upshot. I don't think the results of this model can be directly plugged into any kind of BOTEC, and should be treated with care.
Do you have any ideas for how to decide on the priors for the probability of sentience? I agree decisions about priors are often very arbitrary, and I worry they will have significantly different implications.
We favored a 1/6 prior for consciousness relative to every stance and we chose that fairly early in the process. To some extent, you can check the prior against what you update to on the basis of your evidence. Given an assignment of evidence strength and an opinion about what it should say about something that satisfies all of the indicators, you can backwards infer the prior needed to update to the right posterior. That prior is basically implicit in your choices about evidential strength. We didn't explicitly set our prior this way, but we would probably have reconsidered our choice of 1/6 if it was giving really implausible results for humans, chickens, and ELIZA across the board.
The right conclusion would be that the respondants have no idea about the right exponent, or how to weight the various models because they would not be able to adequately justify their picks.
There is a tension here between producing probabilities we think are right and producing probabilities which could reasonably act as a consensus conclusion. I have my own favorite stance, and I think I have good reason for it, but I didn't try to convince anyone to give it more weight in our aggregation. Insofar as we're aiming in the direction of something that could achieve broad agreement, we don't want to give too much weight to our own views (even if we think we're right). Unfortunately,among people with significant expertise in this area, there is broad and fairly fundamental disagreement. We think that it is still valuable to shoot for consensus, even if that means everyone will think it is flawed (by giving too much weight to different stances.)
This last part carries a lot of weight; a simulacrum, when dormant in the superposition from which it can be sampled, is nonexistent. A simulacrum only exists during the discrete processing event which correlates with its sampling.
There seems to me to be a sensible view on which a simulacrum exists to the extent that computations relevant to making decisions on its behalf are carried out, regardless of what the token sampler chooses. This would suggest that there could conceivably be vast numbers of different simulacra instantiated even in a single forward pass.
One odd upshot of requiring the token sampler is that in contexts in which no tokens get sampled (prefill, training) you can get all of the same model computations but have no simulacra at all.
I find this distinction kind of odd. If we care about what digital minds we produce in the future, what should we be doing now?
I expect that what minds we build in large numbers in the future will be largely depend on how we answer a political question. The best way to prepare now for influencing how we as a society answer that question (in a positive way) is to build up a community with a reputation for good research, figure out the most important cruxes and what we should say about them, create a better understanding of what we should actually be aiming for, initiate valuable relationships with potential stakeholders based on mutual respect and trust, creating basic norms about human-ai relationships, and so on. To me, that looks like engaging with whether near-future AIs are conscious (or have other morally important traits) and working with stakeholders to figure out what policies make sense at what times.
Though I would have thought the posts you highlighted as work you're more optimistic about fit squarely within that project, so maybe I'm misunderstanding you.
I think this is basically right (I don't think the upshot is that incomparability implies nihilism, but rather the moral irrelevance of most choices). I don't really understand why this is a reason to reject incomparability. If values are incomparable, it turns out that the moral implications are quite different from what we thought. Why change your values rather than your downstream beliefs about morally appropriate action?
Thanks for the suggestion. I'm interested in the issue of dealing with threats in bargaining.
I don't think we ever published anything specifically on the defaults issue.
We were focused on allocating a budget that respects the priorities of different worldviews. The central thing we were encountering was that we started by taking the defaults to be the allocation you get by giving everyone their own slice of the total budget and spending it as they wanted. Since there are often options that are well-suited to each different worldview, there is no way to get good compromises. Everyone is happier with the default than any adjustment of it. (More here.) On the other hand, if you switch the default to be some sort of neutral 0 value (assuming that can be defined), then you will get compromises, but many bargainers would rather that they just be given their own slice of the total budget to allocate.
I think the importance of defaults comes through just by playing around with some numbers. Consider the difference between setting the default to be the status quo trajectory we're currently on and setting the default to be the worst possible outcome. Suppose we have two worldviews, one of which cares about suffering in all other people linearly, and the other of which is very locally focused and doesn't care about immense suffering elsewhere. For the two worldviews, relative to the status quo, option A might give (worldview1: 2,worldview2: 10) value and option B might give (4,6) value. Against this default, option B has a higher product (24 vs 20) and is preferred by Nash bargaining. However, relative to the worst possible value default, option A might give (10,002, 12) and option B (10,004, 8), then option A would be preferred to option B (~120k vs 80k).
We implemented a Nash bargain solution in our moral parliament and I came away the impression that the results of Nash bargaining are very sensitive to your choice of defaults and for plausible defaults true bargains can be pretty rare. Anyone who is happy with defaults gets disproportionate bargaining power. One default might be 'no future at all', but that's going to make it hard to find any bargain with the anti-natalists. Another default might be 'just more of the same', but again, someone might like that and oppose any bargain that deviates much. Have you given much thought to picking the right default against which to measure people's preferences? (Or is the thought that you would just exclude obstinate minorities?)
I'm skeptical of functionalism about consciousness (though I don't know any alternative that fares better.) But functionalism about valence seems much harder to avoid. Maybe if you have a benevolent God? Or some sort of dualism? Otherwise, it seems to me that you're going to be hard-pressed to explain why it is that the functional role of valence aligns with whatever properties constitute the fact that valence matters. Why is it that pain is bad and we avoid it, or pleasure is good and we seek it, if it is not just the case that the things we're inclined to avoid count as pain, and the things we're inclined to seek count as pleasure (very roughly).