DS

Derek Shiller

Researcher @ Rethink Priorities
2665 karmaJoined Derekshiller.com

Posts
26

Sorted by New

Comments
163

AIs could have negligible welfare (in expectation) even if they are conscious. They may not be sentient even if they are conscious, or have negligible welfare even if they are sentient. I would say the (expected) total welfare of a group (individual welfare times population) matters much more for its moral consideration than the probability of consciousness of its individuals. Do you have any plans to compare the individual (expected hedonistic) welfare of AIs, animals, and humans? You do not mention this in the section "What’s next".

This is an important caveat. While our motivation for looking at consciousness is largely from its relation to moral status, we don't think that establishing that AIs were conscious would entail that they have significant states that counted strongly one way or the other for our treatment of them, and establishing that they weren't conscious wouldn't entail that we should feel free to treat them however we like.

We think that it estimates of consciousness still play an important practical role. Work on AI consciousness may help us to achieve consensus on reasonable precautionary measures and motivate future research directions with a more direct upshot. I don't think the results of this model can be directly plugged into any kind of BOTEC, and should be treated with care.

Do you have any ideas for how to decide on the priors for the probability of sentience? I agree decisions about priors are often very arbitrary, and I worry they will have significantly different implications.

We favored a 1/6 prior for consciousness relative to every stance and we chose that fairly early in the process. To some extent, you can check the prior against what you update to on the basis of your evidence. Given an assignment of evidence strength and an opinion about what it should say about something that satisfies all of the indicators, you can backwards infer the prior needed to update to the right posterior. That prior is basically implicit in your choices about evidential strength. We didn't explicitly set our prior this way, but we would probably have reconsidered our choice of 1/6 if it was giving really implausible results for humans, chickens, and ELIZA across the board.

The right conclusion would be that the respondants have no idea about the right exponent, or how to weight the various models because they would not be able to adequately justify their picks.

There is a tension here between producing probabilities we think are right and producing probabilities which could reasonably act as a consensus conclusion. I have my own favorite stance, and I think I have good reason for it, but I didn't try to convince anyone to give it more weight in our aggregation. Insofar as we're aiming in the direction of something that could achieve broad agreement, we don't want to give too much weight to our own views (even if we think we're right). Unfortunately,among people with significant expertise in this area, there is broad and fairly fundamental disagreement. We think that it is still valuable to shoot for consensus, even if that means everyone will think it is flawed (by giving too much weight to different stances.)

This last part carries a lot of weight; a simulacrum, when dormant in the superposition from which it can be sampled, is nonexistent. A simulacrum only exists during the discrete processing event which correlates with its sampling.

There seems to me to be a sensible view on which a simulacrum exists to the extent that computations relevant to making decisions on its behalf are carried out, regardless of what the token sampler chooses. This would suggest that there could conceivably be vast numbers of different simulacra instantiated even in a single forward pass.

One odd upshot of requiring the token sampler is that in contexts in which no tokens get sampled (prefill, training) you can get all of the same model computations but have no simulacra at all.

I find this distinction kind of odd. If we care about what digital minds we produce in the future, what should we be doing now?

I expect that what minds we build in large numbers in the future will be largely depend on how we answer a political question. The best way to prepare now for influencing how we as a society answer that question (in a positive way) is to build up a community with a reputation for good research, figure out the most important cruxes and what we should say about them, create a better understanding of what we should actually be aiming for, initiate valuable relationships with potential stakeholders based on mutual respect and trust, creating basic norms about human-ai relationships, and so on. To me, that looks like engaging with whether near-future AIs are conscious (or have other morally important traits) and working with stakeholders to figure out what policies make sense at what times.

Though I would have thought the posts you highlighted as work you're more optimistic about fit squarely within that project, so maybe I'm misunderstanding you.

I think this is basically right (I don't think the upshot is that incomparability implies nihilism, but rather the moral irrelevance of most choices). I don't really understand why this is a reason to reject incomparability. If values are incomparable, it turns out that the moral implications are quite different from what we thought. Why change your values rather than your downstream beliefs about morally appropriate action?

Thanks for the suggestion. I'm interested in the issue of dealing with threats in bargaining.

I don't think we ever published anything specifically on the defaults issue.

We were focused on allocating a budget that respects the priorities of different worldviews. The central thing we were encountering was that we started by taking the defaults to be the allocation you get by giving everyone their own slice of the total budget and spending it as they wanted. Since there are often options that are well-suited to each different worldview, there is no way to get good compromises. Everyone is happier with the default than any adjustment of it. (More here.) On the other hand, if you switch the default to be some sort of neutral 0 value (assuming that can be defined), then you will get compromises, but many bargainers would rather that they just be given their own slice of the total budget to allocate.

I think the importance of defaults comes through just by playing around with some numbers. Consider the difference between setting the default to be the status quo trajectory we're currently on and setting the default to be the worst possible outcome. Suppose we have two worldviews, one of which cares about suffering in all other people linearly, and the other of which is very locally focused and doesn't care about immense suffering elsewhere. For the two worldviews, relative to the status quo, option A might give (worldview1: 2,worldview2: 10) value and option B might give (4,6) value. Against this default, option B has a higher product (24 vs 20) and is preferred by Nash bargaining. However, relative to the worst possible value default, option A might give (10,002, 12) and option B (10,004, 8), then option A would be preferred to option B (~120k vs 80k).

We implemented a Nash bargain solution in our moral parliament and I came away the impression that the results of Nash bargaining are very sensitive to your choice of defaults and for plausible defaults true bargains can be pretty rare. Anyone who is happy with defaults gets disproportionate bargaining power. One default might be 'no future at all', but that's going to make it hard to find any bargain with the anti-natalists. Another default might be 'just more of the same', but again, someone might like that and oppose any bargain that deviates much. Have you given much thought to picking the right default against which to measure people's preferences? (Or is the thought that you would just exclude obstinate minorities?)

Keeping the world around probably does that, so you should donate to Longtermist charities (especially because they potentially increase the number of people ever born, thus giving more people a chance of getting into heaven).

I often get the sense that people into fanaticism think that it doesn't much change what they actually should support. That seems implausible to me. Maybe you should support longtermist causes. (You probably have to contort yourself to justify giving any money to shrimp welfare.) But I would think the longtermist causes you should support will also be fairly different from 'mainstream' causes, and look rather weird close up. You don't really care if the species colonizes the stars and the future is full of happy people living great lives. If some sort of stable totalitarian hellscape offers a marginally better (but still vanishingly small) chance of producing infinite value, that is where you should put your money.

Maybe the best expected value would be to tile the universe with computers trying to figure out the best way to produce infinite value under every conceivable metaphysical scheme consistent with what we know and run them all until the heat death of the universe before trying to act. Given that most people are almost certainly not going to do that, you might think that we shouldn't be looking to build an aligned AI, we should want to build a fanatical AI.

Has your fanaticism changed your mind much about what is worth supporting?

But even a 10% chance that fish feel pain—and that we annually painfully slaughter a population roughly ten times the number of humans who have ever lived—is enough to make it a serious issue. Given the mind-bending scale of the harm we inflict on fish, even a modest chance that they feel pain is enough.

Completely in agreement here.

And while it’s possible that evolution produced some kind of non-conscious signal that produces identical behavior to pain, such a thing is unlikely. If a creature didn’t feel pain, it’s unlikely it would respond to analgesics, seek out analgesic drugs, and get distracted by bodily damage.

This is where I would disagree. I expect moderately-complicated creatures would develop traits like these under evolutionary pressures (except seeking out analgesic drugs). The question then is how likely is it that the best / only / easiest-to-evolve way to produce this slate of behaviors involves having a conscious experience with the relevant pain profile.

We know that human brains have undergone massive changes since our most recent common ancestor with fish, that terrestrial environments place very different demands on our bodies, that human beings have an unparalleled behavioral flexibility to address injuries, etc. so it is plausible that we do have fairly different nociceptive faculties. It seems to me like a pretty open question precisely how neurologically or algorithmically similar our faculties are and how similar they would need to be to for fish to qualify as having pain. The fact that we can't even tell how important the cortex is for pain in humans seems like strong evidence that we shouldn't be too confident about attributing pain to fish. We just know so little. Of course, we shouldn't be confident about denying it to them either, but much confidence either way seems unjustifiable.

I would think the trend would also need to be evenly distributed. If some groups have higher-than-replacement birth rates, they will simply come to dominate over time.

I think of moral naturalism as a position where moral language is supposed to represent things, and it represents certain natural things. The view I favor is a lot closer to inferentialism: the meaning of moral language is constituted by the way it is used, not what it is about. (But I also don't think inferentialism is quite right, since I'm not into realism about meaning either.)

I guess I don't quite see what your puzzlement is with morality. There are moral norms which govern what people should do. Now, you might deny there in fact are such things, but I don't see what's so mysterious.

Another angle on the mystery: it is possible that there are epistemic norms, moral norms, prudential norms, and that's it. But if you're a realist, it seems like it should also be possible that there are hundreds of other kinds of norms that we're completely unaware of, such that we act in all sorts of wrong ways all the time. Maybe there are special norms governing how you should brush your teeth (that have nothing to do with hygiene or our interests), or how to daydream. Maybe these norms hold more weight than moral norms, in something like the way moral norms may hold more weight than prudential norms. If you're a non-naturalist, then apart from trust in a loving God, I'm not sure how you address this possibility. But it also seems absurd that I should have to worry about such things.

Load more