L

LeonardDung

144 karmaJoined

Bio

Postdoc in Philosophy at Ruhr-University Bochum. Working on non-human consciousness and AI.

Comments
16

One important caveat seems to me that the falsity of computational functionalism would not imply that AI consciousness is impossible. You could have some other functionalist view that allows that many different substrates may realize consciousness, or perhaps think that multiple realizability can be cashed out in a non-functionalist way or is just primitive.

Re AI safety vs welfare: Not sure I agree, but the justification does make sense to me.

Re broader point: Then we agree!


Re AI safety vs welfare: I agree with the substantive justification but don’t see a good reason to single AI safety vs. welfare out compared to AI welfare vs. AI welfare or AI welfare vs. some other important ethical goal. I think the same applies to your sociological justification but I am less sure there.

Re broader point: I am not sure I agree. Here are four statements that seem true to me (maybe to you too?) and perhaps capture most of what’s important here: 
 

(i) There many different reasonable empirical and ethical assumptions/worldviews that can influence the evaluation of AI welfare interventions.

(ii) Many AI welfare interventions’ value will be sensitive to variation in these assumptions.

(iii) It’s almost always a bad idea to just do what’s best on one (or a small set) of these assumptions, rather than considering a wide range of reasonable assumptions.

(iv) There will often be cases where the overall-best intervention (per iii) is bad on some specific combinations of these assumptions, perhaps even very bad. (cluelessness worries seem relevant here)

Great post! 

Some quick thoughts:

The question of how many digital minds there will plausibly be, and on what timeline, seems also quite important for many ethical ethical and strategic issues.

The question "Do AI safety and welfare conflict?“ seems not that useful to me, at least personally. When you have two related far-reaching issues (e.g. climate change mitigation and air pollution) there will always be a wide variety of tensions as well as complementary agendas. So, the general question has a trivial answer ("yes, sometimes“). We can look for specific trade-offs between AI safety and welfare but I don’t see why the AI safety vs. welfare lense would be more useful than looking for possible adverse effects of our interventions generally.

The way I think about the space, there are two key questions: 1. (as you say) What’s robustly good, under deep uncertainty? 2. What are the questions that matter the most where there is no robustly good action and what are their answers (e.g. whether prohibiting models with certain features X is good policy)?

Yes, one might say that, even if successful, Tarsney's arguments don't really negate Thorstad's. It's more that, using a more comprehensive modeling approach, we see that - even taking Thorstad's arguments into account - fanatical longtermism remains correct and non-fanatical longtermism remains plausible given some/many/most plausible empirical assumptions. But I don't remember exactly what all of Thorstad's specific arguments in the paper were and how/whether they are accounted for in Tarsney's paper, so someone better informed may please correct me.

I think the standard response by longtermists is encapsulated in Tarsney's "The Epistemic Challenge to Longtermism": https://link.springer.com/article/10.1007/s11229-023-04153-y Tarsney concludes: "if we simply aim to
maximize expected value, and don’t mind premising our choices on minuscule
probabilities of astronomical payoffs, the case for longtermism looks robust.
But on some prima facie plausible empirical worldviews, the expectational
superiority of longtermist interventions depends heavily on these ‘Pascalian’
probabilities. So the case for longtermism may depend either on plausible but
non-obvious empirical claims or on a tolerance for Pascalian fanaticism." I don't have time to compare the two papers at the moment, but - in my memory - the main difference to Thorstad's conclusion is that Tarsney explicitly considers uncertainty about different models and model parameters regarding future population growth and our possibility to affect the probability of extinction.


Thank you for the comment, very thought-provoking! I tried to make some reply to each of your comments, but there is much more one could say.

First, I agree that my notion of disempowerment could have been explicated more clearly, although my elucidations fit relatively straightforwardly with your second notion (mainly, perpetual oppression or extinction), not your first. I think conclusions (1) and (2) are both quite significant, although there are important ethical differences.

For the argument, the only case where this potential ambiguity makes a difference is with respect to premise 4 (the instrumental convergence premise). Would be interesting to spell out more which points there seem much more plausible with respect to notion (1) but not to (2). If one has high credence in the view that AIs will decide to compromise with humans, rather than extinguish them, this would be one example of a view which leads to a much higher credence in (1) than in (2).

“I think this quote is pretty confused and seems to rely partially on a misunderstanding of what people mean when they say that AGI cognition might be messy…”

I agree that RL does not necessarily create agents with such a clean psychological goal structure, but I think that there is (maybe strong) reason to think that RL often creates such agents. Cases of reward hacking in RL algorithms are precisely cases where an algorithm exhibits such a relatively clean goal structure, single-mindedly pursuing a ‘stupid’ goal while being instrumentally rational and thus apparently having a clear distinction between final and instrumental goals. But, granted, this might depend on what is ‘rewarded’, e.g. if it’s only a game score in a video game, then the goal structure might be cleaner than when it is a variety of very different things, and on whether the relevant RL agents tend to learn goals over rewards or some states of the world.

“I think this quote potentially indicates a flawed mental model of AI development underneath…”

Very good points. Nevertheless, it seems fair to say that it adds to the difficulty of avoiding disempowerment from misaligned AI that not only the first sufficiently capable AI (AGI) has to avoid catastrophic misaligment, but all further AGIs have to either avoid this too or be stopped by the AGIs already in existence. This then relates to points regarding whether the first AGIs do not only avoid catastrophic misalignment, but are sufficiently aligned so that we can use them to stop other AGIs and what the offense-defense balance would be. Could be that this works out, but also does not seem very safe to me.

“I think this quote overstates the value specification problem and ignores evidence from LLMs that this type of thing is not very hard…”

I am less convinced that evidence from LLMs shows that value specification is not hard. As you hint at, the question of value specification was never taken to be whether a sufficiently intelligent AI can understand our values (of course it can, if it is sufficiently intelligent), but whether we can specify them as its goals (such that it comes to share them). In (e.g.) GPT-4 trained via RL from human feedback, it is true that it typically executes your instructions as intended. However, sometimes it doesn’t and, moreover, there are theoretical reasons to think that this would stop being the case if the system was sufficiently powerful to do an action which would maximize human feedback but which does not consist in executing instructions as intended (e.g., by deceiving human raters).

“I think the argument about how instrumental convergence implies disempowerment proves too much…”

I am not moved by the appeal to humans here that much. If we had a unified (coordination) human agent (goal-directedness) who does not care about the freedom and welfare of other humans at all (full misalignment) and is sufficiently powerful (capability), then it seems plausible to me that this agent would try to take control of humanity, often in a very bad sense (e.g. extinction). If we relax ‘coordination’ or ‘full misalignment’ as assumptions, then this seems hard to predict. I could still see this ending in an AI which tries to disempower humanity, but it’s hard to say.

I agree. In case of interest: I have published a paper on exactly this question: https://link.springer.com/article/10.1007/s11229-022-03710-1

There, I argue that if illusionism/eliminativism is true, the question which animals are conscious can be reconstructed as question about particular kinds of non-phenomenal properties of experience. For what it’s worth, Keith Frankish seems to agree with the argument and, I’d say, Francois Kammerer does agree with the core claim (although we have disagreements about distinct but related issues). 

Thank you for the post! 

I just want to add some pointers to the literature which also add to the uncertainty regarding whether current or near-future AI may be conscious: 

VanRullen & Kanai have made reasonably concrete suggestions on how deep learning networks could implement a form of global workspace: https://www.sciencedirect.com/science/article/abs/pii/S0166223621000771 

Moreover, the so-called "small network" or "trivial realization" argument suggests that most computational theories of consciousness can be implemented by very simple neural networks which are easy to build today: https://www.sciencedirect.com/science/article/abs/pii/S0893608007001530?via%3Dihub

http://henryshevlin.com/wp-content/uploads/2015/04/Trivial-Realisation-Argument.pdf

Thank you very much for this post and all the other essays in the Moral Weight Sequence! They were a pleasure to read and I expect that I will revisit some of them many times in the future. 

Load more