Recently, we shared our general model of the AI-sentience-possibility-space:
If this model is remotely accurate, then the core problem seems to be that no one currently understands what robustly predicts whether a system is conscious—and that we therefore also do not know with any reasonable degree of confidence which of the four quadrants we are in, now or in the near future.
This deep uncertainty seems unacceptably reckless from an s-risk perspective for fairly obvious reasons—but is also potentially quite dangerous from the point of view of alignment/x-risk. We briefly sketch out the x-risk arguments below.
Ignoring the question of AI sentience is a great way to make a sentient AI hate us
One plausible (and seemingly neglected) reason why advanced AI may naturally adopt actively hostile views towards humanity is if these systems come to believe that we never bothered to check in any rigorous way whether training or deploying these systems was subjectively very bad for these systems (especially if those systems were in fact suffering during training and/or deployment).
Note that humans have a dark history of conveniently either failing to ask or otherwise ignoring this question in very similar contexts (e.g., “don’t worry, human slaves are animal-like!,” “don’t worry, animal suffering is incomparable to human suffering!”)—and that an advanced AI would almost certainly be keenly aware of this very pattern.
It is for this reason that we think AI sentience research may be especially important, even in comparison to other more-commonly-discussed s-risks. If, for instance, we are committing a similar Type II error with respect to nonhuman animals, the moral consequences would also be stark—but there is no similarly direct causal path that could be drawn between these Type II errors and human disempowerment (i.e., suffering chickens can’t effectively coordinate against their captors; suffering AI systems may not experience such an impediment).
Sentient AI that genuinely 'feels for us' probably wouldn't disempower us
It may also be the case that developing conscious AI systems could induce favorable alignment properties—including prosociality—if done in the right way.
Accidentally building suffering AIs may be damning from an x-risk perspective, but purposely building thriving AIs with a felt sense of empathy, deep care for sentient lives, etc may be an essential ingredient for mitigating x-risk. However, given current uncertainty about consciousness, it also seems currently impossible to discern how to reliably yield one of these things and not the other.
Therefore, it seems clear to us that we need to immediately prioritize and fund serious, non-magical research that helps us better understand what features predict whether a given system is conscious—and subsequently use this knowledge to better determine which quadrant we are actually in (above) so that we can act accordingly.
At AE Studio, we are contributing to this effort through our own work and by helping to directly fund promising consciousness researchers like Michael Graziano—but this space requires significantly more technical support and funding than we can provide alone. We hope that this week’s debate and broader conversations in this space will move the needle in the direction of taking this issue seriously.
I claim that I understand sentience. Sentience is just a word that people have projected their confusions about brains / identity onto.
Put less snarkily:
Consciousness does not have a commonly agreed upon definition. The question of whether an AI is conscious cannot be answered until you choose a precise definition of consciousness, at which point the question falls out of the realm of philosophy into standard science.
This might seem like mere pedantry or missing the point, because the whole challenge is to figure out the definition of consciousness, but I think it is actually the central issue. People are grasping for some solution to the "hard problem" of capturing the je ne sais quoi of what it is like to be a thing, but they will not succeed until they deconfuse themselves about the intangible nature of sentience.
You cannot know about something unless it is somehow connected the causal chain that led to the current state of your brain. If we know about a thing called "consciousness" then it is part of this causal chain. Therefore "consciousness", whatever it is, is a part of physics. There is no evidence for, and there cannot ever be evidence for, any kind of dualism or epiphenomenal consciousness. This leaves us to conclude that either panpsychism or materialism is correct. And causally-connected panpsychism is just materialism where we haven't discovered all the laws of physics yet. This is basically the argument for illusionism.
So "consciousness" is the algorithm that causes brains to say "I think therefore I am". Is there some secret sauce that makes this algorithm special and different from all currently known algorithms, such that if we understood it we would suddenly feel enlightened? I doubt it. I expect we will just find a big pile of heuristics and optimization procedures that are fundamentally familiar to computer science. Maybe you disagree, that's fine! But let's just be clear that that is what we're looking for, not some other magisterium.
Making it genuinely "feel for us" is not well defined. There are some algorithms that make it optimize for our safety. Some of these will be vaguely similar to the algorithm in human brains that we call empathy, some will not. It does not particularly matter for alignment either way.
Ya, we probably roughly agree about meta-ethics, too. But I wouldn't say I "understand" consciousness or ethics, except maybe at a high level, because I'm not settled on what I care about and how. The details matter, and can have important implications. I would want to defer to my more informed views.
For example, the evidence in this paper was informative to me, even assuming strong illusionism:
https://www.frontiersin.org/journals/veterinary-science/articles/10.3389/fvets.2022.788289/full
And considerations here also seem important to me:
https://forum.effec... (read more)