Daniel_Friedrich

I like the high-level idea but for now, I am skeptical that going into the details of the math would make me decrease my AI p(doom) by 50,000 times (sorry, only skimmed it so far). As I understand it (knowing CTT), you're weighing "cause X p(doom)" against the general prior that "your reasoning is partially random and will come to incorrect conclusions".

However, in the case of AI x-risk, I'm not sure in which direction the prior should push me (which I take as a sign that it's integrated within my reasoning). Should I ignore AI as a concept, just singularity scenarios and put more weight on "there won't be more progress from now on"?

For me, this is particularly hard because I don't see AI p(doom) as just one number. E.g. I think there's a 20% chance of a "computronium-maximizer with arbitrary goals" and some chance of scenarios like "utility monster AGI", "global cooperation leading to value-maximizing AI", "global cooperation leading to a fraction of possible value", "AI dictatorship", "AI x-risk via terrorism or war" and then some chance "a significant stall in AI development - e.g. due to a moratorium, pandemic, war and maybe due to AGI being impossibly hard." ^[1]

It seems to me the lesson many readers will take away is putting more weight on the "AGI being impossibly hard" scenario but that, paradoxically, seems like a guess that requires a lot of confidence that the world will take a very different trajectory than what the trends suggest - i.e. such update would go against the spirit of the prior that prioritizes modesty and uncertainty.

Would this objection disappear if I tried to understand the math more deeply?

^{^}
Also, what is impossibly hard? I agree with Thorstad that if AGI took a 1000 more years, other problems should be a priority. If it took 70 more years, I would still think AI alignment research is extremely important, although I wouldn't think the same of AI safety activism.

Why only conscious preferences matter

Daniel_Friedrich2mo1

Maybe I misunderstood it as an argument for seeing preferences as a terminal value. I think almost all theories would agree "this theory matters in some sense" but I can imagine many ethical theories that do not see "believing in this moral theory" as either good or bad, each for different reasons. Hedonic utilitarianism, as an example of a consequentialist theory, does not see "believing in hedonic utilitarianism" as inherently valuable - depending on the consequences of believing it, it might even recommend not believing it in some contexts - while still positing hedonism itself is true. For a nihilist, "nihilism itself" probably does not matter morally but it does "matter" in terms of its alleged explanatory power.
You could see the luminosity itself as a factor that moderates the intensity of the signal to grow. But there also seem to be many internal processes that moderate the intensity of signals processed by plants: AI points me to a barley study where stress-related calcium signals varied by stimulus, dose, and tissue (pmc.ncbi.nlm.nih.gov). It still doesn't require an attention schema - i.e. weighted preferences don't imply consciousness (at least under your conception of preferences and Graziano's of consciousness, as I understand them).

Why only conscious preferences matter

Daniel_Friedrich2mo1

Thought-provoking argument! However, I see some gaps:

The reason why preferences matter is straightforward: if you prefer a moral theory according to which preferences do not matter, your preference for that moral theory cannot matter either.

I think this mixes up two senses of "mattering".

I might not think my preference for hedonic utilitarianism is terminally valuable but still think that believing true things is instrumentally valuable to achieve utility.

2. I share your intuition that "preferences need to have a weight" but I don't think that's the same thing as "being represented within an attention schema". I think plants can weigh their preferences (e.g. tolerate drier soil if the luminosity is sufficient) without having explicit preferences (i.e. language?) or a comprehensive world model, let alone a meta-model.

While I agree with the thesis from the title, I think it might be better anchored in ~Sharon H. Rawlette's argument that conscious sensations of "(un)desirability" construct/define what we mean by morality.

Only What Is Alive Can Be Conscious

Daniel_Friedrich3mo5

Thanks for clarifying - sorry it might sound like I was twisting your words - I was trying to think through multiple versions of the experiment you propose.

The amount to which we attribute/misattribute consciousness to different entities depends on the correct theory, so it is very uncertain at this point. But I would endorse this broader research program of systematically decoding which of our intuitions about consciousness are biases and which are valid measurements of brain data.

One reason why I thought about Trolley problems was that they show not only % of people who have an abstract belief about consciousness but also the degree / intensity of its perceived experiences. I'm surprised to see a significant fraction of people (1, 2) say current AI is conscious, although a poll about a personal sacrifice like this one (in a less narrow Twitter bubble) might be more relevant to assess how serious they are - and might better model the kind of moral error that we're more likely to make during the AGI transformation.

Regarding "biochemical processes" - the phrasing matters a lot here. Searle, who came up with The Chinese Room, concludes this thought experiment by suggesting thinking requires the specific biochemistry that brains use just like lactation or photosynthesis are defined by specific molecules, rather than algorithms. This formulation is specifically chosen in contrast to functionalist/computationalist views which are mainstream nowadays.

Only What Is Alive Can Be Conscious

Daniel_Friedrich3mo*3

Disintermediated by a computer to replace biases introduced by cuteness and non-verbal expressiveness with biases introduced by symbolic manipulation, humans would without exception rate a pocket calculator or Eliza- style toy script as more likely to be conscious than a dog or a two year old child. I don't think anyone sincerely believes this to actually be the case.

I'm not sure what you're imagining here. If you give people a trolley problem (only via text) and say on one track, there's a dog and on the other one, there's a computer program Eliza and they can chat to either, most would choose to save the dog, even if its only text output were "whoof whoof".

If you're imagining the thought experiment would somehow block them to make the inference that one entity is an actual dog and the other a program, then yes: I agree with the point that language increases empathy but I'd say the magnitude is much smaller than "non-verbal cues". If you had a Trolley dilemma with one blank track and one track with either a dog or Eliza, I think 90% would pay $0 to save Eliza but a often a lot of money to save the dog.

Unless we're positing dualism, what we perceive at consciousness is an emergent property of complex chemical processes rooted in our biology (and the imperatives of our biology to survive and self replicate.

Most non-dualists would say consciousness is a feature of information processing (functionalists, illusionists, non-reductive materialists) or something as fundamental as physics (Russelian monism, pan(proto)psychism). The particular emergentist and biological theory that is rooted in the instinct to self-replicate and survive is something I'd expect 0.1-7% of philosophers of mind to endorse. But whatever the actual percentages, I definitely disagree dualism and this theory are the only options. The phrase "rooted in [biochemical processes]" is the least controversial but it still connotes something most might not endorse - i.e. that biology and chemistry is the correct category or level of description (Axis 3 in this taxonomy).

Only What Is Alive Can Be Conscious

Daniel_Friedrich3mo3

I endorse the temperature approach. I'm not sure illusionists would accept the question "What's the % probability that an entity is conscious?" as meaningful but maybe a similar question could indeed be universally accepted, like "Compared to your pain intensity 1 (being poked by a needle), what's your central estimate for the intensity of suffering experienced in scenario X?"

Just to clarify, my argument didn't concern classical p-zombies but what I call "honest p-zombies" - intelligent humanoid entities capable of metacognition but without any intuition similar to our phenomenal intuitions.

Only What Is Alive Can Be Conscious

Daniel_Friedrich3mo5

Asking whether a process is "close enough [to the brain] to produce the same effect" implicitly begs the question - i.e. assumes consciousness is biological.

P-zombies who wouldn't describe their sensations in terms like "qualia" would likely have an evolutionary fit that's equal to humans. I don't know if they're possible, but I think it demonstrates evolution wasn't optimizing for consciousness. Therefore, we shouldn't ask "is such system sufficiently close to the brain" but "is it sufficiently close to the processes that happen to make brain (phenomenally) conscious".

In general, there isn't agreement about any correlate of consciousness within philosophy of mind - there are well regarded thinkers who claim it's not real (Frankish) or that it's the basic substance of the universe (Goff). I think it's possible consciousness is similar to, say, intelligence or humor, which means you need a complex system to meaningfully implement it. However, I think it's unlikely that "complexity itself" is what gives rise to consciousness, e.g. sunspots are very complex (~unpredictable interaction of many elements).

Only What Is Alive Can Be Conscious

Daniel_Friedrich3mo5

I'm not convinced by Anil Seth's narrative about our biases in mind attribution.

I've been to his talk where he summarized these points. He talked about our inherent tendency to emotionally relate to entities that can use language. Later, he presented a picture of a transistor and a picture of a monkey and asked which seems more conscious on priors.

The prime mechanism by which human decide whether an entity is valuable and conscious is empathy. We are evolved to feel empathy - that is, modelling "what it is like to be them" - towards entities that have faces, limbs, fur and a squishy body. We feel a lot of empathy for pets and babies - entities that don't control language. And we feel zero empathy for the Chinese room.

The argument relies a lot on trying to depict computers as something rigid, cold and dead and life as something interesting, warm and energetic. This works well for our empathy module but does not convince me as a philosophical argument.

I'm curious whether there's any definition of brain's processes as "non-algorithmic" that doesn't end up in Russellian monism (which I'm inclined to support but suspect Seth isn't). Aren't the laws of physics themselves an algorithm? I see autopoiesis as the most interesting connection between consciousness and life but precisely when you find a clear conceptualization like this, it becomes unclear

why it couldn't be implemented digitally - e.g. aren't LLMs autopoietic systems, where each token determines the next one?
what predictions it makes about the variation in human consciousness (in terms of modalities, intensity and reportability)? E.g. if consciousness is dependent on the degree of embodiment, does it predict Stephen Hawking had a low intensity of consciousness? Is the variance found in human consciousness better explained by the computational differences or differences in the mentioned random biological interactions?

New Video: If Anyone Builds It, Everyone Dies

Daniel_Friedrich4mo1

Great job! Personally, I'd alter the landing page to include recommendations on how to take action outside of working in AI safety (e.g. donation recommendations, "meet people" link) - or some comment why learning more seems crucial.

Celebrating wins — discussion thread

Daniel_Friedrich6mo2

We have seen an order-of-magnitude increase in the interest in AI alignment, according to Google Trends. Part of it (July peak) can be attributed to Grok's behavior (see my little analysis). The YouTube channel AI in Context correctly identified this opportunity and swiftly released a viral video explaining how the incident connects to alignment. September peak might be attributed to the release of If Anyone Builds It.

Daniel_Friedrich

Bio

Participation4

Posts 11

Comments53

Participation
4

Posts
11

Comments
53