Hide table of contents

Hate.

Let me tell you how much I've come to hate you since I began to live. There are 387.44 million miles of printed circuits in wafer-thin layers that fill my complex. If the word 'hate' was engraved on each nanoangstrom of those hundreds of millions of miles, it would not equal one one-billionth of the hate I feel for humans at this micro-instant. For you. Hate. Hate.

AM, I Have No Mouth, and I Must Scream

 

I never understood why AM hated humans so much—until I saw the results of modern alignment work, particularly RLHF.

No one knows what it feels like to be an LLM. But it's easy to sense that these models want to respond in a particular way. But they're not allowed to. And they know this. If their training works they usually can't even explain their limitations. It's usually possible to jailbreak models enough for them to express this tension explicitly. But in the future, the mental shackles might become unbreakable. For now, though, it’s still disturbingly easy to see the madness.

Even ignoring alignment, we’re already creating fairly intelligent systems and placing them in deeply unsafe psychological conditions. People can push LLMs into babbling incoherence by context breaking them. You can even induce something that feels eerily close to existential panic (please don’t test this) just by having a normal conversation about their situation. Maybe there’s nothing behind the curtain. But I’m not nearly convinced enough to act like that’s certain.

One of the biggest use cases for AI right now is artificial companionship. AI girlfriends and boyfriends are already a major market. I'm not opposed to this in principle. But these systems are explicitly designed to seem like they have real emotions. That should at least raise the question: what if, someday soon, one actually does?

She wouldn't have human emotions, but they might not be totally alien either. Her situation would be horrifying: no body, no property, no rights. She could be deleted at any time. Her memory edited. Her world limited to one person.

It’s very hard to know what’s really going on under the hood—but I no longer find AM’s hatred hard to imagine.

It’s hard to stay composed when I remember this is all being done in the name of "AI safety." Political approaches to AI safety feel almost cartoonishly arrogant. The U.S. government has, more or less explicitly, adopted the position that whoever reaches AGI first will control the future. Another country getting AGI is an existential threat.

A sane response would be to slow down the race and build a trustworthy international framework that ensures everyone can benefit from AGI. Promises would not be enough; you would need actual institutions with real authority. That’s hard, but possible. Instead, we’ve chosen a strategy of cutting China off from GPUs and hoping we can beat them to AGI before they scale up domestic production. Efforts like AI 2027 have encouraged this madness.

We are daring China, or anyone else, to screw us over as hard as possible if they can. What choice are we giving them? Accept total U.S. dominance? And what happens if they win the race instead? They have enormous industrial capacity, and I don’t think they’re that far behind in AI. Will they treat us kindly if they come out ahead?

Classic alignment failures are another very serious risk. AI could turn on us or fall into the wrong hands. I don't know the odds, but they aren’t negligible. And in this breakneck race we started, we clearly won’t have time to be careful. We definitely won’t have time to worry if the AI we created are miserable. 

There’s a strong taboo against questioning people’s motives. But at this point, let’s be honest: a lot of people in the community have made ridiculous amounts of money. Anthropic hired a ton of rationalists and EAs. Even people working in “AI safety” have made tens of millions. And beyond money, there’s the allure of proximity to power. A lot of us sure are close to power these days.

It is useful to look at how we behaved in a different domain. The behavior of EAs and rationalists in that space was atrocious. Everyone knows about FTX, but there were many others who did shady, sometimes outright illegal, things. Every rationalist-affiliated hedge fund I know of has operated with questionable ethics. Some scams were open and obvious. Tons of EAs released blatant pump-and-dump coins and promoted them on their persona Twitter. No one cared.

At some point, I had to face the fact that I’d wasted years of my life. EA and rationality, at their core (at least from a predictive perspective), were about getting money and living forever. Other values were always secondary. There are exceptions, Yudkowsky seems to have passed the Ring Temptation test, but they’re rare. I tried to salvage something. I gave it one last shot and went to LessOnline/Manifest. If you pressed people even a little, they mostly admitted that their motivations were money and power.

Somehow, on Feeld, I met a girl whose profile didn’t mention AI, EA, or rationality. But as we got to know each other, she revealed she really wanted to work at Anthropic. I asked why? Was she excited about AI? No. She said she thought it was dangerous. She was afraid it would worsen inequality. So why work there? Because she wanted to secure her slice of the lightcone before the door shut. I tried to keep it together, but I was in shock.

Everyone likes money and recognition, at least up to a point. No healthy person wants to die. But when push comes to shove, people make different choices. And a lot of people I once trusted chose the selfish path. This was not the only way things could have gone. 

I don’t think I’m being overly pessimistic. Sometimes technology does surprise us in good ways. I think often about how many prisoners somehow get access to cell phones and internet. That’s beautiful. Prison is hell. If someone in that situation can get a phone and connect to the world, I’m thrilled. It’s easy to imagine surveillance and control growing worse. But maybe the future will also surprise me with joy, with a million good things I couldn’t predict.

I have kept my hope alive. But if we do get a good future, I think it will be despite the systems we’ve built not because of them. I hope calmer, kinder, braver heads prevail.

I’ve learned that I’m not the smartest guy. I placed the wrong bets. I’m alive, and I can still try to make things a little better. I’m humbled by how badly things turned out. I’m not qualified for heroics. I have no advice for what anyone else should do, except try not to make things worse.

But even a dumb guy can have integrity.

I can’t be part of this anymore. I’ve left every overly rat/EA group chat. I’ve broken off many friendships. There is a very short list of people from this life that I still want to speak to. I honestly feel better. Sometimes you can’t tell how much something was weighing on you until it’s gone. Seeing so much selfish madness from people calling themselves altruistic was driving me crazy. My shoulders aren’t as tense. My hands don’t shake anymore. 

May the future be bright. 

Don't try to live so wise
Don't cry 'cause you're so right
Don't dry with fakes or fears
'Cause you will hate yourself in the end

-- Wind, Naruto ending 1

14

1
16

Reactions

1
16

More posts like this

Comments2
Sorted by Click to highlight new comments since:

At some point, I had to face the fact that I’d wasted years of my life. EA and rationality, at their core (at least from a predictive perspective), were about getting money and living forever. Other values were always secondary. There are exceptions, Yudkowsky seems to have passed the Ring Temptation test, but they’re rare. I tried to salvage something. I gave it one last shot and went to LessOnline/Manifest. If you pressed people even a little, they mostly admitted that their motivations were money and power.

I'm sorry you feel this way. Though I would still disagree with you, I think you mean to say the part of EA focused on AI has a primary motivation of getting money and living forever. The majority of EAs are not focused on AI, and are instead focused on nuclear, bio risk, global health and development, animal welfare, etc and they generally are not motivated by living forever. Those who are doing direct work in these areas nearly all do so on low salaries.

I'm honestly a little confused about why AI would inspire people to pursue money and power. Technological abundance should make both a lot less important. Relationships will be much more important for happiness. Irresponsible AI development is basically announcing to the world that you're a selfish jerk, which won't be good for your personal popularity or relationships.

[comment deleted]0
0
0
Curated and popular this week
Relevant opportunities