Tom Shlomi

105 karmaJoined


There's an important distinction here between prediction the next token in a piece of text and predicting the next action in a causal chain. If you have a computation that is represented by a causal graph, and you train a predictor to predict nodes conditional on previous nodes, then it's true that the predictor won't end up being able to do better than the original computational process. But text is not ordered that way! Texts often describe outcomes before describing the details of the events which generated them. If you train on texts like those, you get something more powerful than an imitator. If you train a good enough next-token predictor on chess games where the winner is mentioned before the list of moves, you can get superhuman play by prepending "This is a game which white/black wins:". If you train a good enough next-token predictor on texts that have the outputs of circuits listed before the inputs, you get an NP-oracle. You're almost certainly not going to get an NP-oracle from GPT-9, but that's because of the limitations of the training processes and architectures of that this universe can support, it's not a limitation of the loss function.

I'd find this post much more valuable if it argued that some parts of the EA community were bad, rather than arguing that they're cultish. Cultish is an imperfect proxy for badness. Sure, cults are bad and something being a thing which cults do is weak evidence of its badness (see Reversed Stupidity Is Not Intelligence). Is, say, advertising EA too aggressively bad? Probably! But it is bad for specific reasons, not because it is also a thing cults do.

A particular way cultishness could be bad, which would make it directly bad for EA to be cultish, is if cults are an attractor in the space of organizations. This would mean that organizations with some properties of cults would feel pressure to gain more and more properties of cults. Still, I currently don't think is the case, and so I think direct criticisms are much more valuable than insinuations of cultishness.

I do think 'catastrophic suffering risk' is an odd one, because it's really not intuitive that a 'catastrophic suffering risk' is less bad than a 'suffering risk'. I guess I just find it weird that something as bad as a genuine s-risk has such a pedestrian name, compared to 'existential risk', which I think is an intuitive and evocative name that gets across the level of bad-ness pretty well. 

I think what happens in my head is that 's-risk' denotes a similarity to x-risks while 'catastrophic suffering risk' denotes a similarity to catastrophic risks, making the former feel more severe than the latter, but I agree this is odd.

One quick question - when you say an s-risk creates a future with negative value, does that make it worse than an x-risk? As in, the imagined future is SO awful that the extinction of humanity would be preferable?

Yep, for me that feels like a natural place to put the bar for an s-risk.

Really great post! As a person who subscribes to hedonistic utilitarianism (modulo a decent amount of moral uncertainty), this is the most compelling criticism I've come across.

I do want to assuage a few of your worries, though. Firstly, as Richard brought up, I respect normative uncertainty enough to at least think for a long time before turning the universe into any sort of hedonium. Furthermore, I see myself as being on a joint mission with all other longtermists to bring about a glorious future, and I wouldn't defect against them by bringing about a future they would reject, even if I was completely certain in my morality.

Also, my current best-guess vision of the maximum-hedonium future doesn't look like what you've described. I agree that it will probably not look like a bunch of simulated happy people living in a way anything like people on Earth. But hedonistic utilitarianism (as I interpret it) doesn't say that "pleasure", in the way the world is commonly used, is what matters, but rather that mental states are what matter. The highest utility states are probably not base pleasures, but rich experiences. I expect that the hedonistic utility a mind can experience scales superlinearly in the computational resources used by that mind. As such, the utopia I imagine is not a bunch of isolated minds stuck repeating some base pleasure, but interconnected minds growing and becoming capable of immensely rich experiences. Nick Bostrom's Letter from Utopia is the best description of this vision that I'm aware of.

Possibly this still sounds terrible to you, and that's entirely fair. Hedonistic utilitarianism does in fact entail many weird conclusions in the limit. 

It's a perfectly good question! I've done research focused on reducing s-risks, and I still don't have a perfectly clear definition for them. 

I generally use the term for suffering that occurs on an astronomical scale and is enough to make the value of the future negative. So for the alien factory farming, I'd probably call it an s-risk once the suffering of the aliens outweighs the positive value from other future beings. If it was significant, but didn't rise to that level, I'd call it something like 'catastrophic suffering risk'. 'Astronomical waste' is also a term that works, though I usually use that for positive things we fail to do, rather than negative things we do. 

Overall, I wouldn't worry too much. There isn't standard terminology for 'undefined amount of suffering that deserves consideration', and you should be fine using whatever terms seem best to you as long as you're clear what you mean by them. The demarcation between existential and merely catastrophic risks is important, because there is a sharp discontinuity once a risk becomes so severe that we can never recover from it. There isn't anything like that with s-risks; a risk that falls just under the bar for being an s-risk should be treated the same as a risk that just passes it.

I hope that answered your question! I'd be happy to clarify if any of that was unclear, or if you have further questions.

"no other literal X-risk" seems too strong. There are certainly some potential ways that nuclear war or a bioweapon could cause human extinction. They're not just catastrophic risks.

In addition, catastrophic risks don't just involve massive immediate suffering. They drastically change global circumstances in a way which will have knock-on effects on whether, when, and how we build AGI.

All that said, I directionally agree with you, and I think that probably all longtermists should have a model of the effects their work has on the potentiality of aligned AGI, and that they should seriously consider switching to working more directly on AI, even if their competencies appear to lie elsewhere. I just think that your post takes this point too far.

I think this is a bit too strong of a claim. It is true that that overwhelming majority of value in the future is determined by whether, when, and how we build AGI. I think it is also true that a longtermist trying to maximize impact should, in some sense, be doing something which affects whether, when, or how we build AGI. 

However, I think your post is too dismissive of working on other existential risks. Reducing the chance that we all die before building AGI increases the chance that we build AGI. While there probably won't be a nuclear war before AGI, it is quite possible that a person very well-suited to working on reducing nuclear issues could reduce x-risk more by working to reduce nuclear x-risk than they could by working more directly on AI.

Load more