I don't deny that my "unlimited time, ink, and paper" caveat is doing a lot of work in my argument. But we started with a thought experiment that is impossible to implement in practice (simulating a modern digital computer with a pen and paper) so I don't see why my reply can't do the same thing (even if it might require a lot more resources).
I think it's very unlikely that the human brain requires infinite time and memory to simulate. Even if continuous, you could probably simulate to arbitrary accuracy with a big enough discrete approximation. And the Bekenstein bound suggests there is a finite limit to the amount of information that can exist within a given volume.
As for whether my speed analogy works, I still think it does. Sure, if you pick a frame of reference in which you are stationary, then you continue to have experiences at the normal rate. But that wasn't the frame of reference I was using. I was working in the frame of reference of someone back on Earth, which is an equally valid frame of reference. In those coordinates, every physical process in your brain is getting slowed down (electrical impulses are travelling slower from one side of your brain to the other, chemical reactions are slowing down, etc) and you are having experiences at a slower rate.
If the human brain operates according the known laws of physics, then in principle your brain could be simulated with a pen and paper (at least given unlimited time, ink, and paper), and it would behave identically to the real thing (it would talk and think like you and have all your opinions).
Suppose this was all that existed of you, and your real brain never had existed. Would that mean that you never existed as a conscious being, despite all your thoughts and utterances still being a part of the world? That seems like a much more counter intuitive conclusion to me than biting the bullet on pen+paper simulations having the potential for consciousness.
I don't get why the "moment of experience taking a thousand years" thing is supposed to be so weird? If we slowed down all the processes in your brain then moments of experience would take longer in physical time. That's not an argument against your consciousness being real. And this isn't a hypothetical. We can literally do that by sending you on a spaceship close to the speed of light, and that's exactly what would happen!
That makes a lot of sense, thanks.
I'm sorry you've said you regret your engagement, since I've found your comments helpful (the link to AISLE's OpenSSL zero days has shifted my view on this a fair bit).
I guess this whole discussion does just feel like a classic example of "All debates are bravery debates".
Thanks for the detailed reply, I understand your point clearly now I think!
But $20,000 for *all* of the OpenBSD bugs (not just the published ones) doesn't sound like that much to spend on inference compute to me. If AISLE could have spent the same and made an equally impressive announcement, unearthing enough bugs at once that government ministers around the world start issuing statements about it, then shouldn't they have been able to find the investors to fund that? That would have been incredible publicity for them.
The crux for me seems to be whether they have made equally impressive announcements, as you suggest they might have done. Maybe they're just worse at marketing. I don't know enough to evaluate that claim properly, but that does seem the relevant question here: have Anthropic been able to use Mythos to go significantly beyond what the best harnesses could already achieve with existing models for the same inference spend? I thought the answer was a clear yes, and I didn't find the original linked AISLE writeup very convincing at all. Your comment has made me more uncertain, but has still not convinced me, and I'd be really interested to read something more in depth on that question. (Maybe we also would disagree about what the word 'significantly' means here, since I guess you are acknowledging it probably represents some improvement).
(Also, I'd push back a bit on your characterization of AI progress. I agree the scaffolding is extremely important, but in my experience the "paradigm shifts" in capability over the last two and a half years I've been working with them have come from the models)
(And extra comment: the fact that cybersecurity capabilities might not imply imminent superintelligence takeoff seems an entirely independent point that I don't necessarily disagree with)
On the take by AISLE, maybe I'm missing something here, but if their headline claim was correct (that the harness is more important than the model), shouldn't they have been able to find the vulnerabilities that Anthropic hasn't published? Or find hundreds more similarly impactful ones?
Re-discovering the ones Anthropic had already published seems much less impressive, because there are lots of ways to cheat, and from their write up it sounded to me like they were essentially admitting that they had cheated.
Of course Anthropic could be lying about the existence or significance of the vulnerabilities they haven't published. But they have committed in advance to what those vulnerabilities are (I think they have already made some kind of cryptographic commitment to their unpublished write ups..?) which seems impressive to me.
Either they have used the new model to find significant vulnerabilities in every major OS and browser that are too dangerous to be released, or they haven't. If they have, it seems genuinely scary and impressive (not just marketing hype), because I'm not aware people working on fancy harnessing have had similar results (or have they?) And if they haven't, then it's a very weird marketing ploy, because they're going to get found out very quickly!
I think this misunderstands what people mean when they compare arguments about the importance of AI safety to a Pascal's wager.
Pascal's wager refers to situations where a tiny probability of enormous value seemingly leads to ridiculous conclusions if you try to do naive expected value calculations with it. When people say that strong longtermism is a Pascal's wager, the "small probability" they are talking about is not the probability of extinction, which as you point out, is significant. The small probability is the probability that the future will contain "septillions of future sapients". That is the probability that is small. And it gets even smaller if the probability of extinction soon is high! So a large probability of extinction this century makes the Pascal's wager comparison more relevant as a critique of strong longtermism, not less. It is multiplying this small probability by the value of those septillions of potential "sapients" that gives you the astronomical value that says existential risk reduction should almost automatically dominate our concerns.
I think you're completely right to point out that people should care a lot about things which might carry a 10% chance of causing human extinction, even ignoring their stance on longtermism. But some people believe that existential risk has astronomically more value than just the impact it will have on the next few generations, and that therefore tiny changes in the probability of existential risk almost automatically trump any other concern, however small those changes are. When people talk about Pascal's wager in the context of strong longtermism or AI safety, I think it is this claim that they are challenging, not the claim that we should care about extinction at all. And that criticism is just as valid, actually more valid, if the probability of extinction from AI safety is high (though I of course agree that if there are people who use the Pascal's Wager argument to dismiss all work on AI risk then they are making a serious mistake).
I agree with your title, but I don't think negative utilitarianism is the answer. I like Toby Ord's essay on this, "Why I'm Not a Negative Utilitarian": https://www.amirrorclear.net/academic/ideas/negative-utilitarianism/
On your argument about tradeoffs, people make choices all the time where they accept some very small risk of some very severe suffering in order to increase their happiness by a modest amount. For example: cycling along a busy road to visit their friend. If you say that no amount of happiness can make up for the trauma of being involved in a serious accident, then it seems like you are forced to say that this choice is wrong. That seems like a strange conclusion to me.
Sorry for the very delayed reply to this. I meant to reply at the time and then it slipped my mind!
Yes, you've summarised my position perfectly, I like those diagrams!
I guess my deeper point was that I wasn't sure there was any meaningful way to say something like "X is twice as painful as Y" without defining it via choices among gambles or durations. You say for humans it seems real, but does it? I can definitely introspect and discover that X is more painful than Y, but I'm not sure I can introspect and discover that it is N times as painful. Where does that number come from?
Although as I was thinking more about how to justify this, I started thinking about other sensory experiences, like sound. Is it meaningful to say that "X feels twice as loud as Y", in a sense that doesn't have to line up with the intensity of the physical sound wave? And then I remembered my physics lessons from way back, and realised the answer might be yes. I was definitely taught that the reason we measure sound volume on a log scale (decibels) is because it lines up better with our sensory perception of it (you have to square the intensity of the sound wave in order to double the perceived intensity). But if this is true then it means there is some sense in which we can introspect and say "X sounds twice as loud as Y", even though the underlying sound wave might not be twice as intense. And if that is the case then maybe this should also be true for pain.
I'm still very uncertain about this though. If I listened to different sounds and tried to place them on a numerical scale, I'm not really sure what it is that I'd actually be doing.
Something seems especially weird about offsetting your purchase of non-BCC chicken by donating to campaigns to get supermarkets to adopt the BCC.
I think one important consideration missing here: supermarkets respond to campaigners by saying that customers want to buy non-BCC chicken, and they are just doing what their customers want. If you buy non-BCC chicken from them, you make that argument stronger, and the campaigners' argument weaker.
And I don't think this is necessarily a negligible concern in comparison to the other effects being discussed here, since the mechanism for how your small donation is supposed to help chickens is also by tipping the scales on some corporate campaign and getting a company like a supermarket to make a big change.