I had a question. Why do all the AI safety companies seem to do the opposite of AI safety? Anthropic keeps publicly releasing models (which means they can be accessed by billions of people), same for OpenAI, and while these models are unlikely to cause major problems, if you're releasing a product that is going to be used by billions of people you should make sure the product is around 99.9999% failure proof. Anthropic themselves have said "AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities" when referring to Mythos. Now sure, Fable is claimed to be "safe for general use", and maybe it is, but why take the risk? Especially after only around 2-3 months of safety testing? I would want a company that claims to be for AI safety to always err on the side of caution, but this frankly seems quite reckless.
If I had to be more specific I would mean "reducing the probability of all humanity (and only humanity) dying in a few short days/weeks from 50% to 10%" by "significantly reduce existential risk".
Also, I disagree with your methods. X risks aren't especially bad because of all the utility lost (and "negative utility" created), they're bad because after they happen there's never any utility again. Unless apes re-evolve into humans and reestablish all of civilization all over again, but we're getting too hypothetical. What's 100, or even 1000 years of death and suffering compared to 10000 of utopia? If stalling/slowing down technological progress for 1000 years made the P(Doom) go from 50% to 1%, I would definitely take it. Unless of course you think utopia is gonna be some short lived thing, but I seriously doubt that.
That's fair, but I imagine X risks and S risks are very heavily correlated. Especially in regards to "speed of progress", accelerationism will, in my view, obviously increase X risks (safety research takes time, the more time you have, the more time for research you have, the more research is done, therefore reducing risk) but also increase S risks (this is more personal opinion, but I don't think the current leaders of AI innovation have stuff like animal welfare in mind. if we just keep chugging along, the first ASI might not care about animals at all).
Wrote a post about it, but the TL;DR is that extintion is THE worst case scenario. It is the end of all utility and completely irreversible, whereas progress can always be made at a later date.
I wanted to make this poll to see how the community views the speed/x-risk tradeoff. I'm personally 99% x-risk and 1% speed, so I would hard agree. My prediction is most people will agree, maybe a 70/30 split, but I'm curious to see.
I don't know, maybe eventually it could help, but with these "cutting edge" coding models doesn't it seem irresponsible? what if the safeguards don't work? shouldn't you release the model publicly only after you've exhaustively patched every single possible jailbreak? (even then I would argue it's still better to not release it, since billions of people means hundreds of thousands of bad actors, and again, as an AI safety company with "cutting edge" models I wouldn't take any risks)