Some thoughts on this comment:
On this part:
I responded well to Richard's call for More Co-operative AI Safety Strategies, and I like the call toward more sociopolitical thinking, since the Alignment problem really is a sociological one at heart (always has been). Things which help the community think along these lines are good imo, and I hope to share some of my own writing on this topic in the future.
I don't think it was always a large sociological problem, but yeah I've updated more towards the sociological aspect of alignment being important (especially as the technical problem has become easier than circa 2008-2016 views had).
Whether or not I agree with Richard's personal politics or not is kinda beside the point to this as a message. Richard's allowed to have his own views on things and other people are allowed to criticse this (I think David Mathers' comment is directionally where I lean too). I will say that not appreciating arguments from open-source advocates, who are very concerned about the concentration of power from powerful AI, has lead to a completely unnecessary polarisation against the AI Safety community from it. I think, while some tensions do exist, it wasn't inevitable that it'd get as bad as it is now, and in the end it was a particularly self-defeating one. Again, by doing the kind of thinking Richard is advocating for (you don't have to co-sign with his solutions, he's even calling for criticism in the post!), we can hopefully avoid these failures in the future.
I do genuinely believe that concentration of power is a huge risk factor, and in particular I'm deeply worried about the incentives of a capitalist post-AGI company where a few hold basically all of the rent/money, and given both stronger incentives to expropriate property from people, similar to how humans expropriate property from animals routinely, combined with weak to non-existent forces against expropriation of property.
That said, I think the piece on open-source AI being a defense against concentration of power and more generally a good thing akin to the enlightment unfortunately has some quite bad analogies, when giving everyone AI, depending on how powerful it is basically at the high end is enough to create entire very large economies on their own, and at the lower end help immensely/automate the process of biological weapons to common citizens is nothing like education/voting, and more importantly the impacts fundamentally require coordination to get large things done, which super-powerful AIs can remove.
More generally, I think one of the largest cruxes with reasonable open-source people and EAs in general is how much they think AIs can make biology capable for the masses, and how offense dominant is the tech, and here I defer to biorisk experts, including EAs that generally think that biorisk is a wildly offense advantaged domain that is very dangerous to democratize, compared to open source people for at least several years.
On Sam Altman's firing:
On the bounties, the one that really interests me is the OpenAI board one. I feel like I've been living in a bizarro-world with EAs/AI Safety People ever since it happened because it seemed such a collosal failure, either of legitimacy or strategy (most likely both), and it's a key example of the "un-cooperative strategy" that Richard is concerned about imo. The combination of extreme action and ~0 justification either externally or internally remains completely bemusing to me and was big wake-up call for my own perception of 'AI Safety' as a brand. I don't think people can underestimate the second-impact effect this bad on both 'AI Safety' and EA, coming about a year after FTX.
I'll be on the blunt end and say it, in that I think was mildly good or at worst neutral to use the uncooperative strategy to fire Sam Altman, because Sam Altman was going to gain all control by default and probably have better PR if the firing didn't happen, and more importantly he was aiming to disempower the safety people basically totally, which leads to at least a mild increase in existential risk, and they realized they would have been manipulated out of it if they waited, so they had to go for broke.
The main EA mistake was in acting too early, before things got notably weird.
That doesn't mean society will react or that it's likely to react, but I basically agree with Veaulans here:
I agree that conditional on escaping/rogue internal deployments like this scenario by Buck, with a lot of contributors, it leads to much larger disasters, and if the AI is unaligned, then unless we have an aligned AI that has somewhat similar capabilities, we lose.
My point is more so that you are way overestimating how many chances the AI has to overthrow us before it is aligned.
https://www.lesswrong.com/posts/ceBpLHJDdCt3xfEok/ai-catastrophes-and-rogue-deployments
But the crux might be that I don't think that we need that much reliability for AI catching, because as long as the AI is caught with non-negligible probability, the AI has to be very cautious, because it is way worse for the AI to be caught than to be successful or the game just ending.
In essence, we probably only need 1, or at most a few times to completely change the game in ways that massively favor alignment success.
You do not need to reliably catch someone if you can make the costs of catching the AI out very painful, and I do suspect we will probably be able to do this (perhaps using trusted AI labor, to use Ryan Greenblatt and Buck's terms).
While finm made a general comment in response to you, I want to specifically focus on the footnote, because I think it's a central crux in why a lot of EAs are way less doomy than you.
Quote below:
We need at least 13 9s of safety for ASI, and the best current alignment techniques aren't even getting 3 9s...
I think the 13 9s can be reduced to something requiring closer to 1-2 9s at the very least, and there are 2 reasons for this:
https://x.com/gcolbourn/status/1762224406955216897
2. If we manage to catch an AI doing bad stuff, then it's much, much harder for the AI to escape, because there are a lot of techniques that can be applied to make the issue go away like proliferating the escape example.
More from Ryan Greenblatt here:
https://www.lesswrong.com/posts/i2nmBfCXnadeGmhzW/catching-ais-red-handed
I definitely think alignment needs to be reliable, but I do not think it needs to be so reliable that we cannot achieve it, or that doom is very likely and we can't change the probabilities.
I'd certainly say it's quite scary, but I do think there's a reasonable hope of surviving and going on to thrive such that I think alignment invest is worth the money.
I basically agree with this, with one particular caveat, in that the EA and LW communities might eventually need to fight/block open source efforts due to issues like bioweapons, and it's very plausible that the open-source community refuses to stop open-sourcing models even if there is clear evidence that they can immensely help/automate biorisk, so while I think the fight was done too early, I think the fighty/uncooperative parts of making AI safe might eventually matter more than is recognized today.
To respond to a local point here:
- Also, I am suspicious of framing "opposition to geoengineering" as bad -- this, to me, is a red flag that someone has not done their homework on uncertainties in the responses of the climate system to large-scale interventions like albedo modification. Geoengineering the planet wrong is absolutely an X-risk.
While I can definitely buy that geoengineering is a net-negative, I'm not sure how geoengineering gone wrong can actually result in X-risk, at least to me so far, and I don't currently understand the issues that well.
It doesn't speak well that he frames opposition to geoengineering as automatically bad (even if I assume the current arguments against geoengineering are quite bad).
This is roughly my take, with the caveat that I'd replace CEV by instruction following, and I wouldn't be so sure that alignment is easy (though I do think we can replace it with the assumption that it is highly incentivized to solve the AI alignment problem and that the problem is actually solvable).
Crossposting this comment from LW, because I think there is some value here:
The main points are that value alignment will be way more necessary for ordinary people to survive, no matter the institiutions adopted, that the world hasn't yet weighed in that much on AI safety and plausibly never will, but we do need to prepare for a future in which AI safety may become mainstream, that Bayesianism is fine actually, and many more points in the full comment.
The big reason I lean towards disagreeing nowadays is coming to the belief that I expect the AI control/alignment problem to be much less neglected and important to solve, and more generally I've come to doubt the assumption that worlds in which we survive are worlds in which we achieve very large value (under my own value set), such that reducing existential risk is automatically good.
Late comment, I basically agree with the point being made here that we should avoid committing a fallacy of assuming work done is constant/lump of labor fallacies, but I don't think this weakens the argument that human work will be replaced by AI work totally, for 2 reasons:
In a world where you can copy AI labor hugely readily, wages fall for the same reason why prices fall when more goods are supplied, and in particular humans have a biological minimum wage of 20-100 watts that fundamentally makes them unemployable once AIs can be run for cheaper than this, and human wages are likely to fall below subsistence if AIs are copied hugely.
While more work will happen from growing the economy, it is still better to invest in AIs to do the work than it is to invest in humans, and thus even while labor grows, human labor specifically can fall to essentially 0, so the automation hypothesis is at least a consistent hypothesis to hold economically.
The prediction of many moral perspectives caring more about averting downsides than producing upsides is well explained if we live in a moral relativist multiverse, where there are an infinity of correct moral systems, and which one you come to is path dependent and starting point dependent, but there exist instrumental goals from many moral perspectives that has a step that wants to avoid extinction/disempowerment, because it means that morality loses out in the competition/battle for survival/dominance.
cf @quinn's positive vs negative longtermism framework:
https://forum.effectivealtruism.org/posts/r5GbSZ7dcb6nbuWch/quinn-s-shortform?commentId=pvXtqvGfjATkJq7N2