Hot Take: Securing AI Labs could actually make things worse
There's a consensus view that stronger security at leading AI labs would be a good thing. It's not at all clear to me that this is the case.
Consider the extremes:
In a maximally insecure world, where anyone can easily steal any model that gets trained, there's no profit or strategic/military advantage to be gained from doing the training, so nobody's incentivised to invest much to do it. We'd only get AGI if some sufficiently-well-resourced group believed it would be good for everyone to have an AGI, and were willing to fund its development as philanthropy.
In a maximally secure world, where stealing trained models is impossible, whichever company/country got to AGI first could essentially dominate everyone else. In this world there's huge incentive to invest and to race.
Of course, our world lies somewhere between these two. State actors almost certainly could steal models from any of the big 3; potentially organised cybercriminals/rival companies too, but most private individuals could not. Still, it seems that marginal steps towards a higher security world make investment and racing more appealing as the number of actors able to steal the products of your investment and compete with you for profits/power falls.
But I notice I am confused. The above reasoning predicts that nobody should be willing to make significant investments in developing AGI with current levels of cyber-security, since if they succeeded their AGI would immediately be stolen by multiple gouvenments (and possibly rival companies/cybercriminals), which would probably nullify any return on the investment. What I observe is OpenAI raising $40 billion in their last funding round, with the explicit goal of building AGI.
So now I have a question: given current levels of cybersecurity, why are investors willing to pour so much cash into building AGI?
...maybe it's the same reason various actors are willing to invest into building open
I recently created a simple workflow to allow people to write to the Attorneys General of California and Delaware to share thoughts + encourage scrutiny of the upcoming OpenAI nonprofit conversion attempt.
Write a letter to the CA and DE Attorneys General
I think this might be a high-leverage opportunity for outreach. Both AG offices have already begun investigations, and AGs are elected officials who are primarily tasked with protecting the public interest, so they should care what the public thinks and prioritizes. Unlike e.g. congresspeople, I don't AGs often receive grassroots outreach (I found ~0 examples of this in the past), and an influx of polite and thoughtful letters may have some influence — especially from CA and DE residents, although I think anyone impacted by their decision should feel comfortable contacting them.
Personally I don't expect the conversion to be blocked, but I do think the value and nature of the eventual deal might be significantly influenced by the degree of scrutiny on the transaction.
Please consider writing a short letter — even a few sentences is fine. Our partner handles the actual delivery, so all you need to do is submit the form. If you want to write one on your own and can't find contact info, feel free to dm me.
So, I have two possible projects for AI alignment work that I'm debating between focusing on. Am curious for input into how worthwhile they'd be to pursue or follow up on.
The first is a mechanistic interpretability project. I have previously explored things like truth probes by reproducing the Marks and Tegmark paper and extending it to test whether a cosine similarity based linear classifier works as well. It does, but not any better or worse than the difference of means method from that paper. Unlike difference of means, however, it can be extended to multi-class situations (though logistic regression can be as well). I was thinking of extending the idea to try to create an activation vector based "mind reader" that calculates the cosine similarity with various words embedded in the model's activation space. This would, if it works, allow you to get a bag of words that the model is "thinking" about at any given time.
The second project is a less common game theoretic approach. Earlier, I created a variant of the Iterated Prisoner's Dilemma as a simulation that includes death, asymmetric power, and aggressor reputation. I found, interestingly, that cooperative "nice" strategies banding together against aggressive "nasty" strategies produced an equilibrium where the cooperative strategies win out in the long run, generally outnumbering the aggressive ones considerably by the end. Although this simulation probably requires more analysis and testing in more complex environments, it seems to point to the idea that being consistently nice to weaker nice agents acts as a signal to more powerful nice agents and allows coordination that increases the chance of survival of all the nice agents, whereas being nasty leads to a winner-takes-all highlander situation, which from an alignment perspective could be a kind of infoblessing that an AGI or ASI could be persuaded to spare humanity for these game theoretic reasons.
I'm not sure how to word this properly, and I'm uncertain about the best approach to this issue, but I feel it's important to get this take out there.
Yesterday, Mechanize was announced, a startup focused on developing virtual work environments, benchmarks, and training data to fully automate the economy. The founders include Matthew Barnett, Tamay Besiroglu, and Ege Erdil, who are leaving (or have left) Epoch AI to start this company.
I'm very concerned we might be witnessing another situation like Anthropic, where people with EA connections start a company that ultimately increases AI capabilities rather than safeguarding humanity's future. But this time, we have a real opportunity for impact before it's too late. I believe this project could potentially accelerate capabilities, increasing the odds of an existential catastrophe.
I've already reached out to the founders on X, but perhaps there are people more qualified than me who could speak with them about these concerns. In my tweets to them, I expressed worry about how this project could speed up AI development timelines, asked for a detailed write-up explaining why they believe this approach is net positive and low risk, and suggested an open debate on the EA Forum. While their vision of abundance sounds appealing, rushing toward it might increase the chance we never reach it due to misaligned systems.
I personally don't have a lot of energy or capacity to work on this right now, nor do I think I have the required expertise, so I hope that others will pick up the slack. It's important we approach this constructively and avoid attacking the three founders personally. The goal should be productive dialogue, not confrontation.
Does anyone have thoughts on how to productively engage with the Mechanize team? Or am I overreacting to what might actually be a beneficial project?
The current US administration is attempting an authoritarian takeover. This takes years and might not be successful. My manifold question puts an attempt to seize power if they lose legitimate elections at 30% (n=37). I put it much higher.[1]
Not only is this concerning by itself, this also incentivizes them to achieve a strategic decisive advantage via superintelligence over pro-democracy factions. As a consequence, they may be willing to rush and cut corners on safety.
Crucially, this relies on them believing superintelligence can be achieved before a transfer of power.
I don't know how much the belief in superintelligence has spread into the administration. I don't think Trump is 'AGI-pilled' yet, but maybe JD Vance is? He made an accelerationist speech. Making them more AGI-pilled and advocating for nationalization (like Ashenbrenner did last year) could be very dangerous.
1. ^
So far, my pessimism about US Democracy has put me in #2 on the Manifold topic, with a big lead over other traders. I'm not a Superforecaster though.
Mildly against the Longtermism --> GCR shift
Epistemic status: Pretty uncertain, somewhat rambly
TL;DR replacing longtermism with GCRs might get more resources to longtermist causes, but at the expense of non-GCR longtermist interventions and broader community epistemics
Over the last ~6 months I've noticed a general shift amongst EA orgs to focus less on reducing risks from AI, Bio, nukes, etc based on the logic of longtermism, and more based on Global Catastrophic Risks (GCRs) directly. Some data points on this:
* Open Phil renaming it's EA Community Growth (Longtermism) Team to GCR Capacity Building
* This post from Claire Zabel (OP)
* Giving What We Can's new Cause Area Fund being named "Risk and Resilience," with the goal of "Reducing Global Catastrophic Risks"
* Longview-GWWC's Longtermism Fund being renamed the "Emerging Challenges Fund"
* Anecdotal data from conversations with people working on GCRs / X-risk / Longtermist causes
My guess is these changes are (almost entirely) driven by PR concerns about longtermism. I would also guess these changes increase the number of people donation / working on GCRs, which is (by longtermist lights) a positive thing. After all, no-one wants a GCR, even if only thinking about people alive today.
Yet, I can't help but feel something is off about this framing. Some concerns (no particular ordering):
1. From a longtermist (~totalist classical utilitarian) perspective, there's a huge difference between ~99% and 100% of the population dying, if humanity recovers in the former case, but not the latter. Just looking at GCRs on their own mostly misses this nuance.
* (see Parfit Reasons and Persons for the full thought experiment)
2. From a longtermist (~totalist classical utilitarian) perspective, preventing a GCR doesn't differentiate between "humanity prevents GCRs and realises 1% of it's potential" and "humanity prevents GCRs realises 99% of its potential"
* Preventing an extinction-level GCR might move u
Both Sam and Dario saying that they now believe they know how to build AGI seems like an underrated development to me. To my knowledge, they only started saying this recently. I suspect they are overconfident, but still seems like a more significant indicator than many people seem to be tracking.
AI Safety Needs To Get Serious About Chinese Political Culture
I worry that Leopold Aschenbrenner's "China will use AI to install a global dystopia" take is based on crudely analogising the CCP to the USSR, or perhaps even to American cultural imperialism / expansionism, and isn't based on an even superficially informed analysis of either how China is currently actually thinking about AI, or what China's long term political goals or values are.
I'm no more of an expert myself, but my impression is that China is much more interested in its own national security interests and its own ideological notions of the ethnic Chinese people and Chinese territory, so that beyond e.g. Taiwan there isn't an interest in global domination except to the extent that it prevents them being threatened by other expansionist powers.
This or a number of other heuristics / judgements / perspectives could change substantially how we think about whether China would race for AGI, and/or be receptive to an argument that AGI development is dangerous and should be suppressed. China clearly has a lot to gain from harnessing AGI, but they have a lot to lose too, just like the West.
Currently, this is a pretty superficial impression of mine, so I don't think it would be fair to write an article yet. I need to do my homework first:
* I need to actually read Leopold's own writing about this, instead of making impressions based on summaries of it,
* I've been recommended to look into what CSET and Brian Tse have written about China,
* Perhaps there are other things I should hear about this, feel free to make recommendations.
Alternatively, as always, I'd be really happy for someone who's already done the homework to write about this, particularly anyone specifically with expertise in Chinese political culture or international relations. Even if I write the article, all it'll really be able to be is an appeal to listen to experts in the field, or for one or more of those experts to step forwar