AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

8
10h
2
I do not believe Anthropic as a company has a coherent and defensible view on policy. It is known that they said words they didn't hold while hiring people (and they claim to have good internal reasons for changing their minds, but people did work for them because of impressions that Anthropic made but decided not to hold). It is known among policy circles that Anthropic's lobbyists are similar to OpenAI's. From Jack Clark, a billionaire co-founder of Anthropic and its chief of policy, today: Dario is talking about countries of geniuses in datacenters in the context of competition with China and a 10-25% chance that everyone will literally die, while Jack Clark is basically saying, "But what if we're wrong about betting on short AI timelines? Security measures and pre-deployment testing will be very annoying, and we might regret them. We'll have slower technological progress!" This is not invalid in isolation, but Anthropic is a company that was built on the idea of not fueling the race. Do you know what would stop the race? Getting policymakers to clearly understand the threat models that many of Anthropic's employees share. It's ridiculous and insane that, instead, Anthropic is arguing against regulation because it might slow down technological progress.
80
2mo
1
I recently created a simple workflow to allow people to write to the Attorneys General of California and Delaware to share thoughts + encourage scrutiny of the upcoming OpenAI nonprofit conversion attempt. Write a letter to the CA and DE Attorneys General I think this might be a high-leverage opportunity for outreach. Both AG offices have already begun investigations, and AGs are elected officials who are primarily tasked with protecting the public interest, so they should care what the public thinks and prioritizes. Unlike e.g. congresspeople, I don't AGs often receive grassroots outreach (I found ~0 examples of this in the past), and an influx of polite and thoughtful letters may have some influence — especially from CA and DE residents, although I think anyone impacted by their decision should feel comfortable contacting them. Personally I don't expect the conversion to be blocked, but I do think the value and nature of the eventual deal might be significantly influenced by the degree of scrutiny on the transaction. Please consider writing a short letter — even a few sentences is fine. Our partner handles the actual delivery, so all you need to do is submit the form. If you want to write one on your own and can't find contact info, feel free to dm me.
64
2mo
5
Notes on some of my AI-related confusions[1] It’s hard for me to get a sense for stuff like “how quickly are we moving towards the kind of AI that I’m really worried about?” I think this stems partly from (1) a conflation of different types of “crazy powerful AI”, and (2) the way that benchmarks and other measures of “AI progress” de-couple from actual progress towards the relevant things. Trying to represent these things graphically helps me orient/think.  First, it seems useful to distinguish the breadth or generality of state-of-the-art AI models and how able they are on some relevant capabilities. Once I separate these out, I can plot roughly where some definitions of "crazy powerful AI" apparently lie on these axes:  (I think there are too many definitions of "AGI" at this point. Many people would make that area much narrower, but possibly in different ways.) Visualizing things this way also makes it easier for me[2] to ask: Where do various threat models kick in? Where do we get “transformative” effects? (Where does “TAI” lie?) Another question that I keep thinking about is something like: “what are key narrow (sets of) capabilities such that the risks from models grow ~linearly as they improve on those capabilities?” Or maybe “What is the narrowest set of capabilities for which we capture basically all the relevant info by turning the axes above into something like ‘average ability on that set’ and ‘coverage of those abilities’, and then plotting how risk changes as we move the frontier?” The most plausible sets of abilities like this might be something like:  * Everything necessary for AI R&D[3] * Long-horizon planning and technical skills? If I try the former, how does risk from different AI systems change?  And we could try drawing some curves that represent our  guesses about how the risk changes as we make progress on a narrow set of AI capabilities on the x-axis. This is very hard; I worry that companies focus on benchmarks in ways that
61
2mo
4
Holden Karnofsky has joined Anthropic (LinkedIn profile). I haven't been able to find more information.
23
18d
10
The U.S. State Department will reportedly use AI tools to trawl social media accounts, in order to detect pro-Hamas sentiment to be used as grounds for visa revocations (per Axios). Regardless of your views on the matter, regardless of whether you trust the same government that at best had a 40% hit rate on ‘woke science’ to do this: They are clearly charging ahead on this stuff. The kind of thoughtful consideration of the risks that we’d like is clearly not happening here. So why would we expect it to happen when it comes to existential risks, or a capability race with a foreign power?
31
1mo
If you can get a better score than our human subjects did on any of METR's RE-Bench evals, send it to me and we will fly you out for an onsite interview Caveats: 1. you're employable (we can sponsor visas from most but not all countries) 2. use same hardware 3. honor system that you didn't take more time than our human subjects (8 hours). If you take more still send it to me and we probably will still be interested in talking (Crossposted from twitter.)
11
9d
1
Random thought: does the idea of explosive takeoff of intelligence assume the alignment is solvable? If the alignment problem isn’t solvable, then an AGI, in creating ASI, would face the same dilemma as humans: The ASI wouldn’t necessarily have the same goals, would disempower the AGI, instrumental convergence, all the usual stuff. I suppose one counter argument is that the AGI rationally shoudn’t create ASI, for these reasons, but, similar to humans, might do so anyway due to competitive/racing dynamics. Whichever AGI doesn’t creates ASI will be left behind, etc.
43
2mo
2
Both Sam and Dario saying that they now believe they know how to build AGI seems like an underrated development to me. To my knowledge, they only started saying this recently. I suspect they are overconfident, but still seems like a more significant indicator than many people seem to be tracking.
Load more (8/163)