AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

80
2mo
1
I recently created a simple workflow to allow people to write to the Attorneys General of California and Delaware to share thoughts + encourage scrutiny of the upcoming OpenAI nonprofit conversion attempt. Write a letter to the CA and DE Attorneys General I think this might be a high-leverage opportunity for outreach. Both AG offices have already begun investigations, and AGs are elected officials who are primarily tasked with protecting the public interest, so they should care what the public thinks and prioritizes. Unlike e.g. congresspeople, I don't AGs often receive grassroots outreach (I found ~0 examples of this in the past), and an influx of polite and thoughtful letters may have some influence — especially from CA and DE residents, although I think anyone impacted by their decision should feel comfortable contacting them. Personally I don't expect the conversion to be blocked, but I do think the value and nature of the eventual deal might be significantly influenced by the degree of scrutiny on the transaction. Please consider writing a short letter — even a few sentences is fine. Our partner handles the actual delivery, so all you need to do is submit the form. If you want to write one on your own and can't find contact info, feel free to dm me.
64
2mo
5
Notes on some of my AI-related confusions[1] It’s hard for me to get a sense for stuff like “how quickly are we moving towards the kind of AI that I’m really worried about?” I think this stems partly from (1) a conflation of different types of “crazy powerful AI”, and (2) the way that benchmarks and other measures of “AI progress” de-couple from actual progress towards the relevant things. Trying to represent these things graphically helps me orient/think.  First, it seems useful to distinguish the breadth or generality of state-of-the-art AI models and how able they are on some relevant capabilities. Once I separate these out, I can plot roughly where some definitions of "crazy powerful AI" apparently lie on these axes:  (I think there are too many definitions of "AGI" at this point. Many people would make that area much narrower, but possibly in different ways.) Visualizing things this way also makes it easier for me[2] to ask: Where do various threat models kick in? Where do we get “transformative” effects? (Where does “TAI” lie?) Another question that I keep thinking about is something like: “what are key narrow (sets of) capabilities such that the risks from models grow ~linearly as they improve on those capabilities?” Or maybe “What is the narrowest set of capabilities for which we capture basically all the relevant info by turning the axes above into something like ‘average ability on that set’ and ‘coverage of those abilities’, and then plotting how risk changes as we move the frontier?” The most plausible sets of abilities like this might be something like:  * Everything necessary for AI R&D[3] * Long-horizon planning and technical skills? If I try the former, how does risk from different AI systems change?  And we could try drawing some curves that represent our  guesses about how the risk changes as we make progress on a narrow set of AI capabilities on the x-axis. This is very hard; I worry that companies focus on benchmarks in ways that
61
2mo
4
Holden Karnofsky has joined Anthropic (LinkedIn profile). I haven't been able to find more information.
23
17d
10
The U.S. State Department will reportedly use AI tools to trawl social media accounts, in order to detect pro-Hamas sentiment to be used as grounds for visa revocations (per Axios). Regardless of your views on the matter, regardless of whether you trust the same government that at best had a 40% hit rate on ‘woke science’ to do this: They are clearly charging ahead on this stuff. The kind of thoughtful consideration of the risks that we’d like is clearly not happening here. So why would we expect it to happen when it comes to existential risks, or a capability race with a foreign power?
11
7d
1
Random thought: does the idea of explosive takeoff of intelligence assume the alignment is solvable? If the alignment problem isn’t solvable, then an AGI, in creating ASI, would face the same dilemma as humans: The ASI wouldn’t necessarily have the same goals, would disempower the AGI, instrumental convergence, all the usual stuff. I suppose one counter argument is that the AGI rationally shoudn’t create ASI, for these reasons, but, similar to humans, might do so anyway due to competitive/racing dynamics. Whichever AGI doesn’t creates ASI will be left behind, etc.
31
1mo
If you can get a better score than our human subjects did on any of METR's RE-Bench evals, send it to me and we will fly you out for an onsite interview Caveats: 1. you're employable (we can sponsor visas from most but not all countries) 2. use same hardware 3. honor system that you didn't take more time than our human subjects (8 hours). If you take more still send it to me and we probably will still be interested in talking (Crossposted from twitter.)
2
1d
The recent pivot by 80 000 hours to focus on AI seems (potentially) justified, but the lack of transparency and input makes me feel wary. https://forum.effectivealtruism.org/posts/4ZE3pfwDKqRRNRggL/80-000-hours-is-shifting-its-strategic-approach-to-focus   TLDR; 80 000 hours, a once cause-agnostic broad scope introductory resource (with career guides, career coaching, online blogs, podcasts) has decided to focus on upskilling and producing content focused on AGI risk, AI alignment and an AI-transformed world. ---------------------------------------- According to their post, they will still host the backlog of content on non AGI causes, but may not promote or feature it. They also say a rough 80% of new podcasts and content will be AGI focused, and other cause areas such as Nuclear Risk and Biosecurity may have to be scoped by other organisations. Whilst I cannot claim to have in depth knowledge of robust norms in such shifts, or in AI specifically, I would set aside the actual claims for the shift, and instead focus on the potential friction in how the change was communicated. To my knowledge, (please correct me), no public information or consultation was made beforehand, and I had no prewarning of this change. Organisations such as 80 000 hours may not owe this amount of openness, but since it is a value heavily emphasises in EA, it seems slightly alienating. Furthermore, the actual change may not be so dramatic, but it has left me grappling with the thought that other mass organisations could just as quickly pivot. This isn't necessarily inherently bad, and has advantageous signalling of being 'with the times' and 'putting our money where our mouth is' in terms of cause area risks. However, in an evidence based framework, surely at least some heads up would go a long way in reducing short-term confusion or gaps.   Many introductory programs and fellowships utilise 80k resources, and sometimes as embeds rather than as standalone resources. Despite claimi
43
2mo
2
Both Sam and Dario saying that they now believe they know how to build AGI seems like an underrated development to me. To my knowledge, they only started saying this recently. I suspect they are overconfident, but still seems like a more significant indicator than many people seem to be tracking.
Load more (8/162)