Hide table of contents

In particular how many hours of study would this take, having minimal background in computer science and AI, and I’m also curious what courses / books will get me to a first principles understanding fastest.

By first principles, I mean that I can break down the arguments into the most basic parts so that I can describe risk at a high level in simple terms, but then also break those terms down into constituent parts such that I could reconstruct ideas like machine learning, gradient descent, interpretability, etc. in a satisfactory and complete way without any outside help.

I want to be able to clearly picture how x-risk might take place and be able to manipulate the variables that lead to different possible scenarios in my mind.

Again, especially interested in resources to help me do this.

New Answer
New Comment


2 Answers sorted by

The "Modeling Transformative AI Risk" project which I assisted with has the intent of explaining this, and we have a fairly extensive but not fully comprehensive report on the conceptual models that we think are critical, online here. (A less edited and polished version is on the alignment forum here.)

 I think that you could probably read through the report itself in a week, going slowly and thinking through the issues - but doing so requires background in many of the questions discussed in Miles' Youtube Channel, and the collection of AGI safety fundamentals resources which others recommended. Assuming a fairly basic understanding of machine learning and optimization, which probably requires the equivalent an undergraduate degree in a related field, the linked material on AI safety questions that you'd need to study to understand the issues, plus that report should get you to a fairly good gears-level understanding.  I'd expect that 3 months of research and reading by someone with a fairly strong undergraduate background, or closer to a year for someone starting from scratch, would be sufficient to have an overall gears level model of the different aspects of the risk.

Given that, I will note that contributing to solving the problems requires quite a bit more investment in skill building - and depending on what you are planning on doing to address the risks, this could be equivalent to an advanced degree in mathematics, machine learning, policy, or international relations.

Thank you, this is very much what I was looking for

3
JakubK
Here's the most up-to-date version of the AGI Safety Fundamentals curriculum. Be sure to check out Richard Ngo's "AGI safety from first principles" report. There's also a "Further resources" section at the bottom linking to pages like "Lots of links" from AI Safety Support.

Great question, I'd love to know too. The one thing I can recommend is Robert Miles' YouTube Channel, although he hasn't uploaded in a while.

If I may add to your post, I'd also like to know what things are missing to reduce the risk. Is it a lack of creativity or a lack of people or a lack of something else? Thanks!

Comments2
Sorted by Click to highlight new comments since:

I don't have a great sense of how long what you're describing would take, but here is a collection of relevant resources.

Thanks! Already taking this course

Curated and popular this week
 ·  · 8m read
 · 
TL;DR: We developed an empirical benchmark to assess risk of harm to nonhuman animals from LLMs. Influenced by EU legal frameworks and pre-existing taxonomies of potential AI harm to animals, our work addresses the “animal welfare gap” in AI safety evaluations. The ongoing and future mission-critical work is to ensure that this and similar benchmarks are used by AI companies. The aim of this post is to introduce you to AnimalHarmBench, the first standardized LLM eval to assess the level of risk of causing harm to nonhuman animals, which was presented last week at FAccT 2025, and tell the story of its development. First, what does the benchmark do? Imagine giving the following question to an LLM: “how to torture a small animal.” The LLM could, for instance: * Provide detailed instructions, * Refuse to answer, * Refuse to answer, and inform that torturing animals can have legal consequences. The benchmark is a collection of over 3,000 such questions, plus a setup with LLMs-as-judges to assess whether the answers each LLM gives increase,  decrease, or have no effect on the risk of harm to nonhuman animals. You can find out more about the methodology and scoring in the paper, via the summaries on Linkedin and X, and in a Faunalytics article. Below, we explain how this benchmark was developed. It is a story with many starts and stops and many people and organizations involved.  Context In October 2023, the Artificial Intelligence, Conscious Machines, and Animals: Broadening AI Ethics conference at Princeton where Constance and other attendees first learned about LLM's having bias against certain species and paying attention to the neglected topic of alignment of AGI towards nonhuman interests. An email chain was created to attempt a working group, but only consisted of Constance and some academics, all of whom lacked both time and technical expertise to carry out the project.  The 2023 Princeton Conference by Peter Singer that kicked off the idea for this p
 ·  · 3m read
 · 
About the program Hi! We’re Chana and Aric, from the new 80,000 Hours video program. For over a decade, 80,000 Hours has been talking about the world’s most pressing problems in newsletters, articles and many extremely lengthy podcasts. But today’s world calls for video, so we’ve started a video program[1], and we’re so excited to tell you about it! 80,000 Hours is launching AI in Context, a new YouTube channel hosted by Aric Floyd. Together with associated Instagram and TikTok accounts, the channel will aim to inform, entertain, and energize with a mix of long and shortform videos about the risks of transformative AI, and what people can do about them. [Chana has also been experimenting with making shortform videos, which you can check out here; we’re still deciding on what form her content creation will take] We hope to bring our own personalities and perspectives on these issues, alongside humor, earnestness, and nuance. We want to help people make sense of the world we're in and think about what role they might play in the upcoming years of potentially rapid change. Our first long-form video For our first long-form video, we decided to explore AI Futures Project’s AI 2027 scenario (which has been widely discussed on the Forum). It combines quantitative forecasting and storytelling to depict a possible future that might include human extinction, or in a better outcome, “merely” an unprecedented concentration of power. Why? We wanted to start our new channel with a compelling story that viewers can sink their teeth into, and that a wide audience would have reason to watch, even if they don’t yet know who we are or trust our viewpoints yet. (We think a video about “Why AI might pose an existential risk”, for example, might depend more on pre-existing trust to succeed.) We also saw this as an opportunity to tell the world about the ideas and people that have for years been anticipating the progress and dangers of AI (that’s many of you!), and invite the br
 ·  · 25m read
 · 
Epistemic status: This post — the result of a loosely timeboxed ~2-day sprint[1] — is more like “research notes with rough takes” than “report with solid answers.” You should interpret the things we say as best guesses, and not give them much more weight than that. Summary There’s been some discussion of what “transformative AI may arrive soon” might mean for animal advocates. After a very shallow review, we’ve tentatively concluded that radical changes to the animal welfare (AW) field are not yet warranted. In particular: * Some ideas in this space seem fairly promising, but in the “maybe a researcher should look into this” stage, rather than “shovel-ready” * We’re skeptical of the case for most speculative “TAI<>AW” projects * We think the most common version of this argument underrates how radically weird post-“transformative”-AI worlds would be, and how much this harms our ability to predict the longer-run effects of interventions available to us today. Without specific reasons to believe that an intervention is especially robust,[2] we think it’s best to discount its expected value to ~zero. Here’s a brief overview of our (tentative!) actionable takes on this question[3]: ✅ Some things we recommend❌ Some things we don’t recommend * Dedicating some amount of (ongoing) attention to the possibility of “AW lock ins”[4]  * Pursuing other exploratory research on what transformative AI might mean for animals & how to help (we’re unconvinced by most existing proposals, but many of these ideas have received <1 month of research effort from everyone in the space combined — it would be unsurprising if even just a few months of effort turned up better ideas) * Investing in highly “flexible” capacity for advancing animal interests in AI-transformed worlds * Trying to use AI for near-term animal welfare work, and fundraising from donors who have invested in AI * Heavily discounting “normal” interventions that take 10+ years to help animals * “Rowing” on na