I recently completed the AGI Safety Fundamentals course (on AI alignment) and really enjoyed it.

I imagine lots of people on the forum have taken the course and now would be a good time to share reflections. I’m hoping this thread might be particularly useful for people considering taking it in future. 

I’d be interested to hear many different kinds of reflections, but here are some prompts:

  1. What do you think you’ve learnt? How has it impacted your plans, if it has?
  2. Did you think it was worth the time?
  3. Who would you recommend it to?
  4. Do you wish you had done something differently, or known something in advance? (Any other advice?)

I’ll leave my own reflections in the comments.

27

0
0

Reactions

0
0
Comments7


Sorted by Click to highlight new comments since:
  • As a result of AGI SF readings and other sporadic AI safety readings… 
    • … I feel more confident asking questions of people who know more than I do
      • I feel like I know the vocabulary, main threat scenarios, and rough approaches to solving the problem, such that I can situate new facts into existing taxonomies 
    • … I’m better able to tell when prominent people disagree / things have more texture
  • Some (self-)critiques 
    • Honestly thought content for some of the weeks was a bit weak if you just wanted an overview of the alignment problem (e.g., adversarial techniques for scalable oversight probably isn’t what you need to understand if you’re trying to assess the risk) 
    • I wish I’d set up strong accountability
    • Wish it had more counter-arguments to classic AI risk. 
      • Apparently Stanford AI group modified curriculum to have more of these? 
    • Wish I’d been more active in discussion or questions beforehand: give yourself a chance to be wrong!
  • Tips I’d recommend for learning more about AI risk
    • Just start! Always feels daunting to dive into but just find a few explainer articles and dive in
    • Spaced repetition for learning really does go hard
    • Talk to knowledgeable people who give you space to be wrong
    • Write up your thoughts and have knowledgeable people poke holes/ show you where you’re missing something 
  • A few resources that I keep coming back to (not from AGI SF): 
DE
9
1
0

Thank you for making this thread Clifford, and we're really grateful for all feedback! We're working hard as a team to improve the course and the infrastructure we have for hosting other courses, and everyone's feedback has been incredibly valuable on our journey thus far :) 

I feel like I have a much better sense of what the current approaches to alignment are, what people are working on and how underdeveloped the field is. In general, it’s been a while since I’ve spent time studying anything so it felt fun just to dedicate time to learning. It also felt empowering to take a field that I’ve heard a lot about at a high level and make it clearer in my mind.

I think doing the Week 0 readings are an easy win for anyone who wants to demystify some of what is going on in ML systems, which I think should be interesting to anyone, even if you’re not interested in alignment.

I became much more motivated to work on making AI go well over the period of the course, I think mainly because it made the problem more concrete but likely just spending more time thinking about it. That said, it’s hard to disentangle this increased motivation from recent events and other factors.

For anyone who is considering the course: TYPE III AUDIO is making audio narrations for the Alignment and Governance courses. The series is due to launch later this month, but some 50+ episodes are already available.

I'm quite glad I took the course! 

Quick takes: 

  1. The main types of value I was getting from the course were: 
    1. Accountability for doing the readings
    2. The chance to use wrong terminology / say things that don't make sense (either when I'm trying to explain something, or when I'm asking a question), and then get corrected (This helped me to develop a more coherent model of what's going on and catch unknown unknowns (at least by transforming them into known unknowns).)
    3. Other resources: links to other readings and explanations
    4. Corrections and clarifications during the sessions
  2. Personal lessons for next time: 
    1. Set aside time to do the readings >24 hours in advance, and send in questions early
      1. (Agree with the facilitator about doing this!)
    2. Use Claude[1] as a personal tutor from the start
      1. This was great. The thing I'd do was, if I was reading something that I was having a hard time following, I'd give Claude some context, then say something like, "I'll now explain this in terms that I understand, or with an analogy or visualization that makes sense to me. Please correct me where I'm misusing terms or saying something wrong." Claude would generally be over-positive and would miss some things, but I'd often get a more technical restatement of what I was trying to say, and this helped me a lot. This was relatively introductory material that wasn't specific to AI safety, so I think Claude was actually performing pretty well. 
    3. Set up a notes and questions doc for myself from the beginning
      1. I now have a messy doc with notes from the different weeks, further readings that we were recommended, and short summaries — I find it quite useful, but the first few weeks are missing because I just hadn't set it up, and was just scribbling down some questions as I read. Having a doc in advance would have been great and would have prompted me to use it.
    4. Actually read the little intro notes for each week on the AGISF website
      1. I think they would have been helpful context for the readings, but I was often skipping them (especially at the beginning), as I didn't really see them as part of the reading. 
    5. I spent a fair amount of time on this and endorse doing that.
      1.  I think the more work I was putting in, the more value I was getting out of the course (there weren't diminishing returns). I was spending probably around 2-5 hours of time before the session each week (there might have been a week when I spent less than 2 hours, and the median time was probably closer to something like 2-3) — in higher-effort weeks, I would explore related writing or referenced texts, and try to get to the point where I could notice the potential weaknesses and confusions I had about the assigned readings. I should flag that I wouldn't always do all the readings very carefully. 
  3. Having some amount of context before starting the course could be useful. If you're not familiar with the overall shape of the argument for existential risk from AI, I imagine that the course structure might be a bit jarring; I moved cohorts a bit in the beginning, but I did feel a bit like it began somewhat abruptly.
  4. Minor points/asides: 
    1. It was just pretty fun.
    2. I have notes on readings I found more and less helpful, and will try to pass those on at some point. 
    3. Facilitators probably matter a lot. 
    4. I enjoyed the readings for some weeks a lot more than I enjoyed them for other weeks. 
    5. I was worried that approaches seemed outdated, in some cases.
    6. I still have pretty broad confusions that I hope to resolve. 
  1. ^

    I was using Claude for dumb reasons — I don't have a strong sense for how it compares to GPT-4 on this. 

I definitely feel it was worth the time for me personally. It was great for learning about the field of AI alignment (problems and proposed solutions). I was hoping the course would spend more time on arguments for and against AI being an x-risk, but unfortunately there was little of that, so it didn't change my mind much.

I think the value I got from the course is that my interactions with AI content are richer. It’s like trying to understand an ecosystem after getting a better prescription for my glasses.

Curated and popular this week
 ·  · 10m read
 · 
Regulation cannot be written in blood alone. There’s this fantasy of easy, free support for the AI Safety position coming from what’s commonly called a “warning shot”. The idea is that AI will cause smaller disasters before it causes a really big one, and that when people see this they will realize we’ve been right all along and easily do what we suggest. I can’t count how many times someone (ostensibly from my own side) has said something to me like “we just have to hope for warning shots”. It’s the AI Safety version of “regulation is written in blood”. But that’s not how it works. Here’s what I think about the myth that warning shots will come to save the day: 1) Awful. I will never hope for a disaster. That’s what I’m trying to prevent. Hoping for disasters to make our job easier is callous and it takes us off track to be thinking about the silver lining of failing in our mission. 2) A disaster does not automatically a warning shot make. People have to be prepared with a world model that includes what the significance of the event would be to experience it as a warning shot that kicks them into gear. 3) The way to make warning shots effective if (God forbid) they happen is to work hard at convincing others of the risk and what to do about it based on the evidence we already have— the very thing we should be doing in the absence of warning shots. If these smaller scale disasters happen, they will only serve as warning shots if we put a lot of work into educating the public to understand what they mean before they happen. The default “warning shot” event outcome is confusion, misattribution, or normalizing the tragedy. Let’s imagine what one of these macabrely hoped-for “warning shot” scenarios feels like from the inside. Say one of the commonly proposed warning shot scenario occurs: a misaligned AI causes several thousand deaths. Say the deaths are of ICU patients because the AI in charge of their machines decides that costs and suffering would be minimize
 ·  · 32m read
 · 
Authors: Joel McGuire (analysis, drafts) and Lily Ottinger (editing)  Formosa: Fulcrum of the Future? An invasion of Taiwan is uncomfortably likely and potentially catastrophic. We should research better ways to avoid it.   TLDR: I forecast that an invasion of Taiwan increases all the anthropogenic risks by ~1.5% (percentage points) of a catastrophe killing 10% or more of the population by 2100 (nuclear risk by 0.9%, AI + Biorisk by 0.6%). This would imply it constitutes a sizable share of the total catastrophic risk burden expected over the rest of this century by skilled and knowledgeable forecasters (8% of the total risk of 20% according to domain experts and 17% of the total risk of 9% according to superforecasters). I think this means that we should research ways to cost-effectively decrease the likelihood that China invades Taiwan. This could mean exploring the prospect of advocating that Taiwan increase its deterrence by investing in cheap but lethal weapons platforms like mines, first-person view drones, or signaling that mobilized reserves would resist an invasion. Disclaimer I read about and forecast on topics related to conflict as a hobby (4th out of 3,909 on the Metaculus Ukraine conflict forecasting competition, 73 out of 42,326 in general on Metaculus), but I claim no expertise on the topic. I probably spent something like ~40 hours on this over the course of a few months. Some of the numbers I use may be slightly outdated, but this is one of those things that if I kept fiddling with it I'd never publish it.  Acknowledgements: I heartily thank Lily Ottinger, Jeremy Garrison, Maggie Moss and my sister for providing valuable feedback on previous drafts. Part 0: Background The Chinese Civil War (1927–1949) ended with the victorious communists establishing the People's Republic of China (PRC) on the mainland. The defeated Kuomintang (KMT[1]) retreated to Taiwan in 1949 and formed the Republic of China (ROC). A dictatorship during the cold war, T
 ·  · 14m read
 · 
This is a transcript of my opening talk at EA Global: London 2025. In my talk, I challenge the misconception that EA is populated by “cold, uncaring, spreadsheet-obsessed robots” and explain how EA principles serve as tools for putting compassion into practice, translating our feelings about the world's problems into effective action. Key points:  * Most people involved in EA are here because of their feelings, not despite them. Many of us are driven by emotions like anger about neglected global health needs, sadness about animal suffering, or fear about AI risks. What distinguishes us as a community isn't that we don't feel; it's that we don't stop at feeling — we act. Two examples: * When USAID cuts threatened critical health programs, GiveWell mobilized $24 million in emergency funding within weeks. * People from the EA ecosystem spotted AI risks years ahead of the mainstream and pioneered funding for the field starting in 2015, helping transform AI safety from a fringe concern into a thriving research field. * We don't make spreadsheets because we lack care. We make them because we care deeply. In the face of tremendous suffering, prioritization helps us take decisive, thoughtful action instead of freezing or leaving impact on the table. * Surveys show that personal connections are the most common way that people first discover EA. When we share our own stories — explaining not just what we do but why it matters to us emotionally — we help others see that EA offers a concrete way to turn their compassion into meaningful impact. You can also watch my full talk on YouTube. ---------------------------------------- One year ago, I stood on this stage as the new CEO of the Centre for Effective Altruism to talk about the journey effective altruism is on. Among other key messages, my talk made this point: if we want to get to where we want to go, we need to be better at telling our own stories rather than leaving that to critics and commentators. Since