I recently completed the AGI Safety Fundamentals course (on AI alignment) and really enjoyed it.

I imagine lots of people on the forum have taken the course and now would be a good time to share reflections. I’m hoping this thread might be particularly useful for people considering taking it in future. 

I’d be interested to hear many different kinds of reflections, but here are some prompts:

  1. What do you think you’ve learnt? How has it impacted your plans, if it has?
  2. Did you think it was worth the time?
  3. Who would you recommend it to?
  4. Do you wish you had done something differently, or known something in advance? (Any other advice?)

I’ll leave my own reflections in the comments.

27

0
0

Reactions

0
0
Comments7


Sorted by Click to highlight new comments since:
  • As a result of AGI SF readings and other sporadic AI safety readings… 
    • … I feel more confident asking questions of people who know more than I do
      • I feel like I know the vocabulary, main threat scenarios, and rough approaches to solving the problem, such that I can situate new facts into existing taxonomies 
    • … I’m better able to tell when prominent people disagree / things have more texture
  • Some (self-)critiques 
    • Honestly thought content for some of the weeks was a bit weak if you just wanted an overview of the alignment problem (e.g., adversarial techniques for scalable oversight probably isn’t what you need to understand if you’re trying to assess the risk) 
    • I wish I’d set up strong accountability
    • Wish it had more counter-arguments to classic AI risk. 
      • Apparently Stanford AI group modified curriculum to have more of these? 
    • Wish I’d been more active in discussion or questions beforehand: give yourself a chance to be wrong!
  • Tips I’d recommend for learning more about AI risk
    • Just start! Always feels daunting to dive into but just find a few explainer articles and dive in
    • Spaced repetition for learning really does go hard
    • Talk to knowledgeable people who give you space to be wrong
    • Write up your thoughts and have knowledgeable people poke holes/ show you where you’re missing something 
  • A few resources that I keep coming back to (not from AGI SF): 
DE
9
1
0

Thank you for making this thread Clifford, and we're really grateful for all feedback! We're working hard as a team to improve the course and the infrastructure we have for hosting other courses, and everyone's feedback has been incredibly valuable on our journey thus far :) 

I feel like I have a much better sense of what the current approaches to alignment are, what people are working on and how underdeveloped the field is. In general, it’s been a while since I’ve spent time studying anything so it felt fun just to dedicate time to learning. It also felt empowering to take a field that I’ve heard a lot about at a high level and make it clearer in my mind.

I think doing the Week 0 readings are an easy win for anyone who wants to demystify some of what is going on in ML systems, which I think should be interesting to anyone, even if you’re not interested in alignment.

I became much more motivated to work on making AI go well over the period of the course, I think mainly because it made the problem more concrete but likely just spending more time thinking about it. That said, it’s hard to disentangle this increased motivation from recent events and other factors.

For anyone who is considering the course: TYPE III AUDIO is making audio narrations for the Alignment and Governance courses. The series is due to launch later this month, but some 50+ episodes are already available.

I'm quite glad I took the course! 

Quick takes: 

  1. The main types of value I was getting from the course were: 
    1. Accountability for doing the readings
    2. The chance to use wrong terminology / say things that don't make sense (either when I'm trying to explain something, or when I'm asking a question), and then get corrected (This helped me to develop a more coherent model of what's going on and catch unknown unknowns (at least by transforming them into known unknowns).)
    3. Other resources: links to other readings and explanations
    4. Corrections and clarifications during the sessions
  2. Personal lessons for next time: 
    1. Set aside time to do the readings >24 hours in advance, and send in questions early
      1. (Agree with the facilitator about doing this!)
    2. Use Claude[1] as a personal tutor from the start
      1. This was great. The thing I'd do was, if I was reading something that I was having a hard time following, I'd give Claude some context, then say something like, "I'll now explain this in terms that I understand, or with an analogy or visualization that makes sense to me. Please correct me where I'm misusing terms or saying something wrong." Claude would generally be over-positive and would miss some things, but I'd often get a more technical restatement of what I was trying to say, and this helped me a lot. This was relatively introductory material that wasn't specific to AI safety, so I think Claude was actually performing pretty well. 
    3. Set up a notes and questions doc for myself from the beginning
      1. I now have a messy doc with notes from the different weeks, further readings that we were recommended, and short summaries — I find it quite useful, but the first few weeks are missing because I just hadn't set it up, and was just scribbling down some questions as I read. Having a doc in advance would have been great and would have prompted me to use it.
    4. Actually read the little intro notes for each week on the AGISF website
      1. I think they would have been helpful context for the readings, but I was often skipping them (especially at the beginning), as I didn't really see them as part of the reading. 
    5. I spent a fair amount of time on this and endorse doing that.
      1.  I think the more work I was putting in, the more value I was getting out of the course (there weren't diminishing returns). I was spending probably around 2-5 hours of time before the session each week (there might have been a week when I spent less than 2 hours, and the median time was probably closer to something like 2-3) — in higher-effort weeks, I would explore related writing or referenced texts, and try to get to the point where I could notice the potential weaknesses and confusions I had about the assigned readings. I should flag that I wouldn't always do all the readings very carefully. 
  3. Having some amount of context before starting the course could be useful. If you're not familiar with the overall shape of the argument for existential risk from AI, I imagine that the course structure might be a bit jarring; I moved cohorts a bit in the beginning, but I did feel a bit like it began somewhat abruptly.
  4. Minor points/asides: 
    1. It was just pretty fun.
    2. I have notes on readings I found more and less helpful, and will try to pass those on at some point. 
    3. Facilitators probably matter a lot. 
    4. I enjoyed the readings for some weeks a lot more than I enjoyed them for other weeks. 
    5. I was worried that approaches seemed outdated, in some cases.
    6. I still have pretty broad confusions that I hope to resolve. 
  1. ^

    I was using Claude for dumb reasons — I don't have a strong sense for how it compares to GPT-4 on this. 

I definitely feel it was worth the time for me personally. It was great for learning about the field of AI alignment (problems and proposed solutions). I was hoping the course would spend more time on arguments for and against AI being an x-risk, but unfortunately there was little of that, so it didn't change my mind much.

I think the value I got from the course is that my interactions with AI content are richer. It’s like trying to understand an ecosystem after getting a better prescription for my glasses.

Curated and popular this week
 ·  · 5m read
 · 
This work has come out of my Undergraduate dissertation. I haven't shared or discussed these results much before putting this up.  Message me if you'd like the code :) Edit: 16th April. After helpful comments, especially from Geoffrey, I now believe this method only identifies shifts in the happiness scale (not stretches). Have edited to make this clearer. TLDR * Life satisfaction (LS) appears flat over time, despite massive economic growth — the “Easterlin Paradox.” * Some argue that happiness is rising, but we’re reporting it more conservatively — a phenomenon called rescaling. * I test rescaling using long-run German panel data, looking at whether the association between reported happiness and three “get-me-out-of-here” actions (divorce, job resignation, and hospitalisation) changes over time. * If people are getting happier (and rescaling is occuring) the probability of these actions should become less linked to reported LS — but they don’t. * I find little evidence of rescaling. We should probably take self-reported happiness scores at face value. 1. Background: The Happiness Paradox Humans today live longer, richer, and healthier lives in history — yet we seem no seem for it. Self-reported life satisfaction (LS), usually measured on a 0–10 scale, has remained remarkably flatover the last few decades, even in countries like Germany, the UK, China, and India that have experienced huge GDP growth. As Michael Plant has written, the empirical evidence for this is fairly strong. This is the Easterlin Paradox. It is a paradox, because at a point in time, income is strongly linked to happiness, as I've written on the forum before. This should feel uncomfortable for anyone who believes that economic progress should make lives better — including (me) and others in the EA/Progress Studies worlds. Assuming agree on the empirical facts (i.e., self-reported happiness isn't increasing), there are a few potential explanations: * Hedonic adaptation: as life gets
 ·  · 38m read
 · 
In recent months, the CEOs of leading AI companies have grown increasingly confident about rapid progress: * OpenAI's Sam Altman: Shifted from saying in November "the rate of progress continues" to declaring in January "we are now confident we know how to build AGI" * Anthropic's Dario Amodei: Stated in January "I'm more confident than I've ever been that we're close to powerful capabilities... in the next 2-3 years" * Google DeepMind's Demis Hassabis: Changed from "as soon as 10 years" in autumn to "probably three to five years away" by January. What explains the shift? Is it just hype? Or could we really have Artificial General Intelligence (AGI)[1] by 2028? In this article, I look at what's driven recent progress, estimate how far those drivers can continue, and explain why they're likely to continue for at least four more years. In particular, while in 2024 progress in LLM chatbots seemed to slow, a new approach started to work: teaching the models to reason using reinforcement learning. In just a year, this let them surpass human PhDs at answering difficult scientific reasoning questions, and achieve expert-level performance on one-hour coding tasks. We don't know how capable AGI will become, but extrapolating the recent rate of progress suggests that, by 2028, we could reach AI models with beyond-human reasoning abilities, expert-level knowledge in every domain, and that can autonomously complete multi-week projects, and progress would likely continue from there.  On this set of software engineering & computer use tasks, in 2020 AI was only able to do tasks that would typically take a human expert a couple of seconds. By 2024, that had risen to almost an hour. If the trend continues, by 2028 it'll reach several weeks.  No longer mere chatbots, these 'agent' models might soon satisfy many people's definitions of AGI — roughly, AI systems that match human performance at most knowledge work (see definition in footnote). This means that, while the compa
 ·  · 4m read
 · 
SUMMARY:  ALLFED is launching an emergency appeal on the EA Forum due to a serious funding shortfall. Without new support, ALLFED will be forced to cut half our budget in the coming months, drastically reducing our capacity to help build global food system resilience for catastrophic scenarios like nuclear winter, a severe pandemic, or infrastructure breakdown. ALLFED is seeking $800,000 over the course of 2025 to sustain its team, continue policy-relevant research, and move forward with pilot projects that could save lives in a catastrophe. As funding priorities shift toward AI safety, we believe resilient food solutions remain a highly cost-effective way to protect the future. If you’re able to support or share this appeal, please visit allfed.info/donate. Donate to ALLFED FULL ARTICLE: I (David Denkenberger) am writing alongside two of my team-mates, as ALLFED’s co-founder, to ask for your support. This is the first time in Alliance to Feed the Earth in Disaster’s (ALLFED’s) 8 year existence that we have reached out on the EA Forum with a direct funding appeal outside of Marginal Funding Week/our annual updates. I am doing so because ALLFED’s funding situation is serious, and because so much of ALLFED’s progress to date has been made possible through the support, feedback, and collaboration of the EA community.  Read our funding appeal At ALLFED, we are deeply grateful to all our supporters, including the Survival and Flourishing Fund, which has provided the majority of our funding for years. At the end of 2024, we learned we would be receiving far less support than expected due to a shift in SFF’s strategic priorities toward AI safety. Without additional funding, ALLFED will need to shrink. I believe the marginal cost effectiveness for improving the future and saving lives of resilience is competitive with AI Safety, even if timelines are short, because of potential AI-induced catastrophes. That is why we are asking people to donate to this emergency appeal
Recent opportunities in AI safety
11