Hide table of contents

Recently, many AI safety movement-building programs have been criticized for attempting to grow the field too rapidly and thus:

  1. Producing more aspiring alignment researchers than there are jobs or training pipelines;
  2. Driving the wheel of AI hype and progress by encouraging talent that ends up furthering capabilities;
  3. Unnecessarily diluting the field’s epistemics by introducing too many naive or overly deferent viewpoints.

At MATS, we think that these are real and important concerns and support mitigating efforts. Here’s how we address them currently.

Claim 1: There are not enough jobs/funding for all alumni to get hired/otherwise contribute to alignment

How we address this:

  • Some of our alumni’s projects are attracting funding and hiring further researchers. Three of our alumni have started alignment teams/organizations that absorb talent (Vivek’s MIRI team, Leap Labs, Apollo Research), and more are planned (e.g., a Paris alignment hub).
  • With the elevated interest in AI and alignment, we expect more organizations and funders to enter the ecosystem. We believe it is important to install competent, aligned safety researchers at new organizations early, and our program is positioned to help capture and upskill interested talent.
  • Sometimes, it is hard to distinguish truly promising researchers in two months, hence our four-month extension program. We likely provide more benefits through accelerating researchers than can be seen in the immediate hiring of alumni.
  • Alumni who return to academia or industry are still a success for the program if they do more alignment-relevant work or acquire skills for later hiring into alignment roles.

Claim 2: Our program gets more people working in AI/ML who would not otherwise be doing so, and this is bad as it furthers capabilities research and AI hype

How we address this:

  • Considering that the median MATS scholar is a Ph.D./Masters student in ML, CS, maths, or physics and only 10% are undergrads, we believe most of our scholars would have ended up working in AI/ML regardless of their involvement with the program. In general, mentors select highly technically capable scholars who are already involved in AI/ML; others are outliers.
  • Our outreach and selection processes are designed to attract applicants who are motivated by reducing global catastrophic risk from AI. We principally advertise via word-of-mouth, AI safety Slack workspaces, AGI Safety Fundamentals and 80,000 Hours job boards, and LessWrong/EA Forum. As seen in the figure below, our scholars generally come from AI safety and EA communities.

MATS Summer 2023 interest form: “How did you hear about us?” (381 responses)

  • We additionally make our program less attractive than comparable AI industry programs by introducing barriers to entry. Our grant amounts are significantly less than our median scholar could get from an industry internship (though substantially higher than comparable academic salaries), and the application process requires earnest engagement with complex AI safety questions. We additionally require scholars to have background knowledge at the level of AGI Safety Fundamentals, which is an additional barrier to entry that e.g. MLAB didn't require.
  • We think that ~1 more median MATS scholar focused on AI safety is worth 5-10 more median capabilities researchers (because most do pointless stuff like image generation, and there is more low-hanging fruit in safety). Even if we do output 1-5 median capabilities researchers per cohort (which seems very unlikely), we likely produce far more benefit to alignment with the remaining scholars.

Claim 3: Scholars might defer to their mentors and fail to critically analyze important assumptions, decreasing the average epistemic integrity of the field

How we address this:

  • Our scholars are encouraged to “own” their research project and not unnecessarily defer to their mentor or other “experts.” Scholars have far more contact with their peers than mentors, which encourages an atmosphere of examining assumptions and absorbing diverse models from divergent streams. Several scholars have switched mentors during the program when their research interests diverged.
  • We require scholars to submit “Scholar Research Plans” one month into the in-person phase of the program, detailing a threat model they are targeting, their research project’s theory of change, and a concrete plan of action (including planned outputs and deadlines).
  • We encourage an atmosphere of friendly disagreement, curiosity, and academic rigor at our office, seminars, workshops, and networking events. Including a diverse portfolio of alignment agendas and researchers from a variety of backgrounds allows for earnest disagreement among the cohort. We tell scholars to “download but don’t defer” in regard to their mentors’ models. Our seminar program includes diverse and opposing alignment viewpoints. While scholars generally share an office room with their research stream, we try and split streams up for accommodation and Alignment 201 discussion groups to encourage intermingling between streams.

We appreciate feedback on all of the above! MATS is committed to growing the alignment field in a safe and impactful way, and would generally love feedback on our methods. More posts are incoming!

79

0
0

Reactions

0
0

More posts like this

Comments4


Sorted by Click to highlight new comments since:
[anonymous]18
1
0

Glad to see this write-up & excited for more posts.

I think these are three areas that MATS has handled well. I'd be especially excited to hear more about areas where MATS thinks it's struggling, MATS is uncertain, or where MATS feels like it has a lot of room to grow. Potential candidates include:

  • How is MATS going about talent selection and advertising for the next cohort, especially given the recent wave of interest in AI/AI safety?
  • How does MATS intend to foster (or recruit) the kinds of qualities that strong researchers often possess?
  • How does MATS define "good" alignment research? 

Other things I'm be curious about:

  • Which work from previous MATS scholars is the MATS team most excited about? What are MATS's biggest wins? Which individuals or research outputs is MATS most proud of?
  • Most peoples' timelines have shortened a lot since MATS was established. Does this substantially reduce the value of MATS (relative to worlds with longer timelines)?
  • Does MATS plan to try to attract senior researchers who are becoming interested in AI Safety (e.g., professors, people with 10+ years of experience in industry)? Or will MATS continue to recruit primarily from the (largely younger and less experienced) EA/LW communities?
  • We broadened our advertising approach for the Summer 2023 Cohort, including a Twitter post and a shout-out on Rob Miles' YouTube and TikTok channels. We expected some lowering of average applicant quality as a result but have yet to see a massive influx of applicants from these sources. We additionally focused more on targeted advertising to AI safety student groups, given their recent growth. We will publish updated applicant statistics after our applications close.
  • In addition to applicant selection and curriculum elements, our Scholar Support staff, introduced in the Winter 2022-23 Cohort, supplement the mentorship experience by providing 1-1 research strategy and unblocking support for scholars. This program feature aims to:
    • Supplement and augment mentorship with 1-1 debugging, planning, and unblocking;
    • Allow air-gapping of evaluation and support, improving scholar outcomes by resolving issues they would not take to their mentor;
    • Solve scholars’ problems, giving more time for research.
  • Defining "good alignment research" is very complicated and merits a post of its own (or two, if you also include the theories of change that MATS endorses). We are currently developing scholar research ability through curriculum elements focused on breadth, depth, and epistemology (the "T-model of research"):
  • Our Alumni Spotlight includes an incomplete list of projects we highlight. Many more past scholar projects seem promising to us but have yet to meet our criteria for inclusion here. Watch this space.
  • Since Summer 2022, MATS has explicitly been trying to parallelize the field of AI safety as much as is prudent, given the available mentorship and scholarly talent. In longer-timeline worlds, more careful serial research seems prudent, as growing the field rapidly is a risk for the reasons outlined in the above article. We believe that MATS' goals have grown more important from the perspective of timelines shortening (though MATS management has not updated on timelines much as they were already fairly short in our estimation).
  • MATS would love to support senior research talent interested in transitioning into AI safety! Our scholars generally comprise 10% Postdocs, and we would like this number to rise. Currently, our advertising strategy is contingent on the AI safety community adequately targeting these populations (which seems false) and might change for future cohorts.

Just to make this a little more accessible to people who aren't familiar with SERI-MATS, MATS is Machine Learning Alignment Theory Scholars Program, a training program for young researchers who want to contribute to AI alignment research.

Thanks Joseph! Adding to this, our ideal applicant has:

  • an understanding of the AI alignment research landscape equivalent to having completed the AGI Safety Fundamentals course;
  • previous experience with technical research (e.g. ML, CS, maths, physics, neuroscience, etc.), ideally at a postgraduate level;
  • strong motivation to pursue a career in AI alignment research, particularly to reduce global catastrophic risk.

MATS alumni have gone on to publish safety research (LW posts here), join alignment research teams (including at Anthropic and MIRI), and found alignment research organizations (including a MIRI team, Leap Labs, and Apollo Research). Our alumni spotlight is here.

Curated and popular this week
 ·  · 10m read
 · 
Regulation cannot be written in blood alone. There’s this fantasy of easy, free support for the AI Safety position coming from what’s commonly called a “warning shot”. The idea is that AI will cause smaller disasters before it causes a really big one, and that when people see this they will realize we’ve been right all along and easily do what we suggest. I can’t count how many times someone (ostensibly from my own side) has said something to me like “we just have to hope for warning shots”. It’s the AI Safety version of “regulation is written in blood”. But that’s not how it works. Here’s what I think about the myth that warning shots will come to save the day: 1) Awful. I will never hope for a disaster. That’s what I’m trying to prevent. Hoping for disasters to make our job easier is callous and it takes us off track to be thinking about the silver lining of failing in our mission. 2) A disaster does not automatically a warning shot make. People have to be prepared with a world model that includes what the significance of the event would be to experience it as a warning shot that kicks them into gear. 3) The way to make warning shots effective if (God forbid) they happen is to work hard at convincing others of the risk and what to do about it based on the evidence we already have— the very thing we should be doing in the absence of warning shots. If these smaller scale disasters happen, they will only serve as warning shots if we put a lot of work into educating the public to understand what they mean before they happen. The default “warning shot” event outcome is confusion, misattribution, or normalizing the tragedy. Let’s imagine what one of these macabrely hoped-for “warning shot” scenarios feels like from the inside. Say one of the commonly proposed warning shot scenario occurs: a misaligned AI causes several thousand deaths. Say the deaths are of ICU patients because the AI in charge of their machines decides that costs and suffering would be minimize
 ·  · 14m read
 · 
This is a transcript of my opening talk at EA Global: London 2025. In my talk, I challenge the misconception that EA is populated by “cold, uncaring, spreadsheet-obsessed robots” and explain how EA principles serve as tools for putting compassion into practice, translating our feelings about the world's problems into effective action. Key points:  * Most people involved in EA are here because of their feelings, not despite them. Many of us are driven by emotions like anger about neglected global health needs, sadness about animal suffering, or fear about AI risks. What distinguishes us as a community isn't that we don't feel; it's that we don't stop at feeling — we act. Two examples: * When USAID cuts threatened critical health programs, GiveWell mobilized $24 million in emergency funding within weeks. * People from the EA ecosystem spotted AI risks years ahead of the mainstream and pioneered funding for the field starting in 2015, helping transform AI safety from a fringe concern into a thriving research field. * We don't make spreadsheets because we lack care. We make them because we care deeply. In the face of tremendous suffering, prioritization helps us take decisive, thoughtful action instead of freezing or leaving impact on the table. * Surveys show that personal connections are the most common way that people first discover EA. When we share our own stories — explaining not just what we do but why it matters to us emotionally — we help others see that EA offers a concrete way to turn their compassion into meaningful impact. You can also watch my full talk on YouTube. ---------------------------------------- One year ago, I stood on this stage as the new CEO of the Centre for Effective Altruism to talk about the journey effective altruism is on. Among other key messages, my talk made this point: if we want to get to where we want to go, we need to be better at telling our own stories rather than leaving that to critics and commentators. Since
 ·  · 3m read
 · 
A friend of mine who worked as a social worker in a hospital told me a story that stuck with me. She had a conversation with an in-patient having a very difficult time. It was helpful, but as she was leaving, they told her wistfully 'You get to go home'. She found it hard to hear—it felt like an admonition. It was hard not to feel guilt over indeed getting to leave the facility and try to stop thinking about it, when others didn't have that luxury. The story really stuck with me. I resonate with the guilt of being in the fortunate position of being able to go back to my comfortable home and chill with my family while so many beings can't escape the horrible situations they're in, or whose very chance at existence depends on our work. Hearing the story was helpful for dealing with that guilt. Thinking about my friend's situation it was clear why she felt guilty. But also clear that it was absolutely crucial that she did go home. She was only going to be able to keep showing up to work and having useful conversations with people if she allowed herself proper respite. It might be unfair for her patients that she got to take the break they didn't, but it was also very clearly in their best interests for her to do it. Having a clear-cut example like that to think about when feeling guilt over taking time off is useful. But I also find the framing useful beyond the obvious cases. When morality feels all-consuming Effective altruism can sometimes feel all consuming. Any spending decision you make affects how much you can donate. Any activity you choose to do takes time away from work you could be doing to help others. Morality can feel as if it's making claims on even the things which are most important to you, and most personal. Often the narratives with which we push back on such feelings also involve optimisation. We think through how many hours per week we can work without burning out, and how much stress we can handle before it becomes a problem. I do find that