Michaël Trazzi

Here are some reasons for why I think it might still make sense to post short-form (not trying to convince you, I just think these arguments are worth mentioning for anyone reading this):

Even if there's more people we want to reach who watch longform vs. short-form (or even who read LessWrong), what actually matters is whether short-form content is neglected, and whether the people who watch short-form would also end up watching long-form anyway. I think there's a case for it being neglected, but I agree that a lot of potentially impactful people who watch TikTok probably also watch Youtube.
The super-agentic people who have developed substantial "cog sec" and manage to not look at any social media at all would probably only be reachable via LessWrong / arXiv papers, which is an argument that undermines most AI Safety comms, not just short-form. To that I'd say:
- I remember Dwarkesh saying somewhere that 30% of his podcast growth comes from short-form. This hints at short-form bringing potential long-form viewer / listener, and those Dwarkesh listeners are people we'd want to reach.
- Youtube pushes aggressively for short-form. And for platforms like Instagram it's even harder to ignore.
  - It's possible to not use Instagram at all, and disable short-form recommendations on Youtube, but every time you add a "cog sec" criteria you're filtering even more people. (A substantial amount of my short-form views come posting on YT shorts, and I'm planning to extend to Instagram soon).
Similarly to what @Cameron Holmes argues below, broad public awareness is also a nice externality, not just getting more AI Safety talent.
You could imagine reaching people indirectly (think your friend who does watch short-form content talks to you about what they've learned at lunch).
When I actually look at the data of what kind of viewers watch my short-form content, it's essentially older people (> 24yo, even >34yo) from high-income countries like the US. It's surprisingly not younger people (who you might expect to have shorter attention span / be less agentic).

AI Safety Field Growth Analysis 2025

Michaël Trazzi19d3

That's usueful data, thanks!

How confident are you that an exponential is a good fit here? The 2025 datapoint make the Research org & FTE curves look more like S-curves to me.

Rethinking The Impact Of AI Safety Videos: Extending Austin & Marcus' framework

Michaël Trazzi1mo3

Thanks! Just want to add some counterpoints and disclaimers to that:
- 1. I want to flag that although I've filmed & edited ~20 short-form clips in the past (eg. from June 2022 to July 2025) around things like AI Policy and protests, most of the content I've recently been posting as just been clips from other interviews. So I think it would also be unfair to compare my clips and original content (both short-form and longform), which is why I wrote this post. (I started doing this because I ran out of footage to edit shortform videos as I was trying to publish one TikTok a day, and these clips eventually reached way more people than what I was doing before, so I transitioned to doing that).
- 2. regarding comparing to high-production videos: I don't want to come across as saying we shouldn't compare work of different length or using different budgets. I think Marcus and Austin's attempt is honorable. Also, being able to correctly use a large budget to make a high-production video that reaches as many people as many lower budget videos requires a lot of skill, though once you have that level of skill then the amount of time you spend on a video to make it really good ends up leading to exponential results in views (if you make something that is 10% better, Youtube will push it much more than 10% more).

What's going on in video in AI Safety these days? (A list)

Michaël Trazzi1mo4

Glad you're working with some of the people I recommended to you, I'm very proud of that SB-1047 documentary team.

I would add to the list Suzy Shepherd who made Writing Doom. I believe she will relatively soon be starting another film. I wrote more about her work here.

How cost-effective are AI safety YouTubers?

Michaël Trazzi1mo1

For context, you asked me for data for something you were planning (at the time) to publish day-off. There's no way to get the watchtime easily on TikTok (which is why I had to do manual addition of things on a computer) and I was not on my laptop, so couldn't do it when you messaged me. You didn't follow up to clarify that watchtime was actually the key metric in your system and you actually needed that number.

Good to know that the 50 people were 4 Safety people and 46 people who hang at Mox and Taco Tuesday. I understand you're trying to reach the MIT-graduate working in AI who might somehow transition to AI Safety work at a lab / constellation. I know that Dwarkesh & Nathan are quite popular with that crowd, and I have a lot of respect for what Aric (& co) did, so the data you collected make a lot of sense to me. I think I can start to understand why you gave a lower score to Rational Animations or other stuff like AIRN.

I'm now modeling you as trying to answer something like "how do we cost-effectively feed AI Safety ideas to the kind of people who walk in at Taco Tuesday, who have the potential to be good AI Safety researchers". Given that, I can now understand better how you ended up giving some higher score to Cognitive Revolution and Robert Miles.

How cost-effective are AI safety YouTubers?

Michaël Trazzi1mo*-1

1) Feel free to use $26k. My main issue was that you didn't ~~ask me for my viewer minutes for TikTok~~ (EDIT: didn't follow up to make sure I give you the viewer minutes for TikTok) and instead used a number that is off by a factor of 10. Please use a correct number in future analysis. For June 15 - Sep 10, that's 4,150,000 minutes, meaning a VM/$ of 160 instead of 18 (details here).
A) Your screenshots of google sheets say "FLI podcast", but you ran your script on the entire channel. And you say that the budget is $500k. Can you confirm what you're trying to measure here? The entire video work of FLI? Just the podcast? If you're trying to get the entire channel, is the budget really $500k for the entire thing? I'm confused.
B) If you use accurate numbers for some things and estimate for others, I'd make sure to communicate explicitly about which ones are which. Even then, when you then compare estimates and real numbers there's a risk that your estimates are off by a a huge factor (has happened with my TikTok numbers), which makes me question the value of the comparisons.
C) Let me try to be clearer regarding paid advertising:
- If some of the watchtime estimates you got from people are (views * 33% of length), and they pay $X per view (fixed cost of ads on youtube), then the VM/$ will be: [nb_views * (33% length) / total_cost] = [ nb_views * 33% length] / [nb_views * X] = [33% length / X]. Which is why I mean it's basically the cost of ads. (Note: I didn't include the organic views above because I'm assuming they're negligible compared to the inorganic ones. If you want me to give examples of videos where I see mostly inorganic views, I'll send you by DM).
- For the cases where you got the actual watchtime numbers instead of multiplying the length by a constant or using a script (say, someone tells you they have Y amount of hours total on their channel), or the ads lead to real organic views, your reasoning around ads makes sense, though I'd still argue that in terms of impact the engagement is the low / pretty disastrous in some cases, and does not translate to things we care about (like people taking action).
3. I think the questions "who is your favourite AI safety creator" or "Which AI safety YouTubers did/do you watch" are heavily biased towards Robert Miles, as he is (and has basically been for the past 8 years) the only "AI Safety Youtuber" (like making purely talking head videos about AI Safety, in comparison, RA is a team). So I think based on these questions it's quite likely he'd be mentioned, though I agree 50 people saying his name first is important data that needs to be taken into account.
- That said, I'm trying to wrap my head around how to go from your definition of "quality of audience" to "Robert Miles was chosen by 50 different people to be their favorite youtuber, as the first person mentioned". My interpretation is that you're saying: 1) you've spoken to 50 people who are people who work in AI Safety 2) they all mentioned Rob as the canonical Youtuber, so "therefore" A) Rob has the highest quality audience? (cf. you wrote in OP "This led me to make the “audience quality” category and rate his audience much higher.")
  - My model for how this claim could be true that 1) you asked 50 people who you all thought were "high quality" audience 2) they all mentioned rob and nobody else (or rarely nobody else), so 3) you inferred "high quality audience => watches Rob" and therefore 4) also inferred "watches Rob => high quality"?
4. Regarding weights, also respectfully, I did indeed look at them individually. You can check my analysis for what I think the TikTok individual weights should be here. For Youtube see here. Regarding your points:
- I have posted in my analysis of tiktok a bunch of datapoints that you probably don't have about the fact that my audience is mostly older high-income people from richer countries, which is unusually good for TikTok. Which is why I put 3 instead of your 2.
- "you're just posting clips of other podcasts and such and this just doesn't do a great job of getting a message across" -> the clips that end up making the majority of viewer minutes are actually quite high fidelity since they're quite long (2-4m long) and get the message more crisply than the average podcast minute. Anyway, once you look at my TikTok analysis you'll see that I ended up dividing everything by 2 to have the max fidelity tiktok have 0.5 (same as Cognitive Revolution), which means my number is Qf=0.45 at the end (instead of your 0.1) to just be coherent with the rest of your numbers.
- Qm: that's subjective but FWIW I myself only align to 0.75 to my TikTok and not 1 (see analysis)
- "Again, most of the quality factor is being done by audience quality and yes, shorts just have a far lower audience quality." --> again, respectfully, from looking at your tables I think this is false. You rank the fidelity of TikTok as 0.1, which is 5x less than 4 other channels. No other channels except my content (TikTok & YT) has less 0.3. In comparison, if you forget about rob's row, the audience quality varies only by 3x between my Qa for TikTok and the rest. So no, the quality factor is not mainly done by audience quality.

How cost-effective are AI safety YouTubers?

Michaël Trazzi1mo*3

Agreed about the need to include Suzy Shepherd and Siliconversations.

Before Marcus messaged me I was in the process of filling another google sheets (link) to measure the impact of content creators (which I sent him) which also had like three key criteria (production value, usefulness of audience, accuracy).

I think Suzy & Siliconversations are great example of effectiveness because:

I think Suzy did her film for really cheap (less than $20k). Probably if you included her time you'd get a larger amount, but in terms of actual $ spent and the impact it got (400k views on longform content) I think it's pretty great, and I think quite educational. In particular, it provides another angle to how to explain things, through some original content. In comparison, a lot of the AI 2027 content has been like amplifying an idea that was already in the world and covered by a bunch of people. Not sure how to compare both things but it's worth noting they're different.
Siliconversations is I think an even more powerful example, and one of the reasons why I wanted to make that google sheets. After talking to folks at ControlAI, his videos about emailing representatives lead to many more emails sent to representatives than another Control AI x Rational Animations collaboration, even though RA is a much bigger channel than Siliconversations (and especially at the time Siliconversations first posted his video). The ratio of CTA per views is at least 2.5% given the 2000 emails from 80k views (source).

The thing I wanted to measure (which I think is probably a bit much harder than just estimating things with weights then multiplying by minutes of watchtime) is "what kind of content leads more people to take action like Siliconversations", and I'm not sure how to measure that except if everyone had CTAs that they tracked and we could compare the ratios.

The reason I think Siliconversations' video lead to so many emails was that he was actually relentless in this video about sending emails, and that was the entire point of the video, instead of like talking about AI risk in general, and having a link in the comments.

I think this is also why that RA x ControlAI collab got less emails, but it also got way more views that potentially in the future will lead to a bunch of people that will do a lot of useful things in the world, though that's hard to measure.

I know that 80k's AI In Context has a full section at the end on "What to do" saying to look at the links in description. Maybe Chana Mesinger has data on how many people clicked on how much traffic was redirected from YT to 80k.

How cost-effective are AI safety YouTubers?

Michaël Trazzi1mo*5

Update: after looking at Marcus' weights, I ended up dividing all the intermediary values of Qf I had by 2, so that it matches with Marcus' weights where Cognitive Revolution = 0.5. Dividing by 2 caps the best tiktok-minute to the average Cognitive Revolution minute. Neel was correct to claim that 0.9 was way too high.

===

My model is that most of the viewer minutes come from people who watch the all thing, and some decent fraction end up following, which means they'll end up engaging more with AI-Safety-related content in the future as I post more.

Looking at my most viewed TikTok:

TikTok says 15.5% of viewers (aka 0.155 * 1400000 = 217000) watched the entire thing, and most people who watch the first half end up watching until the end (retention is 18% at half point, and 10% at the end).

And then assuming the 11k who followed came from those 217000 who watched the whole thing, we can say that's 11000/217000 = 5% of the people who finished the video that end up deciding to see more stuff like that in the future.

So yes, I'd say that if a significant fraction (15.5%) watch the full thing, and 0.155*0.05 = 0.7% of the total end up following, I think that's "engaging properly".

And most importantly, most of the viewer-minutes on TikTok do come from these long videos that are 1-4 minutes long (especially ones that are > 2 minutes long):

The short / low-fidelity takes that are 10-20s long don't get picked up by the new tiktok algorithm, don't get much views, so didn't end up in that "TikTok Qa & Qs" sheet of top 10 videos (and for the ones that did, they didn't really contribute to the total minutes, so to the final Qf).
To show that the Eric Schimdt example above is not cherry-picked, here is a google docs with similar screenshots of stats for the top 10 videos that I use to compute Qf. From these 10 videos, 6 are more than 1m long, and 4 are more than 2 minutes long. The precise distribution is:
- 0m-1m: 4 videos
- 1m-2m: 2 videos
- 2m-3m: 2 videos
- 3m-4m: 2 videos

Happy for others to come up with different numbers / models for this, or play with my model through the "TikTok Qa & Qf" sheet here, using different intermediary numbers.

Update: as I said at the top, I was actually wrong to have initially said Qf=0.9 given the other values. I now claim that Qf should be closer to 0.45. Neel was right to make that comment.

How cost-effective are AI safety YouTubers?

Michaël Trazzi1mo*1

This comment is answering "TikTok I expect is pretty awful, so 0.1 might be reasonable there". For my previous estimate on the quality of my Youtube long-form stuff, see this comment.

tl;dr: I now estimate the quality of my TikTok content to be Q = 0.75 * 0.45 * 3 = 1

The Inside View (TikTok) - Alignment = 0.75 & Fidelity = 0.45

To estimate fidelity of message (Qf) and alignment of message (Qm) in a systematic way, I compiled my top 10 most performing tiktoks and ranked their individual Qf and Qm (see tab called "TikTok Qa & Qf" here, which contains the reasoning for each individual number).

Update Sep 14: I've realized that my numbers about fidelity used 1 as the maximum, but now that I've looked at Marcus' weights for other stuff, I think I should use 0.5 because that's the number he gives to a podcast like Cognitive Revolution, and I don't want to claim that a long tiktok clip is more high-fidelity than the average Cognitive Revolution podcast. So I divided everything by 2 so my maximum fidelity is now 0.5 to match Marcus' other weights.

Then, by doing a minute-adjusted weighted average of the Qas and Qfs I get:

Qf(The Inside View TikTok) = 0.45
Qm(The Inside View TikTok) = 0.75

What this means:

Since I'm editing clips, the message is already high-fidelity (comes from the source, most of the time). The question is whether people will get a high-fidelity long explanation, or something short but potentially compressed. When weighing things by minute we end up with 0.9 meaning that most of the watchtime-minutes come from the high-fidelity content.
I am not always fully aligned with the clips that I post, but I am mostly aligned with them.

The Inside View (TikTok) - Quality of Audience = 3

I believe the original reasoning for Qa = 2 is that people watching short-form by default would be young and / or have short attention spans, and therefore be less of a high-quality audience.

However, most of my high-performing TikTok clips (that represent most of the watch time) are quite long (2m-3m30s long), which makes me think the kind of audience who watch these until the end are not as different from Youtube.

On top of that, my audience a) skews towards US (33%) or high-income countries (more than half are in US / Australian / UK etc.) and 88% of my audience being over 25, with 61% being above 35. (Data here).

Therefore, in terms of quality of audience, I don't see why the audience would be worse in quality than people who watch AI Species / AI Risk Network.

Which is why I'm estimating: Qa(The Inside View TikTok) = 3.

Conclusion

If we multiply these three numbers we get Q = 0.75 * 0.45 * 3 = 1

How cost-effective are AI safety YouTubers?

Michaël Trazzi1mo*7

Agreed that the quality of audience is definitely higher for my (niche) AI Safety content on Youtube, and I'd expect Q to be higher for (longform) Youtube than Tiktok.

In particular, I estimate Q(The Inside View Youtube) = 2.7, instead of 0.2, with (Qa, Qf, Qm) = (6, 0.45, 1), though I acknowledge that Qm is (by definition) the most subjective.

To make this easier to read & reply to, I'll post my analysis for Q(The Inside View Tiktok) in another comment, which I'll link to when it's up. EDIT: link for TikTok analysis here.

The Inside View (Youtube) - Qa = 6

In light of @Drew Spartz's comment (saying one way to quantify the quality of audience would be to look at the CPM ^[1]), I've compiled my CPM Youtube data and my average Playback-based CPM is $14.8, which according to this website ^[2] would put my CPM above the 97.5 percentile in the UK, and close to the 97.5 percentile in the US.

Now, this is more anecdotal evidence than data-based, but I've met quite a few people over the years (from programs like MATS, or working at AI Safety orgs) who've told me they discovered AI Safety from my Inside View podcast. And I expect the SB-1047 documentary to have attracted a niche audience interested in AI regulation.

Given the above, I think it would make sense to have the Qa(Youtube) be between 6 (same as other technical podcasts) and 12 (Robert Miles). For the sake of giving a concrete number, I'll say 6 to be on par with other podcasts like FLI and CR.

The Inside View (Youtube) - Qf = 0.45

In the paragraph below I'll say Qf_M for the Qf that Marcus assigns to other creators.

For the fidelity of message, I think it's a bit of a mixed bag here. As I said previously, I expect the podcasts that Nathan would be willing to crosspost to be on par with his channel's quality, so in that sense I'd say the fidelity of message for these technical episodes (Owain Evans, Evan Hubinger) to be on par with CR (Qf_M = 0.5). Some of my non-technical interviews are probably closer to discussions we could find on Doom Debates (Qf_M = 0.4), though there are less of them. My SB-1047 documentary is probably similar in fidelity of message to AI in context (Qf_M = 0.5), and this fictional scenario is very similar to Drew's content (Qf_M = 0.5). I've also posted video explainers that range from low effort (Qf around 0.4?) to very high effort (Qf around 0.5?).

Given all of the above, I'd say the Qf for the entire channel is probably around 0.45.

The Inside View (Youtube) - Qm = 1

As you say, for the alignment of message, this is probably the most subjective. I think by definition the content I post is the message that aligns the most with my values (at least for my Youtube content) so I'd say 1 here.

The Inside View (Youtube) - Q = 2.7

Multiplying these numbers I get Q = 2.7. Doing a sanity check, this seems about the same as Cognitive Revolution, which doesn't seem crazy given we've interviewed similar people & the cross-post arguments I've said before.

(Obviously if I was to modify all of these Qa, Qf, Qm numbers for all channels I'd probably end up with different quality comparisons).

^{^}
CPM means Cost Per Mille. In YT Studio it's defined as "How much advertisers pay every thousand times your Watch Page content is viewed with ads."
^{^}
I haven't done extended research here and expect I'd probably get different results looking at different websites. This one was the first one I found on google so not cherry-picked.

Michaël Trazzi

Bio

Posts 21

Comments28

The Inside View (TikTok) - Alignment = 0.75 & Fidelity = 0.45

The Inside View (TikTok) - Quality of Audience = 3

Conclusion

The Inside View (Youtube) - Qa = 6

The Inside View (Youtube) - Qf = 0.45

The Inside View (Youtube) - Qm = 1

The Inside View (Youtube) - Q = 2.7

Posts
21

Comments
28