Marcus Abramovitch 🔸

3010 karmaJoined

Comments
200

It's included now in unweighted. I'm going to try to do some analytics to just quality adjustments

Hi Rob, I messaged you on Twitter and got a mail error on an email. I am happy to message you wherever you wish. Feel free to DM me

On 1, with your permission, I'd ask if I could share a screenshot of me asking you in DMs, directly, for viewer minutes. You gave me views, and thus I multiplied the average TikTok length and by a factor for % watched.

On A, yes, the FLI Podcast was perhaps the data point I did the most estimating for a variety of reasons I explained before.

On B, I think you can, in fact, find which are and aren't estimates though I do understand how it's not clear. We considered ways of doing this without being messy. Ill try to make it more clear.

On C, how much you pay for a view is not a constant though. It depends a lot on organic views. And I think boosting videos is a sensible strategy since you put $ into both production costs (time, equipment, etc.) and advertisement. FIguring out how to spend that money efficiently is important. 

On 3, many other people were mentioned. In fact, I found a couple of creators this way. But yes, it was extremely striking and thus suggested that this was a very important factor in the analysis. I want to stress that I do in fact, think that this matters a lot. When Austin and I were speaking and relying on comparisons, we thought his quality numbers should be much higher in fact, we toned it down though maybe we shouldn't have.

To give clarity, I didn't seek people out who worked in AI safety. Here's what I did to the best of my recollection.

Over the course of 3 days, I asked anyone I saw in Mox who seemed friendly enough, as well as Taco Tuesday, and sent a few DMs to acquaintances. The DMs I sent were to people who work in AI safety, but there were only 4. So ~46 came from people hanging out around Mox and Taco Tuesday. 

I will grant that this lends to an SF/AI safety bias. Now, Rob Miles' audience comes heavily from Computerphile and such whose audience is largely young people interested in STEM who like to grapple with interesting academic-y problems in their spare time (outside of school). In other words, this is an audience that we care a lot about reaching. It's hard to overstate the possible variance in audience "quality". For example, Jane Street pays millions to advertisers to get itself seen in front of potential traders on channels like Stand-Up Maths or the Dwarkesh podcast. These channels don't actually get that many views compared to others but they have a very high "audience quality", clearly, based on how much trading firms are willing to pay to advertise there. We actually thought a decent, though imperfect, metric for audience quality would just be a person's income compared to the world average of ~12k. This meant the average american would have an audience quality of 7. Austin and I thought this might be a bit too controversial and doesn't capture exaxctly what we mean (we care about attracking a poor MIT CS student more than a mid-level real estate developer in Miami) but it's a decent approximation.

Audience quality is roughly something like "the people we care most about reaching," and thus "people who can go into work on technical AI safety" seems very important.

Rob wasn't the only one mentioned, the next most popular were Cognitive Revolution and AI in context (people often said "Aric") since I asked them to just name anyone they listen to/would consider an AI safety youtuber, etc.

On 4, I greatly encourage people to input their own weights, I specifically put that in the doc and part of the reason for doing this project was to get people to talk about cost effectiveness in AI safety.

On my bias:
Like all human beings, I'm flawed and have biases, but I did my best to just objectively look at data in what I thought the best way possible. I appreciate that you talked to others regarding my intentions.

I'll happily link to my comments on Manifund 1 2 3 you may be referring to for people to see the full comments and perhaps emphasize some points I wrote


@ I want to quickly note that it's a bit unfair for me to specifically only call you out on this or rather, that this is a thing I find with many AI safety projects. It just came up high on Manifund when I logged on for other reasons and saw donations from people I respect.

 

FWIW, I don't want to single you out, I have this kind of critique of many, many people doing AI safety work but this just seems like a striking example of it.

 

I didn't mean my comments to say "you should return this money". Lots of grants/spending in EA ecosystems I consider to be wasteful, ineffective etc. And again, apologies for singling you out on a gripe I have with EA funding.

Many people can tell you that I have a problem with the free-spending, lavish and often wasteful spending in the longtermist side of EA. I think I made it pretty clear that I was using this RFP as an example because other regrantors gave to it.

This project with Austin was planned to happen before you posted your RFP on Manifund (I can provide proof if you'd like).

I wasn't playing around with the weights to make you come out lower. I assure you, my bias is usually against projects I perceive to be "free-spending". 

I think it's good/natural to try to create separation between evaluators/projects though.

Thanks Chana.

Yes, there is lots to consider and I don't want to suggest that my analysis is comprehensive or should be used as the basis for all future funding decisions for AI safety communications.

Very excited for the next AI in Context video.

I expect there to be lots of experimentation that naturally occurs with people doing what they feel is best and getting out the messages they find important. I am also slightly worried about goodharting and such, for obvious reasons. I think the analysis should be taken with a grain of salt. It's a first pass at this.

Agree on a lot of the points on video production.

I answered Michael directly on the parent. Hopefully, that gives some colour.

Hi Michael, sorry for coming back a bit late. I was flying out of SF

  1. For costs, I'm going to stand strongly by my number here. In fact, I think it should be $26k. I treated everyone the same and counted the value of their time, at their suggested rate, for the amount of time they were doing their work. This affected everyone, and I think it is a much more accurate way to measure things. This affected AI Species and Doom Debates quite severely as well, more so than you as well as others. I briefly touched on this in the post, but I'll expand here. The goal of this exercise isn't to measure money spent vs output, but rather cost effectiveness per resource put in. If time is put in unpaid, this should be accounted for since it isn't going to be unpaid forever. Otherwise, there will be "gamed" cost effectiveness, where you can increase your numbers for a time by not paying yourself. Even if you never planned to take funding, you could spend your time doing other things, and thus there is still a tradeoff. It's natural/normal for projects at the start to be unpaid and apply for funding later, and for a few months of work to go unpaid. For example, I did this work unpaid, but if I am going to continue to do CEA for longtermist stuff, I will expect to be paid eventually.

    In your case, your first TikTok was made on June 15, and given that you post ~1/day, I assume that you basically made the short on the same day. Given I made your calculations on Sept 9/10, that's 13 weeks. In your Manifund post, you are asking for $2k/week, and thus I take that to be your actual cost of doing work. I'm not simply measuring your "work done on the grant" and just accepting the time you did for free beforehand.

    2. I'm happy to take data corrections. Undoubtedly, I have made some mistakes since not everyone responded to me, data is a bit messy, etc.

    A) For the FLI podcast, I ran numbers for the whole channel, not just the podcast. That means their viewer minutes are calculated over the whole channel. They haven't gotten back to me yet so I hope to update their numbers when they do. I agree that their metrics have a wide error bar.

    B) I was in the process of coming up with better estimates of viewer minutes based on video/podcast length but I stopped because people were responding to me, and I thought it better to just use accurate numbers. I stand by this decision, though I acknowledge the tradeoff.

    C) If a video has "inflated" views due to paid advertising, that's fine. It shows up in the cost part of cost-effectiveness. For example, Cognitive Revolution, who does boosts their videos/advertising, that's part of their costs. I don't think its a problem that some viewers are paid for, maybe they see a video they otherwise wouldn't have. That's fine. I also acknowledge that others may feel differently about how this translates to impact. That said, no, this won't reduce QAVM/$ to simply the cost of ads. Ads just don't work very well without organic views.

    3. For Rational Animations showing up low on the list, the primary reason for this is that they spend a boatload of money; nobody else comes close. I'm not saying that's bad. Its just a fact. They spend more than everyone else combined. Since, I am dividing by dollars, they get a lower VM/$ and thus QAVM/$.

    If you wish, you can simply look at VM/$. They score low here too (8th, same as adjusted).

    As for giving Robert Miles a high ranking, this came about because Austin really thought Dwarkesh was an AI safety YouTuber, and so I asked ~50 people different variants of the question "who is your favourite AI safety creator", "Which AI safety YouTubers did/do you watch", etc. It's hard to overstate this; Robert Miles was the first person EVERYONE mentioned. I found this surprising since, well, his view counts don't bear that out. Furthermore, 3 people told me that they work in AI safety because of his videos. I think there is a good case that his adjustment factors should be far HIGHER, not lower.

    4. Regarding your weights. I encourage people (and did so in the post) to give their own weights to channels. For this exercise, I watched a lot of AI safety content to get a sense of what is out there. My quality weights were based on this (and discussions with Austin and others). I encourage you to consider each weight separately. Austin added the "total" quality factor at the end, I kinda didn't want it since I thought it could lead to this. For audience quality, I looked at things like TikTok viewership vs. YouTube/Apple Podcasts. For message fidelity, respectfully, you're just posting clips of other podcasts and such and this just doesn't do a great job of getting a message across. For fidelity of message, everyone but Rob Miles got <0.5 since I am comparing to Rob. With a different reference video, I would get different results. For Qm, your number is very similar to others but even still, I found the message to not be the best.

    Again, most of the quality factor is being done by audience quality and yes, shorts just have a far lower audience quality.

    On the data errors, as expressed above, I don't think I made data errors. I get the sense, while reading this, that you feel I was "out to get you" or something and was being willfully biased. I want to assure you that this wasn't the case. Lots of different creators have issues with how I decided to do this analysis, and in general, they wanted the analysis to be done in a way that would give them better numbers. I think that's human nature, partially, and also that they likely made their content with their assumptions in mind. In the end, I settled on this process as what I (and Austin) found to be the most reasonable, taken everything we learned into account. I am not saying my analysis is the be-all end-all and should dictate where Open Phil money goes tomorrow until further analysis is done.

    I hope that explains/answers all your points. I am happy to engage further.

     

Yes, lots to consider. I talked to a lot of people about how to measure impact, and yes, it's hard. This is, AFAIK, the first public attempt at cost-effectiveness for this stuff.

I disagree on things like log(minutes). Short-form content is consumed with incredibly low engagement and gets scrolled through extremely passively, for hours at a time, just like long-form content.

In terms of preaching to the converted, I think it takes a lot of engagement time to get people to take action. It seems to often take people 1-3 years of engagement with EA content to make significant career shifts, etc.

I'm measuring cost effectiveness thus far. Some people may overperform expectations, and some people may underperform.

As for measuring channel growth, I expect lots of people to make cases for why their channel will grow more compared to others and this would introduce a ton of bias. The fairest thing to do is to measure past impact. More importantly, when we compare to other places we use CEAs, we measure the impact that has happened, we don't just speculate (even with good assumptions) the impact that will occur in the future. Small grants/attempts are made and the ones that work, we scale up.

Michael, when possible, I used the raw data from the creators for viewer minutes. The script was only used for people who didn't send data. I considered doing a lot more data analysis for the percentage of video watched vs. length of video, but as people started to respond to me, this felt unnecessary. I'm going to be editing documents and this post as new data comes in. I think it's probably a good norm to set that evaluators get creators to send them data, as it happens in the GHW space.

On the thought that impact is strongly sublinear per minute of video, I'd ask you to consider, when have you ever taken action due to a 0.1-1 min video? Compare this to a 10 min video and a 100 min podcast and now compare this to a book that takes ~1000 min to read. 

Viewer minutes is a proxy for engagement. It's imperfect, and I expect further CEAs to go deeper, but I think viewer minutes scale slightly sublinearly, not strongly.

Load more