Hide table of contents

In Human Compatible, Stuart Russell makes an argument that I have heard him make repeatedly (I believe on the 80K podcast and the FLI conversation with Steven Pinker). He suggests a pretty bold and surprising claim:

[C]onsider how content-selection algorithms function on social media... Typically, such algorithms are designed to maximize click-through, that is, the probability that the user clicks on presented items. The solution is simply to present items that the user likes to click on, right? Wrong. The solution is to change the user's preferences so that they become more predictable. A more predictable user can be fed items that they are likely to click on, thereby generating more revenue. People with more extreme political views tend to be more predictable in which items they will click on... Like any rational entity, the algorithm learns how to modify the state of its environment—in this case, the user's mind—in order to maximize its own reward. The consequences include the resurgence of fascism, the dissolution of the social contract that underpins democracies around the world, and potentially the end of the European Union and NATO. Not bad for a few lines of code, even if it had a helping hand from some humans. Now imagine what a really intelligent algorithm would be able to do.

I don't doubt that in principle this can and must happen in a sufficiently sophisticated system. What I'm surprised by is the claim that it is happening now. In particular, I would think that modifying human behavior to make people more predictable is pretty hard to do, so that any gains in predictive accuracy for algorithms available today would be swamped by (a) noise and (b) the gains from presenting the content that someone is more likely to click on given their present preferences.

To be clear, I also don't doubt that there might be pieces of information algorithms can show people to make their behavior more predictable. Introducing someone to a new YouTube channel they have not encountered might make them more likely to click its follow-up videos, so that an algorithm has an incentive to introduce people to channels that lead predictably to their wanting to watch a number of other videos. But this is not the same as changing preferences. He seems to be claiming, or at least very heavily implying, that the algorithms change what people want, holding the environment (including information) constant.

Is there evidence for this (especially empirical evidence)? If so, where could I find it?




New Answer
New Comment

3 Answers sorted by

Facebook has at least experimented with using deep reinforcement learning to adjust its notifications according to https://arxiv.org/pdf/1811.00260.pdf . Depending on which exact features they used for the state space (i.e. if they are causally connected to preferences), the trained agent would at least theoretically have an incentive to change user's preferences. 

The fact that they use DQN rather than a bandit algorithm seems to suggest that what they are doing involves at least some short term planning, but the paper does not seem to analyze the experiments in much detail, so it is unclear whether they could have used a myopic bandit algorithm instead. Either way, seeing this made me update quite a bit towards being more concerned about the effect of recommender systems on preferences. 

 Is The YouTube Algorithm Radicalizing You? It’s Complicated.

Recently, there's been significant interest among the EA community in investigating short-term social and political risks of AI systems. I'd like to recommend this video (and Jordan Harrod's channel as a whole) as a starting point for understanding the empirical evidence on these issues.

From reading the summary in this post, it doesn't look like the YouTube video discussed bears on the question of whether the algorithm is radicalizing people 'intentionally,' which I take to be the interesting part of Russell's claim.

I'm curious what you think would count as a current ML model 'intentionally' doing something? It's not clear to me that any currently deployed ML models can be said to have goals.

To give a bit more context on what I'm confused about: the model that gets deployed is the one that does best at minimising the loss function during training. Isn't Russell's claim that a good strategy for minimising the loss function is to change users' preferences? Then, whether or not the model is 'intentionally' radicalising people is beside the point

(I find talk about the goals of AI systems pretty confusing, so I could easily be misunderstanding, or wrong about something)

Eli Rose
Yeah, I agree this is unclear. But, staying away from the word 'intention' entirely, I think we can & should still ask: what is the best explanation for why this model is the one that minimizes the loss function during training? Does that explanation involve this argument about changing user preferences, or not? One concrete experiment that could feed into this: if it were the case that feeding users extreme political content did not cause their views to become more predictable, would training select a model that didn't feed people as much extreme political content? I'd guess training would select the same model anyway, because extreme political content gets clicks in the short-term too. (But I might be wrong.)

There's a lot of anecdotal evidence that news organizations essentially change user's preferences. The fundamental story is quite similar. It's not clear how intentional this is, but there seem to be many cases of people becoming extremized after watching/reading the news (not that I think about it, this seems like a major factor in most of these situations). 

I vaguely recall Matt Taibbi complaining about this in the book Hate Inc. 


Here are a few related links:


If it turns out that the news channels change preferences, it seems like a small leap to suggest that recommender algorithms that get people onto news programs leads to changing their preferences. Of course, one should have evidence to the magnitude and so on.

Thanks. I'm aware of this sort of argument, though I think most of what's out there relies on anecdotes, and it's unclear exactly what the effect is (since there is likely some level of confounding here).

I guess there are still two things holding me up here. (1) It's not clear that the media is changing preferences or just offering [mis/dis]information. (2) I'm not sure it's a small leap. News channels' effects on preferences likely involve prolonged exposure, not a one-time sitting. For an algorithm to expose someone in a prolonged way, it has to either r... (read more)

Ozzie Gooen
(1) The difference between preferences and information seems like a thin line to me. When groups are divided about abortion, for example, which cluster would that fall into?  It feels fairly clear to me that the media facilitates political differences, as I'm not sure how else these could be relayed to the extent they are (direct friends/family is another option, but wouldn't explain quick and correlated changes in political parties).  (2) The specific issue of prolonged involvement doesn't seem hard to be believe. People spend lots of time on Youtube. I've definitely gotten lots of recommendations to the same clusters of videos. There are only so many clusters out there. All that said, my story above is fairly different from Stuart's. I think his is more of "these algorithms are a fundamentally new force with novel mechanisms of preference changes". My claim is that media sources naturally change the preferences of individuals, so of course if algorithms have control in directing people to media sources, this will be influential in preference modification. This is where "preference modification" basically means, "I didn't used to be an intense anarcho-capitalist, but then I watched a bunch of the videos, and now tie in strongly to the movement" However, the issue of "how much do news organizations actively optimize preference modification for the purposes of increasing engagement, either intentionally or non intentionally?" is more vague.
Sorted by Click to highlight new comments since:

Good question. I'm not sure why you'd privilege Russell's explanation over the explanation "people click on extreme political content, so the click-maximizing algorithm feeds them extreme political context."

Right. I mean, I privilege this simpler explanation you mention. He seems to have reason to think it's not the right explanation, but I can't figure out why.

I think the "so that they become more predictable [to the recommender algorithm]" is crucial in Russel's argument. IF human preferences were malleable in this way, and IF recommender algorithms are strong enough to detect that malleability, then the pressures towards the behaviour that Russel suggests is strong and we have a lot of reasons to expect it. I think the answer to both IFs is likely to be yes. 

I just don't think we've seen anything that favors the hypothesis "algorithm 'intentionally' radicalizes people in order to get more clicks from them in the long run" over the hypothesis "algorithm shows people what they will click on the most (which is often extreme political content, and this causes them to become more radical, in a self-reinforcing cycle.)"

BTW, I am interested in studying this question if anyone is interested in partnering up. I'm not entirely sure how to study it, as (given the post) I suspect the result may be a null, which is only interesting if we have access to one of the algorithms he is talking about and data on the scale such an algorithm would typically have.

My general approach would be an online experiment where I expose one group of people to a recommender system and don't expose another. Then place both groups in the same environment and observe whether the first group is now more predictable. (This does not account for the issue of information, though.)

I think that experiment wouldn't prove anything about the algorithm's "intentions," which seem to be the interesting part of the claim. One experiment that maybe would (I have no idea if this is practical) is giving the algorithm the chance to recommend two pieces of content: a) high likelihood of being clicked on, b) lower likelihood of being clicked on, but makes the people who do click on it more polarized. Not sure if a natural example of such a piece of content exists.


IIRC Tristan Harris has also made this claim. Maybe his 80k podcast or The Social Dilemma  has some clues. 

Edit: maybe he just said something like 'Youtube's algorithm is trained to send users down rabbit hole'

Curated and popular this week
Relevant opportunities