In Human Compatible, Stuart Russell makes an argument that I have heard him make repeatedly (I believe on the 80K podcast and the FLI conversation with Steven Pinker). He suggests a pretty bold and surprising claim:
[C]onsider how content-selection algorithms function on social media... Typically, such algorithms are designed to maximize click-through, that is, the probability that the user clicks on presented items. The solution is simply to present items that the user likes to click on, right? Wrong. The solution is to change the user's preferences so that they become more predictable. A more predictable user can be fed items that they are likely to click on, thereby generating more revenue. People with more extreme political views tend to be more predictable in which items they will click on... Like any rational entity, the algorithm learns how to modify the state of its environment—in this case, the user's mind—in order to maximize its own reward. The consequences include the resurgence of fascism, the dissolution of the social contract that underpins democracies around the world, and potentially the end of the European Union and NATO. Not bad for a few lines of code, even if it had a helping hand from some humans. Now imagine what a really intelligent algorithm would be able to do.
I don't doubt that in principle this can and must happen in a sufficiently sophisticated system. What I'm surprised by is the claim that it is happening now. In particular, I would think that modifying human behavior to make people more predictable is pretty hard to do, so that any gains in predictive accuracy for algorithms available today would be swamped by (a) noise and (b) the gains from presenting the content that someone is more likely to click on given their present preferences.
To be clear, I also don't doubt that there might be pieces of information algorithms can show people to make their behavior more predictable. Introducing someone to a new YouTube channel they have not encountered might make them more likely to click its follow-up videos, so that an algorithm has an incentive to introduce people to channels that lead predictably to their wanting to watch a number of other videos. But this is not the same as changing preferences. He seems to be claiming, or at least very heavily implying, that the algorithms change what people want, holding the environment (including information) constant.
Is there evidence for this (especially empirical evidence)? If so, where could I find it?
Thanks. I'm aware of this sort of argument, though I think most of what's out there relies on anecdotes, and it's unclear exactly what the effect is (since there is likely some level of confounding here).
I guess there are still two things holding me up here. (1) It's not clear that the media is changing preferences or just offering [mis/dis]information. (2) I'm not sure it's a small leap. News channels' effects on preferences likely involve prolonged exposure, not a one-time sitting. For an algorithm to expose someone in a prolonged way, it has to either repeatedly recommend videos or recommend one video that leads to their watching many, many videos. The latter strikes me as unlikely; again, behavior is malleable but not that malleable. In the former case, I would think the direct effect on the reward function of all of those individual videos recommended and clicked on has to be way larger than the effect on the person's behavior after seeing the videos. If my reasoning were wrong, I would find that quite scary, because it would be evidence of substantially greater vulnerability to current algorithms than I previously thought.