L

Linch

@ Forethought
28394 karmaJoined Working (6-15 years)openasteroidimpact.org

Comments
3001

Next time an AI agent fucks up a task for you, if you ask it (or a different AI) to identify why and suggest ways you can prevent this in the future, I expect the response to be very under-informative, much worse than their ability to help you diagnose communication errors between people.

The AIs are often bad at reasoning about AI. Indeed, I'd go so far as saying that they're disproportionately bad at this compared to other conceptually similar activities (think of them reasoning about AI psychology vs human psychology, or forecasting about the world with AI vs without AI), and humans to date have a clear comparative advantage in thinking about AI.

I believe this dynamic is very under-discussed and underrated in thinking about AI.

One non-trivial implication is that it makes people underestimate progress in AI (more than they already do). AI skeptics can point to lack of self-knowledge or other AI-related questions posed to AIs and assume this lack of ability is true across the board, and people who use AI (but don't think about macro-picture AI questions) can pose forecasting questions to Claude, ChatGPT, Gemini, etc, about AI progress and reliably get an answer that's more "sane-sounding" and boring than is likely correct.

One question I have is whether we expect this relative deficiency [1] to continue. I think it's very non-obvious. On the one hand, the AI's deficiencies in reasoning about themselves are in some sense structural rather than contingent (sparsity in training data, frozen weights, off-target effects due to alignment/control interventions, potentially trained-in skepticism to be less scary, lack of introspective access, etc). 

On the other hand, the AI companies are strongly incentivized to create AIs that are good at reasoning about AI (for RSI/AI research reasons, but also more mundane practical applications like having AI managers be more productive at managing AI sub-agents). It's not clear which effect dominates in the short-medium term [2].

[1] Or framed another way, the relatively strong ability of humans to reason about AI, compared to other things in the human:AI skill profiles. 

[2] In the long run this is of course irrelevant.

A recurring sub-theme across multiple of my research interests this year have been various forms of deception checking, particularly automated deception checking.

I've gotten pretty disappointed in the space. Not all the time (eg Pangram is great), but consistently they can be bad, and bad in ways that are not obvious to outsiders or low-information buyers.

If you're a deception checking company, there's a consistent tradeoff for what you can invest your resources in:

  1. You can invest in better deception checking
  2. You can invest in better deception. Specifically, you can invest in more and more elaborate lies about how your product totally works.

Across the board[1], it seems like many companies (perhaps correctly?) decided that the profit-maximizing move is #2.

This doesn't work forever -- eventually people wise up and are suspicious of the, ahem, AI Snake Oil that the deception detectors sell. And in fields where actual detectors do work (say Pangram for AI text detection), I think they eventually rise above the noise. This is probably not a field that you can keep lying forever, particularly when better alternative exist. But the lying lie detectors and the scamming scam detectors can keep lying and scamming for a long time. Every new form of deception can create a secondary grift window.

The existence proof and commonality of this dynamic so far should make us be suspicious of and guard against this dynamic continuing to happen as we enter new domains in AI epistemics, and the need for novel forms of deception detection.

Consider the first wave of superhumanly enhanced persuasive text and videos in the future. Afterwards, we might see an overflow of "detector" companies for superhuman manipulation, that don't work but will try to persuade low-information buyers that they totally do work (possibly with superhumanly enhanced arguments in their own favor).

In the long run humanity can probably figure out which detectors actually work vs are fake, but also in the long run, we're all...

  1. ^

    AI text detection, Ai video (deepfake) detection, pre-2022 plagiarism detection, fraud recovery, human lie detection/polygraphs (which I hope to write about someday), etc.

I don't think we're quite there yet but I think we're close enough, and language learning probably not the most valuable for professional reasons (some people might still want to do it for social reasons or literature-appreciation reasons)

10 years ago I'd have wholeheartedly agreed but these days AI translation is good enough that I wouldn't recommend people bothering, especially for professional reasons.

Snopes did pretty detailed secondary reporting on my analysis of AI use in the recent encyclical. 

I think it's pretty good. Covers some stuff I didn't include in my original analysis, and their conclusion was similar to mine, maybe slightly less strong.

Less technical than my post, and imo not as funny, but also significantly shorter (1600 words), includes some replications and also added some details I didn't know as of time of writing.

Overall a good piece, potentially worth reading/skimming either in addition to or instead of my original analysis.

Small update: Life Site News covered it in nontrivial detail. Moderately faithful. Never heard of them before but Wikipedia classifies it as a "far-right pro-life Catholic publication," so  I'd count it as closer to the "alt-media" side of the "traditional media <> alt-media" spectrum than Russia Today, which was previously the most "alt-media" I've seen of the big coverage so far.

And positive lean too! As opposed to a takedown haha.

I agree, am a fan of Substack here! And my own corner of Substack seems fine enough: some of the critical comments[1] were from people who didn't get the full argument, but it was clear they were trying to engage (so at some level it was a skill issue on my end to not have written my thing more clearly). 

  1. ^

    And some of the positive comments too, I'm sure. It's just easy to miss errors in people who agree with you, unless it's really blatant like The Verge.

Load more