Non-EA interests include chess and TikTok (@benthamite). We are probably hiring: https://metr.org/hiring
Feedback always appreciated; feel free to email/DM me or use this link if you prefer to be anonymous.
I agree that the authors encourage this misreading of the data by eg saying "95% of organizations are getting zero return" and failing to note the caveats listed in my comment. If you believe that this statement is referencing a different data set than the one I was quoting which doesn't have those caveats, I'd be interested to hear it.
95% of the time, AI fails to generate significant revenue for businesses that adopt it
I think this is a misreading of the study, though the article you link seems to make the same mistake. Here's the relevant graph:
The finding is that 5% of all companies (not just those that have adopted AI) had an executive who reported "a marked and sustained productivity and/or P&L impact" of a task-specific GenAI.
I think a more accurate summary of the paper is something like "80% of LLM pilots are reported as successful by executives."[1]
Assuming that all successful implementations were preceded by a pilot; the paper doesn't seem to say
+1, this seems more like a Task Y problem.
My impression is that if OP did want to write specialist blogposts etc. they would be able to do that (probably even better placed than a younger person, given their experience). (And conversely, 18 year olds who don't want to do specialist work or get involved in a social scene don't have that many points of attachment.)
Thanks for writing this up - I think "you don't need to worry about reward hacking in powerful AI because solving reward hacking will be necessary for developing powerful AI" is an important topic. (Although your frame is more "we will fail to solve reward hacking and therefore fail to develop powerful AI," IIUC.)
I would find it helpful if you reacted more to the existing literature. E.g. I don't think anyone disagrees with your high-level point that it's hard to accurately supervise models, particularly as they get more capable, but also we have empirical evidence that weak models can successfully supervise stronger models and the stronger model won't just naively copy the mistakes of the weak supervisor to maximize its reward. Is your objection to this that you don't think that these techniques won't scale to more powerful AI, or that even if they do scale it won't be good enough, or something else?
In his examples ( and lexically ordered) there is no "most intense suffering which can be outweighed" (or "least intense suffering which can't be outweighed"). E.g. in the hyperreals no matter how small or large .
S* is only a tiny bit worse than S
In his examples, between any S which can't be outweighed and S* which can, there are an uncountably infinite number of additional levels of suffering! So I don't think it's correct to say it's only a tiny bit worse.
I'm saying that the authors summarized their findings without caveats, and that those caveats would dramatically change how most people interpret the results.
(Note that, despite the "MIT" name being attached, this isn't an academic paper, and doesn't seem to be trying to hold itself to those standards.)