Bio

Non-EA interests include chess and TikTok (@benthamite). We are probably hiring: https://metr.org/hiring 

How others can help me

Feedback always appreciated; feel free to email/DM me or use this link if you prefer to be anonymous.

Sequences
3

AI Pause Debate Week
EA Hiring
EA Retention

Comments
1150

Topic contributions
6

When I worked at CEA, a standard question I would ask people was "what keeps you engaged with EA?" A surprisingly common answer was memes/shitposts.

This content has obvious downsides, but does solve many of the problems in OP (low time commitment, ~anyone can contribute, etc.).

+1, this seems more like a Task Y problem.

My impression is that if OP did want to write specialist blogposts etc. they would be able to do that (probably even better placed than a younger person, given their experience). (And conversely, 18 year olds who don't want to do specialist work or get involved in a social scene don't have that many points of attachment.)

I use DoneThat and like it, thanks for building it!

Thanks for writing this up - I think "you don't need to worry about reward hacking in powerful AI because solving reward hacking will be necessary for developing powerful AI" is an important topic. (Although your frame is more "we will fail to solve reward hacking and therefore fail to develop powerful AI," IIUC.)

I would find it helpful if you reacted more to the existing literature. E.g. I don't think anyone disagrees with your high-level point that it's hard to accurately supervise models, particularly as they get more capable, but also we have empirical evidence that weak models can successfully supervise stronger models and the stronger model won't just naively copy the mistakes of the weak supervisor to maximize its reward. Is your objection to this that you don't think that these techniques won't scale to more powerful AI, or that even if they do scale it won't be good enough, or something else?

I interpret OP's point about asymptotes to mean that he indeed bites this bullet and believes that the "compensation schedule" is massively higher even when the "instrument" only feels slightly worse?

In his examples ( and  lexically ordered) there is no "most intense suffering which can be outweighed" (or "least intense suffering which can't be outweighed"). E.g. in the hyperreals  no matter how small  or large 

S* is only a tiny bit worse than S

In his examples, between any S which can't be outweighed and S* which can, there are an uncountably infinite number of additional levels of suffering! So I don't think it's correct to say it's only a tiny bit worse.

Thanks for writing this Seth! I agree it's possible that we will not see transformative effects from AI for a long time, if ever, and I think it's reasonable for people to make plans which only pay off on the assumption that this is true. More specifically: projects which pay off under an assumption of short timelines often have other downsides, such as being more speculative, which means that the expected value of the long timeline plans can end up being higher even after you discount them for only working on long timelines.[1]

That being said, I think your post is underestimating how transformative truly transformative AI would be. As I said in a reply to Lewis Bollard who made a somewhat similar point: 

If I'm assuming that we are in a world where all of the human labor at McDonald's has been automated away, I think that is a pretty weird world. As you note, even the existence of something like McDonald's (much less a specific corporate entity which feels bound by the agreements of current-day McDonald's) is speculative.

But even if we grant its existence: a ~40% egg price increase is currently enough that companies feel cover to be justified in abandoning their cage-free pledges. Surely "the entire global order has been upended and the new corporate management is robots" is an even better excuse?

And even if we somehow hold McDonald's to their pledge, I find it hard to believe that a world where McDonald’s can be run without humans does not quickly lead to a world where something more profitable than battery cage farming can be found. And, as a result, the cage-free pledge is irrelevant because McDonald’s isn’t going to use cages anyway. (Of course, this new farming method may be even more cruel than battery cages, illustrating one of the downsides of trying to lock in a specific policy change before we understand what the future will be like.)

  1. ^

    Although I would encourage people to actually try to estimate this and pressure test the assumption that there isn't actually a way that their work can pay off on a shorter timeline. 

Thanks jesse. Is there a way that we could actually do this? Like choose some F(X) which is unknown to both of us but guaranteed to be between 0 and 1, and if it's less than 1/2 I pay you a dollar and if it's greater than 1/2 you pay me some large amount of money. 

I feel pretty confident I would take that bet if the selection of F was not obviously antagonistic towards me, but maybe I'm not understanding the types of scenarios you are imagining.

Yes, I think that's a good summary!

I personally am also often annoyed at EAs preferring the status/pay/comfort of frontier labs over projects that I think are more impactful. But it nonetheless seems to me like EAs are very disproportionately the ones doing the scrappy and unglamorous work. E.g. frontier lab Trust and Safety teams usually seem like <25% EAs, but the scrappiest/least glamorous AI safety projects I've worked on were >80% EAs.

I'm curious if your experience is different?

Load more