Bio

Non-EA interests include chess and TikTok (@benthamite). We are probably hiring: https://metr.org/hiring 

How others can help me

Feedback always appreciated; feel free to email/DM me or use this link if you prefer to be anonymous.

Sequences
3

AI Pause Debate Week
EA Hiring
EA Retention

Comments
1147

Topic contributions
6

Thanks for writing this up - I think "you don't need to worry about reward hacking in powerful AI because solving reward hacking will be necessary for developing powerful AI" is an important topic. (Although your frame is more "we will fail to solve reward hacking and therefore fail to develop powerful AI," IIUC.)

I would find it helpful if you reacted more to the existing literature. E.g. I don't think anyone disagrees with your high-level point that it's hard to accurately supervise models, particularly as they get more capable, but also we have empirical evidence that weak models can successfully supervise stronger models and the stronger model won't just naively copy the mistakes of the weak supervisor to maximize its reward. Is your objection to this that you don't think that these techniques won't scale to more powerful AI, or that even if they do scale it won't be good enough, or something else?

I interpret OP's point about asymptotes to mean that he indeed bites this bullet and believes that the "compensation schedule" is massively higher even when the "instrument" only feels slightly worse?

In his examples ( and  lexically ordered) there is no "most intense suffering which can be outweighed" (or "least intense suffering which can't be outweighed"). E.g. in the hyperreals  no matter how small  or large 

S* is only a tiny bit worse than S

In his examples, between any S which can't be outweighed and S* which can, there are an uncountably infinite number of additional levels of suffering! So I don't think it's correct to say it's only a tiny bit worse.

Thanks for writing this Seth! I agree it's possible that we will not see transformative effects from AI for a long time, if ever, and I think it's reasonable for people to make plans which only pay off on the assumption that this is true. More specifically: projects which pay off under an assumption of short timelines often have other downsides, such as being more speculative, which means that the expected value of the long timeline plans can end up being higher even after you discount them for only working on long timelines.[1]

That being said, I think your post is underestimating how transformative truly transformative AI would be. As I said in a reply to Lewis Bollard who made a somewhat similar point: 

If I'm assuming that we are in a world where all of the human labor at McDonald's has been automated away, I think that is a pretty weird world. As you note, even the existence of something like McDonald's (much less a specific corporate entity which feels bound by the agreements of current-day McDonald's) is speculative.

But even if we grant its existence: a ~40% egg price increase is currently enough that companies feel cover to be justified in abandoning their cage-free pledges. Surely "the entire global order has been upended and the new corporate management is robots" is an even better excuse?

And even if we somehow hold McDonald's to their pledge, I find it hard to believe that a world where McDonald’s can be run without humans does not quickly lead to a world where something more profitable than battery cage farming can be found. And, as a result, the cage-free pledge is irrelevant because McDonald’s isn’t going to use cages anyway. (Of course, this new farming method may be even more cruel than battery cages, illustrating one of the downsides of trying to lock in a specific policy change before we understand what the future will be like.)

  1. ^

    Although I would encourage people to actually try to estimate this and pressure test the assumption that there isn't actually a way that their work can pay off on a shorter timeline. 

Thanks jesse. Is there a way that we could actually do this? Like choose some F(X) which is unknown to both of us but guaranteed to be between 0 and 1, and if it's less than 1/2 I pay you a dollar and if it's greater than 1/2 you pay me some large amount of money. 

I feel pretty confident I would take that bet if the selection of F was not obviously antagonistic towards me, but maybe I'm not understanding the types of scenarios you are imagining.

Yes, I think that's a good summary!

I personally am also often annoyed at EAs preferring the status/pay/comfort of frontier labs over projects that I think are more impactful. But it nonetheless seems to me like EAs are very disproportionately the ones doing the scrappy and unglamorous work. E.g. frontier lab Trust and Safety teams usually seem like <25% EAs, but the scrappiest/least glamorous AI safety projects I've worked on were >80% EAs.

I'm curious if your experience is different?

Ok, so your claim is something like "while I haven't rigorously evaluated it, it seems likely that there are ways that the money currently being donated by billionaires could be more effectively (by EA-values) spent?" (But you make no further claims like "...And therefore, improving the way billionaires spend their money is likely to be an intervention that scores well according to traditional EA frameworks like ITN.)

Thanks for clarifying, Bob. 

I'm not sure I understand the first point - doesn't (1) straightforwardly imply a lower (relative) importance in an ITN framework?

And re (2): the third option you think we should consider is something like a citizen's assembly? If so, am I correct in understanding that no evidence that these outperform billionaire philanthropy was presented? (Perhaps you plan to do this in a future post, or something?)

Thanks for writing this, Bob. I feel a bit confused about what your argument is. You cite "defenders"[1] of billionaire philanthropy as giving two defenses:

  1. Billionaire philanthropy is very small compared to governmental budgets
  2. Billionaires may donate in ways that are more effective than government spending

The first defense you don't seem to address, except to cite some statistics supporting the defense. The second you explicitly state that you haven't argued against:

I’ve talked a lot about the drawbacks of billionaire philanthropy, but I haven’t spent any time defending government programs. It could be that billionaire donors are still better because state institutions are even worse.

I can't tell if you are intending to critique (1) and (2) and I am misunderstanding, or if your view is something like "it's true that billionaire philanthropy plausibly outperforms current government spending, but there is a third thing which outperforms both". 

  1. ^

    I expect that Scott would object to being labeled a "defender" of billionaire philanthropy - his post was titled "Against against billionaire philanthropy", not "Pro billionaire philanthropy," I suspect for exactly this reason. But I will stick with your terminology.

Load more