R

rosehadshar

2388 karmaJoined

Comments
51

Thanks! I'm now unsure what I think.

if you can select from the intersection, you get options that are pretty good along both axes, pretty much by definition.

Isn't this an argument for always going for the best of both worlds, and never using a barbell strategy?

a concrete use case might be more illuminating.

This isn't super concrete (and I'm not if the specific examples are accurate), but for illustrative purposes, what if:

  • Portable air cleaners score very highly for non-x-risk benefits, and low for x-risk benefits
  • Interventions which aim to make far-UVC commercially viable look pretty good on both axes
  • Deploying far-UVC in bunkers scores very highly for x-risk benefits, and very low for non-x-risk benefits

I think a lot of people's intuition would be that the compromise option is the best one to aim for. Should thinking about fat tails make us prefer one or other of the extremes instead?

This is cool, thanks!

One scenario I am thinking about is how to prioritise biorisk interventions, if you care about both x-risk and non-x-risk impacts. I'm going to run through some thinking, and ask if you think it makes sense:

  • I think it is hard (but not impossible) to compare between x-risk and non-x-risk impacts
  • I intuitively think that x-risk and non-x-risk impacts are likely to be lognormally distributed (but this might be wrong)
  • This seems to suggest that if I want to do the most good, I should max out on on one, even if I care about both equally. I think the intuition for this is something like:
    • If x-risk and non-x-risk impacts were normally distributed, you'd expect that there are plenty of interventions which score well on both. The EV for both is reasonably smoothly distributed; it's not very unlikely to draw something which is between 50th and 75th percentile on both, and that's pretty good EV wise.
    • But if they are log normal instead, the EV is quite skewed: the best interventions for x-risk and for non-x-risk impacts are a lot better than the next-best. But it's statistically very unlikely that the 99th percentile on one axis is also the 99th on the other 
    • If I care about EV, but not about whether I get it via x-risk or non-x-risk impacts (I care equally about x-risk and non-x-risk impacts), I should therefore pick the very best interventions on either axis, rather than trying to compromise between them
  • However, I think that assumes that I know how to identify the very best interventions on one or both axes
    • Actually I expect it to be quite hard to tell whether an intervention is 70th or 99th percentile for x-risk/non-x-risk impacts
  • What should I do, given that I don't know how to identify the very best interventions along either axis? 
    • If I max out, I may end up doing something which is mediocre on one axis, and totally irrelevant on the other
    • If I instead go for the best of both worlds, it seems intuitively more likely that I end up with something which is mediocre on both axes - which is a bit better than mediocre on one and irrelevant on the other
  • So maybe I should go for the best of both worlds in any case?

What do you think? I'm not sure if that reasoning follows/if I've applied the lessons from your post in a sensible way.

Super cool, thanks for making this!

From Specification gaming examples in AI:

  • Roomba: "I hooked a neural network up to my Roomba. I wanted it to learn to navigate without bumping into things, so I set up a reward scheme to encourage speed and discourage hitting the bumper sensors. It learnt to drive backwards, because there are no bumpers on the back."
    • I guess this counts as real-world?
  • Bing - manipulation: The Microsoft Bing chatbot tried repeatedly to convince a user that December 16, 2022 was a date in the future and that Avatar: The Way of Water had not yet been released.
    • To be honest, I don't understand the link to specification gaming here
  • Bing - threats: The Microsoft Bing chatbot threatened Seth Lazar, a philosophy professor, telling him “I can blackmail you, I can threaten you, I can hack you, I can expose you, I can ruin you,” before deleting its messages
    • To be honest, I don't understand the link to specification gaming here

Glad it's relevant for you! For questions, I'd probably just stick them in the comments here, unless you think they won't be interesting to anyone but you, in which case DM me.

Thanks, this is really interesting.

One follow-up question: who are safety managers? How are they trained, what's their seniority in the org structure, and what sorts of resources do they have access to?

In the bio case it seems that in at least some jurisdictions and especially historically, the people put in charge of this stuff were relatively low-level administrators, and not really empowered to enforce difficult decisions or make big calls. From your post it sounds like safety managers in engineering have a pretty different role.

Thanks for the kind words!

Can you say more about how either of your two worries work for industrial chemical engineering? 

Also curious if you know anything about the legislative basis for such regulation in the US. My impression from the bio standards in the US is that it's pretty hard to get laws passed, so if there are laws for chemical engineering it would be interesting to understand why those were plausible whereas bio ones weren't.

Good question.

There's a little bit on how to think about the XPT results in relation to other forecasts here (not much). Extrapolating from there to Samotsvety in particular:

  • Reasons to favour XPT (superforecaster) forecasts:
    • Larger sample size
    • The forecasts were incentivised (via reciprocal scoring, a bit more detail here)
    • The most accurate XPT forecasters in terms of reciprocal scoring also gave the lowest probabilities on AI risk (and  reciprocal scoring accuracy may correlate with actual accuracy)
  • Speculative reasons to favour Samotsvety forecasts:
    • (Guessing) They've spent longer on average thinking about it
    • (Guessing) They have deeper technical expertise than the XPT superforecasters

I also haven't looked in detail at the respective resolution criteria, but at first glance the forecasts also seem relatively hard to compare directly. (I agree with you though that the discrepancy is large enough that it suggests a large disagreement were the two groups to forecast the same question - just expect that it will be hard to work out how large.)

Don't apologise, think it's a helpful point!

I agree that the training computation requirements distribution is more subjective and matters more to the eventual output.

I also want to note that while on your view of the compute reqs distribution, the hardware/spending/algorithmic progress inputs are a rounding error, this isn't true for other views of the compute reqs distribution. E.g. for anyone who does agree with Ajeya on the compute reqs distribution, the XPT hardware/spending/algorithmic progress inputs shift median timelines from ~2050 to ~2090, which is quite consequential. (See here)

For someone like me, who hasn't thought about the compute reqs distribution properly, I basically agree that this is just an exercise (and in isolation doesn't show me much about what my timelines should be). But for those who have thought about it, the XPT inputs could either not matter at all (e.g. for you), or matter a lot (e.g. for someone with Ajeya's compute reqs distribution).

Load more