R

rosehadshar

2716 karmaJoined

Comments
56

Thanks Lizka!

Some misc personal reflections:

  • Working at Forethought has been my favourite job ever, by a decent margin
  • I spent a couple of years doing AI governance research independently/collaborating with others in an ad hoc way before joining Forethought. I think the quality of my work has been way higher since joining (because I've been working on more important questions than I was able to make headway on solo), and it's also been just a huge win in terms of productivity and attention (the costs of tracking my time, hustling for new projects, managing competing projects etc were pretty huge for me and made it really hard to do proper thinking)

One minor addition from me on why/not to work at Forethought: I think the people working at Forethought care pretty seriously about things going well, and are really trying to make a contribution.

I think this is both a really special strength, and something that has pitfalls:

  • It's a privilege to work with people who care in this way, and it cuts a lot of the crap that you'd get in organisations that were more oriented towards short term outcomes, status, etc
  • On the other hand, I sometimes worry about Forethought leaning a bit to heavily on EA-style 'do what's most impactful' vibes. I think this can kill curiosity, and also easily degrades into trying to try/people trying to meet their own psychological needs to make an impact instead of really staring in the face the reality we seem to be living in.
    • Other people at Forethought think that we're not leaning into this enough though: most work on AI futures stuff is low quality and won't matter at all, and it's very easy to fill all your time with interesting and pointless stuff. I agree on those failure modes, but disagree about where the right place on the spectrum is.

And then a few notes on the sorts of people I'd be really excited to have apply:

  • People who are thinking for themselves and building their own models of what's going on. I think this is rare and sorely needed. Some particular sub-groups I want to call out:
    • Really smart independent thinkers who want to work on AI macrostrategy stuff but haven't yet had a lot of surface area with the topic or done a lot of research. I think Forethought could be a great place for someone to soak up a lot of the existing thinking on these topics, en route to developing their own agenda.
    • Researchers with deep world models on the AI stuff, who think that Forethought is kind of wrong/a lot less good than it could be. The high-level aspiration for Forethought is something like, get the world to sensibly navigate the transition to superintelligence. We are currently 6 researchers, with fairly correlated views: of course we are totally failing to achieve this aspiration right now. But it's a good aspiration, and to the extent that someone has views on how to better address it, I'd love for them to apply.
      • If I got to choose one type of researcher to hire, it would be this one.
      • My hope would be that for many people in this category, Forethought would be able to 'get out of the way': give the person free reign, not entangle them in organisational stuff where they don't want that, and engage with them intellectually to the extent that it's mutually productive.
      • I agree with Lizka that people who think Forethought sucks probably won't want to apply/get hired/enjoy working at Forethought.
  • People who are working on this stuff already, but hamstrung by not having [a salary/colleagues/an institutional home/enough freedom for research at their current place of work/a manager to support them/etc]. I'd hope that Forethought could be a big win for people in this position, and allow them to unlock a bunch more of their potential.

Sorry for the slow response here! Agree that diffusion is an important issue. A few thoughts:

  • Some forms of diffusion might be actively good, for reducing concentration of power. So it's not clear that we want to straightforwardly prevent tech diffusion
  • Ways you could reduce tech diffusion within something like Intelsat:
    • Limited membership helps
    • You could do things like require companies it contracts with to comply with strong infosec, require members not to allow frontier development without strong infosec, require member governments to provide gov-level infosec to frontier developers in their countries
    • Intelsat for satellites involved sharing all the technical information. For AGI, it could involve sharing only some forms of information (e.g. weights don't get shared with everyone, but encrypted chunks of the weights are distributed among founder members)
    • h/t Will: having many countries part of the multilateral project removes their incentives to try to develop frontier AI themselves (and potentially open-source)

Sorry for the slow response here! Agree that diffusion is an important issue. A few thoughts:

  • Some forms of diffusion might be actively good, for reducing concentration of power. So it's not clear that we want to straightforwardly prevent tech diffusion
  • Ways you could reduce tech diffusion within something like Intelsat:
    • Limited membership helps
    • You could do things like require companies it contracts with to comply with strong infosec, require members not to allow frontier development without strong infosec, require member governments to provide gov-level infosec to frontier developers in their countries
    • Intelsat for satellites involved sharing all the technical information. For AGI, it could involve sharing only some forms of information (e.g. weights don't get shared with everyone, but encrypted chunks of the weights are distributed among founder members)
    • h/t Will: having many countries part of the multilateral project removes their incentives to try to develop frontier AI themselves (and potentially open-source)

       

I agree that it's not necessarily true that centralising would speed up US development!

(I don't think we overlook this: we say "The US might slow down for other reasons. It’s not clear how the speedup from compute amalgamation nets out with other factors which might slow the US down:

  • Bureaucracy. A centralised project would probably be more bureaucratic.
  • Reduced innovation. Reducing the number of projects could reduce innovation.")

Interesting take that it's more likely to slow things down than speed things up. I tentatively agree, but I haven't thought deeply about just how much more compute a central project would have access to, and could imagine changing my mind if it were lots more.

Thanks, I think these points are good.

  • Learning may be bottlenecked by serial thinking time past a certain point, after which adding more parallel copies won't help. This could make the conclusion much less extreme.

Do you have any examples in mind of domains where we might expect this? I've heard people say things like 'some maths problems require serial thinking time', but I still feel pretty vague about this and don't have much intuition about how strongly to expect it to bite.


 

Thanks! I'm now unsure what I think.

if you can select from the intersection, you get options that are pretty good along both axes, pretty much by definition.

Isn't this an argument for always going for the best of both worlds, and never using a barbell strategy?

a concrete use case might be more illuminating.

This isn't super concrete (and I'm not if the specific examples are accurate), but for illustrative purposes, what if:

  • Portable air cleaners score very highly for non-x-risk benefits, and low for x-risk benefits
  • Interventions which aim to make far-UVC commercially viable look pretty good on both axes
  • Deploying far-UVC in bunkers scores very highly for x-risk benefits, and very low for non-x-risk benefits

I think a lot of people's intuition would be that the compromise option is the best one to aim for. Should thinking about fat tails make us prefer one or other of the extremes instead?

This is cool, thanks!

One scenario I am thinking about is how to prioritise biorisk interventions, if you care about both x-risk and non-x-risk impacts. I'm going to run through some thinking, and ask if you think it makes sense:

  • I think it is hard (but not impossible) to compare between x-risk and non-x-risk impacts
  • I intuitively think that x-risk and non-x-risk impacts are likely to be lognormally distributed (but this might be wrong)
  • This seems to suggest that if I want to do the most good, I should max out on on one, even if I care about both equally. I think the intuition for this is something like:
    • If x-risk and non-x-risk impacts were normally distributed, you'd expect that there are plenty of interventions which score well on both. The EV for both is reasonably smoothly distributed; it's not very unlikely to draw something which is between 50th and 75th percentile on both, and that's pretty good EV wise.
    • But if they are log normal instead, the EV is quite skewed: the best interventions for x-risk and for non-x-risk impacts are a lot better than the next-best. But it's statistically very unlikely that the 99th percentile on one axis is also the 99th on the other 
    • If I care about EV, but not about whether I get it via x-risk or non-x-risk impacts (I care equally about x-risk and non-x-risk impacts), I should therefore pick the very best interventions on either axis, rather than trying to compromise between them
  • However, I think that assumes that I know how to identify the very best interventions on one or both axes
    • Actually I expect it to be quite hard to tell whether an intervention is 70th or 99th percentile for x-risk/non-x-risk impacts
  • What should I do, given that I don't know how to identify the very best interventions along either axis? 
    • If I max out, I may end up doing something which is mediocre on one axis, and totally irrelevant on the other
    • If I instead go for the best of both worlds, it seems intuitively more likely that I end up with something which is mediocre on both axes - which is a bit better than mediocre on one and irrelevant on the other
  • So maybe I should go for the best of both worlds in any case?

What do you think? I'm not sure if that reasoning follows/if I've applied the lessons from your post in a sensible way.

Super cool, thanks for making this!

From Specification gaming examples in AI:

  • Roomba: "I hooked a neural network up to my Roomba. I wanted it to learn to navigate without bumping into things, so I set up a reward scheme to encourage speed and discourage hitting the bumper sensors. It learnt to drive backwards, because there are no bumpers on the back."
    • I guess this counts as real-world?
  • Bing - manipulation: The Microsoft Bing chatbot tried repeatedly to convince a user that December 16, 2022 was a date in the future and that Avatar: The Way of Water had not yet been released.
    • To be honest, I don't understand the link to specification gaming here
  • Bing - threats: The Microsoft Bing chatbot threatened Seth Lazar, a philosophy professor, telling him “I can blackmail you, I can threaten you, I can hack you, I can expose you, I can ruin you,” before deleting its messages
    • To be honest, I don't understand the link to specification gaming here
Load more