rosehadshar

Some forms of diffusion might be actively good, for reducing concentration of power. So it's not clear that we want to straightforwardly prevent tech diffusion
Ways you could reduce tech diffusion within something like Intelsat:
- Limited membership helps
- You could do things like require companies it contracts with to comply with strong infosec, require members not to allow frontier development without strong infosec, require member governments to provide gov-level infosec to frontier developers in their countries
- Intelsat for satellites involved sharing all the technical information. For AGI, it could involve sharing only some forms of information (e.g. weights don't get shared with everyone, but encrypted chunks of the weights are distributed among founder members)
- h/t Will: having many countries part of the multilateral project removes their incentives to try to develop frontier AI themselves (and potentially open-source)

Intelsat as a Model for International AGI Governance

rosehadshar4mo2

Sorry for the slow response here! Agree that diffusion is an important issue. A few thoughts:

Some forms of diffusion might be actively good, for reducing concentration of power. So it's not clear that we want to straightforwardly prevent tech diffusion
Ways you could reduce tech diffusion within something like Intelsat:
- Limited membership helps
- You could do things like require companies it contracts with to comply with strong infosec, require members not to allow frontier development without strong infosec, require member governments to provide gov-level infosec to frontier developers in their countries
- Intelsat for satellites involved sharing all the technical information. For AGI, it could involve sharing only some forms of information (e.g. weights don't get shared with everyone, but encrypted chunks of the weights are distributed among founder members)
- h/t Will: having many countries part of the multilateral project removes their incentives to try to develop frontier AI themselves (and potentially open-source)

Should there be just one western AGI project?

rosehadshar7mo5

I agree that it's not necessarily true that centralising would speed up US development!

(I don't think we overlook this: we say "The US might slow down for other reasons. It’s not clear how the speedup from compute amalgamation nets out with other factors which might slow the US down:

Bureaucracy. A centralised project would probably be more bureaucratic.
Reduced innovation. Reducing the number of projects could reduce innovation.")

Interesting take that it's more likely to slow things down than speed things up. I tentatively agree, but I haven't thought deeply about just how much more compute a central project would have access to, and could imagine changing my mind if it were lots more.

How much is 1.8 million years of work?

rosehadshar9mo2

Thanks, I think these points are good.

Learning may be bottlenecked by serial thinking time past a certain point, after which adding more parallel copies won't help. This could make the conclusion much less extreme.

Do you have any examples in mind of domains where we might expect this? I've heard people say things like 'some maths problems require serial thinking time', but I still feel pretty vague about this and don't have much intuition about how strongly to expect it to bite.

Fat Tails Discourage Compromise

rosehadshar1y5

Thanks! I'm now unsure what I think.

if you can select from the intersection, you get options that are pretty good along both axes, pretty much by definition.

Isn't this an argument for always going for the best of both worlds, and never using a barbell strategy?

a concrete use case might be more illuminating.

This isn't super concrete (and I'm not if the specific examples are accurate), but for illustrative purposes, what if:

Portable air cleaners score very highly for non-x-risk benefits, and low for x-risk benefits
Interventions which aim to make far-UVC commercially viable look pretty good on both axes
Deploying far-UVC in bunkers scores very highly for x-risk benefits, and very low for non-x-risk benefits

I think a lot of people's intuition would be that the compromise option is the best one to aim for. Should thinking about fat tails make us prefer one or other of the extremes instead?

Fat Tails Discourage Compromise

rosehadshar1y3

This is cool, thanks!

One scenario I am thinking about is how to prioritise biorisk interventions, if you care about both x-risk and non-x-risk impacts. I'm going to run through some thinking, and ask if you think it makes sense:

I think it is hard (but not impossible) to compare between x-risk and non-x-risk impacts
I intuitively think that x-risk and non-x-risk impacts are likely to be lognormally distributed (but this might be wrong)
This seems to suggest that if I want to do the most good, I should max out on on one, even if I care about both equally. I think the intuition for this is something like:
- If x-risk and non-x-risk impacts were normally distributed, you'd expect that there are plenty of interventions which score well on both. The EV for both is reasonably smoothly distributed; it's not very unlikely to draw something which is between 50th and 75th percentile on both, and that's pretty good EV wise.
- But if they are log normal instead, the EV is quite skewed: the best interventions for x-risk and for non-x-risk impacts are a lot better than the next-best. But it's statistically very unlikely that the 99th percentile on one axis is also the 99th on the other
- If I care about EV, but not about whether I get it via x-risk or non-x-risk impacts (I care equally about x-risk and non-x-risk impacts), I should therefore pick the very best interventions on either axis, rather than trying to compromise between them
However, I think that assumes that I know how to identify the very best interventions on one or both axes
- Actually I expect it to be quite hard to tell whether an intervention is 70th or 99th percentile for x-risk/non-x-risk impacts
What should I do, given that I don't know how to identify the very best interventions along either axis?
- If I max out, I may end up doing something which is mediocre on one axis, and totally irrelevant on the other
- If I instead go for the best of both worlds, it seems intuitively more likely that I end up with something which is mediocre on both axes - which is a bit better than mediocre on one and irrelevant on the other
So maybe I should go for the best of both worlds in any case?

What do you think? I'm not sure if that reasoning follows/if I've applied the lessons from your post in a sensible way.

Strongest real-world examples supporting AI risk claims?

rosehadshar2y2

Thanks, really helpful!

What happens on the average day?

rosehadshar2y2

Super cool, thanks for making this!

Strongest real-world examples supporting AI risk claims?

Answer by rosehadsharSep 05, 20237

From Specification gaming examples in AI:

Roomba: "I hooked a neural network up to my Roomba. I wanted it to learn to navigate without bumping into things, so I set up a reward scheme to encourage speed and discourage hitting the bumper sensors. It learnt to drive backwards, because there are no bumpers on the back."
- I guess this counts as real-world?
Bing - manipulation: The Microsoft Bing chatbot tried repeatedly to convince a user that December 16, 2022 was a date in the future and that Avatar: The Way of Water had not yet been released.
- To be honest, I don't understand the link to specification gaming here
Bing - threats: The Microsoft Bing chatbot threatened Seth Lazar, a philosophy professor, telling him “I can blackmail you, I can threaten you, I can hack you, I can expose you, I can ruin you,” before deleting its messages
- To be honest, I don't understand the link to specification gaming here

An overview of standards in biosafety and biosecurity

rosehadshar2y2

Glad it's relevant for you! For questions, I'd probably just stick them in the comments here, unless you think they won't be interesting to anyone but you, in which case DM me.

rosehadshar

Posts 35

Comments55

Posts
35

Comments
55