OCB

Owen Cotton-Barratt

10254 karmaJoined

Sequences
3

Reflection as a strategic goal
On Wholesomeness
Everyday Longermism

Comments
927

Topic contributions
3

These are in the same category because:

  • I'm talking about game-changing improvements to our capabilities (mostly via more cognitive labour; not requiring superintelligence)
  • These are the capacities that we need to help everyone to recognize the situation we're in and come together to do something about it (and they are partial substitutes: the better everyone's epistemics are, the less need for a big lift on coordination which has to cover people seeing the world very differently) 

I'm not actually making a claim about alignment difficulty -- beyond that I do think systems in the vein of those today and the near-successors of those look pretty safe. 

I think that getting people to pause AI research would be a bigger lift than any nonproliferation treaties we've had in the past (not that such treaties have always been effective!). This isn't just a military tech, it's a massively valuable economic tech. Given the incentives, and the importance of having treaties actually followed, I do think this would be a more difficult challenge than any past nonproliferation work. I don't think that means it's impossible, but I do think it's way more likely if something shifts -- hence my 1-3.

(Or if you were asking why I say "out of reach now" in the quoted sentence it's because I'm literally talking about "much better coordination" as a capability; not what could or couldn't be achieved with a certain level of coordination.)

I agree there are some possible attitudes that society could have towards AI development which could put us in a much safer position.

I think that the degree of consensus you'd need for the position that you're outlining here is practically infeasible, absent some big shift in the basic dynamics. I think that the possible shifts which might get you there are roughly:

  1. Scientific ~consensus -- people look to scientists for thought leadership on this stuff. Plausibly you could have a scientist-driven moratorium (this still feels like a stretch, but less than just switching the way society sees AI without having the scientists leading that)
  2. Freak-out about everyday implications of AI -- sufficiently advanced AI would not just pose unprecedented risks, but also represent a fundamental change in the human condition. This could drive a tide of strong sentiment, that doesn't rely on abstract arguments about danger.
  3. Much better epistemics and/or coordination -- out of reach now, put potentially obtainable with stronger tech.

I think there's potentially something to each of these. But I think the GDM paper is (in expectation) actively helpful for 1 and probably 3, and doesn't move the needle much either way on 2.

(My own view is that 3 is the most likely route to succeed. There's some discussion of the pragmatics of this route in AI Tools for Existential Security or AI for AI Safety (both of which also discuss automation of safety research, which is another potential success route), and relevant background views on the big-picture strategic situation in the Choice Transition. But I also feel positive about people exploring routes 1 and 2.)

I agree that there could be an effect that keeps people from speaking out about AI danger. But:

  • I think that such political incentives can occur whenever anyone is dealing with external power-structures, and in practice my impression is that these are a bigger deal for people who want jobs in AI policy compared to people engaged with frontier AI companies
  • This argument has most force in arguing that some EAs should keep professional and social distance from frontier AI companies, not that everyone should
  • Working at a frontier AI company (or having worked at one) can give people a better platform to talk about these issues!
    • Both because of giving people deeper expertise (so they are actually more informed on key questions), but also because of making that legible to the outside world
    • For instance, I feel better about GDM publishing their recent content on safety and security than not, and I think the paper would have had much less impact on public discourse if it had come from an unaffiliated group

I downvoted this (but have upvoted some of your comments).

I think this advice is at minimum overstated, and likely wrong and harmful (at least if taken literally). And it's presented with rhetorical force, so that it seems to mostly be trying to push people's views towards a position that is (IMO) harmful, rather than mostly providing them with information to help them come to their own conclusions.

TBC:

  • I think you probably have things to add here, and in particular feel quite curious what's led you to the view that people here inevitably get corrupted (which doesn't match my impression), or how you think that corruption manifests
  • I'm in favour of people having access to the "henchman of a supervillain" perspective (which could help them to notice things they might otherwise overlook); the thing I'm objecting to is rhetorically projecting it as the deep truth of the situation (which I think it isn't)

Which applications to focus on: I agree that epistemic tools and coordination-enabling tools will eventually have markets and so will get built at some point absent intervention. But this doesn't feel like a very strong argument -- the whole point is that we may care about accelerating applications even if it's not by a long period. And I don't think that these will obviously be among the most profitable applications people could make (especially if you can start specializing to the most high-leverage epistemic and coordination tools).

Also, we could make a similar argument that "automated safety" research won't get dropped, since it's so obviously in the interests of whoever's winning the race. 

UI and complementary technologies: I'm sort of confused about your claim about comparative advantage. Are you saying that there aren't people in this community whose comparative advantage might be designing UI? That would seem surprising.

More broadly, though:

  • I'm not sure how much "we can just outsource this" really cuts against the core of our argument (how to get something done is a question of tactics, and it could still be a strategic priority even if we just wanted to spend a lot of money on it)
  • I guess I feel, though, that you're saying this won't be a big bottleneck
    • I think that that may be true if you're considering automated alignment research in particular. But I'm not on board with that being the clear priority here

Compute allocation: mostly I think that "get people to care more" does count as the type of thing we were talking about. But I think that it's not just caring about safety, but also being aware ahead-of-time of the role that automated research may have to play in this, and when it may be appropriate to hit the gas and allocate a lot of compute to particular areas.

Training data: I agree that the stuff you're pointing to seems worthwhile. But I feel like you've latched onto a particular type of training data, and you're missing important categories, e.g.:

  • Epistemics stuff -- there are lots of super smart people earnestly trying to figure out very hard questions, and I think that if you could access their thinking, there would be a lot there which would compare favourably to a lot of the data that would be collected from people in this community. It wouldn't be so targeted in terms of the questions it addressed (e.g. "AI strategy", but learning good epistemics may be valuable and transfer over)
  • International negotiation, and high-stakes bargaining in general -- potentially very important, but not something I think our community has any particular advantage at
  • Synthetic data -- a bunch of things may be unlocked more by working out how to enable "self-play" (or the appropriate analogue), rather than just collecting more data the hard way

It seems like "what can we actually do to make the future better (if we have a future)?" is a question that keeps on coming up for people in the debate week.

I've thought about some things related to this, and thought it might be worth pulling some of those threads together (with apologies for leaving it kind of abstract). Roughly speaking, I think that:

 

There are some other activities which might help make the future better without doing so much to increase the chance of having a future, e.g.:

  • Try to propagate "good" values (I first wrote "enlightenment" instead of "good", since I think the truth-seeking element is especially important for ending up somewhere good; but others may differ), to make it more likely that they're well-represented in whatever entities end up steering
  • Work to anticipate and reduce the risk of worst-case futures (e.g. by cutting off the types of process that might lead there)

However, these activities don't (to me) seem as high leverage for improving the future as the more mixed-purpose activities.

Ughh ... baking judgements about what's morally valuable into the question somehow doesn't seem ideal. Like I think it's an OK way to go for moral ~realists, but among anti-realists you might have people persistently disagreeing about what counts as extinction.

Also like: what if you have a world which is like the one you describe as an extinction scenario, but there's a small amount of moral value in some subcomponent of that AI system. Does that mean it no longer counts as an extinction scenario?

I'd kind of propose instead using the typology Will proposed here, and making the debate between (1) + (4) on the one hand vs (2) + (3) on the other.

Load more