Oliver Sourbut

Technical staff (Autonomous Systems) @ UK AI Safety Institute (AISI)
454 karmaJoined Working (6-15 years)Pursuing a doctoral degree (e.g. PhD)London, UK
www.oliversourbut.net

Bio

Participation
4

  • Autonomous Systems @ UK AI Safety Institute (AISI)
  • DPhil AI Safety @ Oxford (Hertford college, CS dept, AIMS CDT)
  • Former senior data scientist and software engineer + SERI MATS

I'm particularly interested in sustainable collaboration and the long-term future of value. I'd love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.

I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read - let me know your suggestions! In no particular order, here are some I've enjoyed recently

  • Ord - The Precipice
  • Pearl - The Book of Why
  • Bostrom - Superintelligence
  • McCall Smith - The No. 1 Ladies' Detective Agency (and series)
  • Melville - Moby-Dick
  • Abelson & Sussman - Structure and Interpretation of Computer Programs
  • Stross - Accelerando
  • Graeme - The Rosie Project (and trilogy)

Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites

  • Hanabi (can't recommend enough; try it out!)
  • Pandemic (ironic at time of writing...)
  • Dungeons and Dragons (I DM a bit and it keeps me on my creative toes)
  • Overcooked (my partner and I enjoy the foody themes and frantic realtime coordination playing this)

People who've got to know me only recently are sometimes surprised to learn that I'm a pretty handy trumpeter and hornist.

Comments
71

Basically +1 here. I guess some relevant considerations are the extent to which a tool can act as antidote to its own (or related) misuse - and under what conditions of effort, attention, compute, etc. If that can be arranged, then 'simply' making sure that access is somewhat distributed is a help. On the other hand, it's conceivable that compute advantages or structural advantages could make misuse of a given tech harder to block, in which case we'd want to know that (without, perhaps, broadcasting it indiscriminately) and develop responses. Plausibly those dynamics might change nonlinearly with the introduction of epistemic/coordination tech of other kinds at different times.

In theory, it's often cheaper and easier to verify the properties of a proposal ('does it concentrate power?') than to generate one satisfying given properties, which gives an advantage to a defender if proposals and activity are mostly visible. But subtlety and obfuscation and misdirection can mean that knowing what properties to check for is itself a difficult task, tilting the other way.

Likewise, narrowly facilitating coordination might produce novel collusion with substantial negative externalities on outsiders. But then ex hypothesi those outsiders have an outsized incentive to block that collusion, if only they can foresee it and coordinate in turn.

It's confusing.

A nit

lifestyle supports the planet, rather than taking from it

appeals to me, I'm sure to some others, but (I sense) could come across with a particular political-tribal flavour, which you might want to try neutralising. (Or not! if that'd detract from the net appeal)

On point 1 (space colonization), I think it's hard and slow! So the same issue as with bio risks might apply: AGI doesn't get you this robustness quickly for free. See other comment on this post.

I like your point 2 about chancy vs merely uncertain. I guess a related point is that when the 'runs' of the risks are in some way correlated, having survived once is evidence that survivability is higher. (Up to an including the fully correlated 'merely uncertain' extreme?)

For clarity, you're using 'important' here in something like an importance x tractability x neglectedness factoring? So yes more important (but there might be reasons to think it's less tractable or neglected)?

I've been meaning to write something about 'revisiting the alignment strategy'. The section 5 here ('Won't AGI make post-AGI catastrophes essentially irrelevant?') makes the point very clearly:

On this view, a post-AGI world is nearly binary—utopia or extinction—leaving little room for Sisyphean scenarios.

But I think this is too optimistic about the speed and completeness of the transition to globally deployed, robustly aligned "guardian" systems.

without making much of a case for it. Interested in Will and reviewers' sense of the space and literature here.

Yep, definitely for me 'big civ setbacks are really bad' was already baked in from the POV of setting bad context for pre-AGI-transition(s) (as well as their direct badness). But while I'd already agreed with Will about post-AGI not being an 'end of history' (in the sense that much remains uncertain re safety), I hadn't thought through the implication that setbacks could force a rerun of the most perilous transition(s), which does add some extra concern.

A small aside: some put forth interplanetary civilisation as a partial defence against either of total destruction and 'setback'. But reaching the milestone of having a really robustly interplanetary civ might itself take quite a long time after AGI - especially if (like me) you think digital uploading is nontrivial.

(This abstractly echoes the suggestion in this piece that bio defence might take a long time, which I agree with.)

Some gestures which didn't make the cut as they're too woolly or not quite the right shape:

  • adversarial exponentials might force exponential expense per gain
    • e.g. combatting replicators
    • e.g. brute forcing passwords
  • many empirical 'learning curve' effects appear to consume exponential observations per increment
    • Wright's Law (which is the more general cousin of Moore's Law) requires exponentially many production iterations per incremental efficiency gain
    • Deep learning scaling laws appear to consume exponential inputs per incremental gain
    • AlphaCode and AlphaZero appear to make uniform gains per runtime compute doubling
    • OpenAI's o-series 'reasoning models' appear to improve accuracy on many benchmarks with logarithmic returns to more 'test time' compute
    • (in all of these examples, there's some choice of what scale to represent 'output' on, which affects whether the gains look uniform or not, so the thesis rests on whether the choices made are 'natural' in some way)

This is lovely, thank you!

My main concern would be that it takes the same very approximating stance as much other writing in the area, conflating all kinds of algorithmic progress into a single scalar 'quality of the algorithms'.

You do moderately well here, noting that the most direct interpretation of your model regards speed or runtime compute efficiency, yielding 'copies that can be run' as the immediate downstream consequence (and discussing in a footnote the relationship to 'intelligence'[1] and the distinction between 'inference' and training compute).

I worry that many readers don't track those (important!) distinctions and tend to conflate these concepts. For what it's worth, by distinguishing these concepts, I have come to the (tentative) conclusion that a speed/compute efficiency explosion is plausible (though not guaranteed), but an 'intelligence' explosion in software alone is less likely, except as a downstream effect of running faster (which might be nontrivial if pouring more effective compute into training and runtime yields meaningful gains).


  1. Of course, 'intelligence' is also very many-dimensional! I think the most important factor in discussions like these regarding takeoff is 'sample efficiency', since that's quite generalisable and feeds into most downstream applications of more generic 'intelligence' resources. This is relevant to R&D because sample efficiency affects how quickly you can accrue research taste, which controls the stable level of your exploration quality. Domain-knowledge and taste are obviously less generalisable, and harder to get in silico alone. ↩︎

Load more