I'm a researcher at Forethought; before that, I ran the non-engineering side of the EA Forum (this platform), ran the EA Newsletter, and worked on some other content-related tasks at CEA. [More about the Forum/CEA Online job.]
Selected posts
Background
I finished my undergraduate studies with a double major in mathematics and comparative literature in 2021. I was a research fellow at Rethink Priorities in the summer of 2021 and was then hired by the Events Team at CEA. I later switched to the Online Team. In the past, I've also done some (math) research and worked at Canada/USA Mathcamp.
ore Quick sketch of what I mean (and again I think others at Forethought may disagree with me):
I also want to caveat that:
(And thanks for the nice meta note!)
I've been struggling to articulate this well, but I've recently been feeling like, for instance, proposals on making deals with "early [potential] schemers" implicitly(?) rely on a bunch of assumptions about the anatomy of AI entities we'd get at relevant stages.
More generally I've been feeling pretty iffy about using game-theoretic reasoning about "AIs" (as in "they'll be incentivized to..." or similar) because I sort of expect it to fail in ways that are somewhat similar to what one gets if one tries to do this with states or large bureaucracies or something -- iirc the fourth paper here discussed this kind of thing, although in general there's a lot of content on this. Similar stuff on e.g. reasoning about the "goals" etc. of AI entities at different points in time without clarifying a bunch of background assumptions (related, iirc).
Ah, @Gregory Lewis🔸 says some of the above better. Quoting his comment:
- [...]
- So ~everything is ultimately an S-curve. Yet although 'this trend will start capping out somewhere' is a very safe bet, 'calling the inflection point' before you've passed it is known to be extremely hard. Sigmoid curves in their early days are essentially indistinguishable from exponential ones, and the extra parameter which ~guarantees they can better (over?)fit the points on the graph than a simple exponential give very unstable estimates of the putative ceiling the trend will 'cap out' at. (cf. 1, 2.)
- Many important things turn on (e.g.) 'scaling is hitting the wall ~now' vs. 'scaling will hit the wall roughly at the point of the first dyson sphere data center' As the universe is a small place on a log scale, this range is easily spanned by different analysis choices on how you project forward.
- Without strong priors on 'inflecting soon' vs. 'inflecting late', forecasts tend to be volatile: is this small blip above or below trend really a blip, or a sign we're entering a faster/slow regime?
- [...]
I tried to clarify things a bit in this reply to titotal: https://forum.effectivealtruism.org/posts/iJSYZJJrLMigJsBeK/lizka-s-shortform?commentId=uewYatQz4dxJPXPiv
In particular, I'm not trying to make a strong claim about exponentials specifically, or that things will line up perfectly, etc.
(Fwiw, though, it does seem possible that if we zoom out, recent/near-term population growth slow-downs might be functionally a ~blip if humanity or something like it leaves the Earth. Although at some point you'd still hit physical limits.)
Oh, apologies: I'm not actually trying to claim that things will be <<exactly.. exponential>>. We should expect some amount of ~variation in progress/growth (these are rough models, we shouldn't be too confident about how things will go, etc.), what's actually going on is (probably a lot) more complicated than a simple/neat progression of new s-curves, etc.
The thing I'm trying to say is more like:
(Apologies if what I'd written earlier was unclear about what I believe — I'm not sure if we still notably disagree given the clarification?)
A different way to think about this might be something like:
Something like this seems to help explain why views like "the curve we're observing will (basically) just continue" have seemed surprisingly successful, even when the people holding those "curve go up" views justified their conclusions via apparently incorrect reasoning about the specific drivers of progress. (And so IMO people should place non-trivial weight on stuff like "rough, somewhat naive-seeming extrapolation of the general trends we're observing[2]."[3])
[See also a classic post on the general topic, and some related discussion here, IIRC: https://www.alignmentforum.org/posts/aNAFrGbzXddQBMDqh/moore-s-law-ai-and-the-pace-of-progress ]
Caveat: I'd add "...on a big range/ the scale we care about"; at some point, ~any progress would start hitting ~physical limits. But if that point is after the curve reshapes ~everything we care about, then I'm basically ignoring that consideration for now.
Obviously there are caveats. E.g.:
- the metrics we use for such observations can lead us astray in some situations (in particular they might not ~linearly relate to "the true thing we care about")
- we often have limited data, we shouldn't be confident that we're predicting/measuring the right thing, things can in fact change over time and we should also not forget that, etc.
(I think there were nice notes on this here, although I've only skimmed and didn't re-read https://arxiv.org/pdf/2205.15011 )
Also, sometimes we do know what
Replying quickly, speaking only for myself[1]:
I.e. I'm not speaking for the Online/Mod teams here, and didn't run this comment by anyone.
(I vaguely remember making and linking a public version of this doc somewhere at some point, but couldn't quickly find that.)
In fact it looks like I can no longer add or remove the Community tag from posts. I'm still in the Slack; a few people sometimes flag questions about edge cases there.
I sometimes see people say stuff like:
Those forecasts were misguided. If they ended up with good answers, that's accidental; the trends they extrapolated from have hit limits... (Skeptics get Bayes points.)
But IMO it's not a fluke that the "that curve is going up, who knows why" POV has done well.
A sketch of what I think happens:
There’s a general dynamic here that goes something like:
And then in *some* sense the bottlenecks crowd turn out to be right (the specific drivers/paradigm peters out, there’s literally no more space for more transistors, companies run low on easily accessible/high quality training data, etc.)…
…but then a "surprise new thing" pops up and fills the gap, such that the “true” thing we cared about (whether or not it’s what we were originally measuring) *does* actually continue as people originally predicted, apparently naively
(and it turned out that the curve consists of a stack of s-curves..)
We can go too far with this kind of reasoning; some “true things we care about” (e.g. spread of a disease) * are* in fact s-curves, bounded, etc., and only locally look like ~exponentials. (So: no, we shouldn't expect the baby to weigh trillions of pounds by age 10...)
But I think the more granular, gears-oriented view — which considers how long specific drivers of the progress we're seeing could continue, etc. — often underrates the extent to which *other forces* can (and often do) jump in when earlier drivers lose momentum.
"The Bypass Principle: How AI flows around obstacles" from Eric Drexler is a very related (and IMO good) post. Quote (bold mine):
While shallow assessments focus on visible obstacles — the difficulties of matching human capabilities, of overcoming regulatory barriers, and of restructuring organizations — AI-enabled developments will often find paths that bypass rather than overcome apparent barriers. Existing obstacles are concrete and obvious in a way that alternatives are not. Skewed judgment follows.
(This stuff isn't new; many people have pointed out these kinds of dynamics. But I feel like I'm still seeing them a fair bit — and this came up recently — so I wanted to write this note.)
Yeah, this sort of thing is partly why I tend to feel better about BOTECs like (writing very quickly, tbc!):
What could we actually accomplish if we (e.g.) doubled (the total stock/ flow of) investment in ~technical AIS work (specifically the stuff focused on catastrophic risks, in this general worldview)? (you could broaden if you wanted to, obviously)
Well, let's see:
- That might look like:
- adding maybe ~400(??) FTEs similar (in ~aggregate) to the folks working here now, distributed roughly in proportion to current efforts / profiles — plus the funding/AIS-specific infrastructure (e.g. institutional homes) needed to accomodate them
- E.g. across intent alignment stuff, interpretability, evals, AI control, ~safeguarded AI, AI-for-AIS, etc., across non-profit/private/govt (but in fact aimed at loss of control stuff).
- How good would this be?
- Maybe (per year of doubling) we'd then get something like a similar-ish value from this as we don from a year of current space (or something like 2x less, if we want to eyeball diminishing returns)
- Then maybe we can look at what this space has accomplished in the past year and see how much we'd pay for that / how valuable that seems...
- (What other ~costs might we be missing here?)
You might also decide that you have much better intuitions for how much we'd accomplish (and how valuable that'd be) on a different scale (e.g. adding one project like Redwood/Goodfire/Safeguarded AI/..., i.e. more like 30 FTEs than 400 — although you'd probably want to account for considerations like "for each 'successful' project we'd likely need to invest in a bunch of attempts/ surrounding infrastructure..."), or intuitions about what amount of investment is required to get to some particular desired outcome...
Or if you took the more ITN-style approach, you could try to approach the BOTEC via something like (1) how much investment has there been so far in this broad ~POV / porftolio, (2 (option a)) how much value/progress has this portfolio made + something like "how much has been made in the second half?" (to get a sense of how much we're facing diminishing returns at the moment — fwiw without thinking too much about it I think "not super diminishing returns at the mo"), or (2 (option b)) what fraction of the overall "AI safety problem" is "this-sort-of-safety-work-affectable" (i.e. something like "if we scaled up this kind of work — and only this kind of work — to an insane degree, how much of the problem will be fixed?") + how big/important the problem is overall... Etc. (Again, for all of this my main question is often "what are the sources of signal or taste / heuristics / etc. that you're happier basing your estimates on?)
Thank you! I used Procreate for these (on an iPad).[1]
(I also love Excalidraw for quick diagrams, have used & liked Whimsical before, and now also semi-grudgingly appreciate Canva.)
Relatedly, I wrote a quick summary of the post in a Twitter thread a few days ago and added two extra sketches there. Posting here too in case anyone finds them useful:
(And a meme generator for the memes.)
Yeah actually I think @Habryka [Deactivated] discusses these kinds of dynamics here: https://www.lesswrong.com/posts/4NFDwQRhHBB2Ad4ZY/the-filan-cabinet-podcast-with-oliver-habryka-transcript
Excerpt (bold mine, Habryka speaking):
One of the core things that I was always thinking about with LessWrong, and that was my kind of primary analysis of what went wrong with previous LessWrong revivals, was [kind of] an iterated, [the term] “prisoner's dilemma” is overused, but a bit of an iterated prisoner's dilemma or something where, like, people needed to have the trust on an ongoing basis that the maintainers and the people who run it will actually stick with it. And there's a large amount of trust that the people need to have that, if they invest in a site and start writing content on it, that the maintainers and the people who run it actually will put the effort into making that content be shepherded well. And the people who want to shepherd it only want to do that if the maintainers actually...
And so, one of the key things that I was thinking about, was trying to figure out how to guarantee reliability. This meant, to a lot of the core contributors of the site, I made a promise when I started it, that was basically, I'm going to be making sure that LessWrong is healthy and keeps running for five years from the time I started. Which was a huge commitment - five years is a hugely long time. But my sense at the time was that type of commitment is exactly the most important thing. Because the most usual thing that I get when I talk [in] user interviews to authors and commenters is that they don't want to contribute because they expect the thing to decline in the future. So reliability was a huge part of that.
And then I also think, signaling that there was real investment here was definitely a good chunk of it. I think UI is important, and readability of the site is important. And I think I made a lot of improvements there to decide that I'm quite happy with. But I think a lot of it was also just a costly signal that somebody cares.
I don't know how I feel about that in retrospect. But I think that was a huge effect, where I think people looked on the site, and when [they] looked at LessWrong 2.0, there was just a very concrete sense that I could see in user interviews that they were like, "Oh, this is a site that is being taken care of. This is a thing that people are paying attention to and that is being kept up well." In a similar [sense] to how, I don't know, a clean house has the same symbol. I don't really know. I think a lot of it was, they were like, wow, a lot of stuff is changing. And the fact that a lot of work is being put into this, the work itself is doing a lot of valuable signaling.
Yeah, I guess I don't want to say that it'd be better if the team had people who are (already) strongly attached to various specific perspectives (like the "AI as a normal technology" worldview --- maybe especially that one?[1]). And I agree that having shared foundations is useful / constantly relitigating foundational issues would be frustrating. I also really do think the points I listed under "who I think would be a good fit" — willingness to try on and ditch conceptual models, high openness without losing track of taste, & flexibility — matter, and probably clash somewhat with central examples of "person attached to a specific perspective."
= rambly comment, written quickly, sorry! =
But in my opinion we should not just all (always) be going off of some central AI-safety-style worldviews. And I think that some of the divergence I would like to see more of could go pretty deep - e.g. possibly somewhere in the grey area between what you listed as "basic prerequisites" and "particular topics like AI timelines...". (As one example, I think accepting terminology or the way people in this space normally talk about stuff like "alignment" or "an AI" might basically bake in a bunch of assumptions that I would like Forethought's work to not always rely on.)
One way to get closer to that might be to just defer less or more carefully, maybe. And another is to have a team that includes people who better understand rarer-in-this-space perspectives, which diverge earlier on (or people who are by default inclined to thinking about this stuff in ways that are different from others' defaults), as this could help us start noticing assumptions we didn't even realize we were making, translate between frames, etc.
So maybe my view is that (1) there were more ~independent worldview formation/ exploration going on, and that (2) the (soft) deferral that is happening (because some deferral feels basically inevitable) were less overlapping.
(I expect we don't really disagree, but still hope this helps to clarify things. And also, people at Forethought might still disagree with me.)
In particular:
If this perspective involves a strong belief that AI will not change the world much, then IMO that's just one of the (few?) things that are ~fully out of scope for Forethought. I.e. my guess is that projects with that as a foundational assumption wouldn't really make much sense to do here. (Although IMO even if, say, I believed that this conclusion was likely right, I might nevertheless be a good fit for Forethought if I were willing to view my work as a bet on the worlds in which AI is transformative.)
But I don't really remember what the "AI as normal.." position is, and could imagine that it's somewhat different — e.g. more in the direction of "automation is the wrong frame for understanding the most likely scenarios" / something like this. In that case my take would be that someone exploring this at Forethought could make sense (haven't thought about this one much), and generally being willing to consider this perspective at least seems good, but I'd still be less excited about people who'd come with the explicit goal of pursuing that worldview & no intention of updating or whatever.
--
(Obviously if the "AI will not be a big deal" view is correct, I'd want us to be able to come to that conclusion -- and change Forethught's mission or something. So I wouldn't e.g. avoid interacting with this view or its proponents, and agree that e.g. inviting people with this POV as visitors could be great.)