I'm a researcher at Forethought; before that, I ran the non-engineering side of the EA Forum (this platform), ran the EA Newsletter, and worked on some other content-related tasks at CEA. [More about the Forum/CEA Online job.]
Selected posts
Background
I finished my undergraduate studies with a double major in mathematics and comparative literature in 2021. I was a research fellow at Rethink Priorities in the summer of 2021 and was then hired by the Events Team at CEA. I later switched to the Online Team. In the past, I've also done some (math) research and worked at Canada/USA Mathcamp.
I tried to clarify things a bit in this reply to titotal: https://forum.effectivealtruism.org/posts/iJSYZJJrLMigJsBeK/lizka-s-shortform?commentId=uewYatQz4dxJPXPiv
In particular, I'm not trying to make a strong claim about exponentials specifically, or that things will line up perfectly, etc.
(Fwiw, though, it does seem possible that if we zoom out, recent/near-term population growth slow-downs might be functionally a ~blip if humanity or something like it leaves the Earth. Although at some point you'd still hit physical limits.)
Oh, apologies: I'm not actually trying to claim that things will be <<exactly.. exponential>>. We should expect some amount of ~variation in progress/growth (these are rough models, we shouldn't be too confident about how things will go, etc.), what's actually going on is (probably a lot) more complicated than a simple/neat progression of new s-curves, etc.
The thing I'm trying to say is more like:
(Apologies if what I'd written earlier was unclear about what I believe — I'm not sure if we still notably disagree given the clarification?)
A different way to think about this might be something like:
Something like this seems to help explain why views like "the curve we're observing will (basically) just continue" have seemed surprisingly successful, even when the people holding those "curve go up" views justified their conclusions via apparently incorrect reasoning about the specific drivers of progress. (And so IMO people should place non-trivial weight on stuff like "rough, somewhat naive-seeming extrapolation of the general trends we're observing[2]."[3])
[See also a classic post on the general topic, and some related discussion here, IIRC: https://www.alignmentforum.org/posts/aNAFrGbzXddQBMDqh/moore-s-law-ai-and-the-pace-of-progress ]
Caveat: I'd add "...on a big range/ the scale we care about"; at some point, ~any progress would start hitting ~physical limits. But if that point is after the curve reshapes ~everything we care about, then I'm basically ignoring that consideration for now.
Obviously there are caveats. E.g.:
- the metrics we use for such observations can lead us astray in some situations (in particular they might not ~linearly relate to "the true thing we care about")
- we often have limited data, we shouldn't be confident that we're predicting/measuring the right thing, things can in fact change over time and we should also not forget that, etc.
(I think there were nice notes on this here, although I've only skimmed and didn't re-read https://arxiv.org/pdf/2205.15011 )
Also, sometimes we do know what
Replying quickly, speaking only for myself[1]:
I.e. I'm not speaking for the Online/Mod teams here, and didn't run this comment by anyone.
(I vaguely remember making and linking a public version of this doc somewhere at some point, but couldn't quickly find that.)
In fact it looks like I can no longer add or remove the Community tag from posts. I'm still in the Slack; a few people sometimes flag questions about edge cases there.
I sometimes see people say stuff like:
Those forecasts were misguided. If they ended up with good answers, that's accidental; the trends they extrapolated from have hit limits... (Skeptics get Bayes points.)
But IMO it's not a fluke that the "that curve is going up, who knows why" POV has done well.
A sketch of what I think happens:
There’s a general dynamic here that goes something like:
And then in *some* sense the bottlenecks crowd turn out to be right (the specific drivers/paradigm peters out, there’s literally no more space for more transistors, companies run low on easily accessible/high quality training data, etc.)…
…but then a "surprise new thing" pops up and fills the gap, such that the “true” thing we cared about (whether or not it’s what we were originally measuring) *does* actually continue as people originally predicted, apparently naively
(and it turned out that the curve consists of a stack of s-curves..)
We can go too far with this kind of reasoning; some “true things we care about” (e.g. spread of a disease) * are* in fact s-curves, bounded, etc., and only locally look like ~exponentials. (So: no, we shouldn't expect the baby to weigh trillions of pounds by age 10...)
But I think the more granular, gears-oriented view — which considers how long specific drivers of the progress we're seeing could continue, etc. — often underrates the extent to which *other forces* can (and often do) jump in when earlier drivers lose momentum.
"The Bypass Principle: How AI flows around obstacles" from Eric Drexler is a very related (and IMO good) post. Quote (bold mine):
While shallow assessments focus on visible obstacles — the difficulties of matching human capabilities, of overcoming regulatory barriers, and of restructuring organizations — AI-enabled developments will often find paths that bypass rather than overcome apparent barriers. Existing obstacles are concrete and obvious in a way that alternatives are not. Skewed judgment follows.
(This stuff isn't new; many people have pointed out these kinds of dynamics. But I feel like I'm still seeing them a fair bit — and this came up recently — so I wanted to write this note.)
Yeah, this sort of thing is partly why I tend to feel better about BOTECs like (writing very quickly, tbc!):
What could we actually accomplish if we (e.g.) doubled (the total stock/ flow of) investment in ~technical AIS work (specifically the stuff focused on catastrophic risks, in this general worldview)? (you could broaden if you wanted to, obviously)
Well, let's see:
- That might look like:
- adding maybe ~400(??) FTEs similar (in ~aggregate) to the folks working here now, distributed roughly in proportion to current efforts / profiles — plus the funding/AIS-specific infrastructure (e.g. institutional homes) needed to accomodate them
- E.g. across intent alignment stuff, interpretability, evals, AI control, ~safeguarded AI, AI-for-AIS, etc., across non-profit/private/govt (but in fact aimed at loss of control stuff).
- How good would this be?
- Maybe (per year of doubling) we'd then get something like a similar-ish value from this as we don from a year of current space (or something like 2x less, if we want to eyeball diminishing returns)
- Then maybe we can look at what this space has accomplished in the past year and see how much we'd pay for that / how valuable that seems...
- (What other ~costs might we be missing here?)
You might also decide that you have much better intuitions for how much we'd accomplish (and how valuable that'd be) on a different scale (e.g. adding one project like Redwood/Goodfire/Safeguarded AI/..., i.e. more like 30 FTEs than 400 — although you'd probably want to account for considerations like "for each 'successful' project we'd likely need to invest in a bunch of attempts/ surrounding infrastructure..."), or intuitions about what amount of investment is required to get to some particular desired outcome...
Or if you took the more ITN-style approach, you could try to approach the BOTEC via something like (1) how much investment has there been so far in this broad ~POV / porftolio, (2 (option a)) how much value/progress has this portfolio made + something like "how much has been made in the second half?" (to get a sense of how much we're facing diminishing returns at the moment — fwiw without thinking too much about it I think "not super diminishing returns at the mo"), or (2 (option b)) what fraction of the overall "AI safety problem" is "this-sort-of-safety-work-affectable" (i.e. something like "if we scaled up this kind of work — and only this kind of work — to an insane degree, how much of the problem will be fixed?") + how big/important the problem is overall... Etc. (Again, for all of this my main question is often "what are the sources of signal or taste / heuristics / etc. that you're happier basing your estimates on?)
Thank you! I used Procreate for these (on an iPad).[1]
(I also love Excalidraw for quick diagrams, have used & liked Whimsical before, and now also semi-grudgingly appreciate Canva.)
Relatedly, I wrote a quick summary of the post in a Twitter thread a few days ago and added two extra sketches there. Posting here too in case anyone finds them useful:
(And a meme generator for the memes.)
Yeah actually I think @Habryka [Deactivated] discusses these kinds of dynamics here: https://www.lesswrong.com/posts/4NFDwQRhHBB2Ad4ZY/the-filan-cabinet-podcast-with-oliver-habryka-transcript
Excerpt (bold mine, Habryka speaking):
One of the core things that I was always thinking about with LessWrong, and that was my kind of primary analysis of what went wrong with previous LessWrong revivals, was [kind of] an iterated, [the term] “prisoner's dilemma” is overused, but a bit of an iterated prisoner's dilemma or something where, like, people needed to have the trust on an ongoing basis that the maintainers and the people who run it will actually stick with it. And there's a large amount of trust that the people need to have that, if they invest in a site and start writing content on it, that the maintainers and the people who run it actually will put the effort into making that content be shepherded well. And the people who want to shepherd it only want to do that if the maintainers actually...
And so, one of the key things that I was thinking about, was trying to figure out how to guarantee reliability. This meant, to a lot of the core contributors of the site, I made a promise when I started it, that was basically, I'm going to be making sure that LessWrong is healthy and keeps running for five years from the time I started. Which was a huge commitment - five years is a hugely long time. But my sense at the time was that type of commitment is exactly the most important thing. Because the most usual thing that I get when I talk [in] user interviews to authors and commenters is that they don't want to contribute because they expect the thing to decline in the future. So reliability was a huge part of that.
And then I also think, signaling that there was real investment here was definitely a good chunk of it. I think UI is important, and readability of the site is important. And I think I made a lot of improvements there to decide that I'm quite happy with. But I think a lot of it was also just a costly signal that somebody cares.
I don't know how I feel about that in retrospect. But I think that was a huge effect, where I think people looked on the site, and when [they] looked at LessWrong 2.0, there was just a very concrete sense that I could see in user interviews that they were like, "Oh, this is a site that is being taken care of. This is a thing that people are paying attention to and that is being kept up well." In a similar [sense] to how, I don't know, a clean house has the same symbol. I don't really know. I think a lot of it was, they were like, wow, a lot of stuff is changing. And the fact that a lot of work is being put into this, the work itself is doing a lot of valuable signaling.
Re not necessarily "optimizing" for the Forum, I guess my frame is:
The Online Team is the current custodian of an important shared resource (the Forum). If the team can't actually commit to fulfilling its "Forum custodian" duties, e.g. because the priorities of CEA might change, then it should probably start trying to (responsibly) hand that role off to another person/group.
(TBC this doesn't mean that Online should be putting all of its efforts into the Forum, just like a parent has no duty to spend all their energy caring for their child. And it's not necessarily clear what the bar for responsibly fulfilling Forum custodian duties actually is — maybe moderation and bug fixes are core charges, but "no new engineering work" is fine, I'm not sure.)
I would view this somewhat differently if it was possible for another group to compete with or step in for CEA / the Online Team if it seemed that the team is not investing [enough] in the Forum (or investing poorly). But that's not actually possible — in fact even if the Online Team stopped ~everything, by default no one else would be able to take over. I'd also feel somewhat differently if the the broader community hadn't invested so much in the Forum, and if I didn't think that a baseline ~trust in (and therefore clear commitment from) the team was so important for the Forum's fate (which I believe for reasons loosely outlined in the memo, IIRC).
...
Btw, I very much agree that staring into the abyss (occasionally) is really useful. And I really appreciate you posting this on the Forum, and also engaging deeply/openly in the replies.
I think going for Option 2 ("A bulletin board") or 3 ("Shut down") would be pretty a serious mistake, fwiw. (I have fewer/weaker opinions on 1, although I suspect I'm more pessimistic about it than ~most others.)
...
An internal memo I wrote in early 2023 (during my time on the Online Team) seems relevant, so I've made a public copy: Vision / models / principles for the Forum and the Forum+ team[1]
I probably no longer believe some of what I wrote there, but still endorse the broad points/models, which lead me think, among other things:
Other notes (adding to what the models I'd described in the memo lead me to believe, or articulating nearby/more specific versions of the claims I make there, etc.):
(also called "Equilibrium curves & being stewards of the special thing")
The memo outlines some of how I was thinking about how the Forum works, especially an "equilibrium curves" model and a view that trust is an key thing to track/build towards. It also discusses the value of the Online Team's work, theories of change, and when (if ever) "closing" the Forum would make sense and how that could work/play out.
(I know Sarah's read this, but figured I'd share it in case others are interested, and because I'm about to reference stuff from there.)
Note: At least two fairly important parts of my current models seem missing from the doc (and I suspect I'd think of several more if I thought about it for more than the time it took to skim the doc and write this comment): (1) the Forum as a "two-sided marketplace" ("writers / content producers" and "readers"), and (2) creation of (object-level) common knowledge as an important thing the Forum does sometimes.
Ah, @Gregory Lewis🔸 says some of the above better. Quoting his comment: