Toby_Ord

It may be partly from Eliezer's first alignment approach being to create a superintelligent sovereign AI, where if that goes right, other risks really would be dealt with.

Rerunning the Time of Perils

Toby_Ord4d2

Yeah, I mean 'more valuable to prevent', before taking into account the cost and difficulty.

Rerunning the Time of Perils

Toby_Ord4d6

At any rate, merely uncertain catastrophic risks do not have rerun risk, while chancy ones do.

This is a key point. For many existential risks, the risk is mainly epistemic (i.e. we should assign some probability p to it happening in the next time period), rather than it being objectively chancy. For one-shot decision-making sometimes this distinction doesn't matter, but here it does.

Complicating matters, what is really going on is not just that the probability is one of two types, but that we have a credence distribution over the different levels of objective chance. A pure subjective case is where all our credence is on 0% and 100%, but in many cases we have credences over multiple intermediate risk levels — these cases are neither purely epistemic nor purely objective chance.

Rerunning the Time of Perils

Toby_Ord4d3

The value of saving philanthropic resources to deploy post-superintelligence is greater than it otherwise would be.

One way to think of this is that if there is a 10% existential risk from the superintelligence transition and we will attempt that transition, then the world is currently worth 0.90 V, where V is the expected value of the world after achieving that transition. So the future world is more valuable (in the appropriate long-term sense) and saving it is correspondingly more important. With these numbers the effect isn't huge, but would be important enough to want to take into account.

More generally, worlds where we are almost through the time of perils are substantially more valuable than those where we aren't. And it setback prevention becomes more important the further through you are.

Legible vs. Illegible AI Safety Problems

Toby_Ord1mo11

That's a very nice and clear idea — I think you're right that working on making mission-critical, but illegible, problems legible is robustly high value.

How Well Does RL Scale?

Toby_Ord2mo11

It's very difficult to do this with benchmarks, because as the models improve benchmarks come and go. Things that used to be so hard that it couldn't do better than chance quickly become saturated and we look for the next thing, then the one after that, and so on. For me, the fact that GPT-4 -> GPT4.5 seemed to involve climbing about half of one benchmark was slower progress than I expected (and the leaks from OpenAI suggest they had similar views to me). When GPT-3.5 was replaced by GPT-4, people were losing their minds about it — both internally and on launch day. Entirely new benchmarks were needed to deal with what it could do. I didn't see any of that for GPT-4.5.

I agree with you that the evidence is subjective and disputable. But I don't think it is a case where the burden of proof is disproportionately on those saying it was a smaller jump than previously.

(Also, note that this doesn't have much to do with the actual scaling laws, which are a measure of how much prediction error of the next token goes down when you 10x the training compute. I don't have reason to think that has gone off trend. But I'm saying that the real-world gains from this (or the intuitive measure of intelligence) has diminished, compared to the previous few 10x jumps. This is definitely compatible. e.g. if the model only trained on wikipedia plus an unending supply of nursery rhymes, its prediction error would continue to drop as more training happened, but its real world capabilities wouldn't improve by continued 10x jumps in the number of nursery rhymes added in. I think the real world is like this where GPT-4-level systems are already trained on most books ever written and much of the recorded knowledge of the last 10,000 years of civilisation, and it makes sense that adding more Reddit comments wouldn't move the needle much.)

How Well Does RL Scale?

Toby_Ord2mo5

I was going to say something about lack of incentives, but I think it is also a lack of credible signals that the work is important, is deeply desired by others working in these fields, and would be used to inform deployments of AI. In my view, there isn't much desire for work like this from people in the field and they probably wouldn't use it to inform deployment unless a lot of effort is also added from the author to meet the right people, convince theme to spend the time to take it seriously etc.

How Well Does RL Scale?

Toby_Ord2mo7

I don't know what to make of that. Obviously Vladimir knows a lot about state of the art compute, but there are so many details there without them being drawn together into a coherent point that really disagrees with you or me on this.

It does sound like he is making the argument that GPT 4.5 was actually fine and on trend. I don't really believe this, and don't think OpenAI believed it either (there are various leaks they were disappointed with it, they barely announced it, and then they shelved it almost immediately).

I don't think the argument about original GPT-4 really works. It improved because of post-training, but did they also add that post-training on GPT-4.5? If so, then the 10x compute really does add little. If not, then why not? Why is OpenAI's revealed preference to not put much effort into enhancing their most expensive ever system if not because they didn't think it was that good?

There is a similar story re reasoning models. It is true that in many ways the advanced reasoning versions of GPT-4o (e.g. o3) are superior to GPT-4.5, but why not make it a reasoning model too? If that's because it would use too much compute or be too slow for users due to latency, then these are big flaws with scaling up larger models.

How Well Does RL Scale?

Toby_Ord2mo5

Re 99% of academic philosophers, they are doing their own thing and have not heard of these possibilities and wouldn't be likely to move away from their existing areas if they had. Getting someone to change their life's work is not easy and usually requires hours of engagement to have a chance. It is especially hard to change what people work on in a field when you are outside that field.

A different question is about the much smaller number of philosophers who engage with EA and/or AI safety (there are maybe 50 of these). Some of these are working on some of those topics you mention. e.g. Will MacAskill and Joe Carlsmith have worked on several of these. I think some have given up philosophy to work on other things such as AI alignment. I've done occasional bits of work related to a few of these (e.g. here on dealing with infinities arising in decision theory and ethics without discounting) and also to other key philosophical questions that aren't on your list.

For such philosophers, I think it is a mixture of not having seen your list and not being convinced these are the best things that they each could be working on.

How Well Does RL Scale?

Toby_Ord2mo25

I appreciate you raising this Wei (and Yarrow's responses too). They both echoed a lot of my internal debate on this. I'm definitely not sure whether this is the best use of my time. At the moment, my research time is roughly evenly split between this thread of essays on AI scaling and more philosophical work connected to longtermism, existential risk and post-AGI governance. The former is much easier to demonstrate forward progress and there is more of a demand signal for it. The latter is harder to be sure it is on the right path and is in less demand. My suspicion is that it is generally more important though, and that demand/appreciation doesn't track importance very well.

It is puzzling to me too that no-one else was doing this kind of work on understanding scaling. I think I must be adding some rare ingredient, but I can't think of anything rare enough to really explain why no-one else got these results first. (People at the labs probably worked out a large fraction of this, but I still don't understand why the people not at the labs didn't.)

In addition to the general questions about which strand is more important, there are a few more considerations:

No-one can tell ex ante how a piece of work or research stream will pan out, so everyone will always be wrong ex post sometimes in their prioritisation decisions
My day job is at Oxford University's AI Governance Initiative (a great place!) and I need to be producing some legible research that an appreciable number of other people are finding useful
I'm vastly more effective at work when I have an angle of attack and a drive to write up the results — recently this has been for these bite-size pieces of understanding AI scaling. The fact that there is a lot of response from others is helping with this as each piece receives some pushback that leads me to the next piece.

But I've often found your (Wei Dai's) comments over the last 15-or-so years to be interesting, unusual, and insightful. So I'll definitely take into account your expressed demand for more philosophical work and will look through those pages of philosophical questions you linked to.

Toby_Ord

Posts 12

Comments170

Posts
12

Comments
170