Tom Davidson from Forethought Research and I have a new paper responding to some recent skeptical takes on the Singularity Hypothesis (e.g. this one). Roughly half the paper is philosophical and the other half is empirical. Both halves argue that we should take the Singularity Hypothesis more seriously than many people have been taking it of late. I'm sharing this with the community here because (i) I think it will be of general interest, and (ii) this is a project I'm still working on, and I would appreciate feedback from the community. 

Here's the abstract:

The singularity hypothesis posits a period of rapid technological progress following the point at which AI systems become able to contribute to AI research. Recent philosophical criticisms of the singularity hypothesis offer a range of theoretical and empirical arguments against the possibility or likelihood of such a period of rapid progress. We explore two strategies for defending the singularity hypothesis from these criticisms. First, we distinguish between weak and strong versions of the singularity hypothesis and show that, while the weak version is nearly as worrisome as the strong version from the perspective of AI safety, the arguments for it are considerably more forceful and the objections to it are significantly less compelling. Second, we discuss empirical evidence that points to the plausibility of strong growth assumptions for progress in machine learning and develop a novel mathematical model of the conditions under which strong growth can be expected to occur. We conclude that the singularity hypothesis in both its weak and strong forms continues to demand serious attention in discussions of the future dynamics of growth in AI capabilities.

Note that this is an academic paper, and the goal is to have it published in a  journal. So, for example, our ability to go into detail about some topics is limited by space constraints, and we can't introduce arguments that rely on priors that we have made up without published empirical evidence.

44

0
0

Reactions

0
0
Comments5
Sorted by Click to highlight new comments since:

More generally, I take issue in your  with the idea that the number of "AI researchers" scales linearly with effective compute (gamma = 1 is put forward as your default hypothesis), and that these "AI researchers" can be assumed to have the same attributes as human researchers, like their beta value. 

If you double the thinking time of chatgpt, or their training time, do you get twice as good results? Empirically no. From openAI themselves, you need exponential increases in compute to get linear results in accuracy:

Running two AI systems in parallel is just not the same as hiring two different researchers. Each researcher brings with them new ideas, training, and backgrounds: while each AI is an identical clone. If you think this will change in the future that's fine, but it's a pretty big assumption imo. 

Again, thanks for this! I think this is an important issue which it might be worth addressing more directly in the paper. Two comments, and I'm interested in what you think about them.

Comment 1: I'm not sure that the analogy to the relationship between compute and accuracy here is apt. When we duplicate an automated AI researcher, we are not trying to improve our accuracy on a single task, we are working on multiple tasks in parallel. 

Comment 2: I do think the analogy to cloning is apt. Consider some talented ML researcher at a top lab — call her Ava. We can ask: If we had a duplication machine such that we could make any number of copies of Ava, how would the total quality-weighted research effort contributed by making n copies compare to the total quality-weighted research effort contributed by instead hiring n additional engineers? It is correct that the copies of Ava will not come into the world with new ideas, training, or background. But I imagine this would not be such a huge limitation for them, since engineers can and do retrain and come up with new ideas. On the other hand, hiring n additional engineers is sampling without replacement from a fixed pool of talent, so we should expect that as we hire more engineers, they will become more and more inferior on average to Ava.

So in sum, I suppose I agree with you that setting gamma to 1 is not strictly speaking a conceptual truth, but given the cognitive flexibility we are assuming when we talk about a human-level digital machine learning researcher, I feel confident that gamma is approximately 1.

No worries, I'm glad you find these critiques helpful!

I think the identical clone thing is an interesting thought experiment, and one that perhaps reveals some differences in worldview. I think duplicating Ava a couple of times would lead to roughly linear increase in output, sure: but if you kept duplicating you'd run into diminishing returns. A large software company who's engineers were entirely replaced with Ava's would be a literal groupthink factory: all of the blindspots and biases of Ava would be completely entrenched and make the whole enterprise brittle. 

I think the push and pull of different personalities is essential to creative production in science: If you look at the history of scientific developments progress is rarely the work of a single genius: more typically it driven by collaborations and fierce disagreements. 

With regards to comment 1: yeah "accuracy" is an imperfect proxy, but I think it makes more sense than "number of tasks done" as a measure of algorithmic progress. This seems like an area where quality matters more than quantity. If I'm using Chatgpt to generate ideas for a research project, will running five different instances lead to the final ideas being five times as good? 

I feel like there's a hidden assumption here that AI will at some point switch from acting like LLM's act in reality to acting like a "little guy in the computer". I don't think this is the case, I think AI may end up having different advantages and disadvantages when compared to human researchers. 

Interesting work!

However I would advise a recheck of some of your numbers: I checked your citation to erdil(2024) on page 23 and I think you made errors there? Compared to erdil page 21, you say r is "1.51 in the case of linear programming", whereas if I'm reading table 8 correctly, the actual 50th percent value is 1.077. Also you say the 90% confidence interval is ".082 to 2.42", I think it's actually .82 to 2.42. 

Also, Are you sure Erdil is using the same model as yours for his estimation of r? In that section he cites back to an earlier section describing the "feller diffusion model", which has some similarities to your semi-endogenous model but has extra terms. Is it valid to use his beta as yours? 

Really glad to hear from you, since I greatly appreciated your work on the AI 2027 material! 

You're right that there are two errors here that need to be corrected. One is that it should be .82 rather than .082. The other is that I intended to be using the numbers from the "naive estimate" column of Table 8 on page 21 in the Erdil paper, which are calculated using a simple process which (to my mind) is likely to be less subject to errors introduced by model choice, but the 1.58 is the 50% estimate from their more complex model — the naive estimate is 1.66. The Feller diffusion model is relevant to their more complex calculations, about which I am a little suspicious, but not to their naive calculations.

More from cdkg
Curated and popular this week
Relevant opportunities