This is a linkpost for

Extended version of a short paper accepted at DPFM, ICLR'24. Authored by Vishaal Udandarao, Ameya Prabhu, Adhiraj Ghosh, Yash Sharma, Philip H.S. Torr, Adel Bibi, Samuel Albanie, and Matthias Bethge.

Similar to "The Importance of (Exponentially More) Computing Power."


Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during "zero-shot" evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and five standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and downstream datasets, and testing on purely synthetic data distributions. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test set as the "Let it Wag!" benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.




Sorted by Click to highlight new comments since:

For people who are confused by the title, there is a nice paper overview on computerphile here, titled "has generative AI already peaked?". 

If I'm interpreting it right, the authors seem to be indicating based on their experiments that shoving in more compute power and data like openAI is doing with GPT will experience diminishing returns and not lead to significant general reasoning outside of their training set conditions. Or in EA terms, AGI is unlikely to arrive without significant algorithmic breakthroughs. 

Thanks for sharing. Based on this paper, the paper on the fact that many emergent properties of LLMs are a mirage, and Epoch's work on data scaling laws, I have greatly revised my estimates of the arrival rate of AI risks and benefits downward.

Curated and popular this week
Relevant opportunities