I currently lead EA funds.
Before that, I worked on improving epistemics in the EA community at CEA (as a contractor), as a research assistant at the Global Priorities Institute, on community building, and Global Health Policy.
Unless explicitly stated otherwise, opinions are my own, not my employer's.
You can give me positive and negative feedback here.
I'm not directly working on an AI pause, but if I were, I would think that timelines were very strategically relevant:
Discussion of AI timelines is helpful for answering many of the questions above, as are the endline forecasts. I suspect that I believe that working on AI pauses is much less robust to timelines than you think it is, and discussion of timelines is much more helpful for developing models of AI development and risk (and those models are highly decision-relevant).
Thanks for the link, I haven't come across that report before.
I think Yann has pretty atypical views for people working on LMs. For example, if you take the reference classes of AI-related Turing award winners or Chief scientist types at AI labs, most are far more bullish on LMs (e.g., Hinton, Bengio, Ilya, Jared Kaplan, Schulman).
Why should it matter whether new models have been released after the reveal of ARC-AGI-2? If models have to be specifically fine-tuned for these tasks, doesn’t that show they are lacking in the capability to generalize to novel problems?
The main reason is that the benchmark has been pretty adversarially selected, so it's not clear that it's pointing at a significant lack in LM capabilities. I agree that it's weak evidence that they can't generalise to novel problems, but basically all of the update is priced in from just interacting with systems and noticing that they are better in some domains than others.
For one, it tells you that current frontier models lack the general intelligence or “fluid intelligence” to solve simple puzzles that pretty much any person can solve. Why is that? Isn’t that interesting?
I disagree that ARC-AGI is strong evidence against LMs not having "fluid intelligence" - I agree that was the intention of the benchmark, and I think it's weak evidence.
Another “benchmark” I mused about is the ability of AI systems to generate profit for their users by displacing human labour. It seems like improvement on that “benchmark” has been much, much slower than Moore’s law, but, then again, I don’t know if anyone’s been able to accurately measure that.
Has this been a lot slower than Moore's law? I think OpenAI revenue is, on average, more aggressive than Moore's law. I'd guess that LM ability to automate intellectual work is more aggressive than Moore's law, too, but it started from a very low baseline, so it's hard to see. Subjectively, LMs feel like they should be having a larger impact on the economy than they currently are. I think this is more related to horizon length than fluid intelligence, but 🤷♂️.
The bigger picture is that LLMs have extremely meagre capabilities in many cognitive domains and I haven’t seen signs of anything but modest improvement over the last ~2.5 years. I also don’t see many people trying to quantify those things.
I'm curious for examples here - particularly if they are the kinds of things that LMs have affordances for, are intellectual tasks, and are at least moderately economically valuable (so that someone has actually tried to solve).
In my view, there are many good reasons to work at an AI company, including:
* productively steering an AI lab during crunch time
* doing well-resourced AI safety research
* increasing the ability for safety-conscious people to blow the whistle to governments
* learning about the AI frontier from the best people in the field
* giving to effective charities
* influencing the views of other employees
* influencing how powerful AI systems are deployed and what they are used for during deployment
I don't think these necessarily outweigh the costs of working at an AI company, but the altruistic benefits are sometimes large, and it seems good for people to consider the option thoughtfully.
My impression is that ARC-AGI (1) is close to being solved, which is why they brought our ARC-AGI-2 a few weeks ago.
Benchmarks are often adversarially selected so they take longer to be saturated, so I don't think little progress on ARC-AGI-2 a few weeks after release (and iirc after any major model release) tells us much at all.
-Most AI experts are skeptical that scaling up LLMs could lead to AGI.
I don't think this is true. Do you have a source? My guess is that I wouldn't consider many of the people "experts".
-It seems like there are deep, fundamental scientific discoveries and breakthroughs that would need to be made for building AGI to become possible. There is no evidence we're on the cusp of those happening and it seems like they could easily take many decades.
I think this is a pretty strange take. It seems like basically all progress on AI has involved approximately 0 "deep, fundamental scientific discoveries", so I think you need some argument for why the trend will change. Alternatively, if you think we have made lots of discoveries and that explains AI progress so far, then you need an argument for why these discoveries will stop. Or, if you think we have made little AI progress since ~2010 then I think most readers would strongly disagree with you.
-If you look at easy benchmarks like ARC-AGI and ARC-AGI-2 that are easy for humans to solve and intentionally designed to be a low bar for AI to clear, the weaknesses of frontier AI models are starkly revealed.
I don't think they are designed to be a low bar to clear. They seem very adversarially selected, though I agree that LMs do poorly on them relative to subjectively more difficult tasks like coding. It seems pretty hard to make a timelines update from ARC-AGI unless you are very confident in the importance of abstract shape rotation problems for much more concrete problems, or you care about some notion of "intelligence" much more than automating intellectual labour.
Hi Markus,
For context I run EA Funds, which includes the EAIF (though the EAIF is chaired by Max Daniel not me). We are still paying out grants to our grantees — though we have been slower than usual (particularly for large grants). We are also still evaluating applications and giving decisions to applicants (though this is also slower than usual).
We have communicated this to the majority of our grantees, but if you or anyone else reading this urgently needs a funding decision (in the next two weeks), please email caleb [at] effectivealtruismfunds [dot] org with URGENT in the subject line, and I will see what I can do. Please also include:
You can also apply to one of Open Phil’s programs; in particular, Open Philanthropy’s program for grantees affected by the collapse of the FTX Future Fund may be particularly of note to people applying to EA Funds due to the FTX crash.