Kinda pro-pluralist, kinda anti-Bay EA.
I have come here to extend the principle of charity to bad criticisms of EA and kick ass. And I'm all out of charity.
(my opinions are fully my own, and do not represent the views of any close associates or the company I work for)
warning - mildly spicy take
In the wake of the release, I was a bit perplexed by how much of Tech Twitter (answered by own question there) really thought this a major advance.
But in actuality a lot of the demo was, shall we say, not consistently candid about Gemini's capabilities (see here for discussion and here for the original).
At the moment, all Google have released is a model inferior to GPT-4 (though the multi-modality does look cool), and have dropped an I.O.U for a totally-superior-model-trust-me-bro to come out some time next year.
Previously some AI risk people confidently thought that Gemini would be substantially superior to GPT-4. As of this year, it's clearly not. Some EAs were not sceptical enough of a for-profit company hosting a product announcement dressed up as a technical demo and report.
There have been a couple of other cases of this overhype recently, notably 'AGI has been achieved internally' and 'What did Ilya see?!!?!?' where people jumped to assuming a massive jump in capability on the back on very little evidence, but in actuality there hasn't been. That should set off warning flags about 'epistemics' tbh.
On the 'Benchmarks' - I think most 'Benchmarks' that large LLMs use, while the contain some signal, are mostly noisy due to the significant issue of data contamination (papers like The Reversal Curse indicate this imo), and that since LLMs don't think as humans do we shouldn't be testing them in similar ways. Here are two recent papers - one from Melanie Mitchell, one about LLMs failing to abstract and generalise, and another by Jones & Bergen[1] from UC San Diego actually empirically performing the Turing Test with LLMs (the results will shock you)
I think this announcement should make people think near term AGI, and thus AIXR, is less likely. To me this is what a relatively continuous takeoff world looks like, if there's a take off at all. If Google had announced and proved a massive leap forward, then people would have shrunk their timelines even further. So why, given this was a PR-fueled disappointment, should we not update in the opposite direction?
Finally, to get on my favourite soapbox, dunking on the Metaculus 'Weakly General AGI' forecast:
tl;dr - Gemini release is disappointing. Below many people's expectations of its performance. Should downgrade future expectations. Near term AGI takeoff v unlikely. Update downwards on AI risk (YMMV).
I originally thought this was a paper by Mitchell, this was a quick system-1 take that was incorrect, and I apologise to Jones and Bergen.
I'm glad you found my comment useful. I think then, with respect, you should consider retracting some of your previous comments, or at least reframing them to be more circumspect and be clear you're taking issue with a particular framing/subset of the AIXR community as opposed to EA as a whole.
As for the points in your comment, there's a lot of good stuff here. I think a post about the NRRC, or even an insider's view into how the US administration thinks about and handles Nuclear Risk, would be really useful content on the Forum, and also incredibly interesting! Similarly, I think how a community handles making 'right-tail recommendations' when those recommendations may erode its collective and institutional legitimacy[1] would be really valuable. (Not saying that you should write these posts, they're just examples off the top of my head. In general I think you have a professional perspective a lot of EAs could benefit from)
I think one thing where we agree is that there's a need to ask and answer a lot more questions, some of which you mention here (beyond 'is AIXR valid'):
And so on.
Some people in EA might write this off as 'optics', but I think that's wrong
I'm sorry you encountered this, and I don't want to minimise your personal experience
I think once any group becoms large enough there will be people who associate with it who harbour all sorts of sentiments including the ones you mention.
On the whole though, i've found the EA community (both online and those I've met in person) to be incredibly pro-LGBT and pro-trans. Both the underlying moral views (e.g. non-traditionalism, impartiality and cosmpolitanism etc) point that way, as do the underlying demographics (e.g. young, high educated, socially liberal)
I think where there might be a split is in progressive (as in, leftist politically) framings of issues and the type of language used to talk about these topics. I think those often find it difficult to gain purchase in EA, especially on the rationalist/LW-adjacent side. But I don't think those mean that the community as a whole, or even the sub-section, are 'anti-LGBT' and 'anti-trans', and I think there are historical and multifacted reasons why there's some emnity between 'progressive' and 'EA' camps/perspectives.
Nevertheless, I'm sorry that you experience this sentiment, and I hope you're feeling ok.
Thanks for sharing the post Zed :) Like titotal says, I hope you consider staying around. I think AI-risk (AIXR) sceptic posts should be welcomed on the Forum. I'm someone who'd probably count as AIXR sceptic for the EA community (but not the wider world/public). It's clearly an area you think EA as a whole is making a mistake, so I've read the post and recent comments and have some thoughts that I hope you might find useful:
I think there are some good points you made:
Some parts that I didn't find convincing:
But also some bad ones:
I know this is a super long comment, so feel free to only respond to the bits you find useful or even not at all. Alternatively we could try out the new dialogue feature to talk through this a bit more? In any case, thanks again for the post, it got me thinking about where and why I disagree both with AI 'doomers' as well as your position in this post.
Hey Wei, I appreciate you responding to Mo, but I found myself still confused after reading this reply. This isn't purely down to you - a lot of LessWrong writing refers to 'status', but they never clearly define what it is or where the evidence and literature for it is.[1] To me, it seem to function as this magic word that can explain anything and everything. The whole concept of 'status' as I've seen it used in LW seems incredibly susceptible to being part of 'just-so' stories.
I'm highly sceptical of this though, like I don't know what a 'status gradient' is and I don't think it exists in the world? Maybe you mean an abstract description of behaviour? But then a 'status gradient' is just describing what happened in a social setting, rather than making scientific predictions. Maybe it's instead a kind of non-reductionist sense of existing and having impact, which I do buy, but then things like 'ideas','values', and 'beliefs' should also exist in this non-reductionist way and be as important for considering human action as 'status' is.
It also tends to lead to using explanations like this:
One tricky consideration here is that people don't like to explicitly think about status, because it's generally better for one's status to appear to do everything for its own sake
Which to me is dangerously close to saying "if someone talks about status, it's evidence it's real. If they don't talk about it, then they're self-deceiving in a Hansion sense, and this is evidence for status" which sets off a lot of epistemological red-flags for me
In fact, one of the most cited works about it isn't a piece of anthropology or sociology, but a book about Improv acting???
Just a quick point of order:
as far as I know, nobody who enabled or associated with SBF has yet stepped down from their leadership positions in EA organizations.
I think Will resigning from his position of the EV UK board and Nick resigning from both the UK and US boards would count for this
I'm not making a claim here whether these were the 'right' outcomes or whether it's 'enough', but there have been consequences including at 'leading' EA organisations
Maybe you two might consider having this discussion using the new Dialogue feature? I've really appreciated both of your perspectives and insights on this discussion, and I think the collaborative back-and-forth your having seems a very good fit for how Dialogues work.
A side consideration - assuming a UK-based EAGx is being planned for next year, perhaps that could be planned to co-incide with holidays from UK univerisites, and perhaps be more favourable from applications from students who wanted to attend/apply to EAG London 2024 but didn't for the reason Oliver states?
[ aside: I know organising events isn't an easy thing, just want to make it clear this is more of a consideration rather than a demand :) ]
I'm find myself pretty confused at this reply Tristan. I'm not trying to be rude, but like in some cases I don't really see how it follows
When you say "EAs are out" it seems like we want some of our own on the inside, as opposed to just sensible, saftey concerned people.
I disagree. I think it's a statement of fact. The EAs who were on the board will no longer be on the board. They're both senior EAs, so I don't think it's an irrelevant detail for the Forum to consider. I also think it's a pretty big stretch to go from 'EAs are out' to 'only EAs can be trusted with AI Safety', like I just don't see that link being strong at all, and I disagree with it anyway
What succinct way to put this is better? "Saftey is out" feels slightly better but like it's still making some sort of claim that we have unique providence here. So idk, maybe we just need slightly longer expressions here like "Helen Toner and Tasha McCauley have done really great work and without their concern for saftey we're worried about future directions of the board" or something like that.
Perhaps an alternative could have been "Sam Altman returning as OpenAI CEO, major changes to board structure agreed?" or something like that?
As for your expression. I guess I just disagree with it, or think it's lacking evidence? I definitely wouldn't want to cosign it or state that's an EA-wide position?
to avoid uncertainty about what I mean here:
But see below, I think these issue are best discussed somewhere else
(the other two paragraphs of yours focus somewhat confusingly on the idea of labeling EAs as being necessary for considering the impact of this on EA (and on their ability to govern in EA) which I think is best discussed as its own separate point?)
I agree that the implications of this for EA governance are best discussed in another place/post entirely, but it's an issue I think does need to be brought up, perhaps when the dust has settled a bit and tempers on all sides have cooled.
I don't know where I claim that labelling EAs is necessary for discussing the impacts of this at all. Like I really just don't get it - I don't think that's true about what I said and I don't think I said it or implied it 🤷♂
including most of the AI-safety identifying people at OpenAI as far as I can tell
Hey Ryan, thanks for your engagement :) I'm going to respond to your replies in one go if that's ok
#1:
This is a good point. I think my argument would point to larger updates for people who put susbtantial probability on near term AGI in 2024 (or even 2023)! Where do they shift that probability in their forecast? I think just dropping it uniformly over their current probability would be suspect to me. So maybe it'd wouldn't be a large update for somebody already unsure what to expect from AI development, but I think it should probably be a large update for the ~20% expecting 'weak AGI' in 2024 (more in response #3)
#2:
Yeah I suppose ~80%->~60% is a decent update, thanks for showing me the link! My issue here would be the resolution criteria realy seems to be CoT on GSM8K, which is almost orthogonal to 'better' imho, especially given issues accounting for dataset contamination - though I suppose the market is technically about wider perception rather than technical accuracy. I think I was basing a lot of my take on the response on Tech Twitter which is obviously unrepresentative, and prone to hype. But there were a lot of people I generally regard as smart and switched-on who really over-reacted in my opinion. Perhaps the median community/AI-Safety researcher response was more measured.
#3:
I'm sympathetic to this, but Metaculus questions are generally meant to be resolved according a strict and unambiguous criteria afaik. So if someone thinks that weakly general AGI is near, but that it wouldn't do well at the criteria in the question, then they should have longer timelines than the current community response to that question imho. The fact that this isn't the case to me indicates that many people who made a forecast on this market aren't paying attention to the details of the resolution and how LLMs are trained and their strengths/limitations in practice. (Of course, if these predictors think that weak AGI will happen from a non-LLM paradigm then fine, but then i'd expect the forecasting community to react less to LLM releases)
I think where I absolutely agree with you is that we need different criteria to actually track the capabilities and properties of general AI systems that we're concerned about! The current benchmarks available seem to have many flaws and don't really work to distinguish interesting capabilities in the trained-on-everything era of LLMs. I think funding, supporting, and popularising research into what 'good' benchmarks would be and creating a new test would be high impact work for the AI field - I'd love to see orgs look into this!
B
For the Metaculus question? I'd be very upset if I had a longer-timeline prediction that failed because this resolution got changed - it says 'less than 10 SAT exams' in the training data in black and white! The fact that these systems need such masses of data to do well is a sign against their generality to me.
I don't doubt that the Gemini team is aware of issues of data contamination (they even say so at the end of page 7 in the technical report), but I've become very sceptical about the state of public science on Frontier AI this year. I'm very much in a 'trust, but verify' mode and the technical report is to me more of a fancy press-release that accompanied the marketing than an honest technical report. (which is not to doubt the integrity of the Gemini research and dev team, just to say that I think they're losing the internal tug-of-war with Google marketing & strategy)
#4:
Ah good spot. I think I saw Melanie share it on twitter, and assumed she was sharing some new research of hers (I pulled together the references fairly quickly). I still think the results stand but I appreciate the correction and have amended my post.
<> <> <> <> <>
I want to thank you again for the interesting and insightful questions and prompts. They definitely made me think about how to express my position slightly more clearly (at least, I hope I make more sense to you after this reponse, even if we don't agree on everything) :)