(comment crossposted from LW)
While the coauthors broadly agree about points listed in the post, I wanted to stick my neck out a bit more and assign some numbers to one of the core points. I think on present margins, voluntary restraint slows down capabilities progress by at most 5% while probably halving safety progress, and this doesn't seem like a good trade. [The numbers seem like they were different in the past, but the counterfactuals here are hard to estimate.] I think if you measure by the number of people involved, the effect of restraint is substantially lower; here I'm assuming that people who are most interested in AI safety are probably most focused on the sorts of research directions that I think could be transformative, and so have an outsized impact.
Request for advice from animal suffering EAs (and, to a lesser extent, climate EAs?): is there an easy win over getting turkeys from Mary's Turkeys? (Also, how much should I care about getting the heritage variety?)
Background: I routinely cook for myself and my housemates (all of whom are omnivores), and am on a diet that requires animal products for health reasons. Nevertheless, I'd rather impose fewer costs than more costs on others; I stopped eating chicken and chicken eggs in response to this post and recently switched from consuming lots of grass-finished beef to consuming lots of bison.
I have a general sense that "bigger animals have better lives", and so it's better to roast turkeys than roast ducks, for example, but am 1) not clear on the relative impacts of birds and mammals and 2) not clear on turkeys specifically. My sense is that even relatively well-treated chickens have been bred in a way that's pretty welfare-destroying, but that this hasn't yet happened to ducks; turkeys are subject to Thanksgiving pressure in the US, tho, which I'm sure has had some effect. The various food impact things that I've seen (like foodimpacts.org) don't seem to address turkeys (or use estimates from 'similar' creatures, which feels like too much of a stretch).
I think the counterfactual to roasting a turkey is either smoking the same weight of bison brisket or my housemates cooking themselves the same weight of chicken.
So you would have lost 40 hours of productive time? Respectfully: so what? You have sources actively claiming you are about to publish directly false information about them and asking for time to provide evidence that information is directly false.
Also, I think it is worth Oli/Ben estimating how many productive hours were lost to the decision to not delay; it would not surprise me if much of the benefit here was illusory.
(I'm Matthew Gray)
Inflection is a late addition to the list, so Matt and I won’t be reviewing their AI Safety Policy here.
My sense from reading Inflection's response now is that they say the right things about red teaming and security and so on, but I am pretty worried about their basic plan / they don't seem to be grappling with the risks specific to their approach at all. Quoting from them in two different sections:
Inflection’s mission is to build a personal artificial intelligence (AI) for everyone. That means an AI that is a trusted partner: an advisor, companion, teacher, coach, and assistant rolled into one.
Internally, Inflection believes that personal AIs can serve as empathetic companions that help people grow intellectually and emotionally over a period of years or even decades.** Doing this well requires an understanding of the opportunities and risks that is grounded in long-standing research in the fields of psychology and sociology.** We are presently building our internal research team on these issues, and will be releasing our research on these topics as we enter 2024.
I think AIs thinking specifically about human psychology--and how to convince people to change their thoughts and behaviors--are very dual use (i.e. can be used for both positive and negative ends) and at high risk for evading oversight and going rogue. The potential for deceptive alignment seems quite high, and if Inflection is planning on doing any research on those risks or mitigation efforts specific to that, it doesn't seem to have shown up in their response.
I don't think this type of AI is very useful for closing the acute risk window, and so probably shouldn't be made until much later.
I'm thinking about the matching problem of "people with AI safety questions" and "people with AI safety answers". Snoop Dogg hears Geoff Hinton on CNN (or wherever), asks "what the fuck?", and then tries to find someone who can tell him what the fuck.
I think normally people trust their local expertise landscape--if they think the CDC is the authority on masks they adopt the CDC's position, if they think their mom group on Facebook is the authority on masks they adopt the mom group's position--but AI risk is weird because it's mostly unclaimed territory in their local expertise landscape. (Snoop also asks "is we in a movie right now?" because movies are basically the only part of the local expertise landscape that has had any opinion on AI so far, for lots of people.) So maybe there's an opportunity here to claim that territory (after all, we've thought about it a lot!).
I think we have some 'top experts' who are available for, like, mass-media things (podcasts, blog posts, etc.) and 1-1 conversations with people they're excited to talk to, but are otherwise busy / not interested in fielding ten thousand interview requests. Then I think we have tens (hundreds?) of people who are expert enough to field ten thousand interview requests, given that the standard is "better opinions than whoever they would talk to by default" instead of "speaking to the whole world" or w/e. But just like connecting people who want to pay to learn calculus and people who know calculus and will teach it for money, there's significant gains from trade from having some sort of clearinghouse / place where people can easily meet. Does this already exist? Is anyone trying to make it? (Do you want to make it and need support of some sort?)
I think the 'traditional fine dining' experience that comes closest to this is Peking Duck.
Most of my experience has been with either salt-drenched cooked fat or honey-dusted cooked fat; I'll have to try smoking something and then applying honey to the fat cap before I eat it. My experience is that it is really good but also quickly becomes unbalanced / no longer good; some people, on their first bite, already consider it too unbalanced to enjoy. So I do think there's something interesting here where there is a somewhat subtle taste mechanism (not just optimizing for 'more' but somehow tracking a balance) that ice cream seems to have found a weird hole in.
[edit: for my first attempt at this, I don't think the honey improved it at all? I'll try it again tho.]
When people make big and persistent mistakes, the usual cause (in my experience) is not something that comes labeled with giant mental “THIS IS A MISTAKE” warning signs when you reflect on it.
Instead, tracing mistakes back to their upstream causes, I think that the cause tends to look like a tiny note of discord that got repeatedly ignored—nothing that mentally feels important or action-relevant, just a nagging feeling that pops up sometimes.
To do better, then, I want to take stock of those subtler upstream causes, and think about the flinch reactions I exhibited on the five-second level and whether I should have responded to them differently.
I don't see anything in the lessons on the question of whether or not your stance on drama has changed, which feels like the most important bit?
That is, suppose I have enough evidence to not-be-surprised-in-retrospect if one of my friends is abusing their partner, and also I have a deliberate stance of leaving other people's home lives alone. The former means that if I thought carefully about all of my friends, I would raise that hypothesis to attention; the latter means that even if I had the hypothesis, I would probably not do anything about it. In this hypothetical, I only become a force against abuse if I decide to become a meddler (which introduces other costs and considerations).
I think many "apostasy" stories (someone who had the atheist personality-type but grew up in a religious culture, people converting from left to right or back) have this character, and tend to be very popular with the destination audience and unpopular with the source audience. On the one hand, this is unsurprising--both on the levels of tribal affiliation and intellectual dynamics. (If people who use the EA tools come to the EA conclusions, then of course attempts to build alternative conclusions with the EA tools will fail.)
But it seems like--there should still be some ability to learn something, here? Either which tools are better, or whether it is reliably the case that applying analytic philosophy to Marxism causes it to evaporate, or so on. (Or that applying Marxism tools to analytic philosophy causes it to evaporate, and why we should care about that.)
Like, 'vulnerable' is an interesting word here. Jon Elster isn't a fictional character; he wrote his book 40 years ago, and people read it, and presumably some people were convinced and other people weren't. I remember reading Hayek's The Road to Serfdom, thinking "oh I wish I knew what George Orwell would have thought after reading this", and then discovering that he had read it and hadn't been convinced by it, writing this review.
And as you might expect from someone who feels a lot of resonance with Hayek--I didn't find Orwell's criticism convincing at all! But it's strange to say that means Orwell wasn't "vulnerable" to the Hayekian critique; instead I looked at his argument (that liberal tyranny is less responsible than state tyranny) and said "yikes, that's the best you could come up with?"