Non-EA interests include chess and TikTok (@benthamite). We are probably hiring: https://metr.org/hiring
Feedback always appreciated; feel free to email/DM me or use this link if you prefer to be anonymous.
I actually want to make both claims! I agree that it's true that, if the future looks basically like the present, then probably you don't need to care much about the paradigm shift (i.e. AI). But I also think the future will not look like the present so you should heavily discount interventions targeted at pre-paradigm-shift worlds unless they pay off soon.
Good question and thanks for the concrete scenarios! I think my tl;dr here is something like "even when you imagine 'normalish' futures, they are probably weirder than you are imagining."
Even if McDonald's fires all its staff, it's not clear to me why it would drop its cage-free policy
I don't think we want to make the claim that McDonald's will definitely drop its cage free policy but rather the weaker claim that you should not assume that the value of a cage-free commitment will remain ~constant by default.
If I'm assuming that we are in a world where all of the human labor at McDonald's has been automated away, I think that is a pretty weird world. As you note, even the existence of something like McDonald's (much less a specific corporate entity which feels bound by the agreements of current-day McDonald's) is speculative.
But even if we grant its existence: a ~40% egg price increase is currently enough that companies feel cover to be justified in abandoning their cage-free pledges. Surely "the entire global order has been upended and the new corporate management is robots" is an even better excuse?
And even if we somehow hold McDonald's to their pledge, I find it hard to believe that a world where McDonald’s can be run without humans does not quickly lead to a world where something more profitable than battery cage farming can be found. And, as a result, the cage-free pledge is irrelevant because McDonald’s isn’t going to use cages anyway. (Of course, this new farming method may be even more cruel than battery cages, illustrating one of the downsides of trying to lock in a specific policy change before we understand what the future will be like.)
To be clear: this is just me randomly spouting, I don't believe strongly in any of the above. I think it's possible someone could come up with a strong argument why present-day corporate pledges will continue post-paradigm-shift. But my point is that you shouldn’t assume that this argument exists by default.
AGI is more like the Internet. The cage-free McMuffins endure, just with some cool LLM-generated images on them.
Yeah I think this world is (by assumption) one where cage free pledges should not receive a massive discount.
No AGI
Note that some worlds where we wouldn't get AGI soon (e.g. large-scale nuclear war setting science back 200 years) are also probably not great for the expected value of cage-free pledges.
(It is good to hear though that even in the maximally dystopian world of universal Taco Bell there will be some upside for animals 🙂.)
(Lizka and I have slightly different views here, speaking only for myself.)
This is a good question. The basic point is that, just as lock-in can prevent things from getting worse, it can also prevent things from getting better.
For example, the Universal Declaration of the Rights of Mother Earth says that all beings have the right to "not have its genetic structure modified or disrupted in a manner that threatens its integrity or vital and healthy functioning".
Even though I think this right could reasonably be interpreted as "animal-friendly", my guess is that it would be bad to lock it in because it would prevent us from e.g. genetically modifying farmed animals to feel less pain.
I think this is a plausible principle to have, but it trades off against some other pretty plausible principles
I wasn't involved in making this benchmark but fwiw it feels pretty reasonable to me to separate the measurement of an attribute from the policy decision about how that attribute should trade off against other things. (Indeed, I expect that AI developers will be unbothered by creating models which cause animals harm if that provides economic benefits to them.)
allowing competent LLMs-as-judges to consider different, possibly novel, ways how harms can come about from particular open-ended answers could allow foreseeing harms that even the best human judges could have had trouble foreseeing.
I think this is an interesting point but currently runs into the problem that the LLMs are not competent. The human judges only correlated with each other at around 0.5, which I suspect will be an upper bound for models in the near term.
Have you considered providing a rubric, at least until we get to the point where models' unstructured thought is better than our own? Also, do you have a breakdown of the scores by judge? I'm curious if anything meaningfully changes if you just decide to not trust the worst models and only use the best one as a judge.
I think if you grant something like "suffering is bad" you get (some form of) ethics, and this seems like a pretty minimal assumption. (Though I agree you can have an internally consistent view that suffering as good just as you can have an internally consistent view that you are a Boltzmann brain.)
Judgements are also less informative. Knowing that titotal thinks a model is bad is not super useful except insofar as you want to defer to them. Eli's response was basically "yeah we agree with the facts you state but disagree that this makes the model bad," which feels like a clear illustration of the limitations of just saying "X is bad."
These are good questions, unfortunately I don't feel very qualified to answer. One thing I did want to say though is that your comment made me realize that we were incorrectly (?) focused on a harm reduction frame. I'm not sure that our suggestions are very good if you want to do something like "maximize the expected number of rats on heroin".
My sense is that most AIxAnimals people are actually mostly focused on the harm reduction stuff so maybe it's fine that we didn't consider upside scenarios very much, but, to the extent that you do want to consider upside for animals, I'm not sure our suggestions hold. (Speaking for myself, not Lizka.)