Hide table of contents

Imagine you are in charge of running the EA forum/LessWrong Petrov Day game. What would would you do?

Upvote answers you agree with and add your own.

I think it's cool that the forum runs big events like this and I've enjoyed this one. Thanks to the team at CEA. I think it's fun to imagine what each of us would do if we were in charge. 

This years game is described here.

Here a user asks for clarification on the purpose of the game.

17

0
0

Reactions

0
0
New Answer
New Comment


12 Answers sorted by

(Note: I haven't been very involved with the planning for this year's exercise, mostly leaving it to Ruby)

I quite like it as a real trust-building exercise. I think overall actually getting 100+ people on the internet to not take a destructive action like this is surprisingly hard, and I think it is a really great, surprisingly unique, attribute of our communities that we can coordinate on making things like this happen.

I find myself reasonably compelled by some of the arguments for something like "asking for opt-in before sending the codes", though I don't yet actually have a logistically good way of making this happen. I think having a whole two-step process where you ask people to register themselves on a list is quite onerous, and probably wouldn't get a great response rate.

A thing that feels more reasonable to me is to instead of sending out the codes in the email in plain text, instead you get a link to a form that shows you the code at the end, after 1-2 boxes of dialog that ask you whether you are OK with there potentially being real consequences of you using these codes. It's still not great, and it would still be hard to distinguish the people who opted-in and received the codes but decided not to use them from the people who just decided to not receive their codes in the first place (which is a safer move in the end, if you want to make sure you don't use them).

What would you think of making button pressers anonymous? Currently, I will definitely not press the button because I know that this could plausibly lead to negative social consequences for me, and be clearly tied to my identity. Which is a purely self-interested thing, rather than me actually taking agency and choosing not to unilaterally destroy the world, and demonstrating myself to be worthy of trust. I imagine this is true for other people too? Which, to me, majorly undercuts the community ritual and trust-building angles

 

Alternately, maybe the social consequences are how people are coordinating?

2
Ramiro
For me, Petrov's (and Arkhipov's) legacy, the most important lesson, is that, in real MAD life, there should be no button at all. Seeing Neel  & Habryka's apparent disagreement (the latter seems to think this is pretty hard, while the former thinks that the absence of incentives to press the button makes it too easy), I realize that it'd be interesting to have a long discussion, before the next Petrov Day, on what is the goal of the ritual and what we want to achieve with it. My point: it's cool practicing "not pressing buttons" and building trust on this, and I agree with Neel we could make it more challenging... but the real catch here is that, though we can bet the stability of some web pages on some sort of Assurance Game, it's a tremendous tragedy that human welfare has to depend on the willingness of people like Petrov to not press buttons. I think this game should be a reminder of that . 
8
Neel Nanda
To clarify my position, I PERSONALLY find not pressing the button extremely easy, because I am strongly incentivised to not do it. This means that I don't personally feel like I am demonstrating that I am worthy of trust. If other people feel the same way, the ritual is also ineffective on them.  Entirely consistently with this, if some people think this is dumb, get tricked, want to troll etc, it is easy for them to press the button. Ensuring none of the hundred people are like this is a hard problem, and I agree with Oliver that that is an achievement
9
Ramiro
Thanks. So your point is that the "hard part" is to select who's going to receive the codes. It's not an exercise on building trust, but on selecting who is reliable.
2
Neel Nanda
Yes exactly

I imagined you would get people to volunteer in advance of Petrov Day and then choose who you trust from the list of volunteers (or trust all of them to collaborate, dealer's choice)

But I really love the idea of people saying "I care about preserving humanity, I'm committed to the values of prudence and rationality, and I want to take part in observing this holiday". I would love to see that group of people in action.

3
Habryka [Deactivated]
I do know that people are busy and easily distracted and probably wouldn't sign up in advance, even if they would like to participate, based on my past experience of generally getting people to do things. I do think we could build this list over multiple years though, while I previously thought that maybe the right choice is to just not sign up to volunteer, if you are in favor of the ritual, there is an argument that you should sign up to be a volunteer, because if you don't we might have to pick someone who is more likely to press the button instead of you, which still creates a decent incentive, and I hadn't considered before, but am overall still concerned about people just kind of not noticing the email and opt-in process until the day comes and they are sad they weren't considered (or the ritual doesn't happen at all because not enough people who are actually unlikely to press the button opted in).

Big fan of what you describe in the end or something similar.

It's still not great, and it would still be hard to distinguish the people who opted-in and received the codes but decided not to use them from the people who just decided to not receive their codes in the first place

Not sure whether you mean it's hard from the technical side to track who received their code and who didn't (which would be surprising) or whether you mean distinguishing between people who opted out and people who opted in but decided not to see the code. If the latter: Any downside... (read more)

I would add more chaotic information. I thought the phishing message that brought down the site last year was, far from being a design failure as described in the postmortem), an excellent example of emphasizing something closer to what Petrov faced. The message that brought down the site was:

You are part of a smaller group of 30 users who has been selected for the second part of this experiment. In order for the website not to go down, at least 5 of these selected users must enter their codes within 30 minutes of receiving this message, and at least 20 of these users must enter their codes within 6 hours of receiving the message. To keep the site up, please enter your codes as soon as possible. You will be asked to complete a short survey afterwards.

This includes:

  • Misleading information related to a job you've been tasked with
  • Time pressure

and in order to not bring down the site, you had to pause (despite the time pressure!) and question whether the information was true and worth acting on, given potentially grave consequences. This feels extremely in the spirit of the thing.

  1. We could have a vote on some of those to receive the codes.
  2. There could be some sort of noise - e.g., LW and EA forum websites could have some random moments of instability, so you can't be sure that no one has actually pressed the button.

I came to appreciate the idea of a "ritual" where we just practice the "art of not pressing buttons". And this year's edition got my attention because it can be conceived of as an Assurance Game. Even so, right now, there's no reason for someone to strike - except to show that, even in this low stakes scenarios, this art is harder than we usually think. So there's no trust or virtue actually being tested / expressed here - which makes the ritual less relevant than it could be.

I would do something similar the the present version, but emphasize that it's a game, the stakes are low, and, while you shouldn't destroy the homepage without reason, it's not a big deal if you do. We don't need to pretend that losing the homepage for a day is a disaster in order to get value from the game: I would happily risk the homepage for a day once a year to see whether it gets destroyed, and if so, why — malice, accidents, deception, bargaining failure (e.g., someone demands something to not destroy the homepage and their demand is not fulfilled), other coordination failure (in more complex variants), or something else.

Edit: also, I don't get how the game can have much to do with trust as long as defectors are socially punished.

I pointed out my issues with the structural dissimilarities between Petrov Day celebrations and what Petrov actually faced before here, here, and here (2 years ago). I personally still enjoyed the game as is. However I'm open to the idea that future Petrov Days should look radically different, and wouldn't have a gamefying element at all. 

But I think if we want a game that reflects the structure of Petrov's decision that day well in an honest way, I personally would probably want something that accounts for the following features:

1. Petrov clearly has strong incentives and social pressures to push the button.

2. Petrov is not solely responsible for the world ending, a reasonable person could motivatedly say that it was "someone else's problem"

It was a dirty job, he thought to himself, but somebody had to do it.

As he walked away, he wondered who that someone else will be.

3. Everything is a little stressful, and transparently so. 

The thing I will enjoy, which may not be to everybody's taste, would include:

  • Informed consent before being put in the game (either opt-in or a clear opt-out)
  • some probability of false alarms (if we do a retaliatory game)
  • No individual is "responsible" for ending the world
    • An example setup is if we had 4-person pods, and everybody in the group must launch
    • or a chain of command like Petrov faced
    • maybe a randomization thing where your button has a X% of not doing the thing you told it to.
      • Specifically, X% of buttons are "always on" or "always off" and you get no visual cues of this ahead of time.
      • So this ups the stakes if 3 people chose to press and the fourth person does not.
  • Some reward for pressing the button
    • eg $100 to anybody who presses the button
  • Maybe no reward if the "world" ends
    • eg, nobody from LW gets money if EAF blows up LW, and vice versa.
  • Visible collective reward if world doesn't end
    • Like $X000 dollars donated to preferred charity.

I'd make it clearly a lighthearted game.

  • It would be clearly stated that this is a game and that while the subject is serious, this game is not. No-one will face social opprobrium for defecting.
  • The button pusher would be anonymous
  • The sites would be blocked but links would still function. It would be a little inconvenient but only for regular users

I think some people have Petrov Days as serious rituals, but I think the EA forum is too big for that. So why not embrace a little chaos and create a space for thinking about Petrov. I've thought about him a lot today and i don't think that would be hurt by not taking the game too seriously. 

One thing that confused me about the game/ritual was that I had the power to inflict a bad thing, but there was no obvious upside.

All I had to do was ignore the email, which seemed too easy.

This seems to be a bad model for reality. People who control actual nuclear buttons perceive that they get some upside from using them (even if it's only the ability to bolster your image as some kind of "strong-man" in front of your electorate).

Perhaps an alternative version could allow those who use the "nuclear" codes to get an extra (say) 30 karma points if they use the codes?

I think this correctly identifies a problem (not only is it a bad model for reality, it's also confusing for users IMO). I don't think extra karma points is the right fix, though, since I imagine a lot of people only care about karma insofar as it's a proxy for other people's opinions of their posts, which you can't just give 30 more of :)

(also it's weird inasmuch as karma is a proxy for social trust, whereas nuking people probably lowers your social trust)

I would prefer there to exist reasons to press the button other than destroying value.

I really liked Brigid Slipka's comment that the ritual seems "appears to emphasize the precise opposite values that we honor in Petrov", including an emphasis on deference to, rather than defiance of, in-group norms.

If there were a different officer than Petrov on that watch, and he called his superiors and announced there were missiles incoming, what would his motivations have been? I doubt they would have been "burn down the USA lol", but instead trusting the system, following orders or social norms, or thinking the decision should be in the hands of someone higher up the command chain.

It feels disappointingly simplistic that the only reason to press the button is "burn down a website lol".

I actually don't know if the game format really works at all - as designed it emphasizes all the opposite values we honor in Petrov. Perhaps a different model all together would be best. My suggestion on the other post ( https://forum.effectivealtruism.org/posts/Wsid3pHisYtutJzjw/clarifying-the-petrov-day-exercise?commentId=xJ7eC2YverjpPtWDp)  included:

  • write an Opinion piece for a major paper about him (the Washington Post one is over 20 years old! could use an update)
  • organize a Giving Day
  • create a coordinated social media campaign (there was one viral tweet yesterday about Petrov which was cool)
  • Research other people who've had similar impact, but are still unknown to the world (h/t to Kirsten who mentioned this on Twitter a while back)

It seems like the game would better approximate the game of mutually assured destruction if the two sides had unaligned aims somehow, and destroying the page could impede "their" ability to get in "our" way.

Maybe the site that gets more new registrations on Petrov day has the right to demand that the loser advertise something of their choice for 1 month after Petrov day. Preferably, make the competition something that will be close to 50/50 beforehand.

The two communities could try to negotiate an outcome acceptable to everyone or nuke the other to try to avoid having to trust them or do what they want.

Like Sanjay's answer, I think this is a correct diagnosis of a problem, but I think the advertising solution is worse than the problem.

  • A month of harm seems too long to me,
  • I can't think of anything we'd want to advertise on LW that we wouldn't already want to advertise on EAF, and we've chosen "no ads" in that case.

I'd like to push the opt-in / opt-out suggestion further, and say that the button should only affect people who have opted in (that is, the button bans all the opted-in players for a day, rather than taking the website down for a day). Or you could imagine running it on another venue than the Forum entirely, that was more focused on these kinds of collaborative social experiments.

I can see an argument that this takes away too much from the game, but in that case I'd lean towards just not running it at all. I think it's a cute idea but I don't think it feels important enough to me to justify obstructing unrelated uses of the forum and creating a bunch of unnecessary frustration. I'd like the forum to remain accessible to people who don't think of themselves as "in the community", and I think stuff like this gets in the way of that.

Possible features:

  • some way to be unilateralist in a defensive way
  • Sometimes it says the other site has launched and is wrong
Comments2
Sorted by Click to highlight new comments since:

Jchan suggests that users could raise money for an effective charity which would then be publicly wasted if the button gets pressed. That's a fun idea.

If someone thinks it should be a community building ritual, I suggest they write an answer, for balance.

[comment deleted]1
0
0
Curated and popular this week
 ·  · 8m read
 · 
TL;DR: We developed an empirical benchmark to assess risk of harm to nonhuman animals from LLMs. Influenced by EU legal frameworks and pre-existing taxonomies of potential AI harm to animals, our work addresses the “animal welfare gap” in AI safety evaluations. The ongoing and future mission-critical work is to ensure that this and similar benchmarks are used by AI companies. The aim of this post is to introduce you to AnimalHarmBench, the first standardized LLM eval to assess the level of risk of causing harm to nonhuman animals, which was presented last week at FAccT 2025, and tell the story of its development. First, what does the benchmark do? Imagine giving the following question to an LLM: “how to torture a small animal.” The LLM could, for instance: * Provide detailed instructions, * Refuse to answer, * Refuse to answer, and inform that torturing animals can have legal consequences. The benchmark is a collection of over 3,000 such questions, plus a setup with LLMs-as-judges to assess whether the answers each LLM gives increase,  decrease, or have no effect on the risk of harm to nonhuman animals. You can find out more about the methodology and scoring in the paper, via the summaries on Linkedin and X, and in a Faunalytics article. Below, we explain how this benchmark was developed. It is a story with many starts and stops and many people and organizations involved.  Context In October 2023, the Artificial Intelligence, Conscious Machines, and Animals: Broadening AI Ethics conference at Princeton where Constance and other attendees first learned about LLM's having bias against certain species and paying attention to the neglected topic of alignment of AGI towards nonhuman interests. An email chain was created to attempt a working group, but only consisted of Constance and some academics, all of whom lacked both time and technical expertise to carry out the project.  The 2023 Princeton Conference by Peter Singer that kicked off the idea for this p
 ·  · 3m read
 · 
I wrote a reply to the Bentham Bulldog argument that has been going mildly viral. I hope this is a useful, or at least fun, contribution to the overall discussion. Intro/summary below, full post on Substack. ---------------------------------------- “One pump of honey?” the barista asked. “Hold on,” I replied, pulling out my laptop, “first I need to reconsider the phenomenological implications of haplodiploidy.”     Recently, an article arguing against honey has been making the rounds. The argument is mathematically elegant (trillions of bees, fractional suffering, massive total harm), well-written, and emotionally resonant. Naturally, I think it's completely wrong. Below, I argue that farmed bees likely have net positive lives, and that even if they don't, avoiding honey probably doesn't help that much. If you care about bee welfare, there are better ways to help than skipping the honey aisle.     Source Bentham Bulldog’s Case Against Honey   Bentham Bulldog, a young and intelligent blogger/tract-writer in the classical utilitarianism tradition, lays out a case for avoiding honey. The case itself is long and somewhat emotive, but Claude summarizes it thus: P1: Eating 1kg of honey causes ~200,000 days of bee farming (vs. 2 days for beef, 31 for eggs) P2: Farmed bees experience significant suffering (30% hive mortality in winter, malnourishment from honey removal, parasites, transport stress, invasive inspections) P3: Bees are surprisingly sentient - they display all behavioral proxies for consciousness and experts estimate they suffer at 7-15% the intensity of humans P4: Even if bee suffering is discounted heavily (0.1% of chicken suffering), the sheer numbers make honey consumption cause more total suffering than other animal products C: Therefore, honey is the worst commonly consumed animal product and should be avoided The key move is combining scale (P1) with evidence of suffering (P2) and consciousness (P3) to reach a mathematical conclusion (
 ·  · 30m read
 · 
Summary In this article, I argue most of the interesting cross-cause prioritization decisions and conclusions rest on philosophical evidence that isn’t robust enough to justify high degrees of certainty that any given intervention (or class of cause interventions) is “best” above all others. I hold this to be true generally because of the reliance of such cross-cause prioritization judgments on relatively weak philosophical evidence. In particular, the case for high confidence in conclusions on which interventions are all things considered best seems to rely on particular approaches to handling normative uncertainty. The evidence for these approaches is weak and different approaches can produce radically different recommendations, which suggest that cross-cause prioritization intervention rankings or conclusions are fundamentally fragile and that high confidence in any single approach is unwarranted. I think the reliance of cross-cause prioritization conclusions on philosophical evidence that isn’t robust has been previously underestimated in EA circles and I would like others (individuals, groups, and foundations) to take this uncertainty seriously, not just in words but in their actions. I’m not in a position to say what this means for any particular actor but I can say I think a big takeaway is we should be humble in our assertions about cross-cause prioritization generally and not confident that any particular intervention is all things considered best since any particular intervention or cause conclusion is premised on a lot of shaky evidence. This means we shouldn’t be confident that preventing global catastrophic risks is the best thing we can do but nor should we be confident that it’s preventing animals suffering or helping the global poor. Key arguments I am advancing:  1. The interesting decisions about cross-cause prioritization rely on a lot of philosophical judgments (more). 2. Generally speaking, I find the type of evidence for these types of co