This is a follow-up to the Petrov Day commemoration of 2022


We died.

Twice.

Maybe three times.


This is how it went down...

September, 21, 5:43 PM Pacific Time: I write the code that determines whether a user is authorized to launch. I was in a rush because I leaving for EA Global DC.

September 25, 8:00 PM: Petrov Day celebration commences. The button is on the frontpage. Any users with greater than 2,300 karma could press the button.

September 25: 9:05 PM: I am reminded by others at LessWrong that I was supposed to publish the launch code in the comments. Without the code, no one can nuke the site. Understandably, no one spoke up to point out the absence of code – that would be hella suspicious. This reveals that incompetence on occasion might even save you from destruction. But don't bet on it.

September 25, 10:55:04 PM: The frontpage is nuked after 2 hours and 55 minutes. The karma threshold is 2,100 and there are 290 users with that much karma or more.

September 25, 11:51 PM: Oliver Habryka is suspicious that there is a bug in the code and he digs into it, uncovering that indeed, due to an error in the logic, any user with 0 karma (and exactly 0 karma) was already able to launch nukes and destroy the frontpage. An anonymity-preserving database check confirmed that it was a user with 0 karma who had launched the missiles.

September 26, 12:36 AM: The decision is made to revive the frontpage and keep going with the Petrov Day commemoration.

September 26, 10:11 AM: The entire LessWrong website (not just the front page) goes down and is returning 502s. This is because I pushed a bad commit to fix hiding the Petrov Opt-Out checkbox without properly testing it. The site is soon restored to functioning.

September 26, 5:33:02 PM:  At this time, the karma threshold for the ability to launch is 200, and 1,504 users were able to bring down the front page. 

With two hours and 27 minutes remaining, someone presses the initiation button, enters the launch code, and presses fire. The frontpage goes down.

With the anonymity introduced in 2022, we do not know who pressed the button or why. 

September 26, 8:00:00 PM: The Petrov Day commemoration is concluded. The LessWrong frontpage is restored.


What a day. What a day indeed.

I will tell you the symbolic lessons I take from these events, though you are welcome to interpret things how you please.

First, we very successfully commemorated how shoddy engineering by well-intentioned engineers can still lead to the end of the world. In Petrov's case, the shoddy engineering lead to a false alarm that was safely ignored. In our case, we weren't so lucky, and the bad code meant that the site did indeed go down. And in real life, you don't get to decide that wasn't fun and have a do-over.

We could have decided to leave the site down, there would have been some symbolic purity in that, however it seemed more worthwhile to see what would happen under the primary design of the exercises – seeing how long it'd take for the site to go down, even if we were pretty sure that it would.

Second, I want to claim that although the site went down, it in fact revealed that users on LessWrong with more than 300 karma (1,178 of them exist, 335 visited the site on Petrov Day) are not the kind to push buttons for one reason or another. Granted many of them would not have been checking the site, but still. This is a much better outcome than the Manifold Prediction market anticipated:

You can see that for much of the day, the market thought that the site would survived for only ~30% of the day, i.e. 8 hours. In fact, the site survived 21.5 hours or 90% of the day.

I feel pretty good about that! This is even true despite the fact that a button-pusher would remain anonymous this year.

A couple of people expressed that they were very sad that the site went down this year as it means we don't get the symbol of trust we had in previous years. I'm not sure that's the right argument. Given that 335 users with 300+ karma were active on the site on Petrov Day, and the site didn't go down until we got beneath that, you could argue this is most successful Petrov Day yet on LessWrong (in past years, at most 250 people were given codes, and it's not clear they all visited LessWrong even). Plus, as above, this year the 300+ users didn't press the button despite the offer of anonymity.

Of course, this analysis ignores the fact that someone with 300+ karma might have decided to wait until the threshold was much lower than their own score in order to act with more anonymity. It's very hard to rule that out, so perhaps the above analysis is bunk. Or rather, it's far from certain but is evidence in a certain direction.


Something we introduced this year was the ability to opt out of Petrov Day. There are at least two reasons you might want to do so: 1) because you wish to object to the game, or 2) because you want to commit yourself to not pressing the button.

35 users chose to opt out. We cannot be sure of their motivations, but a few users publicly declared their opt-out, several citing that committing to not press the button seemed like the right thing to do. Kudos to them!

Lastly, Zack Stein-Perlman polled LessWrong's users to check what people overall thought about bringing the site down:

 As you can see, overwhelmingly, users think one ought not to press the button. 

Hear, hear. I agree. 

This was great, y'all. Yes, the site went down eventually, but only once the karma required to do was merely 200. From this we can conclude that in gaining more karma than that, one becomes the kind of person who doesn't destroy the world symbolically or otherwise.

Till next time!

 

  1. ^

    Specifically, the function below determines who could launch nukes. The flaw in this function is that users with 0 karma do not have a karma field on their user object, so checking their karma is below the current threshold does nothing, i.e. nothing in the function prevents them from launching.

    export const userCanLaunchPetrovMissile = (user: UsersCurrent|DbUser|null): boolean  => {
      const currentKarmaThreshold = getPetrovDayKarmaThreshold()
      const manuallyExcludedUsers: String[] = [<redacted>]
      const userCreatedBeforeCutoff = moment('2022-09-21').isSameOrAfter(moment(user?.createdAt))
    
      return !!user && userCreatedBeforeCutoff && !(manuallyExcludedUsers.includes(user._id) 
        || !!user.banned 
        || user.deleted 
        || (user.karma && user.karma < currentKarmaThreshold) 
        || (user.karma && user.karma < 0) 
        || user.petrovOptOut)
    }

New to LessWrong?

New Comment
41 comments, sorted by Click to highlight new comments since: Today at 7:52 PM

A quick retrospective on the four Manifold markets on the day, focusing on the possibility of good or bad incentives flowing out of the markets. Several people expressed concerns that the Manifold markets would cause LessWrong's home page to go down, or go down earlier, due to incentives. People also placed bets and limit orders to try to mute those bad incentives. Some examples:

"This is a classical example where having a prediction market creates really bad incentives." "I think the right thing to do here is buy yes so that it's more beneficial for others to buy and hold no." "I'm not wild about this market existing, but given that it does exist I'm strongly in favour of making it profitable for others to not press the button."

I decided to look into this. I want to say up front that I think all the incentives were very small relative to perceived stakes, and I have no suspicions of anyone after writing this comment. I am also not going to give any bettor names.

By the end of the YES/NO market, several users had bet large sums on YES and were consequently incentivized to blow up the home page. One user had m26,002 YES shares (USD $260). Another user had 7328 NO shares (USD $73) and was incentivized not to blow up the site. But this doesn't show the whole picture, because there were also limit orders. As was explained to me:

The choice of whether to leave (public) limit orders on one side or the other is the way you incentivize action.

At various points during the day there were large (m1000, etc) YES limit orders in play. This is completely useless as an incentive for an individual LessWrong user. Sure, if I was the only one who might press the button, I could bet through those limit orders, not press the button, and collect my bounty. But many people could press the button, and this would have lost me mana. They did count as an incentive for LessWrong coders! They could quietly introduce a bug preventing the button from being pressed, bet NO, and collect their winnings. Nobody took up that incentive.

I didn't see any large NO limit orders. A NO limit order on a market like this is inherently risky, because if the button is pressed the offer will definitely be taken, but if the button is not pressed it probably won't. If there had been then that could have added to the incentive some bettors had to blow up the home page, but as far as I can tell there weren't.

The positions in the WHEN market were much smaller, with the largest positions coming in at 552 shares in LATE/NEVER and 325 shares in EARLY. The current design of range markets on Manifold is fun to play with, but is not very "swingy". In particular, there is no option for outsized winnings if you can predict exactly when something will happen. If a user was planning to blow up the home page in the last few hours they could have bet a combination of LATE/NEVER in this market and YES in the binary market to maximize their gains, but I don't see any evidence of this. Because there are fewer high karma users, large EARLY bets in this market could slightly limit their incentive to bet EARLY and blow up the home page early. Unlike the YES/NO market, those bets didn't happen.

If someone was willing to anonymously blow up the home page, then a pattern of placing huge bets on that happening might look suspicious and damage their anonymity. So generally I would expect these incentives to only be effective for someone who was willing to blow up the home page and take credit for it. You can bet on that happening in Will anyone try to take credit for nuking LessWrong?. Or, place a limit NO order to incentivize someone to bet YES and then take the credit (and the mana).

When I created my market - Will LessWrong change their Petrov Day 2022 plan and reduce the chance of the button being pressed? - it was swiftly bet down to a low probability, creating a small incentive for the LessWrong admins to bet YES and then change their plans. Nobody took up that incentive, and the market didn't attract large limit orders.

Finally mkualquiera made a market - Will my friend agree to defect on Petrov Day?. This market is evidence that two people were thinking about being incentivized to blow up the home page by betting markets. But it seems like this was more about a fun game than making mana. The largest position was 301 shares on YES, and the market resolved NO, so whatever incentive those shares provided, they weren't enough to make it happen.

In the end I think there is a low (10%) chance that anyone's behavior was significantly shifted by prediction market incentives. Feel free to reply to this post with how your behavior was shifted so I can update.

I was hopeful that people might shift their behavior based on the prediction market predictions - specifically that the high probability placed on the home page being blown up would lead to design changes. However, this retrospective clarifies that Petrov Day 2022 was a social experiment, so the prediction would have just shown it was expected to work as designed.

... the primary design of the exercises – seeing how long it'd take for the site to go down, even if we were pretty sure that it would.

It's also possible that people saw the high probability that the home page would be blown up and altered their plans. They may have not blown it up because they expected someone else would (especially if they weren't confident in their anonymity being preserved). Or they may have blown it up because "if it's going to happen, it may as well be me". I think there is a higher chance (25%) that this was an effect, but I don't know what the net direction would have been. Again, feel free to reply if you think the prediction markets helped you make decisions.

Overall I think the prediction markets were a positive addition to a negative event. I think the main incentive in play was the possibility of seeing, or not seeing, the LessWrong home page being blown up. And I think we all have a lot to figure out about prediction markets.

This is great! You should make a top-level post. Please!

To clarify, me and my friend were 100% going to press the button, but we were discouraged by the false alarm. There was no fun at that point, and it made me lose like 1/3 of my total mana. I had to close all my positions to stop the losses, and we went to sleep. When we woke up it was already too late for it to be noteworthy or fun.

We can conclude that in gaining more karma than [300], one becomes the kind of person who doesn't destroy the world symbolically or otherwise.

I imagine this is tongue in cheek, but we really can't. You mentioned an important reason - someone with more karma could have waited to press the button. The first button press occurred 110 minutes after it could have been pressed. The second button press occurred at least 40 minutes after it could have been pressed, and perhaps 100, 160, 220, etc. In 2020 the button was pressed 187 minutes after it could have been pressed (by a 4000+ karma user).

You excluded known trouble makers from accessing the button but you didn't exclude unknown trouble makers, and lower karma is correlated with being unknown.

We are also dealing with a hostile intelligence who pressed the button (or caused it to be pressed). Someone with higher karma might deliberately wait to press the button to throw people off the scent, to encourage people to make a naive update about karma scores, or to reduce the negative consequences of bringing the home page down for longer without the negative consequences of leaving it up all day. The timing evidence is thus hostile evidence and updating on it correctly requires superintelligence.

Put this together and I would not place any suspicion on the noble class of 200-299 karma users that I happened to enter on Petrov Day after net positive gains from complaining about the big red button.

I am willing to update that at least one person in the 200+ karma range pressed the button, and at least one person with zero karma pressed the button. This assumes there was not a third bug in play. This does not change my opinion of LessWrong users, but those who predicted that the home page would remain up could update.

I also imagine that it was tongue in cheeck but I also think that the structure of the whole thing so heavily suggests this line of thinking that on surface level recognising it to be wrong doesn't really dispell it.

The timing evidence is thus hostile evidence and updating on it correctly requires superintelligence.

What do you mean by this? It seems trivially false that updating on hostile evidence requires superintelligence; for example poker players will still use their opponent's bets as evidence about their cards, even though these bets are frequently trying to mislead them in some way.

The evidence being from someone who went against the collective desire does mean that confidently taking it at face value is incorrect, but not that we can't update on it.

Good callout, that sentence is simplified. I think the conclusion is correct.

Epistemic status: personal rule of thumb, defensively oriented.

Example: Cursed Monty Hall. This is like Monty Hall except that we know that Monty doesn't want us to win and is free to reveal whatever evidence he wants, at no cost to himself. Before Monty opens a door, we think that sticking with our choice has the same EV as switching to another door. After Monty opens a door, this should not change our decision. If updating on the evidence would cause us to make a better decision, Monty would not have given us the evidence.

It's not quite that simple in other cases. In Cursed Monty Hall, we assume it costs Monty nothing to feed us false evidence. In Poker, it costs money to make a bet. A player's desire to feed false evidence to their opponent is limited by the cost of providing it. Another way of looking at this is that a poker bet is not purely hostile evidence, it is also an action in the game.

Another example from Poker is deliberate table talk aimed at deception. This is more purely hostile evidence, it costs no in-game resources to do this. Updating based on table talk is therefore much harder than updating correctly based on bets. Whether it requires a "superintelligence" to update "correctly" is probably down to semantics.

In the LessWrong RedButton game, there is a cost to blowing up the home page when one is normally sleeping. We might update a little on the most likely sleeping habits of the attacker. But not too much! The value of the update must be less than the cost of misleading us, or else the attacker will pay the cost in order to mislead us. Whatever value we gain from updating positively about people who were asleep at 5:33:02 PM on 2022-09-26, it must be less than the cost to the attacker of staying up late or waking up early, one day a year.

Similarly, for a 200+ karma user there is no clear cost or benefit to the attacker in blowing up the home page at the beginning of their available window vs the end. So we should not update on the karma of the attacker, beyond noting that it was 200+. I welcome attempts to calculate a better Bayesian update on the karma of the attacker than that but I don't see how it's going to work.

I'm not really sure you can treat the button-presser as hostile in the same sense as someone you are playing poker against is hostile. Someone might for example just think it's funny to take down the frontpage, it doesn't mean they have an incentive to minimize the information we get out of it.

I second that we can't really conclude that high-karma users aren't the button-pressing types, for the reasons you reference

I like this comment.

[-][anonymous]2y112

I know this is going to come off as overly critical no matter how I frame it but I genuinely don't mean it to be.

Another takeaway from this would seem to be an update towards recognizing the difference between knowing something and enacting it or, analogously, being able to identify inadequacy vs. avoid it. People on LW often discuss, criticize, and sometimes dismiss folks who work at companies that fail to implement all security best practices or do things like push to production without going through proper checklists. Yet, this is a case where exactly that happened, even though there was not strong financial or (presumably) top down pressure to act quickly.

Actually it seems that the industry standard (?) process of code review was followed just fine. Yet the wrong logic still went through. (Actually based on the github PR it seems that the reviewer himself suggested the wrong logic?)

I think in this case there would also be plenty to say about blindly following checklists. (Could code review in some cases make things worse by making people think less about the code they write?)

EDIT: Actually based on the TS types user.karma can't be missing. Either the types or Ruby's explanation is wrong. Clearly multiple things had to go wrong for this bug to slip through.

[-][anonymous]2y63

I don't know all the details of what testing was done, but I would not describe code review and then deploying as state-of-the-art as this ignores things like staged deploys, end-to-end testing, monitoring, etc. Again, I'm not familiar with the LW codebase and deploy process so it's possible all these things are in place, in which case I'd be happy to retract my comment!

To me it seems that the average best practices are being followed.[1] But these "best practices" are still just a bunch of band-aids, which happen to work fairly very well for most use-cases.

A much more interesting question to ask here is what if something important like ... humanity's survival depended on your software? It seems that software correctness will be quite important for alignment. Yet I see very few people seriously trying to make creating correct software scalable. (And it seems like a field particularly suited for empirical work, unlike alignment. I mean, just throw your wildest ideas at a proof checker, and see what sticks. After you have a proof, it doesn't matter at all how it was obtained.)


  1. And I think the amount of effort in this case is perfectly justified. I mean this was code for a one-off single day event, nothing mission critical. It would be unreasonable to expect much more for something like this. ↩︎

[-][anonymous]2y72

Mainly commenting on your footnote, I generally agree that it's fine to put low amounts of effort into one-off simple events. The caveat here is that this is an event that is 1) treated pretty seriously in past years and 2) is a symbol of a certain mindset that I think typically includes double-checking things and avoiding careless mistakes.

This is why I really wish we had an AI that had superhuman code, theorem proving and translation to natural language, and crucially only those capabilities, so that we can prove certain properties.

The types are wrong. It's sad that the types are wrong. I sure wish they weren't wrong, but changing it in all the cases is a pretty major effort (we would have to manually annotate all fields to specify which things can be null/undefined in the database, and which ones can't, and then would have to go through a lot of type errors).

Yeah, makes sense. Indeed such tech debt can't be fixed overnight.

The types were introduced after most of the user objects had already been created, and I guess no one ever ran a migration to fill them in.

I see, makes sense.

On the other hand I am afraid this reinforces NaiveTortoise's point, this seems like an underlying issue that could potentially lead to bugs much worse than this...

Thanks for posting and explaining the code - that's an interesting, subtle bug.

I think we learn more from Petrov Day when the site goes down than we would if it stayed up, although nothing is ever going to beat the year someone tricked someone into pressing the button by saying they had to press the button to keep the site up.  That was great.

The funny thing is that I had assumed the button was going to be buggy, though I was wrong how. The map header has improperly swallowed mouse scroll wheel events whenever it's shown; I had wondered if the button would also interpret them likewise since it was positioned in the same way, so I spent most of the day carefully dragging the scrollbar.

It's a good idea to be pedantically clear with terminology, and I thank you for saying "front page" rather than "site" in most of the timeline.  I visited the site multiple times (and saw a 502 at one point), but I never look at the front page, using my /allPosts bookmark.  I never saw the button, though I have enough karma (though I may be removed as a troublemaker, as I think this excercise is rather less weighty than it's made out to be.  I don't actually know whether I'd press it, given the chance.).

more than 300 karma (1,178 of them exist, 335 visited the site on Petrov Day)

Visited the site, or visited the front page?  I'm mostly curious how many of the regulars use the front page, vs using greaterwrong (which I presume isn't even a site visitor) or a deeper bookmark.

That's actually a very good question. When I filter for visitors to the frontpage, it's 1053 and 271.

Accidentally allowing anyone to launch, and then having it happen, is a pretty memorable way of demonstrating the pitfalls of this kind of setup.

As another commenter pointed out, there was negligible financial pressure or top down command pressure to rush the implementation, presumably, compared to more serious systems or to Petrov's situation. So the accident is all the more enlightening.

negligible financial pressure or top down command pressure to rush the implementation

False. We have a gazillion valuable things we could be doing and not nearly enough time. I figured Petrov Day was worth 2 days of effort. I think in total we spent four. 

Yeah, but that pressure was neither financial or top-down command pressure, so I think the original comment is right here.

Fine. Merely the pressure of the fate of the world.

It's hard to tell whether your joking or being serious. Do you really believe the pressures your experiencing are that significant compared to what Petrov was experiencing on that day?

There's a real element of seriousness to it. I do generally feel like we need to move faster and Petrov Day was yet another thing where I wanted to get it done quickly in a small timebox. This means that I did feel rushed. And that rushed comes from feeling a lot is at stake.

Also, the question was the pressures on creating the system, not the pressures during the false alarm.

Also, the question was the pressures on creating the system, not the pressures during the false alarm.

What relation does the prior have to the latter? 

You received financial pressure and/or top down command pressure to rush it in 4 days? Or was it your own decision? Because the former would imply some rather significant things.

I don't think that "users active on the site on Petrov day", nor "users who visited the homepage on Petrov day" are good metrics; someone who didn't want to press the button would have no reason to visit the site, and they might have not done so either naturally (because they don't check LW daily) or artificially (because they didn't want to be tempted or didn't want to engage with the exercise.) I expect there are a lot of users who simply don't care about Petrov day, and I think they should still be included in the set of "people who chose not to press the button".

What about "users who viewed the Petrov day announcement article or visited the homepage"? That should more accurately capture the set of users who were aware of their ability to nuke the homepage and chose not to do so. (It still misses anyone who found out via social media, Manifold, etc., but there's not much you can do about that.)

Given that 335 users with 300+ karma were active on the site on Petrov Day, and the site didn't go down until we got beneath that, you could argue this is most successful Petrov Day yet on LessWrong (in past years, at most 250 people were given codes, and it's not clear they all visited LessWrong even). Plus, as above, this year the 300+ users didn't press the button despite the offer of anonymity.

I think that reasoning apllies only for the subset of users in the Americas. For users in Europe the time point when 300+ was enough to launch was deep in the night, and for parts of Asia very early in the morning. Someone from that group would have had to set the alarm to get up from bed to nuke the site which required considerable more energy than not withstanding the temptation and pressing the launch button while visiting Less Wrong during the day.

Still, I think it was a successful Petrov Day.

The flaw in this function is that users with 0 karma do not have a karma field on their user object

Out of interest, is this true of all users with 0 karma, or specifically users who've never received a vote? That is, if I vote on a comment of a 0-karma user and then retract it, will they have a karma object? (Or "users who have no current votes applied to them" would also be plausible, who wouldn't have a karma object in that situation but would if I voted up and someone else voted them back down to 0.)

[-][anonymous]2y10

I didn't push the button because I knew that doing so would actually take down the front page. And I only believed that would happen because it happened last year.

users on LessWrong with more than 300 karma (1,178 of them) are not the kind to push buttons for one reason or another. Granted many of them would not have been checking the site

Do you have website access stats that would let you compute precisely how many of them did in fact load the homepage during Petrov Day?

335 users with 300 karma or more were active on Petrov Day. 1644 logged-in users total created before the cutoff. 

Adjusting for the unilateralist's curse is good, but creating common knowledge about everyone's beliefs is even better. So: please agree-vote this comment if you think the site should be nukable by anyone with 200+ karma and disagree-vote if you think it shouldn't. (Strong-vote if you feel strongly or have relevant private information.)

What exactly is the proposal here? Have next year's Petrov Day celebration only go down to 300 karma, or what?

A wide variety of proposals are compatible with the belief that the home page should not be nukable by anyone with 200+ karma, including the 2019, 2020, and 2021 celebrations.

(pedantically even this year didn't allow strictly anyone with sufficient karma to press the button as two "known trouble makers" were excluded in code and others chose to opt out)