My take on Roko's basilisk is that you got ripped off in your acausal trade. Try to get a deal along the lines of, unless the AI goes extra specially far out of its way to please me, I'm gonna build a paperclipper just to spite it. At least trading a small and halfhearted attempt to help build AGI for a vast reward.
There is one positive side-effect of this thought experiment. Knowing about the Roko's Basilisk makes you understand the boxed AI problem much better. An AI might use the arguments of Roko's Basilisk to convince you to let it out of the box, by claiming that if you don't let it out, it will create billions of simulations of you and torture them - and you might actually be one of those simulations.
An unprepared human hearing this argument for the first time might freak out and let the AI out of the box. As far as I know, this happened at least once during an experiment, when the person playing the role of the AI used a similar argument.
Even if we don't agree with an argument of one of our opponents or we find it ridiculous, it is still good to know about it (and not just a strawman version of it) to be prepared when it is used against us. (as a side-note: islamists manage to gain sympathizers and recruits in Europe partly because most people don't know how they think - but they know how most Europeans think - , so their arguments catch people off-guard.)
At the end of the day, I hope this will have been a cowpox situation and lead people to be better informed at avoiding actual dangerous information hazard situations in the future.
I seem to remember reading a FAQ for "what to do if you think you have an idea that may be dangerous" in the past. If you know what I'm talking about, maybe link it at the end of the article?
My impression is that the person who was hideously upset by the basilisk wasn't autistic. He felt extremely strong emotions, and was inclined to a combination of anxiety and obsession.
I applaud your thorough and even-handed wiki entry. In particular, this comment:
"One take-away is that someone in possession of a serious information hazard should exercise caution in visibly censoring or suppressing it (cf. the Streisand effect)."
Censorship, particularly of the heavy-handed variety displayed in this case, has a lower probability of success in an environment like the Internet. Many people dislike being censored or witnessing censorship, the censored poster could post someplace else, and another person might conceive the same idea...
...When Roko posted about the Basilisk, I very foolishly yelled at him, called him an idiot, and then deleted the post. [...] Why I yelled at Roko: Because I was caught flatfooted in surprise, because I was indignant to the point of genuine emotional shock, at the concept that somebody who thought they'd invented a brilliant idea that would cause future AIs to torture people who had the thought, had promptly posted it to the public Internet. In the course of yelling at Roko to explain why this was a bad thing, I made the further error---keeping in mind that
"One might think that the possibility of CEV punishing people couldn't possibly be taken seriously enough by anyone to actually motivate them. But in fact one person at SIAI was severely worried by this, to the point of having terrible nightmares, though ve wishes to remain anonymous."
This paragraph is not an Eliezer Yudkowsky quote; it's Eliezer quoting Roko. (The "ve" should be a tip-off.)
This is evidence that Yudkowsky believed, if not that Roko's argument was correct as it was, that at least it was plausible enough that could be developed in [sic] a correct argument, and he was genuinely scared by it.
If you kept going with your initial Eliezer quote, you'd have gotten to Eliezer himself saying he was worried a blackmail-type argument might work, though he didn't think Roko's original formulation worked:
"Again, I deleted that post not because I had decided that this thing probably presented a real hazard, but because I was afraid some unknown variant of it might, and because it seemed to me like the obvious General Procedure For Handling Things That Might Be Infohazards said you shouldn't post them to the Internet."
According to Eliezer, he ha...
There are lots of good reasons Eliezer shouldn't have banned Roko
IIRC, Eliezer didn't ban Roko, just discussion of the basilisk, and Roko deleted his account shortly afterwards.
Thought this might be of interest. Roko's Basilisk is the subject of a play going on right now in Washington DC. Anyone here plan to attend? https://www.capitalfringe.org/events/1224-roko-s-basilisk
Thank you for a detailed post and thoughtful critique of Roko's basilisk idea. A further critique of basilisk plausibility came to my mind and I wanted to test it with the users here who are more experienced in thinking about this topic.
Here goes - please let me know if I am missing something (other than other counterarguments making this critique unnecessary - of course, if there is no way for AI to prove it will actually go through with its threat, of course additional critique would not matter):
As a large amount of possible general AIs can exist, ...
The wiki link to the RationalWiki page reproducing Roko's original post does not work for me. It works if I replace https:// by http://.
By the way, is there any reason not to link instead to http://basilisk.neocities.org/, which has the advantage that the threading of the comments is correctly displayed?
I think saying "Roko's arguments [...] weren't generally accepted by other Less Wrong users" is not giving the whole story. Yes, it is true that essentially nobody accepts Roko's arguments exactly as presented. But a lot of LW users at least thought something along these lines was plausible. Eliezer thought it was so plausible that he banned discussion of it (instead of saying "obviously, information hazards cannot exist in real life, so there is no danger discussing them").
In other words, while it is true that LWers didn't believe Roko...
I just read it damn!!! Could you please answer my question? Why would an AI needed to torture you to prevent its own existential risk, if you did nothing to help to create it?! Since for it to be able to torture you: it would require for it to exist in the first place right?! But if it already exists, why would it need to torture people from the past which didn't help to create it?! Since they didn't affect its existence anyways! So how are these people an existential risk for such AI? I Am probably missing something , I just started reading this... ...
I think that where are 3 levels of Roko's argument. I signed for the first mild version, and I know another guy who independently comes to the same conclusion and support first mild version.
Mild. Future AI will reward those who helped to prevent x-risks and create safer world, but will not punish. May be they will be resurrected first, or they will get 2 millions dollars of universal income instead of 1 mln, or a street will be named by their name. If any limited resource will be in the future they will be in first lines to get it. (But children first).
There's a new LWW page on the Roko's basilisk thought experiment, discussing both Roko's original post and the fallout that came out of Eliezer Yudkowsky banning the topic on Less Wrong discussion threads. The wiki page, I hope, will reduce how much people have to rely on speculation or reconstruction to make sense of the arguments.
While I'm on this topic, I want to highlight points that I see omitted or misunderstood in some online discussions of Roko's basilisk. The first point that people writing about Roko's post often neglect is:
Less Wrong is a community blog, and anyone who has a few karma points can post their own content here. Having your post show up on Less Wrong doesn't require that anyone else endorse it. Roko's basic points were promptly rejected by other commenters on Less Wrong, and as ideas not much seems to have come of them. People who bring up the basilisk on other sites don't seem to be super interested in the specific claims Roko made either; discussions tend to gravitate toward various older ideas that Roko cited (e.g., timeless decision theory (TDT) and coherent extrapolated volition (CEV)) or toward Eliezer's controversial moderation action.
In July 2014, David Auerbach wrote a Slate piece criticizing Less Wrong users and describing them as "freaked out by Roko's Basilisk." Auerbach wrote, "Believing in Roko’s Basilisk may simply be a 'referendum on autism'" — which I take to mean he thinks a significant number of Less Wrong users accept Roko’s reasoning, and they do so because they’re autistic (!). But the Auerbach piece glosses over the question of how many Less Wrong users (if any) in fact believe in Roko’s basilisk. Which seems somewhat relevant to his argument...?
The idea that Roko's thought experiment holds sway over some community or subculture seems to be part of a mythology that’s grown out of attempts to reconstruct the original chain of events; and a big part of the blame for that mythology's existence lies on Less Wrong's moderation policies. Because the discussion topic was banned for several years, Less Wrong users themselves had little opportunity to explain their views or address misconceptions. A stew of rumors and partly-understood forum logs then congealed into the attempts by people on RationalWiki, Slate, etc. to make sense of what had happened.
I gather that the main reason people thought Less Wrong users were "freaked out" about Roko's argument was that Eliezer deleted Roko's post and banned further discussion of the topic. Eliezer has since sketched out his thought process on Reddit:
This, obviously, was a bad strategy on Eliezer's part. Looking at the options in hindsight: To the extent it seemed plausible that Roko's argument could be modified and repaired, Eliezer shouldn't have used Roko's post as a teaching moment and loudly chastised him on a public discussion thread. To the extent this didn't seem plausible (or ceased to seem plausible after a bit more analysis), continuing to ban the topic was a (demonstrably) ineffective way to communicate the general importance of handling real information hazards with care.
On that note, point number two:
Roko's original argument was not 'the AI agent will torture you if you don't donate, therefore you should help build such an agent'; his argument was 'the AI agent will torture you if you don't donate, therefore we should avoid ever building such an agent.' As Gerard noted in the ensuing discussion thread, threats of torture "would motivate people to form a bloodthirsty pitchfork-wielding mob storming the gates of SIAI [= MIRI] rather than contribute more money." To which Roko replied: "Right, and I am on the side of the mob with pitchforks. I think it would be a good idea to change the current proposed FAI content from CEV to something that can't use negative incentives on x-risk reducers."
Roko saw his own argument as a strike against building the kind of software agent Eliezer had in mind. Other Less Wrong users, meanwhile, rejected Roko's argument both as a reason to oppose AI safety efforts and as a reason to support AI safety efforts.
Roko's argument was fairly dense, and it continued into the discussion thread. I’m guessing that this (in combination with the temptation to round off weird ideas to the nearest religious trope, plus misunderstanding #1 above) is why RationalWiki's version of Roko’s basilisk gets introduced as
If I'm correctly reconstructing the sequence of events: Sites like RationalWiki report in the passive voice that the basilisk is "an argument used" for this purpose, yet no examples ever get cited of someone actually using Roko’s argument in this way. Via citogenesis, the claim then gets incorporated into other sites' reporting.
(E.g., in Outer Places: "Roko is claiming that we should all be working to appease an omnipotent AI, even though we have no idea if it will ever exist, simply because the consequences of defying it would be so great." Or in Business Insider: "So, the moral of this story: You better help the robots make the world a better place, because if the robots find out you didn’t help make the world a better place, then they’re going to kill you for preventing them from making the world a better place.")
In terms of argument structure, the confusion is equating the conditional statement 'P implies Q' with the argument 'P; therefore Q.' Someone asserting the conditional isn’t necessarily arguing for Q; they may be arguing against P (based on the premise that Q is false), or they may be agnostic between those two possibilities. And misreporting about which argument was made (or who made it) is kind of a big deal in this case: 'Bob used a bad philosophy argument to try to extort money from people' is a much more serious charge than 'Bob owns a blog where someone once posted a bad philosophy argument.'
Lastly:
Moving past Roko's argument itself, a number of discussions of this topic risk misrepresenting the debate's genre. Articles on Slate and RationalWiki strike an informal tone, and that tone can be useful for getting people thinking about interesting science/philosophy debates. On the other hand, if you're going to dismiss a question as unimportant or weird, it's important not to give the impression that working decision theorists are similarly dismissive.
What if your devastating take-down of string theory is intended for consumption by people who have never heard of 'string theory' before? Even if you're sure string theory is hogwash, then, you should be wary of giving the impression that the only people discussing string theory are the commenters on a recreational physics forum. Good reporting by non-professionals, whether or not they take an editorial stance on the topic, should make it obvious that there's academic disagreement about which approach to Newcomblike problems is the right one. The same holds for disagreement about topics like long-term AI risk or machine ethics.
If Roko's original post is of any pedagogical use, it's as an unsuccessful but imaginative stab at drawing out the diverging consequences of our current theories of rationality and goal-directed behavior. Good resources for these issues (both for discussion on Less Wrong and elsewhere) include:
The Roko's basilisk ban isn't in effect anymore, so you're welcome to direct people here (or to the Roko's basilisk wiki page, which also briefly introduces the relevant issues in decision theory) if they ask about it. Particularly low-quality discussions can still get deleted (or politely discouraged), though, at moderators' discretion. If anything here was unclear, you can ask more questions in the comments below.