Everyone loves a thought experiment, from Maxwell’s demon to the classic bootstrap paradox. But there is one thought experiment – briefly banned by the Internet forum where it was first posted – which you might regret reading about, known as “Roko’s basilisk”.
Basilisks, as anyone familiar with ancient folklore or Harry Potter will know, are mythical reptiles which can kill people just by looking them in the eyes. Roko’s basilisk is named after it, as once you’ve heard the thought experiment (according to the thought experiment) it means you are more likely to receive negative consequences as a result.
For this reason, among others, the thought experiment was banned from LessWrong, where it was first posted.
So, what exactly is it? The idea, proposed by LessWrong user Roko, has its roots in game theory, and the prisoner’s dilemma. In the prisoner’s dilemma, two prisoners facing jail time are offered the chance to go free if they flip on their fellow prisoner. But there a few other possible outcomes.
If they both flip, they will each go to jail for two years. If one flips on the other, they will go free while the other gets three years. If they both remain silent, they will receive one year in jail each. If you are in that situation, should you choose to betray your fellow prisoner, or stay silent?
Rationally, it makes sense to betray your fellow prisoner. If you flip you will either go free, or have a two year sentence instead of three. Unfortunately, it also makes sense for the other prisoner to betray you, and so the optimal choice for both of you – one year for both staying silent – is taken off the table.
Philosophers and game theorists have argued about how you should act during the prisoner’s dilemma, and whether a good outcome can be achieved. This is especially relevant for people attempting to design autonomous artificial intelligence (AI) agents, wishing to encourage the best outcomes from their programming. In short, if we get true AI, we need it to make rational decisions that create better outcomes, not worse.
One way that LessWrong’s founder suggested would lead to a favorable outcome is if two identical AI agents were playing the same game, and knew that the other AI was running the same decision-making program. The AI would use timeless decision theory (TDT), where “agents should decide as if they are determining the output of the abstract computation that they implement”.
In Roko’s thought experiment, there are similar rational decisions that lead to terrible consequences. Roko imagines that a “positive singularity” will exist some time in the future, where AI has surpassed humanity but still acts in its interests. Given that the AI is attempting to protect humanity, for example from existential threats, it may lead to negative consequences for those who do not try to avert these existential threats.
“In this vein, there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn’t give 100 percent of their disposable incomes to x-risk motivation,” Roko wrote. “This would act as an incentive to get people to donate more to reducing existential risk, and thereby increase the chances of a positive singularity.”
More than this, the AI may choose to retroactively punish anyone who knew about the future AI (the basilisk) but failed to do what they could in order to bring it into existence.
“By merely entertaining the idea of such a being and not facilitating its development you would expose yourself to the possibility that it would deduce that you had not acted in accordance with the duty to bring it into existence (the moralistic tone of the experiment is enforced by the fact that the AI is paradoxically a benevolent one whose task is to protect humankind, and therefore those who don’t facilitate its existence desire ill against their fellow men),” philosopher Isabel Millar explained in her thesis on the psychoanalysis of AI.
“The vengeful Abrahamic nature of the Basilisk meant that in future, it could recreate a simulation of you to torture for all eternity for the sin of putting him at existential risk. The Old Testament stylings of the Basilisk are clear: he’s nice, but only if you deserve it.”
What’s more, according to Roko, the AI may reserve worse punishments for those who knew about it but failed to do anything, rather than those who knew nothing about it. So by learning about it, you would now be cursed to harsher punishment for failing to commit to making the positive singularity.
The argument sounds a little silly, but when it was posted it caused quite a stir.
“One might think that the possibility of [coherent extrapolated volition of humanity] punishing people couldn’t possibly be taken seriously enough by anyone to actually motivate them. But in fact one person at [the Singularity Institute for Artificial Intelligence] was severely worried by this, to the point of having terrible nightmares, though ve wishes to remain anonymous. I don’t usually talk like this, but I’m going to make an exception for this case,” the founder of LessWrong, Eliezer Yudkowsky, replied in the comments.
“Listen to me very closely, you idiot. YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL.”
After this, Roko’s post was removed, while discussion of the basilisk was banned for several years. Roko also went on to regret posting about the basilisk.
“Look, you have three people all of whom think it is a bad idea to spread this. All are smart. Two initially thought it was OK to spread it,” Roko wrote.
“I would add that I wish I had never learned about any of these ideas. In fact, I wish I had never come across the initial link on the internet that caused me to think about transhumanism and thereby about the singularity; I wish very strongly that my mind had never come across the tools to inflict such large amounts of potential self-harm with such small durations of inattention, uncautiousness and/or stupidity, even if it is all premultiplied by a small probability. (not a very small one, mind you. More like 1/500 type numbers here). If this is not enough warning to make you stop wanting to know more, then you deserve what you get.”
While the idea clearly scared some people, it is a little silly to worry about in the literal sense. An AI is unlikely to punish you for failing to create it sooner, especially given the extra resources that following through with the retroactive blackmail would entail. But it does highlight problems within AI and game theory, and the importance of getting it right if we are to create a singularity.
On the other hand, if that were to happen they could also wipe us out to produce paperclips, so maybe being punished by a vengeful basilisk isn’t as bad of a consequence as it seems.