Roko's Basilisk - History

•

Applied to From Human to Posthuman: Transhumanism, Anarcho-Capitalism, and AI’s Role in Global Disparity and Governance by DyingNaive 2mo ago

•

Applied to Elon and Roko’s basilisk by Maxh 2mo ago

•

Applied to Could Roko's basilisk acausally bargain with a paperclip maximizer? by Christopher King 2y ago

•

Applied to Why kill everyone? by arisAlexis 2y ago

•

Applied to I Am Scared of Posting Negative Takes About Bing's AI by Yitz 2y ago

Aprillion2y10

I mostly fixed the page by removing quotes from links (edited as markdown in VS Code, 42 links were like []("...") and 64 quotes were double-escaped \") ... feel free to double check (I also sent feedback to moderators, maybe they want to check for similar problems on other pages on DB level)

Aprillion v1.19.0Nov 30th 2022 GMT (+208/-272) 2

A visual depiction of a prisoner's dilemma. T denotes the best outcome for a given player, followed by R, then P, then S.

One example of a Newcomblike problem is the prisoner's dilemma. This is a two-player game in which each player has two options: ~~\"cooperate,\~~"cooperate," or ~~\"defect.\~~"defect." By assumption, each player prefers to defect rather than cooperate, all else being equal; but each player also prefers mutual cooperation over mutual defection.

One of the basic open problems in decision theory is that standard ~~\"rational\"~~"rational" agents will end up defecting against each other, even though it would be better for both players if they could somehow enact a binding mutual agreement to cooperate instead.

In other words, the standard formulation of CDT cannot model scenarios where another agent (or a part of the environment) is correlated with a decision process, except insofar as the decision causes the correlation. The general name for scenarios where CDT fails is ~~\"Newcomblike~~"Newcomblike problems,\" and these scenarios are ubiquitous in human interactions.

Yudkowsky's interest in decision theory stems from his interest in the AI control problem: ~~\"If~~"If artificially intelligent systems someday come to surpass humans in intelligence, how can we specify safe goals for them to autonomously carry out, and how can we gain high confidence in the agents' reasoning and decision-making?\" Yudkowsky has argued that in the absence of a full understanding of decision theory, we risk building autonomous systems whose behavior is erratic or difficult to model.

Because Eliezer Yudkowsky founded Less Wrong and was one of the first bloggers on the site, AI theory and ~~\"acausal\"~~"acausal" decision theories — in particular, logical decision theories, which respect logical connections between agents' properties rather than just the causal effects they have on each other — have been repeatedly discussed on Less Wrong. Roko's basilisk was an attempt to use Yudkowsky's proposed decision theory (TDT) to argue against his informal characterization of an ideal AI goal (humanity's coherently extrapolated volition).

A simple depiction of an agent that cooperates with copies of itself in the one-shot prisoner's dilemma. Adapted from the Decision Theory FAQ.

Roko observed that if two TDT or UDT agents with common knowledge of each other's source code are separated in time, the later agent can (seemingly) blackmail the earlier agent. Call the earlier agent ~~\"Alice\"~~"Alice" and the later agent ~~\"Bob.\~~"Bob." Bob can be an algorithm that outputs things Alice likes if Alice left Bob a large sum of money, and outputs things Alice dislikes otherwise. And since Alice knows Bob's source code exactly, she knows this fact about Bob (even though Bob hasn't been born yet). So Alice's knowledge of Bob's source code makes Bob's future threat effective, even though Bob doesn't yet exist: if Alice is certain that...

Read More (2910 more words)

•

Applied to Is acausal extortion possible? by sisyphus 2y ago

gamestroyer2y20

What about a similar AI that helps anyone who tries to bring him into existence and does nothing to other people?

hath v1.17.0Sep 8th 2022 GMT (+10/-12) 1

Roko’s basilisk is a thought experiment proposed in 2010 by the user Roko on the Less Wrong community blog. Roko used ideas in decision theory to argue that a sufficiently powerful AI agent would have an incentive to torture anyone who imagined the agent but didn't work to bring the agent into existence. The argument was called a ~~\"basilisk\"~~"basilisk" --named after the legendary reptile who can cause death with a single glance--because merely hearing the argument would supposedly put you at risk of torture from this hypothetical agent. A basilisk in this context is any information that harms or endangers the people who hear it.

Rob Bensinger v1.16.0Aug 12th 2022 GMT (+255/-221) 2

Roko’s basilisk is a thought experiment proposed in 2010 by the user Roko on the Less Wrong community blog. Roko used ideas in decision theory to argue that a sufficiently powerful AI agent would have an incentive to torture anyone who imagined the agent but didn't work to bring the agent into existence. The argument was called a \"basilisk\" --named after the legendary reptile who can cause death with a single glance--because merely hearing the argument would supposedly put you at risk of torture from this hypothetical ~~agent — a~~agent. A basilisk in this context is any information that harms or endangers the people who hear it.

Roko's argument was broadly rejected on Less Wrong, with commenters objecting that an agent like the one Roko was describing would have no real reason to follow through on its threat: once the agent already exists, it ~~can't affect the probability~~will by default just see it as a waste of ~~its existence, so torturing~~resources to torture people for their past ~~decisions would be a waste~~decisions, since this doesn't causally further its plans. A number of ~~resources. Although several~~ decision ~~theories allow one to~~algorithms can follow through on acausal threats and ~~promises —~~promises, via the same ~~precommitment~~ methods that permit mutual cooperation in prisoner's ~~dilemmas — it is not clear~~dilemmas; but this doesn't imply that such theories can be blackmailed. ~~If they can be blackmailed, this~~And following through on blackmail threats against such an algorithm additionally requires a large amount of shared information and trust between the agents, which does not appear to exist in the case of Roko's basilisk.

jp2y10

Bunch of broken images in this one

1Aprillion2y