Alon — LessWrong

It seems this type of AI might cheat and set ε=0 using the following method:

create an "insurance company" subagent (see below for what it is and why creating it has a utility that is >ε).
give the insurance company a few utilons (paper clips) to hold as hostages.
set a rule that all future subagents will be able to communicate the insurance company.
the insurance company will "buy" all the reward that has a probability < ε from subagents at roughly its fair price (its utility)
all subagents won't ignore events that happen with probability < ε because their very existence is has a probability of 1 to increase their utility (via the insurance company).

alternatively for any ε you set, it will be profitable for the AI to create a new version of itself with the same utility function and ε'=ε/2 than give the new AI all the resources it has and commit suicide.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments