Squark comments on Identity and quining in UDT - Less Wrong

9 Post author: Squark 17 March 2015 08:01PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (26)

You are viewing a single comment's thread. Show more comments above.

Comment author: Squark 18 March 2015 08:13:17AM *  3 points [-]

Hi KnaveOfAllTrades, thx for commenting!

Your proton beam problem is not decision-determined in the sense of Yudkowsky 2010. That is, it depends directly on the decision algorithm rather than depending only on the decisions it makes. Indeed it is impossible to come up with a decision theory that is optimal for all problems, but it might be possible to come up with a decision theory that is optimal for some reasonable class of problems (like decision-determined problems).

Now, there is a decision-determined version of your construction. Consider the following "diagonal" problem. The agent makes a decision out of some finite set. Omega runs a simulation of XDT and penalizes the agent iff its decision is equal to XDT's decision.

This is indeed a concern for deterministic agents. However, if we allow the decision algorithm access to an independent source of random bits, the problem is avoided. XDT produces all answers distributed uniformly and gets optimal payoff.

Comment author: V_V 18 March 2015 02:14:25PM 4 points [-]

This is indeed a concern for deterministic agents. However, if we allow the decision algorithm access to an independent source of random bits, the problem is avoided. XDT produces all answers distributed uniformly and gets optimal payoff.

Omega can predict the probability distribution on the agent answers, and it is possible to construct a Newcomb-like problem that penalizes any specific distribution.

Comment author: Squark 18 March 2015 03:09:37PM *  2 points [-]

Hi V_V, thx for commenting!

In a Newcomb-like problem, Omega copies the agent and runs it many times to measure the decision distribution. It then penalizes the agent if the distribution matches an explicitly given one. Such a problem is easily solved by the agent since it knows which distribution to avoid.

In an anti-Newcomb-like problem, Omega measures the distribution produced by XDT and compares it with the agent's decision. It then penalizes the agent according to the likelihood of its decision in the distribution. However, if XDT produces a uniform distribution, all agents do equally well.

A trickier diagonalization is a hybrid Newcomb-anti-Newcomb (NAN) problem. Omega copies the agent and runs it multiple times to measure the distribution. It then compares the result with a similar procedure applied to XDT and penalizes the agent if there is a close match.

Now, the NAN diagonalization can be solved if we assume the agent has access to random bits which are common between all of its copies but inaccessible by the rest of the universe (Omega). This assumption can be interpreted to mean the precursor gives the agent a randomly generated "password" before Omega copies it.

Comment author: V_V 18 March 2015 06:04:43PM 5 points [-]

Now, the NAN diagonalization can be solved if we assume the agent has access to random bits which are common between all of its copies but inaccessible by the rest of the universe (Omega). This assumption can be interpreted to mean the precursor gives the agent a randomly generated "password" before Omega copies it.

Sure, but this defeats the purpose of Omega being a predictor.

Comment author: Squark 19 March 2015 05:52:23PM 1 point [-]

Not really. The idea is looking for a decision theory that performs optimally a natural class of problems which is as wide as possible. CDT is optimal for action-determined problems, UDT is supposed to be optimal for decision-determined problems. Allowing this use of randomness still leaves us with a class of problems much wider than action-determined: for example the Newcomb problem!