Squark comments on Overcoming the Loebian obstacle using evidence logic - Less Wrong

4 Post author: Squark 14 March 2014 06:34PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (17)

You are viewing a single comment's thread. Show more comments above.

Comment author: Squark 15 March 2014 05:24:29PM 1 point [-]

I suppose she could reason that if she can't prove that her formal system is consistent, then she can't trust that the actions she can prove to be "good" are actually "good". Or, even if she trusts her present self, she can't trust her future selves...

Lucy works by trying to prove a given action is "good" and doing it if she's successful. She doesn't need to prove that "there is no proof A is not good" into order to do A. However it's true she can't trust her future selves. Indeed it can be a problem even without self-modifications. For example, if someone drops an anvil on Lucy, Lucy cannot reason that "I should dodge the anvil because if I get destroyed I won't do good things in the future". She can still have a hard-coded aversion to being destroyed, but this seems like a hack.

...Generalize this to arbitrary problems and you get UDT or something alike.

UDT assumes a system of assigning probabilities to mathematical sentences given a (possibly inconsistent) theory. For UDT to admit naturalistic trust, it needs that system to overcome the Loebian obstacle, like the system I propose here. Indeed the system I propose is intended as a component of UDT or UIM.

Comment author: V_V 15 March 2014 09:32:29PM 0 points [-]

"I should dodge the anvil because if I get destroyed I won't do good things in the future". She can still have a hard-coded aversion to being destroyed, but this seems like a hack.

I think it's worse than that. If your argument is correct, the type of AI you are describing can't plan because it can't trust its future selves to follow through with the plan, even if doing so wouldn't require commitment. It would be limited to immediate utility maximization.

It seems that your argument proves too much. In fact, I think that Lucy can still prove useful properties about its successors even without proving the consistency of their and therefore her own, proof system.

UDT assumes a system of assigning probabilities to mathematical sentences given a (possibly inconsistent) theory. For UDT to admit naturalistic trust, it needs that system to overcome the Loebian obstacle, like the system I propose here. Indeed the system I propose is intended as a component of UDT or UIM.

That's not how UDT works in my understanding, even though I shall admit that I'm not an expert on the subject. Do you have a reference?

Comment author: Squark 16 March 2014 07:37:32AM 0 points [-]

I think it's worse than that. If your argument is correct, the type of AI you are describing can't plan because it can't trust its future selves to follow through with the plan, even if doing so wouldn't require commitment.

We can avoid this problem if Lucy performs an action once it is the first in a provably "good" sequence of actions. This would allow her to dodge the anvil if it interferes with her immediate plans, but not on the general grounds of "a universe with Lucy is a better universe since Lucy is doing good things".

That's not how UDT works in my understanding, even though I shall admit that I'm not an expert on the subject. Do you have a reference?

I don't have a reference which discusses UDT and the Loebian obstacle together. You can find a description of the AFAIK "latest and greatest" UDT here. UDT considers proofs in a formal system. If this system suffers from the Loebian obstacle this will lead to the kind of problems I discuss here. In fact, I haven't stated it explicitly but I think of Lucy as a UDT agent: she considers possible actions as logical counterfactuals and computes expected utility based on that.