Vladimir_Nesov comments on Notion of Preference in Ambient Control - Less Wrong

14 Post author: Vladimir_Nesov 07 October 2010 09:21PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (45)

You are viewing a single comment's thread. Show more comments above.

Comment author: Tyrrell_McAllister 08 October 2010 02:49:32AM *  0 points [-]

Still commenting while reading:

The agent normally won't even know "explicit values" of actual action and actual outcome. Knowing actual value would break the illusion of consistent consequences: suppose the agent is consistent, knows that A=2, and isn't out of time yet, then it can prove [A=1 => O=100000], even if in fact O=1000, use that moral argument to beat any other with worse promised outcome, and decide A=1, contradiction.

This would only happen if the agent had a rule of inference that allowed it to infer from

  • A=1 => O=100000

and

  • all other promised outcomes are worse than 100000

that

  • A = 1.

But why would the first-order theory use such a rule of inference? You seem to have just given an argument for why we shouldn't put this rule of inference into the theory.

ETA: I guess that my point leads right to your conclusion, and explains it. The agent is built so that, upon deducing the first two bullet-points, the agent proceeds to do the action assigned to the constant symbol 1 by the interpretation. But the point is that the agent doesn't bother to infer the third bullet-point; the agent just acts. As a result, it never deduces any formulas of the form [A=X], which is what you were saying.

Comment author: Vladimir_Nesov 08 October 2010 07:26:07AM 0 points [-]

The agent never proves A=1, but it does (by assumption) prove A=2, while in fact it turns out that the agent acts as A=1 (without proving it), and so in the standard model it's true that A=1, while agent's theory says that in standard model A=2, which means that agent's theory is inconsistent, which contradicts the assumption that it's consistent.

Comment author: Tyrrell_McAllister 08 October 2010 05:50:02PM *  0 points [-]

The agent never proves A=1.

Okay. I was confused because you wrote that the agent "can . . . decide A=1", rather than "will do action 1."

Is there a rule of inference in the system such that the first two bullet points above entail the third within the system? I see that [A=1] is true in the standard model, but the agent's theory isn't complete in general. So should we expect that [A=1] is a theorem of the agent's theory? (ETA1: Okay, I think that your comment here, explains why [A=1] must be a theorem if the agent actually does 1. But I still need to think it through.)

(ETA2: Now that I better understand the axiom that defines A in the theory, I see why [A=1] must be a theorem if the agent actually does 1.)

Also, it seems that your proof only goes through if the agent "knows that A=2" when the agent will not in fact do action v(2) (which directly contradict soundness). But if the agent knows [A=X], where v(X) is the actual action, then all we can conclude is that the agent declines to observe my first two bullet-points above in a way that would induce it to do v(1). (Here, v(X) is the interpretation of the constant symbol X under the standard model.)

Comment author: Vladimir_Nesov 08 October 2010 06:35:01PM *  0 points [-]

Is there a rule of inference in the system such that the first two bullet points above entail the third within the system? I see that [A=1] is true in the standard model, but the agent's theory isn't complete in general. So why should we expect that [A=1] is a theorem of the agent's theory?

Every recursive function is representable in Robinson's arithmetic Q, that is for any (say, 1-ary) recursive function F, there is a formula w such that F(n)=m => Q|- w(n,m) and F(n)<>m => Q|- ~w(n,m). Hence, statements like this that hold in the standard model, also hold in any model.

That the agent doesn't avoid looking for further moral arguments to prove is reflected in "isn't out of time" condition, which is not formalized, and is listed as the first open problem. If in fact A=X, then it can't prove ludicrous moral arguments with A=X as the premise, only the actual utility, but it clearly can prove arguments that beat the true one by using a false premise.

Comment author: Tyrrell_McAllister 08 October 2010 07:29:37PM 0 points [-]

Every recursive function is representable in Robinson's arithmetic Q, that is for any (say, 1-ary) recursive function F, there is a formula w such that F(n)=m => Q|- w(n,m) and F(n)<>m => Q|- ~w(n,m). Hence, statements like this that hold in the standard model, also hold in any model.

I think that I need to see this spelled out more. I take it that, in this case, your formula w is w(A,X) = [A=X]. What is your recursive function F?

Comment author: Vladimir_Nesov 08 October 2010 07:41:31PM *  0 points [-]

We are representing the no-arguments agent program agent() using a formula w with one free variable, such that agent()=n => Q|- w(n) and agent()<>n => Q|- ~w(n). Actual action is defined by the axiom w(A), where A is a new constant symbol.

Comment author: Tyrrell_McAllister 09 October 2010 12:12:57AM *  0 points [-]

Okay, I'm convinced. In case it helps someone else, here is how I now understand your argument.

We have an agent written in some program. Because the agent is a computer program, and because our 1st-order logic can handle recursion, we can write down a wff [w(x)], with one free variable x, such that, for any action-constant X, [w(X)] is a theorem if and only if the agent does v(X).

[ETA: Vladimir points out that it isn't handling recursion alone that suffices. Nonetheless, a theory like PA or ZFC is apparently powerful enough to do this. I don't yet understand the details of how this works, but it certainly seems very plausible to me.]

In particular, if the agent goes through a sequence of operations that conclude with the agent doing v(X), then that sequence of operations can be converted systematically into a proof of [w(X)]. Conversely, if [w(X)] is a theorem, then it has a proof that can be reinterpreted as the sequence of operations that the agent will carry out, and which will conclude with the agent doing v(X).

The wff [w(x)] also has the property that, given two constant U and V, [w(U) & w(V)] entails [U = V].

Now, the agent's axiom system includes [w(A)], where A is a constant symbol. Thus, if the agent does v(X), then [w(A)] and [w(X)] are both theorems, so we must have that [A=X] is a theorem.

Comment author: Vladimir_Nesov 09 October 2010 08:42:43AM *  0 points [-]

This works (although "because our 1st-order logic can handle recursion" is not it, etc.). (Note that "[w(X)] is a theorem of T if and only if X is as we need" is weak T-representability, while I cited the strong kind, that also guarantees [~w(X)] being a theorem if X is not as we need.)

Comment author: Tyrrell_McAllister 09 October 2010 03:05:43PM 0 points [-]

(although "because our 1st-order logic can handle recursion" is not it, etc.).

That was there because of this line from your post: "The theory should provide sufficient tools to define recursive functions and/or other necessary concepts."

Comment author: Vladimir_Nesov 09 October 2010 03:12:06PM *  0 points [-]

Doesn't make it an explanatory sufficient condition to conclude what you did: I object to your use of "because".