Wei_Dai comments on UDT agents as deontologists - Less Wrong

8 Post author: Tyrrell_McAllister 10 June 2010 05:01AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (109)

You are viewing a single comment's thread. Show more comments above.

Comment author: Wei_Dai 11 June 2010 02:08:06PM 1 point [-]

No, you can't represent an interactive strategy by a single input to output mapping.

Why not?

BTW, in UDT1.1 (as well as UDT1), "input" consists of the agent's entire memory of the past as well as its current perceptions. Thought I'd mention that in case there's a misunderstanding there.

Comment author: Vladimir_Nesov 11 June 2010 05:21:28PM *  0 points [-]

... okay, this question allowed me to make a bit of progress. Taking as a starting point the setting of this comment (that we are estimating the probability of (A~B => X~Y) being true, where A and X are respectively agent's and environment's programs, B and Y programs representing agent's strategy and outcome for environment), and the observations made here and here, we get a scheme for local decision-making.

Instead of trying to decide the whole strategy, we can just decide the local action. Then, the agent program, and "input" consisting of observations and memories, together make up the description of where the agent is in the environment, and thus where its control will be applied. The action that the agent considers can then be local, just something the agent does at this very moment, and the alternatives for this action are alternative statements about the agent: thus, instead of considering a statement A~B for agent's program A and various whole strategies B, we consider just predicates like action1(A) and action2(A) which assert A to choose action 1 or action 2 in this particular situation, and which don't assert anything else about its behavior in other situations or on other counterfactuals. Taking into account other actions that the agent might have to make in the past or in the future happens automatically, because the agent works with complete description of environment, even if under severe logical uncertainty. Thus, decision-making happens "one bit at a time", and the agent's strategy mostly exists in the environment, not under in any way direct control by the agent, but still controlled in the same sense everything in the environment is.

Thus, in the simplest case of a binary local decision, mathematical intuition would only take as explicit argument a single bit, which indicates what assertion is being made about [agent's program together with memory and observations], and that is all. No maps, no untyped strategies.

This solution was unavailable to me when I thought about explicit control, because the agent has to coordinate with itself, rely on what it can in fact decide in other situations and not what it should optimally decide, but it's a natural step in the setting of ambient control, because the incorrect counterfactuals are completely banished out of consideration, and environment describes what the agent will actually do on other occasions.

Going back to the post explicit optimization of global strategy, the agent doesn't need to figure out the global strategy! Each of the agent copies is allowed to make the decision locally, while observing the other copy as part of the environment (in fact, it's the same problem as "general coordination problem" I described on the DT list, back when I was clueless about this approach).

Comment author: Wei_Dai 11 June 2010 05:44:51PM 0 points [-]

Each of the agent copies is allowed to make the decision locally, while observing the other copy as part of the environment

Well, that was my approach in UDT1, but then I found a problem that UDT1 apparently can't solve, so I switched to optimizing over the global strategy (and named that UDT1.1).

Can you re-read explicit optimization of global strategy and let me know what you think about it now? What I called "logical correlation" (using Eliezer's terminology) seems to be what you call "ambient control". The point of that post was that it seems an insufficiently powerful tool for even two agents with the same preferences to solve the general coordination problem amongst themselves, if they only explicitly optimize the local decision and depend on "logical correlation"/"ambient control" to implicitly optimize the global strategy.

If you think there is some way to get around that problem, I'm eager to hear it.

Comment author: Vladimir_Nesov 11 June 2010 06:09:34PM *  0 points [-]

So far as I can see, your mistake was assuming "symmetry", and dropping probabilities. There is no symmetry, only one of the possibilities is what will actually happen, and the other (which I'm back to believing since the last post on DT list) is inconsistent, though you are unlikely to be able to actually prove any such inconsistency. You can't say that since (S(1)=A => S(2)=B) therefore (S(1)=B => S(2)=A). One of the counterfactuals is inconsistent, so if S(1) is in fact A, then S(1)=B implies anything. But what you are dealing with are probabilities of these statements (which possibly means proof search schemes trying to prove these statements and making a certain number of elementary assumptions, the number that works as the length of programs in universal probability distribution). These probabilities will paint a picture of what you expect the other copy to do, depending on what you do, and this doesn't at all have to be symmetric.

Comment author: Wei_Dai 11 June 2010 07:27:41PM 0 points [-]

If there is to be no symmetry between "S(1)=A => S(2)=B" and "S(1)=B => S(2)=A", then something in the algorithm has to treat the two cases differently. In UDT1 there is no such thing to break the symmetry, as far as I can tell, so it would treat them symmetrically and fail on the problem one way or another. Probabilities don't seem to help since I don't see why UDT1 would assign them different probabilities.

If you have an idea how the symmetry might be broken, can you explain it in more detail?

Comment author: Tyrrell_McAllister 11 June 2010 08:48:02PM *  0 points [-]

I think that Vladimir is right if he is saying that UDT1 can handle the problem in your Explicit Optimization of Global Strategy post.

With your forbearance, I'll set up the problem in the notation of my write-up of UDT1.

There is only one world-program P in this problem. The world-program runs the UDT1 algorithm twice, feeding it input "1" on one run, and feeding it input "2" on the other run. I'll call these respective runs "Run1" and "Run2".

The set of inputs for the UDT1 algorithm is X = {1, 2}.

The set of outputs for the UDT1 algorithm is Y = {A, B}.

There are four possible execution histories for P:

  • E, in which Run1 outputs A, Run2 outputs A, and each gets $0.

  • F, in which Run1 outputs A, Run2 outputs B, and each gets $10.

  • G, in which Run1 outputs B, Run2 outputs A, and each gets $10.

  • H, in which Run1 outputs B, Run2 outputs B, and each gets $0.

The utility function U for the UDT1 algorithm is defined as follows:

  • U(E) = 0.

  • U(F) = 20.

  • U(G) = 20.

  • U(H) = 0.

Now we want to choose a mathematical intuition function M so that Run1 and Run2 don't give the same output. This mathematical intuition function does have to satisfy a couple of constraints:

  • For each choice of input X and output Y, the function M(X, Y, –) must be a normalized probability distribution on {E, F, G, H}.

  • The mathematical intuition needs to meet certain minimal standards to deserve its name. For example, we need to have M(1, B, E) = 0. The algorithm should know that P isn't going to execute according to E if the algorithm returns B on input 1.

But these constraints still leave us with enough freedom in how we set up the mathematical intuition. In particular, we can set

  • M(1, A, F) = 1, and all other values of M(1, A, –) equal to zero;

  • M(1, B, H) = 1, and all other values of M(1, B, –) equal to zero;

  • M(2, A, E) = 1, and all other values of M(2, A, –) equal to zero;

  • M(2, B, F) = 1, and all other values of M(2, B, –) equal to zero.

Thus, in Run1, the algorithm computes that, if it outputs A, then execution history F would transpire, so the agent would get utility U(F) = 20. But if Run1 were to output B, then H would transpire, yielding utility U(H) = 0. Therefore, Run1 outputs A.

Similarly, Run2 computes that its outputting A would result in E, with utility 0, while outputting B would result in F, with utility 20. Therefore, Run2 outputs B.

Hence, execution history F transpires, and the algorithm reaps $20.

ETA: And, as a bonus, this mathematical intuition really makes sense. For, suppose that we held everything equal, except that we do some surgery so that Run1 outputs B. Since everything else is equal, Run2 is still going to output B. And that really would put us in history H, just as Run1 predicted when it evaluated M(1, B, H) = 1.

Comment author: Vladimir_Nesov 11 June 2010 09:06:52PM 0 points [-]

That's cheating, you haven't explained anything, you've just chosen the strategies and baptized them with mathematical intuition magically knowing them from the start.

Comment author: Tyrrell_McAllister 11 June 2010 09:12:39PM *  0 points [-]

I'm not sure what you mean by "cheating". Wei Dai doesn't claim to have explained where the mathematical intuition comes from, and I don't either. The point is, I could build a UDT1 agent with that mathematical intuition, and the agent would behave correctly if it were to encounter the scenario that Wei describes. How I came up with that mathematical intuition is an open problem. But the agent that I build with it falls under the scope of UDT1. It is not necessary to pass to UDT1.1 to find such an agent.

I'm giving an existence proof: There exist UDT1 agents that perform correctly in Wei's scenario. Furthermore, the mathematical intuition used by the agent that I exhibit evaluates counterfactuals in a reasonable way (see my edit to the comment).

Comment author: Vladimir_Nesov 11 June 2010 09:22:16PM *  0 points [-]

Wei Dai doesn't claim to have explained where the mathematical intuition comes from, and I don't either.

There is a difference between not specifying the structure of an unknown phenomenon for which we still have no explanation, and assigning the phenomenon an arbitrary structure without giving an explanation. Even though you haven't violated the formalism, mathematical intuition is not supposed to magically rationalize your (or mine) conclusions.

How I came up with that mathematical intuition is an open problem.

No it's not, you've chosen it so that it "proves" what we believe to be a correct conclusion.

I'm giving an existence proof: There exist UDT1 agents that perform correctly in Wei's scenario.

Since you can force the agent to pick any of the available actions by appropriately manipulating its mathematical intuition, you can "prove" that there is an agent that performs correctly in any given situation, so long as you can forge its mathematical intuition for every such situation. You can also "prove" that there is an agent that makes the worst possible choice, in exactly the same way.

Comment author: Tyrrell_McAllister 12 June 2010 05:12:19PM *  1 point [-]

How I came up with that mathematical intuition is an open problem.

No it's not, you've chosen it so that it "proves" what we believe to be a correct conclusion.

This is kind of interesting. In Wei's problem, I believe that I can force a winning mathematical intuition with just a few additional conditions, none of which assume that we know the correct conclusion. They seem like reasonable conditions to me, though maybe further reflection will reveal counterexamples.

Using my notation from this comment, we have to find right-hand values for the following 16 equations.

M(1, A, E) = . M(1, A, F) = . M(1, A, G) = . M(1, A, H) = .
M(1, B, E) = . M(1, B, F) = . M(1, B, G) = . M(1, B, H) = .
M(2, A, E) = . M(2, A, F) = . M(2, A, G) = . M(2, A, H) = .
M(2, B, E) = . M(2, B, F) = . M(2, B, G) = . M(2, B, H) = .

In addition to the conditions that I mentioned in that comment, I add the following,

  • Binary: Each probability distribution M(X, Y, –) is binary. That is, the mathematical intuition is certain about which execution history would follow from a given output on a given input.

  • Accuracy: The mathematical intuition, being certain, should be accurate. That is, if the agent expects a certain amount of utility when it produces its output, then it should really get that utility.

(Those both seem sorta plausible in such a simple problem.)

  • Counterfactual Accuracy: The mathematical intuition should behave well under counterfactual surgery, in the sense that I used in the edit to the comment linked above. More precisely, suppose that the algorithm outputs Yi on input Xi for all i. Suppose that, for a single fixed value of j, we surgically interfered with the algorithm's execution to make it output Y'j instead of Yj on input Xj. Let E' be the execution history that would result from this. Then we ought to have that M(Xj, Y'j, E') = 1.

I suspect that the counterfactual accuracy condition needs to be replaced with something far more subtle to deal with other problems, even in the binary case.

Nonetheless, it seems interesting that, in this case, we don't need to use any prior knowledge about which mathematical intuitions win.

I'll proceed by filling in the array above entry-by-entry. We can fill in half the entries right away from the definitions of the execution histories:

M(1, A, E) = . M(1, A, F) = . M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = . M(1, B, H) = .
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .

Now we have to consider cases. Starting with the upper-left corner, the value of M(1, A, E) will be either 0 or 1.

Case I: Suppose that M(1, A, E) = 0. Normalization forces M(1, A, F) = 1:

M(1, A, E) = 0 M(1, A, F) = 1 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = . M(1, B, H) = .
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .

Now, in the second row, the value of M(1, B, G) will be either 0 or 1.

Case I A: Suppose that M(1, B, G) = 0. Normalization forces M(1, B, H) = 1:

M(1, A, E) = 0 M(1, A, F) = 1 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 0 M(1, B, H) = 1
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .

We have filled in enough entries to see that Run1 will output A. (Recall that U(F) = 20 and U(H) = 0.) Thus, if Run2 outputs A, then E will happen, not G. Similarly, if Run2 outputs B, then F will happen, not H. This allows us to complete the mathematical intuition function:

M(1, A, E) = 0 M(1, A, F) = 1 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 0 M(1, B, H) = 1
M(2, A, E) = 1 M(2, A, F) = 0 M(2, A, G) = 0 M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = 1 M(2, B, G) = 0 M(2, B, H) = 0

Under this mathematical intuition function, Run1 outputs A and Run2 outputs B. Moreover, this function meets the counterfactual accuracy condition. Note that this function wins.

Case I B: Suppose that M(1, B, G) = 1 in the second row. Normalization forces M(1, B, H) = 0:

M(1, A, E) = 0 M(1, A, F) = 1 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 1 M(1, B, H) = 0
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .

In this case, Run1 will need to use a tie-breaker, because it predicts utility 20 from both outputs. There are two cases, one for each possible tie-breaker.

Case I B i: Suppose that the tie-breaker leads Run1 to output A. If Run2 outputs A, then E will happen, not G. And if Run2 outputs B, then F will happen, not H. This gives us a complete mathematical intuition function:

M(1, A, E) = 0 M(1, A, F) = 1 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 1 M(1, B, H) = 0
M(2, A, E) = 1 M(2, A, F) = 0 M(2, A, G) = 0 M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = 1 M(2, B, G) = 0 M(2, B, H) = 0

Hence, Run2 will output B. But this function fails the counterfactual accuracy condition. It predicts execution history G if Run1 were to output B, when in fact the execution history would be H. Thus we throw out this function.

Case I B ii: Suppose that the tie-breaker leads Run1 to output B. Then, similar to Case I B i, the resulting function fails the counterfactual accuracy test. (Run2 will output A. The resulting function predicts history F if Run1 were to output A, when in fact the history would be E.) Thus we throw out this function.

Therefore, in Case I, all functions either win or are ineligible.

Case II: Suppose that M(1, A, E) = 1. Normalization forces M(1, A, F) = 0, getting us to

M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = . M(1, B, H) = .
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .

Now, in the second row, the value of M(1, B, G) will be either 0 or 1.

Case II A: Suppose that M(1, B, G) = 0. Normalization forces M(1, B, H) = 1:

M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 0 M(1, B, H) = 1
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .

In this case, Run1 will need to use a tie-breaker, because it predicts utility 0 from both outputs. There are two cases, one for each possible tie-breaker.

Case II A i: Suppose that the tie-breaker leads Run1 to output A. If Run2 outputs A, then E will happen, not G. And if Run2 outputs B, then F will happen, not H. This gives us a complete mathematical intuition function:

M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 0 M(1, B, H) = 1
M(2, A, E) = 1 M(2, A, F) = 0 M(2, A, G) = 0 M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = 1 M(2, B, G) = 0 M(2, B, H) = 0

Hence, Run2 will output B. But this function fails the accuracy condition. Run1 expects utility 0 for its output, when in fact it will get utility 20. Thus we throw out this function.

Case II A ii: Suppose that the tie-breaker leads Run1 to output B. If Run2 outputs A, then G will happen, not E. And if Run2 outputs B, then H will happen, not F. This gives us a complete mathematical intuition:

M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 0 M(1, B, H) = 1
M(2, A, E) = 0 M(2, A, F) = 0 M(2, A, G) = 1 M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = 0 M(2, B, G) = 0 M(2, B, H) = 1

Hence, Run2 will output A. But this function fails the accuracy condition. Run1 expects utility 0 for its output, when in fact it will get utility 20. Thus we throw out this function.

Case II B: Suppose that M(1, B, G) = 1. Normalization forces M(1, B, H) = 0:

M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 1 M(1, B, H) = 0
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .

We have filled in enough entries to see that Run1 will output B. (Recall that U(E) = 0 and U(G) = 20.) Thus, if Run2 outputs A, then G will happen, not E. Similarly, if Run2 outputs B, then H will happen, not F. This allows us to complete the mathematical intuition function:

M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 1 M(1, B, H) = 0
M(2, A, E) = 0 M(2, A, F) = 0 M(2, A, G) = 1 M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = 0 M(2, B, G) = 0 M(2, B, H) = 1

Under this mathematical intuition function, Run1 outputs B and Run2 outputs A. Moreover, this function meets the counterfactual accuracy condition. Note that this function wins.

Therefore, all cases lead to mathematical intuitions that either win or are ineligible.

ETA: And I just discovered that there's a length-limit on comments.

Comment author: Tyrrell_McAllister 11 June 2010 09:44:42PM *  0 points [-]

Let me see if I understand your point. Are you saying the following?

<Attempted paraphrase> Some UDT1 agents perform correctly in the scenario, but some don't. To not be "cheating", you need to provide a formal decision theory (or at least make some substantial progress towards providing one) that explains why the agent's builder would choose to build one of the UDT1 agents that do perform correctly. </Attempted paraphrase>

Comment author: Vladimir_Nesov 11 June 2010 07:42:26PM *  0 points [-]

The symmetry is broken by "1" being different from "2". The probabilities express logical uncertainty, and so essentially depend on what happens to be provable given finite resources and epistemic state of the agent, for which implementation detail matters. The asymmetry is thus hidden in mathematical intuition, and is not visible in the parts of UDT explicitly described.

Comment author: Vladimir_Nesov 11 June 2010 04:15:44PM 0 points [-]

...but on the other hand, you don't need the "input" at all, if decision-making is about figuring out the strategy. You can just have a strategy that produces the output, with no explicit input. The history of input can remain implicit in the agent's program, which is available anyway.

Comment author: Tyrrell_McAllister 11 June 2010 04:03:06PM *  0 points [-]

BTW, in UDT1.1 (as well as UDT1), "input" consists of the agent's entire memory of the past as well as its current perceptions. Thought I'd mention that in case there's a misunderstanding there.

Good; that was my understanding.

Comment author: Vladimir_Nesov 11 June 2010 03:16:55PM *  0 points [-]

BTW, in UDT1.1 (as well as UDT1), "input" consists of the agent's entire memory of the past as well as its current perceptions. Thought I'd mention that in case there's a misunderstanding there.

Yes, that works too. On second thought, extracting output in this exact manner, while pushing everything else to the "input" allows to pose a problem specifically about the output in this particular situation, so as to optimize the activity for figuring out this output, rather than the whole strategy, of which right now you only need this aspect and no more.

Edit: Though, you don't need "input" to hold the rest of the strategy.

Comment author: Tyrrell_McAllister 11 June 2010 04:15:39PM *  0 points [-]

I was having trouble understanding what strategy couldn't be captured by a function X -> Y. After all, what could possibly determine the output of an algorithm other than its source code and whatever input it remembers getting on that particular run? Just to be clear, do you now agree that every strategy is captured by some function f: X -> Y mapping inputs to outputs?

One potential problem is that there are infinitely many input-output mappings. The agent can't assume a bound on the memory it will have, so it can't assume a bound on the lengths of inputs X that it will someday need to plug into an input-output mapping f.

Unlike the case where there are potentially infinitely many programs P1, P2, . . ., it's not clear to me that it's enough to wrap up an infinte set I of input-output mappings into some finite program that generates them. This is because the UDT1.1 agent needs to compute a sum for every element of I. So, if the set I is infinite, the number of sums to be computed will be infinite. Having a finite description of I won't help here, at least not with a brute-force UDT1.1 algorithm.

Comment author: Vladimir_Nesov 11 June 2010 06:32:12PM *  1 point [-]

Any infinite thing in any given problem statement is already presented to you with a finite description. All you have to do is transform that finite description of an infinite object so as to get a finite description of a solution of your problem posed about the infinite object.

Comment author: Tyrrell_McAllister 11 June 2010 06:53:57PM *  0 points [-]

Any infinite thing in any given problem statement is already presented to you with a finite description. All you have to do is transform that finite description of an infinite object so as to get a finite description of a solution of your problem posed about the infinite object.

Right. I agree.

But, to make Wei's formal description of UDT1.1 work, there is a difference between

  • dealing with a finite description of an infinite execution history Ei and

  • dealing with a finite description of an infinite set I of input-output maps.

The difference is this: The execution histories only get fed into the utility function U and the mathematical intuition function (which I denote by M). These two functions are taken to be black boxes in Wei's description of UDT1.1. His purpose is not to explain how these functions work, so he isn't responsible for explaining how they deal with finite descriptions of infinite things. Therefore, the potential infinitude of the execution histories is not a problem for what he was trying to do.

In contrast, the part of the algorithm that he describes explicitly does require computing an expected utility for every input-output map and then selecting the input-output map that yielded the largest expected utility. Thus, if I is infinite, the brute-force version of UDT1.1 requires the agent to find a maximum from among infinitely many expected utilities. That means that the brute-force version just doesn't work in this case. Merely saying that you have a finite description of I is not enough to say in general how you are finding the maximum from among infinitely many expected utilities. In fact, it seems possible that there may be no maximum.

Actually, in both UDT1 and UDT1.1, there is a similar issue with the possibility of having infinitely many possible execution-history sequences <E1, E2, . . .>. In both versions of UDT, you have to perform a sum over all such sequences. Even if you have a finite description of the set E of such sequences, a complete description of UDT still needs to explain how you are performing the sum over the infinitely many elements of the set. In particular, it's not obvious that this sum is always well-defined.

Comment author: Vladimir_Nesov 11 June 2010 07:07:35PM *  1 point [-]

...but the action could be a natural number, no? It's entirely OK if there is no maximum - the available computational resources then limit how good a strategy the agent manages to implement ("Define as big a natural number as you can!"). The "algorithm" is descriptive, it's really a definition of optimality of a decision, not specification of how this decision is to be computed. You can sometimes optimize infinities away, and can almost always find a finite approximation that gets better with more resources and ingenuity.

Comment author: Tyrrell_McAllister 11 June 2010 07:16:11PM *  0 points [-]

The "algorithm" is descriptive, it's really a definition of optimality of a decision, not specification of how this decision is to be computed. You can sometimes optimize infinities away, and can almost always find a finite approximation that gets better with more resources and ingenuity.

Okay. I didn't know that the specification of how to compute was explicitly understood to be incomplete in this way. Of course, the description could only be improved by being more specific about just when you can "sometimes optimize infinities away, and can almost always find a finite approximation that gets better with more resources and ingenuity."