Vaniver comments on Decision Theories: A Semi-Formal Analysis, Part I - Less Wrong

21 Post author: orthonormal 24 March 2012 04:01PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (90)

You are viewing a single comment's thread. Show more comments above.

Comment author: Vaniver 24 March 2012 11:09:09PM 0 points [-]

It's a lot easier for me to see the applications for an iterated game where you try to model your opponent's source code given the game history than a single-shot game where you try to predict what your opponent will do given their source code and your source code. That seems like it just invites infinite regress.

Suppose we know we're going to play the one-shot prisoner's dilemma with access to source code, and so we want to try to build the bot that cooperates only with bots that have functionally the same rule as it, i.e. the 'superrational prisoner' that's come up on LW. How would we actually express that? Suppose, for ease of labeling, we have bots 1 and 2 that can either C or D. Bot 1 says "output C iff 2 will output C iff 1 will output C iff 2 will output C iff..." Well, clearly that won't work. We also can't just check to see if the source code is duplicated, because we want to ensure that we also cooperate with similar bots, and don't want to get fooled by bots that have that commented out.

Is there a way to code that bot with the sort of propositions you're doing? Outcomes aren't enough- you need models of outcome dependencies that can fit inside themselves.

Comment author: cousin_it 25 March 2012 12:04:34AM *  2 points [-]

What Nesov said. Loebian cooperation happened because I wanted to make quining cooperation work when the opponent has the same algorithm, but not necessarily the same source code. Illusion of transparency again, I post a lot of math thinking that its motivation will be crystal clear to everyone...

Comment author: Vaniver 25 March 2012 07:11:45AM 0 points [-]

Combining branches: check out this comment.

Comment author: Vladimir_Nesov 24 March 2012 11:45:13PM 4 points [-]

Well, that's described in these posts, or do you mean something else?

Comment author: Vaniver 25 March 2012 07:09:58AM 0 points [-]

That is what I meant. My problem is that I don't see how to express that strategy correctly with the vocabulary in this post, and I don't understand the notation in those posts well enough to tell if they express that idea correctly.

For example, I'm not comfortable with how Eliezer framed it. If my strategy is "cooperate iff they play cooperate iff I play cooperate," then won't I never cooperate with myself, because I don't have the strategy "cooperate iff they play cooperate"?

(I do think that general idea might be formalizable, but you need models of outcome dependencies that can fit inside themselves. Which maybe the notation I don't understand is powerful enough to do.)

Comment author: Douglas_Knight 25 March 2012 06:54:18PM 0 points [-]

"Cooperate iff they play cooperate iff I play cooperate" is imprecise. Your complaint seems to stem from interpreting it to pay too much attention to the details of the algorithms.* One reason to use counterfactuals is to avoid this. One interpretation of the slogan is "A()=C iff A()=B()." This has symmetry, so the agent cooperates with itself. However, it this is vulnerable to self-fulfilling prophecy against the cooperation rock (ie, it can choose either option) and its actions against the defection rock depend on implementation details. It is better to interpret it as cooperate if both counterfactuals "A()=C => B()=C" and "A()=D => B()=D" are true. Again, this has symmetry to allow it to cooperate with itself, but it defects against rocks. In fact, an agent like from this post pretty much derives this strategy and should cooperate with an agent that comes pre-programmed with this strategy. The problem is whether the counterfactuals are provable. Here is a thread on the topic.

* Another example of agents paying too much attentions to implementation details is that we usually want to exclude agents that punish other agents for their choice of decision theory, rather than their decisions.

Comment author: Vaniver 26 March 2012 01:09:18AM 0 points [-]

Aren't the implementation details sort of the whole point? I mean, suppose you code a version of cooperate iff "A()=C => B()=C and A()=D => B()=D" and feed it a copy of itself. Can it actually recognize that it should cooperate? Or does it think that the premise "A()=C" implies a cooperation rock, which it defects against?

That is, if the algorithm is "create two copies of your opponent, feed them different inputs, and combine their output by this formula," when you play that against itself it looks like you get 2^t copies, where t is how long you ran it before it melted down.

I see how one could code a cliquebot that only cooperated with someone with its source code, and no one else. I don't see how one could formalize this general idea- of trying D,C then C,C then D,D (and never playing C,D)- in a way that halts and recognizes different implementations of the same idea. The problem is that the propositions don't seem to cleanly differentiate between code outcomes, like "A()=C", and code, like "A()=C => B()=C".

Comment author: Douglas_Knight 26 March 2012 02:36:46AM 0 points [-]

No, the algorithm is definitely not to feed inputs in simulation. The algorithm is to prove things about the interaction of the two algorithms. For more detail, see the post that is the parent of the comment I linked.

"A()=C => B()=C" is a statement of propositional logic, not code. It is material implication, not conditional execution.

Comment author: Vaniver 26 March 2012 03:46:55AM 0 points [-]

The algorithm is to prove things about the interaction of the two algorithms.

This is possibly just ignorance on my part, but as far as I can tell what that post does is take the difficult part and put it in a black box, which leaves me no wiser about how this works.

"A()=C => B()=C" is a statement of propositional logic, not code. It is material implication, not conditional execution.

Code was the wrong phrase for me to use- I should have gone with my earlier "outcome dependency," which sounds like your "material implication."

I think I've found my largest stumbling block: when I see "B()=C", it matters to me what's in the (), i.e. the inputs to B. But if B is a copy of this algorithm, it needs more information than "A()=C" in order to generate an output. Or, perhaps more precisely, it doesn't care about the "=C" part and makes its decision solely on the "A()" part. But then I can't really feed it "A()=C" and "A()=D" and expect to get different results.

That is, counterfactuals about my algorithm outputs are meaningless if the opponent's algorithm judges me based on my algorithm, not my outputs. Right? Or am I missing something?

Comment author: Douglas_Knight 26 March 2012 04:41:40AM 0 points [-]

The input to A and B is always the same, namely the source code of A and B and the universe. In fact, often we consider all that to be hard coded and there to be no input, hence the notation A().

Could I rephrase your last paragraph as "counterfactuals about the output of my algorithm are meaningless since my algorithm only has one output"? Yes, this is a serious problem in making sense of strategies like "cooperate iff the opponent cooperates iff I do." Either my opponent cooperates or not. That is a fact about math. How could it be different?

And yet, in the post I linked to, Cousin It managed to set up a situation in which there are provable theorems about what happens in the counterfactuals. Still, that does not answer the question of the meaning of "logical counterfactuals." They seems to match the ordinary language use of counterfactuals, as in the above strategy.

If the "black box" was the provability oracle, then consider searching for proofs of bounded lengths, which is computable. If the black box was formal logic (Peano arithmetic), well, it doesn't look very black to me, but it is doing a lot of work.

I don't think you should be judging material implication by its name. Perhaps by "outcome dependency" you mean "logical counterfactual." Then, yes, counterfactuals are related to such implications, by definition. The arrow is just logical implication. There are certain statements of logic that we interpret as counterfactuals.

Comment author: Vaniver 26 March 2012 05:25:05AM 0 points [-]

Could I rephrase your last paragraph as "counterfactuals about the output of my algorithm are meaningless since my algorithm only has one output"?

That looks right to me.

Either my opponent cooperates or not. That is a fact about math. How could it be different?

What do you mean by the first sentence? That the opponent's move is one of {C, D}? Or that we are either in the world where C is played with probability 1, or the world where D is played with probability 1?

It seems to me that pernicious algorithms could cause cycling if you aren't careful, and it's not specified what happens if the program fails to output an order. If you tack on some more assumptions, I'm comfortable with saying that the opponent must play either C or D, but the second seems outlawed by mixed strategies being possible.

If the "black box" was the provability oracle, then consider searching for proofs of bounded lengths, which is computable. If the black box was formal logic (Peano arithmetic), well, it doesn't look very black to me, but it is doing a lot of work.

Peano arithmetic is beyond my formal education. I think I understand most of what it's doing but the motivation of why one would turn to it is not yet clear.

I don't think you should be judging material implication by its name.

So, "A()=C => B()=C" means "it cannot be the case that I cooperate and they defect" and "A()=D => B()=D" means "it cannot be the case that I defect and they cooperate," and if both of those are true, then the algorithm cooperates. Right? (The first is to not be taken advantage of, and the second is to take advantage of those who will let you.)

What I'm not seeing is how you take two copies of that, evaluate them, and get "C,C" without jumping to the end by wishful thinking. It looks to me like the counterfactuals multiply without end, because every proof requires four subproofs that are just as complicated. (This is where it looks like the provability oracle is being called in, but I would rather avoid that if possible.)

Comment author: Douglas_Knight 26 March 2012 07:30:14AM *  1 point [-]

Yes, there are complications like programs failing to terminate or using random numbers. For simplicity, we usually assume no random numbers. We could force termination by putting a time bound and some default action, but it's usually better to distinguish the such program failure from other actions. In any event, at least one of the possibilities A()=C and A()=D is false, and thus thus claims like "A()=C => B()=C" are confusing.

If A and B have the same source code, then it is provable that they have the same source code and thus it is provable that A()=B(), and thus provable that A()=C => B()=C. That is what A and B do, is search for that proof. They do not simulate each other, so there is no multiplication of counterfactuals. Identical source code is a rather limited case, but it is a start. Loeb's theorem shows how to prove things in some other cases.

Comment author: orthonormal 24 March 2012 11:19:39PM 1 point [-]

You'll see by Part III. (Hint: when you see an infinite regress, consider a clever quining.)

Comment author: Vaniver 25 March 2012 07:13:07AM 0 points [-]

Ok. I see how to do quining to only cooperate with copies of your source code, but I don't yet see how to do quining with outcome dependencies.

Comment author: orthonormal 28 March 2012 01:34:24AM 0 points [-]

By the way, I was too blasé in the grandparent comment. I have a model of TDT that does almost the right thing, but I haven't figured out how to quine it so that it does exactly the right thing. (Technically, I can't be sure it'll still work if I do get the right quine in, but my gut feeling is that it will.)

Comment author: Vaniver 28 March 2012 04:44:34AM 0 points [-]

So, I'm going to be pretty disappointed if this whole affair is just someone inventing a meta-cliquebot that's actually just a cliquebot.

Comment author: orthonormal 28 March 2012 02:15:01PM 0 points [-]

Trust me, there's better stuff than that. (In particular, I've got a more nicely formalized write-up of UDT. It's just TDT I'm having issues with.)

Comment author: wedrifid 28 March 2012 01:43:00AM 0 points [-]

I haven't figured out how to quine it so that it does exactly the right thing.

The quining part usually seems to be the tricky part doesn't it?