Douglas_Knight comments on Decision Theories: A Semi-Formal Analysis, Part I - Less Wrong

21 Post author: orthonormal 24 March 2012 04:01PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (90)

You are viewing a single comment's thread. Show more comments above.

Comment author: Vaniver 25 March 2012 07:09:58AM 0 points [-]

That is what I meant. My problem is that I don't see how to express that strategy correctly with the vocabulary in this post, and I don't understand the notation in those posts well enough to tell if they express that idea correctly.

For example, I'm not comfortable with how Eliezer framed it. If my strategy is "cooperate iff they play cooperate iff I play cooperate," then won't I never cooperate with myself, because I don't have the strategy "cooperate iff they play cooperate"?

(I do think that general idea might be formalizable, but you need models of outcome dependencies that can fit inside themselves. Which maybe the notation I don't understand is powerful enough to do.)

Comment author: Douglas_Knight 25 March 2012 06:54:18PM 0 points [-]

"Cooperate iff they play cooperate iff I play cooperate" is imprecise. Your complaint seems to stem from interpreting it to pay too much attention to the details of the algorithms.* One reason to use counterfactuals is to avoid this. One interpretation of the slogan is "A()=C iff A()=B()." This has symmetry, so the agent cooperates with itself. However, it this is vulnerable to self-fulfilling prophecy against the cooperation rock (ie, it can choose either option) and its actions against the defection rock depend on implementation details. It is better to interpret it as cooperate if both counterfactuals "A()=C => B()=C" and "A()=D => B()=D" are true. Again, this has symmetry to allow it to cooperate with itself, but it defects against rocks. In fact, an agent like from this post pretty much derives this strategy and should cooperate with an agent that comes pre-programmed with this strategy. The problem is whether the counterfactuals are provable. Here is a thread on the topic.

* Another example of agents paying too much attentions to implementation details is that we usually want to exclude agents that punish other agents for their choice of decision theory, rather than their decisions.

Comment author: Vaniver 26 March 2012 01:09:18AM 0 points [-]

Aren't the implementation details sort of the whole point? I mean, suppose you code a version of cooperate iff "A()=C => B()=C and A()=D => B()=D" and feed it a copy of itself. Can it actually recognize that it should cooperate? Or does it think that the premise "A()=C" implies a cooperation rock, which it defects against?

That is, if the algorithm is "create two copies of your opponent, feed them different inputs, and combine their output by this formula," when you play that against itself it looks like you get 2^t copies, where t is how long you ran it before it melted down.

I see how one could code a cliquebot that only cooperated with someone with its source code, and no one else. I don't see how one could formalize this general idea- of trying D,C then C,C then D,D (and never playing C,D)- in a way that halts and recognizes different implementations of the same idea. The problem is that the propositions don't seem to cleanly differentiate between code outcomes, like "A()=C", and code, like "A()=C => B()=C".

Comment author: Douglas_Knight 26 March 2012 02:36:46AM 0 points [-]

No, the algorithm is definitely not to feed inputs in simulation. The algorithm is to prove things about the interaction of the two algorithms. For more detail, see the post that is the parent of the comment I linked.

"A()=C => B()=C" is a statement of propositional logic, not code. It is material implication, not conditional execution.

Comment author: Vaniver 26 March 2012 03:46:55AM 0 points [-]

The algorithm is to prove things about the interaction of the two algorithms.

This is possibly just ignorance on my part, but as far as I can tell what that post does is take the difficult part and put it in a black box, which leaves me no wiser about how this works.

"A()=C => B()=C" is a statement of propositional logic, not code. It is material implication, not conditional execution.

Code was the wrong phrase for me to use- I should have gone with my earlier "outcome dependency," which sounds like your "material implication."

I think I've found my largest stumbling block: when I see "B()=C", it matters to me what's in the (), i.e. the inputs to B. But if B is a copy of this algorithm, it needs more information than "A()=C" in order to generate an output. Or, perhaps more precisely, it doesn't care about the "=C" part and makes its decision solely on the "A()" part. But then I can't really feed it "A()=C" and "A()=D" and expect to get different results.

That is, counterfactuals about my algorithm outputs are meaningless if the opponent's algorithm judges me based on my algorithm, not my outputs. Right? Or am I missing something?

Comment author: Douglas_Knight 26 March 2012 04:41:40AM 0 points [-]

The input to A and B is always the same, namely the source code of A and B and the universe. In fact, often we consider all that to be hard coded and there to be no input, hence the notation A().

Could I rephrase your last paragraph as "counterfactuals about the output of my algorithm are meaningless since my algorithm only has one output"? Yes, this is a serious problem in making sense of strategies like "cooperate iff the opponent cooperates iff I do." Either my opponent cooperates or not. That is a fact about math. How could it be different?

And yet, in the post I linked to, Cousin It managed to set up a situation in which there are provable theorems about what happens in the counterfactuals. Still, that does not answer the question of the meaning of "logical counterfactuals." They seems to match the ordinary language use of counterfactuals, as in the above strategy.

If the "black box" was the provability oracle, then consider searching for proofs of bounded lengths, which is computable. If the black box was formal logic (Peano arithmetic), well, it doesn't look very black to me, but it is doing a lot of work.

I don't think you should be judging material implication by its name. Perhaps by "outcome dependency" you mean "logical counterfactual." Then, yes, counterfactuals are related to such implications, by definition. The arrow is just logical implication. There are certain statements of logic that we interpret as counterfactuals.

Comment author: Vaniver 26 March 2012 05:25:05AM 0 points [-]

Could I rephrase your last paragraph as "counterfactuals about the output of my algorithm are meaningless since my algorithm only has one output"?

That looks right to me.

Either my opponent cooperates or not. That is a fact about math. How could it be different?

What do you mean by the first sentence? That the opponent's move is one of {C, D}? Or that we are either in the world where C is played with probability 1, or the world where D is played with probability 1?

It seems to me that pernicious algorithms could cause cycling if you aren't careful, and it's not specified what happens if the program fails to output an order. If you tack on some more assumptions, I'm comfortable with saying that the opponent must play either C or D, but the second seems outlawed by mixed strategies being possible.

If the "black box" was the provability oracle, then consider searching for proofs of bounded lengths, which is computable. If the black box was formal logic (Peano arithmetic), well, it doesn't look very black to me, but it is doing a lot of work.

Peano arithmetic is beyond my formal education. I think I understand most of what it's doing but the motivation of why one would turn to it is not yet clear.

I don't think you should be judging material implication by its name.

So, "A()=C => B()=C" means "it cannot be the case that I cooperate and they defect" and "A()=D => B()=D" means "it cannot be the case that I defect and they cooperate," and if both of those are true, then the algorithm cooperates. Right? (The first is to not be taken advantage of, and the second is to take advantage of those who will let you.)

What I'm not seeing is how you take two copies of that, evaluate them, and get "C,C" without jumping to the end by wishful thinking. It looks to me like the counterfactuals multiply without end, because every proof requires four subproofs that are just as complicated. (This is where it looks like the provability oracle is being called in, but I would rather avoid that if possible.)

Comment author: Douglas_Knight 26 March 2012 07:30:14AM *  1 point [-]

Yes, there are complications like programs failing to terminate or using random numbers. For simplicity, we usually assume no random numbers. We could force termination by putting a time bound and some default action, but it's usually better to distinguish the such program failure from other actions. In any event, at least one of the possibilities A()=C and A()=D is false, and thus thus claims like "A()=C => B()=C" are confusing.

If A and B have the same source code, then it is provable that they have the same source code and thus it is provable that A()=B(), and thus provable that A()=C => B()=C. That is what A and B do, is search for that proof. They do not simulate each other, so there is no multiplication of counterfactuals. Identical source code is a rather limited case, but it is a start. Loeb's theorem shows how to prove things in some other cases.

Comment author: Vaniver 27 March 2012 04:59:51AM 0 points [-]

That is what A and B do, is search for that proof.

Ok. Where do they look for it, and how will they know if they've found it?

I don't like, but will accept for now, the "evaluate every possible proof less than X characters" method of finding proofs.

But I don't see how you determine those proofs are true or false without simulating A() and B(), especially if B() isn't a copy of A(), but some complicated algorithm that might or might not cash out as equivalent to A().

(Where I'm going with this: if this idea requires magic to do its basic operation, then I am uncomfortable with using this idea for anything.)

Comment author: Douglas_Knight 27 March 2012 07:38:31AM 1 point [-]

Very often in this conversation, I think we're using words to mean wildly different things; such as "proof." Do you need to simulate bubble sort to prove that it takes quadratic time?