orthonormal comments on Decision Theories: A Semi-Formal Analysis, Part I - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (90)
What you're thinking of seems to be EDT, not NDT.
I warned at the beginning that this was an artificial setting, but it's one in which we can express and distinguish CDT, TDT, ADT, UDT and others. The lack of memory serves to focus attention on what we can do; note that it's perfectly legitimate to try and deduce what Y would have done against another opponent, if you want that kind of data.
It's a lot easier for me to see the applications for an iterated game where you try to model your opponent's source code given the game history than a single-shot game where you try to predict what your opponent will do given their source code and your source code. That seems like it just invites infinite regress.
Suppose we know we're going to play the one-shot prisoner's dilemma with access to source code, and so we want to try to build the bot that cooperates only with bots that have functionally the same rule as it, i.e. the 'superrational prisoner' that's come up on LW. How would we actually express that? Suppose, for ease of labeling, we have bots 1 and 2 that can either C or D. Bot 1 says "output C iff 2 will output C iff 1 will output C iff 2 will output C iff..." Well, clearly that won't work. We also can't just check to see if the source code is duplicated, because we want to ensure that we also cooperate with similar bots, and don't want to get fooled by bots that have that commented out.
Is there a way to code that bot with the sort of propositions you're doing? Outcomes aren't enough- you need models of outcome dependencies that can fit inside themselves.
What Nesov said. Loebian cooperation happened because I wanted to make quining cooperation work when the opponent has the same algorithm, but not necessarily the same source code. Illusion of transparency again, I post a lot of math thinking that its motivation will be crystal clear to everyone...
Combining branches: check out this comment.
Well, that's described in these posts, or do you mean something else?
That is what I meant. My problem is that I don't see how to express that strategy correctly with the vocabulary in this post, and I don't understand the notation in those posts well enough to tell if they express that idea correctly.
For example, I'm not comfortable with how Eliezer framed it. If my strategy is "cooperate iff they play cooperate iff I play cooperate," then won't I never cooperate with myself, because I don't have the strategy "cooperate iff they play cooperate"?
(I do think that general idea might be formalizable, but you need models of outcome dependencies that can fit inside themselves. Which maybe the notation I don't understand is powerful enough to do.)
"Cooperate iff they play cooperate iff I play cooperate" is imprecise. Your complaint seems to stem from interpreting it to pay too much attention to the details of the algorithms.* One reason to use counterfactuals is to avoid this. One interpretation of the slogan is "A()=C iff A()=B()." This has symmetry, so the agent cooperates with itself. However, it this is vulnerable to self-fulfilling prophecy against the cooperation rock (ie, it can choose either option) and its actions against the defection rock depend on implementation details. It is better to interpret it as cooperate if both counterfactuals "A()=C => B()=C" and "A()=D => B()=D" are true. Again, this has symmetry to allow it to cooperate with itself, but it defects against rocks. In fact, an agent like from this post pretty much derives this strategy and should cooperate with an agent that comes pre-programmed with this strategy. The problem is whether the counterfactuals are provable. Here is a thread on the topic.
* Another example of agents paying too much attentions to implementation details is that we usually want to exclude agents that punish other agents for their choice of decision theory, rather than their decisions.
Aren't the implementation details sort of the whole point? I mean, suppose you code a version of cooperate iff "A()=C => B()=C and A()=D => B()=D" and feed it a copy of itself. Can it actually recognize that it should cooperate? Or does it think that the premise "A()=C" implies a cooperation rock, which it defects against?
That is, if the algorithm is "create two copies of your opponent, feed them different inputs, and combine their output by this formula," when you play that against itself it looks like you get 2^t copies, where t is how long you ran it before it melted down.
I see how one could code a cliquebot that only cooperated with someone with its source code, and no one else. I don't see how one could formalize this general idea- of trying D,C then C,C then D,D (and never playing C,D)- in a way that halts and recognizes different implementations of the same idea. The problem is that the propositions don't seem to cleanly differentiate between code outcomes, like "A()=C", and code, like "A()=C => B()=C".
No, the algorithm is definitely not to feed inputs in simulation. The algorithm is to prove things about the interaction of the two algorithms. For more detail, see the post that is the parent of the comment I linked.
"A()=C => B()=C" is a statement of propositional logic, not code. It is material implication, not conditional execution.
This is possibly just ignorance on my part, but as far as I can tell what that post does is take the difficult part and put it in a black box, which leaves me no wiser about how this works.
Code was the wrong phrase for me to use- I should have gone with my earlier "outcome dependency," which sounds like your "material implication."
I think I've found my largest stumbling block: when I see "B()=C", it matters to me what's in the (), i.e. the inputs to B. But if B is a copy of this algorithm, it needs more information than "A()=C" in order to generate an output. Or, perhaps more precisely, it doesn't care about the "=C" part and makes its decision solely on the "A()" part. But then I can't really feed it "A()=C" and "A()=D" and expect to get different results.
That is, counterfactuals about my algorithm outputs are meaningless if the opponent's algorithm judges me based on my algorithm, not my outputs. Right? Or am I missing something?
The input to A and B is always the same, namely the source code of A and B and the universe. In fact, often we consider all that to be hard coded and there to be no input, hence the notation A().
Could I rephrase your last paragraph as "counterfactuals about the output of my algorithm are meaningless since my algorithm only has one output"? Yes, this is a serious problem in making sense of strategies like "cooperate iff the opponent cooperates iff I do." Either my opponent cooperates or not. That is a fact about math. How could it be different?
And yet, in the post I linked to, Cousin It managed to set up a situation in which there are provable theorems about what happens in the counterfactuals. Still, that does not answer the question of the meaning of "logical counterfactuals." They seems to match the ordinary language use of counterfactuals, as in the above strategy.
If the "black box" was the provability oracle, then consider searching for proofs of bounded lengths, which is computable. If the black box was formal logic (Peano arithmetic), well, it doesn't look very black to me, but it is doing a lot of work.
I don't think you should be judging material implication by its name. Perhaps by "outcome dependency" you mean "logical counterfactual." Then, yes, counterfactuals are related to such implications, by definition. The arrow is just logical implication. There are certain statements of logic that we interpret as counterfactuals.
That looks right to me.
What do you mean by the first sentence? That the opponent's move is one of {C, D}? Or that we are either in the world where C is played with probability 1, or the world where D is played with probability 1?
It seems to me that pernicious algorithms could cause cycling if you aren't careful, and it's not specified what happens if the program fails to output an order. If you tack on some more assumptions, I'm comfortable with saying that the opponent must play either C or D, but the second seems outlawed by mixed strategies being possible.
Peano arithmetic is beyond my formal education. I think I understand most of what it's doing but the motivation of why one would turn to it is not yet clear.
So, "A()=C => B()=C" means "it cannot be the case that I cooperate and they defect" and "A()=D => B()=D" means "it cannot be the case that I defect and they cooperate," and if both of those are true, then the algorithm cooperates. Right? (The first is to not be taken advantage of, and the second is to take advantage of those who will let you.)
What I'm not seeing is how you take two copies of that, evaluate them, and get "C,C" without jumping to the end by wishful thinking. It looks to me like the counterfactuals multiply without end, because every proof requires four subproofs that are just as complicated. (This is where it looks like the provability oracle is being called in, but I would rather avoid that if possible.)
You'll see by Part III. (Hint: when you see an infinite regress, consider a clever quining.)
Ok. I see how to do quining to only cooperate with copies of your source code, but I don't yet see how to do quining with outcome dependencies.
By the way, I was too blasé in the grandparent comment. I have a model of TDT that does almost the right thing, but I haven't figured out how to quine it so that it does exactly the right thing. (Technically, I can't be sure it'll still work if I do get the right quine in, but my gut feeling is that it will.)
So, I'm going to be pretty disappointed if this whole affair is just someone inventing a meta-cliquebot that's actually just a cliquebot.
Trust me, there's better stuff than that. (In particular, I've got a more nicely formalized write-up of UDT. It's just TDT I'm having issues with.)
The quining part usually seems to be the tricky part doesn't it?