fubarobfusco comments on Decision Theories: A Semi-Formal Analysis, Part I - Less Wrong

21 Post author: orthonormal 24 March 2012 04:01PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (90)

You are viewing a single comment's thread. Show more comments above.

Comment author: Irgy 03 April 2012 03:22:57AM *  1 point [-]

Well, this article puts my confusion from the primer into context, and I think I understand the issue now.

The "problem" (and it's ultimately not a problem, but I'll get to that) I have is that the game these programs are playing is ill-posed, in that it's not a closed system. It might appear as if they're playing, for example, Prisoners' Dilemna, but with access to each other's source code they're really playing some sort of Prisoners' Meta-Dilemna, a fundamentally different game for all that it might seem superficially similar. Now I'm not enough of an expert to tell you exactly what defines a "well posed" game, but I'm quite confident this is not it. Or maybe I'm abusing the terminology a bit but hopefully my point is clear enough (if not I'll elaborate). The distinction though is that the mechanism by which you make your decision on how to play should be outside the game itself, and therefore only observed indirectly.

One property this metagame has is that there is fundamentally no "correct" answer for what "decision theory" the programs should use. The proof is simple, imagine you have a correct answer D. Imagine then that this program faces a population of the following opponents (I'll use the Prisoners' Dilemna subgame as a simple example):

  • Look at your opponent's source code.
  • If your opponent is using D, defect.
  • Otherwise co-operate.

D is in fact the single worst strategy in this scenario by quite a margin, regardless of what it is.

But that's fine, why would we even expect there to be a single correct strategy for every possible set of opponents? Of course there isn't. This is where we take a step back to a game that is actually well posed and for which Naive Decision Theory works fine - the programmers' game. Applying it there would go something like this:

  • Create a probability distribution over the (potentially infinite) space of all possible programs that yours might be up against.
  • Find the program which maximises the expected utility against that distribution of opponents.

For an infinite space of possible programs the second step might technically be impossible to perform optimally, but you do the best you can. There is of course a devil in the details of the first step in how you create that distribution, and when you look into it ultimately it's a game between programmers, with its own Nash Equilibria and co-operative strategies and so on, not just a simple decision. But until you start having programmers read each others' minds (in which case you've really just pointlessly shifted the whole thing up one meta-level) it's at least a well-posed (if complex) game that the ordinary approaches should work for.

As I said earlier though the fact that I might call these games ill-posed doesn't mean they're not interesting. I'll be fascinated to read about the strategies that people have devised for them and the attempts to be as general as possible. I'm a little concerned by the fact that the arguments so far seem to be along the lines of "You could do X, but then what if Y happens?", purely because as I've shown above there is ultimately no end to that chain of logic. But maybe you'll eventually step outside of it. My guess is that ultimately the "smarter" program wins. Certainly my generic counter-strategy has to generally be more complex (and therefore "smarter"? Maybe) than the "D" strategy it punishes in order to identify "D" in the first place.

In any case I look forward to finding out.

Comment author: fubarobfusco 03 April 2012 08:52:49AM -1 points [-]

This is where we take a step back to a game that is actually well posed and for which Naive Decision Theory works fine - the programmers' game. Applying it there would go something like this:

  • Create a probability distribution over the (potentially infinite) space of all possible programs that yours might be up against.
  • Find the program which maximises the expected utility against that distribution of opponents.

... I believe this is called AIXI.

(Not exactly, but ...)