Wei_Dai comments on Secrets of the eliminati - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (252)
I wonder:
if you had an agent that obviously did have goals (let's say, a player in a game, whose goal is to win, and who plays the optimal strategy) could you deduce those goals from behavior alone?
Let's say you're studying the game of Connect Four, but you have no idea what constitutes "winning" or "losing." You watch enough games that you can map out a game tree. In state X of the world, a player chooses option A over other possible options, and so on. From that game tree, can you deduce that the goal of the game was to get four pieces in a row?
I don't know the answer to this question. But it seems important. If it's possible to identify, given a set of behaviors, what goal they're aimed at, then we can test behaviors (human, animal, algorithmic) for hidden goals. If it's not possible, that's very important as well; because that means that even in a simple game, where we know by construction that the players are "rational" goal-maximizing agents, we can't detect what their goals are from their behavior.
That would mean that behaviors that "seem" goal-less, programs that have no line of code representing a goal, may in fact be behaving in a way that corresponds to maximizing the likelihood of some event; we just can't deduce what that "goal" is. In other words, it's not as simple as saying "That program doesn't have a line of code representing a goal." Its behavior may encode a goal indirectly. Detecting such goals seems like a problem we would really want to solve.
One method that would work for this example is to iterate over all possible goals in ascending complexity, and check which one would generate that game tree. How to apply this idea to humans is unclear. See here for a previous discussion.
Ok, computationally awful for anything complicated, but possible in principle for simple games. That's good, though; that means goals aren't truly invisible, just inconvenient to deduce.
I think, actually, because we hardly ever play with optimal strategy goals are going to be nigh impossible to deduce. Would such a end-from-means deduction even work if the actor was not using the optimal strategy? Because humans only do so in games on the level of tic-tac-toe (the more rational ones maybe in more complex situations, but not by much), and as for machines that could utilize optimal strategy, we've just excluded them from even having such 'goals'.
If each game is played to the end (no resignations, at least in the sample set) then presumably you could make good initial guesses about the victory condition by looking at common factors in the final positions. A bit like zendo. It wouldn't solve the problem, but it doesn't rely on optimal play, and would narrow the solution space quite a bit.
e.g. in the connect-four example, all final moves create a sequence of four or more in a row. Armed with that hypothesis, you look at the game tree, and note that all non-final moves don't. So you know (with reasonably high confidence) that making four in a row ends the game. How to figure out whether it wins the game or loses it is an exercise for the reader.
(mental note, try playing C4 with the win condition reversed and see if it makes for an interesting game.)
there's always heuristics, for example seeing that the goal of making three in a row fits the game tree well suggests considering goals of the form "make n in a row" or at least "make diagonal and orthogonal versions of some shape"