JamesAndrix comments on Desirable Dispositions and Rational Actions - Less Wrong

13 Post author: RichardChappell 17 August 2010 03:20AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (180)

You are viewing a single comment's thread. Show more comments above.

Comment author: JamesAndrix 17 August 2010 07:15:18AM *  5 points [-]

So far, the only effect that all the Omega-talk has had on me is to make me honestly suspect that you guys must be into some kind of mind-over-matter quantum woo.

...What?

Also, it doesn't matter if he's impossible. He's an easy way to tack on arbitrary rules to hypotheticals without overly tortured explanations, because people are used to getting arbitrary rules from powerful agents.

It's also impossible for a perfectly Absent Minded Driver to come to one of only two possible intersections with 3 destinations with known payoffs and no other choices. To say nothing of the impossibly horrible safety practices of our nation's hypothetical train system.

Comment author: Perplexed 17 August 2010 07:47:51AM 0 points [-]

it doesn't matter if he's impossible

Are you sure? I'm not objecting to the arbitrary payoffs or complaining because he doesn't seem to be maximizing his own utility. I'm objecting to his ability to predict my actions. Give me a scenario which doesn't require me to assign a non-zero prior to woo and in which a revisionist decision theory wins. If you can't, then your "improved" decision theory is no better than woo itself.

Regarding the Absent Minded Driver, I didn't recognize the reference. Googling, I find a .pdf by one of my guys (Nobelist Robert Aumann) and an LW article by Wei-Dai. Cool, but since it is already way past my bedtime, I will have to read them in the morning and get back to you.

Comment author: thomblake 17 August 2010 05:55:23PM 6 points [-]

I'm objecting to his ability to predict my actions. Give me a scenario which doesn't require me to assign a non-zero prior to woo

The only 'woo' here seems to be your belief that your actions are not predictable (even in principle!). Even I can predict your actions within some tolerances, and we do not need to posit that I am a superintelligence! Examples: you will not hang yourself to death within the next five minutes, and you will ever make another comment on Less Wrong.

Comment author: Perplexed 17 August 2010 08:49:42PM -1 points [-]

...you will ever make another comment on Less Wrong.

"ever"? No, "never".

Comment author: thomblake 18 August 2010 12:43:44AM 2 points [-]

Wha?

In case it wasn't clear, it was a one-off prediction and I was already correct.

Comment author: Perplexed 19 August 2010 02:51:18AM 2 points [-]

In case mine wasn't clear, it was a bad Gilbert & Sullivan joke. Deservedly downvoted. Apparently.

Comment author: Alicorn 19 August 2010 02:55:51AM 4 points [-]

You need a little more context/priming or to make the joke longer for anyone to catch this. Or you need to embed it in a more substantive and sensible reply. Otherwise it will hardly ever work.

Comment author: Perplexed 19 August 2010 04:56:38AM 1 point [-]
Comment author: Alicorn 19 August 2010 04:59:51AM 1 point [-]

I'd call that a long joke, wouldn't you?

Comment author: Perplexed 19 August 2010 05:05:29AM 1 point [-]

See what I mean? I made it long and it still didn't work. :)

Comment author: Cyan 19 August 2010 04:51:22AM 0 points [-]

I wasn't sure, so I held off posting my reply (a decision I now regret). It would have been, "Well, hardly ever."

Comment author: Kingreaper 18 August 2010 12:53:56AM *  2 points [-]

I'm objecting to his ability to predict my actions.

Why? What about you is fundamentally logically impossible to predict?

Do you not find that you often predict the actions of others? (ie. giving them gifts that you know they'll like) And that others predict your reactions? (ie. choosing not to give you spider-themed horror movies if you're arachnophobic)

Comment author: Perplexed 17 August 2010 06:56:35PM 0 points [-]

Ok, I've read the paper(most of it) and Wei-Dai's article now. Two points.

  1. In a sense, I understand how you might think that the Absent Minded Driver is no less contrived and unrealistic than Newcomb's Paradox. Maybe different people have different intuitions as to what toy examples are informative and which are misleading. Someone else (on this thread?) responded to me recently with the example of frictionless pulleys and the like from physics. All I can tell you is that my intuition tells me that the AMD, the PD, frictionless pulleys,and even Parfit's Hitchhiker all strike me as admirable teaching tools, whereas Newcomb problems and the old questions of irrestable force vs immovable object in physics are simply wrong problems which can only create confusion.

  2. Reading Wei-Dai's snarking about how the LW approach to decision theory (with zero published papers to date) is so superior to the confusion in which mere misguided Nobel laureates struggle - well, I almost threw up. It is extremely doubtful that I will continue posting here for long.

Comment author: Wei_Dai 18 August 2010 12:00:11AM 5 points [-]

It wasn't meant to be a snark. I was genuinely trying to figure out how the "LW approach" might be superior, because otherwise the most likely explanation is that we're all deluded in thinking that we're making progress. I'd be happy to take any suggestions on how I could have reworded my post so that it sounded less like a snark.

Comment author: Perplexed 20 August 2010 11:39:52PM *  6 points [-]

Wei-Dai wrote a post entitled The Absent-Minded Driver which I labeled "snarky". Moreover, I suggested that the snarkiness was so bad as to be nauseating, so as to drive reasonable people to flee in horror from LW and SAIA. I here attempt to defend these rather startling opinions. Here is what Wei-Dai wrote that offended me:

This post examines an attempt by professional decision theorists to treat an example of time inconsistency, and asks why they failed to reach the solution (i.e., TDT/UDT) that this community has more or less converged upon. (Another aim is to introduce this example, which some of us may not be familiar with.) Before I begin, I should note that I don't think "people are crazy, the world is mad" (as Eliezer puts it) is a good explanation. Maybe people are crazy, but unless we can understand how and why people are crazy (or to put it more diplomatically, "make mistakes"), how can we know that we're not being crazy in the same way or making the same kind of mistakes?

The paper that Wei-Dai reviews is "The Absent-Minded Driver" by Robert J. Aumann, Sergiu Hart, and Motty Perry. Wei-Dai points out, rather condescendingly:

(Notice that the authors of this paper worked for a place called Center for the Study of Rationality, and one of them won a Nobel Prize in Economics for his work on game theory. I really don't think we want to call these people "crazy".)

Wei-Dai then proceeds to give a competent description of the problem and the standard "planning-optimality" solution of the problem. Next comes a description of an alternative seductive-but-wrong solution by Piccione and Rubinstein. I should point that everyone - P&R, Aumann, Hart, and Perry, Wei-Dai, me, and hopefully you who look into this - realizes that the alternative P&R solution is wrong. It gets the wrong result. It doesn't win. The only problem is explaining exactly where the analysis leading to that solution went astray, and in explaining how it might be modified so as to go right. Making this analysis was, as I see it, the whole point of both papers - P&R and Aumann et al. Wei-Dai describes some characteristics of Aumann et al's corrected version of the alternate solution. Then he (?) goes horribly astray:

In problems like this one, UDT is essentially equivalent to planning-optimality. So why did the authors propose and argue for action-optimality despite its downsides ..., instead of the alternative solution of simply remembering or recomputing the planning-optimal decision at each intersection and carrying it out?

But, as anyone who reads the paper carefully should see, they weren't arguing for action-optimality as the solution. They never abandoned planning optimality. Their point is that if you insist on reasoning in this way, (and Seldin's notion of "subgame perfection" suggests some reasons why you might!) then the algorithm they call "action-optimality" is the way to go about it.

But Wei-Dai doesn't get this. Instead we get this analysis of how these brilliant people just haven't had the educational advantages that LW folks have:

Well, the authors don't say (they never bothered to argue against it), but I'm going to venture some guesses:

  • That solution is too simple and obvious, and you can't publish a paper arguing for it.
  • It disregards "the probability of being at X", which intuitively ought to play a role.
  • The authors were trying to figure out what is rational for human beings, and that solution seems too alien for us to accept and/or put into practice.
  • The authors were not thinking in terms of an AI, which can modify itself to use whatever decision theory it wants to.
  • Aumann is known for his work in game theory. The action-optimality solution looks particularly game-theory like, and perhaps appeared more natural than it really is because of his specialized knowledge base.
  • The authors were trying to solve one particular case of time inconsistency. They didn't have all known instances of time/dynamic/reflective inconsistencies/paradoxes/puzzles laid out in front of them, to be solved in one fell swoop.

Taken together, these guesses perhaps suffice to explain the behavior of these professional rationalists, without needing to hypothesize that they are "crazy". Indeed, many of us are probably still not fully convinced by UDT for one or more of the above reasons.

Let me just point out that the reason it is true that "they never argued against it" is that they had already argued for it. Check out the implications of their footnote #4!

Ok, those are the facts, as I see them. Was Wei-Dai snarky? I suppose it depends on how you define snarkiness. Taboo "snarky". I think that he was overbearingly condescending without the slightest real reason for thinking himself superior. "Snarky" may not be the best one-word encapsulation of that attitude, but it is the one I chose. I am unapologetic. Wei-Dai somehow came to believe himself better able to see the truth than a Nobel laureate in the Nobel laureate's field. It is a mistake he would not have made had he simply read a textbook or taken a one-semester course in the field. But I'm coming to see it as a mistake made frequently by SIAI insiders.

Let me point out that the problem of forgetful agents may seem artificial, but it is actually extremely important. An agent with perfect recall playing the iterated PD, knowing that it is to be repeated exactly 100 times, should rationally choose to defect. On the other hand, if he cannot remember how many iterations remain to be played, and knows that the other player cannot remember either, should cooperate by playing Tit-for-Tat or something similar.

Well, that is my considered response on "snarkiness". I still have to respond on some other points, and I suspect that, upon consideration, I am going to have to eat some crow. But I'm not backing down on this narrow point. Wei-Dai blew it in interpreting Aumann's paper. (And also, other people who know some game theory should read the paper and savor the implications of footnote #4. It is totally cool).

Comment author: Tyrrell_McAllister 20 August 2010 11:49:27PM *  5 points [-]

The paper that Wei-Dai reviews is "The Absent-Minded Driver" by Robert J. Aumann, Sergiu Hart, and Motty Perry. Wei-Dai points out, rather condescendingly:

(Notice that the authors of this paper worked for a place called Center for the Study of Rationality, and one of them won a Nobel Prize in Economics for his work on game theory. I really don't think we want to call these people "crazy".)

How is Wei Dai being condescending there? He's pointing out how weak it is to dismiss people with these credentials by just calling them crazy. ETA: In other words, it's an admonishment directed at LWers.

That, at any rate, was my read.

Comment author: Perplexed 21 August 2010 12:24:20AM 1 point [-]

I'm sure it would be Wei-Dai's read as well. The thing is, if Wei-Dai had not mistakenly come to the conclusion that the authors are wrong and not as enlightened as LWers, that admonishment would not be necessary. I'm not saying he condescends to LWers. I say he condescends to the rest of the world, particularly game theorists.

Comment author: wedrifid 21 August 2010 01:27:07AM 2 points [-]

Are you essentially saying you are nauseated because Wei Dai disagreed with the authors?

Comment author: Perplexed 21 August 2010 03:54:51AM -1 points [-]

No. Not at all. It is because he disagreed through the wrong channels, and then proceeded to propose rather insulting hypotheses as to why they had gotten it wrong.

Just read that list of possible reasons! And there are people here arguing that "of course we want to analyze the cause of mistakes". Sheesh. No wonder folks here are so in love with Evolutionary Psychology.

Ok, I'm probably going to get downvoted to hell because of that last paragraph. And, you know what, that downvoting impulse due to that paragraph pretty much makes my case for why Wei Dai was wrong to do what he did. Think about it.

Comment author: wedrifid 21 August 2010 05:02:45AM 0 points [-]

Ok, I'm probably going to get downvoted to hell because of that last paragraph. And, you know what, that downvoting impulse due to that paragraph pretty much makes my case for why Wei Dai was wrong to do what he did. Think about it.

Interestingly enough I think that it is this paragraph that people will downvote, and not the one above. Mind you, the premise in "No wonder folks here are so in love with Evolutionary Psychology." does seem so incredibly backward that I almost laughed.

No. Not at all. It is because he disagreed through the wrong channels, and then proceeded to propose rather insulting hypotheses as to why they had gotten it wrong.

I can understand your explanation here. Without agreeing with it myself I can see how it follows from your premises.

Comment author: Tyrrell_McAllister 21 August 2010 12:37:51AM *  1 point [-]

I'm having trouble following you.

I'm sure it would be Wei-Dai's read as well.

Are you saying that you read him differently, and that he would somehow be misinterpreting himself?

The thing is, if Wei-Dai had not mistakenly come to the conclusion that the authors are wrong and not as enlightened as LWers, that admonishment would not be necessary.

The admonishment is necessary if LWers are likely to wrongly dismiss Aumann et al. as "crazy". In other words, to think that the admonishment is necessary is to think that LWers are too inclined to dismiss other people as crazy

I'm not saying he condescends to LWers. I say he condescends to the rest of the world, particularly game theorists.

I got that. Who said anything about condescending to LWers?

Comment author: Wei_Dai 21 August 2010 12:56:49AM *  2 points [-]

Preliminary notes: You can call me "Wei Dai" (that's firstname lastname). "He" is ok. I have taken a graduate level course in game theory (where I got a 4.0 grade, in case you suspect that I coasted through it), and have Fudenberg and Tirole's "Game Theory" and Joyce's "Foundations of Causal Decision Theory" as two of the few physical books that I own.

Their point is that if you insist on reasoning in this way, (and Seldin's notion of "subgame perfection" suggests some reasons why you might!) then the algorithm they call "action-optimality" is the way to go about it.

I can't see where they made this point. At the top of Section 4, they say "How, then, should the driver reason at the action stage?" and go on directly to describe action-optimality. If they said something like "One possibility is to just recompute and apply the planning-optimal solution. But if you insist ..." please point out where. See also page 108:

In our case, there is only one player, who acts at different times. Because of his absent-mindedness, he had better coordinate his actions; this coordination can take place only before he starts out}at the planning stage. At that point, he should choose p*1 . If indeed he chose p*1 , there is no problem. If by mistake he chose p*2 or p*3 , then that is what he should do at the action stage. (If he chose something else, or nothing at all, then at the action stage he will have some hard thinking to do.)

If Aumann et al. endorse using planning-optimality at the action stage, why would they say the driver has some hard thinking to do? Again, why not just recompute and apply the planning-optimal solution?

I also do not see how subgame perfection is relevant here. Can you explain?

Let me just point out that the reason it is true that "they never argued against it" is that they had already argued for it. Check out the implications of their footnote #4!

This footnote?

Formally, (p*, p*) is a symmetric Nash equilibrium in the (symmetric) game between ‘‘the driver at the current intersection’’ and ‘‘the driver at the other intersection’’ (the strategic form game with payoff functions h.)

Since p* is the action-optimal solution, they are pointing out the formal relationship between their notion of action-optimality and Nash equilibrium. How is this footnote an argument for "it" (it being "recomputing the planning-optimal decision at each intersection and carrying it out")?

Comment author: Perplexed 21 August 2010 01:26:16AM 3 points [-]

I have taken a graduate level course in game theory (where I got a 4.0 grade, in case you suspect that I coasted through it), and have Fudenberg and Tirole's "Game Theory" and Joyce's "Foundations of Causal Decision Theory" as two of the few physical books that I own.

Ok, so it is me who is convicted of condescending without having the background to justify it. :( FWIW I have never taken a course, though I have been reading in the subject for more than 45 years.

My apologies. More to come on the substance.

Comment author: Perplexed 21 August 2010 02:19:09AM *  1 point [-]

Relevance of Subgame perfection. Seldin suggested subgame perfection as a refinement of Nash equilibrium which requires that decisions that seemed rational at the planning stage ought to still seem rational at the action stage. This at least suggests that we might want to consider requiring "subgame perfection" even if we only have a single player making two successive decisions.

Relevance of Footnote #4. This points out that one way to think of problems where a single player makes a series of decisions is to pretend that the problem has a series of players making the decisions - one decision per player, but that these fictitious players are linked in that they all share the same payoffs (but not necessarily the same information). This is a standard "trick" in game theory, but the footnote points out that in this case, since both fictitious players have the same information (because of the absent-mindedness) the game between driver-version-1 and driver-version-2 is symmetric, and that is equivalent to the constraint p1 = p2.

Does Footnote #4 really amount to "they had already argued for [just recalculating the planning-optimal solution]"? Well, no it doesn't really. I blew it in offering that as evidence. (Still think it is cool, though!)

Do they "argue for it" anywhere else? Yes, they do. Section 5, where they apply their methods to a slightly more complicated example, is an extended argument for the superiority of the planning-optimal solution to the action-optimal solutions. As they explain, there can be multiple action-optimal solutions, even if there is only one (correct) planning-optimal solution, and some of those action-optimal solutions are wrong *even though they appear to promise a higher expected payoff than does the planning optimal solution.

I can't see where they made this point. At the top of Section 4, they say "How, then, should the driver reason at the action stage?" and go on directly to describe action-optimality. If they said something like "One possibility is to just recompute and apply the planning-optimal solution. But if you insist ..." please point out where. See also page 108:

In our case, there is only one player, who acts at different times. Because of his absent-mindedness, he had better coordinate his actions; this coordination can take place only before he starts out at the planning stage. At that point, he should choose p1 . If indeed he chose p1 , there is no problem. If by mistake he chose p2 or p3 , then that is what he should do at the action stage. (If he chose something else, or nothing at all, then at the action stage he will have some hard thinking to do.)

If Aumann et al. endorse using planning-optimality at the action stage, why would they say the driver has some hard thinking to do? Again, why not just recompute and apply the planning-optimal solution?

I really don't see why you are having so much trouble parsing this. "If indeed he chose p1 , there is no problem" is an endorsement of the correctness of the planning-optimal solution. The sentence dealing with p2 and p3 asserts that, if you mistakenly used p2 for your first decision, then you best follow-up is to remain consistent and use p2 for your remaining two choices. The paragraph you quote to make your case is one I might well choose myself to make my case.

Edit: There are some asterisks in variable names in the original paper which I was unable to make work with the italics rules on this site. So "p2" above should be read as "p <asterisk> 2"

Comment author: Wei_Dai 21 August 2010 02:27:43AM *  1 point [-]

It is a statement that the planning-optimal action is the correct one, but it's not an endorsement that it is correct to use the planning-optimality algorithm to compute what to do when you are already at an intersection. Do you see the difference?

ETA (edited to add): According to my reading of that paragraph, what they actually endorse is to compute the planning-optimal action at START, remember that, then at each intersection, compute the set of action-optimal actions, and pick the element of the set that coincides with the planning-optimal action.

BTW, you can use "\" to escape special characters like "*" and "_".

Comment author: Perplexed 21 August 2010 02:42:44AM *  1 point [-]

Thx for the escape character info. That really ought to be added to the editing help popup.

Yes, I see the difference. I claim that what they are saying here is that you need to do the planning-optimal calculation in order to find p*1 as the unique best solution (among the three solutions that the action-optimal method provides). Once you have this, you can use it at the first intersection. But at the other intersections, you have some choices: either recalculate the planning-optimal solution each time, or write down enough information so that you can recognize that p*1 is the solution you are already committed to among the three (in section 5) solutions returned by the action-optimality calculation.

ETA in response to your ETA. Yes they do. Good point. I'm pretty sure there are cases more complicated than this perfectly amnesiac driver where that would be the only correct policy. (ETA:To be more specific, cases where the planning-optimal solution is not a sequential equilibrium). But then I have no reason to think that UDT would yield the correct answer in those more complicated cases either.

Comment author: Wei_Dai 21 August 2010 03:20:54AM 0 points [-]

I deleted my previous reply since it seems unnecessary given your ETA.

I'm pretty sure there are cases more complicated than this perfectly amnesiac driver where that would be the only correct policy. (ETA:To be more specific, cases where the planning-optimal solution is not a sequential equilibrium).

What would be the only correct policy? What I wrote after "According to my reading of that paragraph"? If so, I don't understand your "cases where the planning-optimal solution is not a sequential equilibrium". Please explain.

Comment author: JamesAndrix 17 August 2010 10:21:16PM *  3 points [-]

1A. It may well be a wrong problem. if so it ought to be dissolved.

1B. If so, many theorists (including presumably nobel prize winners), have missed it since 1969.

1C. Your intuition should not be considered a persuasive argument, even by you.

2 . Even ignoring any singularitarian predictions, given the degree to which knowledge acceleration has already advanced, you should expect to see cases where old standards are blown away with seemingly little effort.

Maybe this isn't one of those cases, but it should not surprise you if we learn that humanity as a whole has done more decision theory in the past few years than in all previous history.

Given that the similar accelerations are happening in many fields, there are probably several past-nobel-level advances by rank amateurs with no special genius.

Comment author: cousin_it 17 August 2010 09:00:33PM *  3 points [-]

In the comment section of Wei Dai's post in question, taw and pengvado completed his solution so conclusively that if you really take the time to understand the object level (instead of the meta level where some people are apriori smarter because they won a prize), you can't help but feel the snarking was justified :-)

Comment author: Perplexed 19 August 2010 02:49:06AM 2 points [-]

OK, I've got some big guns pointed at me, so I need to respond. I need to respond intelligently and carefully. That will take some time. Within a week at most.

Comment author: Wei_Dai 18 August 2010 10:34:34PM 1 point [-]

A couple more comments:

  1. For a long time I also didn't think that Newcomb's Problem was worth thinking about. Then I read something by Eliezer that pointed out the connection to Prisoner's Dilemma. (According to Prisoners' Dilemma is a Newcomb Problem, others saw the connection as early as 1969.) See also my Newcomb's Problem vs. One-Shot Prisoner's Dilemma where I explored how they are different as well.
  2. I'm curious what you now think about my perspective on the Absent Minded Driver, on both the object level and meta level (assuming I convinced you that it wasn't meant to be a snark). You're the only person who has indicated actually having read Aumann et al.'s paper.
Comment author: Perplexed 20 August 2010 11:58:24PM 2 points [-]

The possible connection between Newcomb and PD is seen by anyone who considers Jeffrey's version of decision theory (EDT). So I have seen it mentioned by philosophers long before I had heard of EY. Game theorists, of course, reject this, unless they are analysing games with "free precommitment". I instinctively reject it too, for what that is worth, though I am beginning to realize that publishing your unchangeable source code is pretty-much equivalent to free precommitment.

My analysis of your analysis of AMD is in my response to your comment below.

Comment author: Kevin 18 August 2010 01:16:37AM *  0 points [-]

Give me a scenario which doesn't require me to assign a non-zero prior to woo and in which a revisionist decision theory wins.

Omega is a perfect super-intelligence, existing in a computer simulation like universe that can be modeled by a set of physical laws and a very long string of random numbers. Omega knows the laws and the numbers.