Ahhh, a human interest post. Well, sort of. At least it has something besides math-talk.

In the extreme programming community they have a saying, "three strikes and you refactor". The rationalist counterpart would be this: once you've noticed the same trap twice, you'd be stupid to fall prey to it the third time.

Strike one is Eliezer's post The Crackpot Offer. Child-Eliezer thought he'd overthrown Cantor's theorem, then found an error in his reasoning, but felt a little tempted to keep on trying to overthrow the damned theorem anyway. The right and Bayesian thing to do, which he ended up doing, was to notice that once you've found your mistake there's no longer any reason to wage war on an established result.

Strike two is Emile's comment on one of my recent posts:

I find it annoying how my brain keeps saying "hah, I bet I could" even though I explained to it that it's mathematically provable that such an input always exists. It still keeps coming up with "how about this clever encoding?, blablabla" ... I guess that's how you get cranks.

Strike three is... I'm a bit ashamed to say that...

...strike three is about me. And maybe not only me.

There's a certain vibe in the air surrounding many discussions of decision theory. It sings: maybe the central insight of game theory (that multiplayer situations are not reducible to single-player ones) is wrong. Maybe the slightly-asymmetrized Prisoner's Dilemma has a single right answer. Maybe you can get a unique solution to dividing a cake by majority vote if each individual player's reasoning is "correct enough". But honestly, where exactly is the Bayesian evidence that merits anticipating success on that path? Am I waging war on clear and simple established results because of wishful thinking? Are my efforts the moral equivalent of counting the reals or proving the consistency of PA within PA?

An easy answer is that "we don't know" if our inquiries will be fruitful, so you can't prove I must stop. But that's not the Bayesian answer. The Bayesian answer is to honestly tally up the indications that future success is likely, and stop if they are lacking.

So I want to ask an object-level question and a meta-level question:

1) What evidence supports the intuition that, contra game theory, single-player decision theory has a "solution"?

2) If there's not much evidence supporting that intuition, how should I change my actions?

(I already have tentative answers to both questions, but am curious what others think. Note that you can answer the second question without knowing any math :-))

New Comment
26 comments, sorted by Click to highlight new comments since:

I'm not really sure what you're asking here. The way I see it, we currently do not have any game theory or single-player decision theory that is both well-defined and do not have obvious problems. Take conventional game theory for example. Before you can use it, you have to tell it where "agents" are located in the world. How do we distinguish between an agent and a non-agent (some piece of machinery that happens to exist in the world)?

If you're asking whether you should work on these problems, well, somebody has to, why not you? If you're asking whether you should work on single-player decision theory and hope that multi-player solutions fall out, or work on ideas or math that may be more directly applicable to multi-player game, that itself seems like a hard problem involving lots of logical uncertainty. Given that we don't have good theories about what to do when faced with such logical uncertainty, it's not clear that doing lots of explicit analysis is better than just following your gut instincts.

Right, we don't know how to choose the best direction of inquiry under logical uncertainty. But unexamined gut instinct can lead you down a blind alley, as I tried to show in the post.

I spent way too much time trying to fix the current proof-theoretic algorithms and the effort has mostly failed. Maybe it failed because I wishfully thought that a good algorithm must exist without any good reason, like Eliezer with his failed disproof of Cantor's theorem.

Now I'm asking, perhaps in a roundabout way, what directions of inquiry could have better odds of making things clearer. My current answer is to reallocate time to searching for impossibility proofs (like W/U/A but hopefully better) and studying multiplayer games (like dividing a cake by majority vote, or Stuart's problem about uncertainty over utility functions). I'd like to hear others' opinions, though. Especially yours. What direction does your gut instinct consider the most fruitful?

ETA: Wei has replied by private communication :-)

As it happens, I have some unpublished unsolvability results against decision theory that I've been meaning to write up. And also a reduction from game theory to single-player decision theory. I'll post these tonight when I have some time to separate them from their current context.

Tonight, huh? Please do! I've been waiting for some writeups from you for awhile now :-)

After a thoroughly confused exchange with Vladimir Nesov, I'm now realizing that I've made up enough definitions that I can't easily pluck out results, I'm going to have to finish the long and tedious part of the writeup first. So no writeup tonight. But I have resumed writing now, so hopefully that'll translate into writeups soon.

[-][anonymous]70

Good advice for proving X, especially when you're stuck: try to disprove X instead. Success has obvious (though depressing, if you were really hoping X were true) benefits, but there are benefits to failure also. Problems encountered while trying to disprove X are hints about how to prove X.

Try taking the other side for a while. Think about how to formalize and prove the idea that "single player decision theory does not have a solution.''

I am not sure I understand the question :-(

Is the "central insight of game theory" on: http://en.wikipedia.org/wiki/Game_theory ?

Surely all hypotheses about what to do potentially consider modelling the surrounding environment - including other agents, as required. So: what is the supposed distinction between "single-player decision theory" and "game theory" actually supposed to be down to?

I cannot state the question formally (that's part of the problem), but here's an informal version: is there a definition of "maximizing utility given the environment" that is in some sense "optimal" when the environment is a big tangled computer program (or some even more complex thing) containing other similarly smart agents?

I do think that "maximize utility" is a perfectly good answer to the question of what to do - though it is kind-of passing the buck. However, for most agents, the environment is vast and complex compared to them, so they surely won't behave remotely optimally. Since agent code seems to be easily copied, this "problem" may be with us for a while yet.

I don't like how much "crank" sounds like "heretic." In EY's case he kept trying to wage war on the established result even after he noticed his mistake, but the mere act of questioning - even publicly - an established result should not be called crankery.

2) If there's not much evidence supporting that intuition, how should I change my actions?

If the value is high enough it is time to shut up and do the impossible. But the importance of having these details solved is not high enough to warrant that kind of desperation. On the other hand the likelyhood also doesn't qualify as 'impossible'. We multiply instead. To do this it will be necessary to answer two further questions:

3) When I actually encounter one of these scenarios what am I (or an agent I identify with) going to do? The universe doesn't just let us opt out of making a decision just because it's impossible to make a correct one. It is physically impossible to do nothing. The aspect of the wave function representing me is going to change whether I like it or not.

4) How can I avoid getting into the unsolvable game theoretic scenarios? Can I:
a) Gain the power to take control of the whole cake and divide it however I damn well please?
b) Overpower the prison guard and release myself and my friend?

These are not just trite dismissals. Crude as it may seem gaining power for yourself really is the best way to handle many game theory decisions for yourself, that is - prevention!

Note that there is one particular instance of 'cake division' that needs a solution. That is, if you gain power by creating an FAI there is a cake that must be divided. The problem is not the same but nevertheless you need to create a solution that you will be able to successfully implement without anybody else killing you before you press the button. You must choose a preference aggregation method that does work, which can run into some similar difficulties, and democracy is ruled out. Note that this isn't something I have seen any inspiring ideas on - and it is conspicuously absent from any 'CEV' based solution that I've encountered.

I'm not sure if your question 3) sheds any light on the problem. Let's replace "solving decision theory" with "solving the halting problem". It's a provably impossible task: there's no algorithm that always gives the right result, and there's no algorithm that beats all other algorithms. What will I do if asked to solve the halting problem for some random Turing machine? Not sure... I'll probably use some dirty heuristics, even though I know they sometimes fail and there exist other heuristics that dominate mine. Shutting up and doing the impossible ain't gonna help because in this case the impossible really is impossible.

Regarding question 4), if the UDT worldview is right and you are actually a bunch of indistinguishable copies of yourself spread out all over mathematics, then the AIs built by these copies will face a coordination problem. If you code the AI wrong, these copies may wage war among themselves and lose utility as a result. I got really freaked out when Wei pointed out that possibility, but now it seems quite obvious to me.

If you code the AI wrong, these copies may wage war among themselves and lose utility as a result. I got really freaked out when Wei pointed out that possibility, but now it seems quite obvious to me.

Excuse me for being dense, but how would these AI's go about waging war on each other if the are in causally distinct universes? I'm sure there's some clever way, but I can't see what it is.

I don't understand precisely enough what "causally distinct" means, but anyway the AIs don't have to be causally distinct. If our universe is spatially infinite (which currently seems likely, but not certain), it contains infinitely many copies of you and any AIs that you build. If you code the AI wrong (e.g. using the assumption that it's alone and must fend for itself), its copies will eventually start fighting for territory.

Isn't it much more likely to encounter many other, non-copy AI's prior to meeting itself?

[-][anonymous]00

If you code the AI wrong, it can end up fighting these non-copy AIs too, even though they may be similar enough to ours to make acausal cooperation possible.

Unless they're far enough apart, and inflation is strong enough, that their future light-cones never intersect. I thought you were going to talk about them using resources on acausal blackmail instead.

Also, I was traveling in May, so I just discovered this post. Have your thoughts changed since then?

Nope, I didn't get any new ideas since May. :-(

Causally distinct isn't a technical term, I just made it up on the spot. Basically, I was imagining the different AIs as existing in different Everett Branches or Tegmark universes or hypothetical scenario's or something like that. I hadn't considered the possibility of multiple AIs in the same universe.

I'm not sure if your question 3) sheds any light on the problem.

It certainly (and obviously) sheds light on the problem of "how should I change my actions?".

If you make the question one of practical action then practical actions and the consequences thereof are critical. I need to know (or have a guess given a certain level of resource expenditure) what I am going to do in such situations and what the expected outcome will be. This influences how important the solution is to find and also the expected value of spending more time creating better 'dirty heuristics'.

The Bayesian answer is to honestly tally up the indications that future success is likely, and stop if they are lacking.

So I want to ask an object-level question and a meta-level question:

1) What evidence supports the intuition that, contra game theory, single-player decision theory has a "solution"?

2) If there's not much evidence supporting that intuition, how should I change my actions?

Since we are trying to reach the conclusion "how should I change my actions?" it seems that we are missing perhaps the most important question:

0) What is the expected value to me (inclusive of altruistic values) of discovering a solution?

1) What evidence supports the intuition that, contra game theory, single-player decision theory has a "solution"?

  • How often have the findings in game theory of this level of difficulty turned out to be wrong in the past?

  • In academic circles in general how often have "can't be solved" conclusions turned out to be bogus.

  • Is game theory a subject where my intuitions have served me well in the past or is my intuition just out of its depth on the subject?

Based on asking myself those questions I wouldn't rule out finding a solution just yet. There seems to be unexplored territory in that area and I would want to examine very closely the premises, even the most 'obvious' seeming premises that make those solutions seem unsolvable.

There's a certain vibe in the air surrounding many discussions of decision theory. It sings: maybe the central insight of game theory (that multiplayer situations are not reducible to single-player ones) is wrong. Maybe the slightly-asymmetrized Prisoner's Dilemma has a single right answer. Maybe you can get a unique solution to dividing a cake by majority vote if each individual player's reasoning is "correct enough".

Could you clarify what you mean here? AFAICT, updateless/timeless decision theory does not actually dissolve the problem of strategic behavior. For instance, the cooperative solution to the one-shot PD is only stable under fairly specific conditions.

Even if you are right, it may still be worthwhile to understand how exactly the UDT/TDT approach goes wrong. After all, finding the error in his purported disproof of Cantor's theorem presumably helped Child Eliezer gain some sort of insight into basic set theory.

AFAICT, updateless/timeless decision theory does not actually dissolve the problem of strategic behavior.

It doesn't, but there seems to be a widespread hope that some more advanced decision theory will succeed at that task. Or maybe I'm misreading that hope.

I seek a better conceptual foundation that would allow talking about ethics more rigorously, for example.

Not re game theory, but re crankery: I'm still working on it. My notes leave me thinking "but why are you telling me all of this?" which is probably a bad sign. They're mostly a string of overly-specific heuristics. Not sure they get to addressing how crankery feels from the inside, let alone in a way that would help the subject. It's all a tricky one.