Common mistakes people make when thinking about decision theory

cousin_it

From my experience reading and talking about decision theory on LW, it seems that many of the unproductive comments in these discussions can be attributed to a handful of common mistakes.

Mistake #1: Arguing about assumptions

The main reason why I took so long to understand Newcomb's Problem and Counterfactual Mugging was my insistence on denying the assumptions behind these puzzles. I could have saved months if I'd just said to myself, okay, is this direction of inquiry interesting when taken on its own terms?

Many assumptions seemed to be divorced from real life at first. People dismissed the study of electromagnetism as an impractical toy, and considered number theory hopelessly abstract until cryptography arrived. The only way to make intellectual progress (either individually or as a group) is to explore the implications of interesting assumptions wherever they might lead. Unfortunately people love to argue about assumptions instead of getting anything done, though they can't really judge before exploring the implications in detail.

Several smart people on LW are repeating my exact mistake about Newcomb's Problem now, and others find ways to commit the same mistake when looking at our newer ideas. It's so frustrating and uninteresting to read yet another comment saying my assumptions look unintuitive or unphysical or irrelevant to FAI or whatever. I'm not against criticism, but somehow such comments never blossom into interesting conversations, and that's reason enough to caution you against the way of thinking that causes them.

Mistake #2: Stopping when your idea seems good enough

There's a handful of ideas that decision theory newbies rediscover again and again, like pointing out indexical uncertainty as the solution to Newcomb's problem, or adding randomness to models of UDT to eliminate spurious proofs. These ideas don't work and don't lead anywhere interesting, but that's hard to notice when you just had the flash of insight and want to share it with the world.

A good strategy in such situations is to always push a little bit past the point where you have everything figured out. Take one extra step and ask yourself: "Can I make this idea precise?" What are the first few implications? What are the obvious extensions? If your result seems to contradict what's already known, work through some of the contradictions yourself. If you don't find any mistakes in your idea, you will surely find new formal things to say about your idea, which always helps.

Mistake #2A: Stopping when your idea actually is good enough

I didn't want to name any names in this post because my status on LW puts me in a kinda position of power, but there's a name I can name with a clear conscience. In 2009, Eliezer wrote:

Formally you'd use a Godelian diagonal to write (...)

Of course that's not a newbie mistake at all, but an awesome and fruitful idea! As it happens, writing out that Godelian diagonal immediately leads to all sorts of puzzling questions like "but what does it actually do? and how do we prove it?", and eventually to all the decision theory research we're doing now. Knowing Eliezer's intelligence, he probably could have preempted most of our results. Instead he just declared the problem solved. Maybe he thought he was already at 0.95 formality and that going to 1.0 would be a trivial step? I don't want to insinuate here, but IMO he made a mistake.

Since this mistake is indistinguishable from the last, the remedy for it is the same: "Can I make this idea precise?" Whenever you stake out a small area of knowledge and make it amenable to mathematical thinking, you're likely to find new math that has lasting value. When you stop because your not-quite-formal idea seems already good enough, you squander that opportunity.

...

If this post has convinced you to stop making these common mistakes, be warned that it won't necessarily make you happier. As you learn to see more clearly, the first thing you'll see will be a locked door with a sign saying "Research is hard". Though it's not very scary or heroic, mostly you just stand there feeling stupid about yourself :-)

From my experience reading and talking about decision theory on LW, it seems that many of the unproductive comments in these discussions can be attributed to a handful of common mistakes.

Mistake #1: Arguing about assumptions

Mistake #2: Stopping when your idea seems good enough

Mistake #2A: Stopping when your idea actually is good enough

I didn't want to name any names in this post because my status on LW puts me in a kinda position of power, but there's a name I can name with a clear conscience. In 2009, Eliezer wrote:

Formally you'd use a Godelian diagonal to write (...)

...

1) Yes, the solution should be an agent program. It can't be something as simple as "return 1", because when I talk about solving the LPP, there's an implicit desire to have a single agent that solves all problems similar enough to the LPP, for example the version where the agent's actions 1 and 2 are switched, or where the agent's source code has some extra whitespace and comments compared to its own quined representation, etc.

2) We imagine the world to be a program with no arguments that returns a utility value, and the agent to be a subprogram within the world program. Even though the return value of an argumentless program is just a constant, the agent can still try to "maximize that constant", if the agent is ignorant about it in just the right way. For example, if the world program calls the agent program and then returns 0 or 1 depending on whether the agent's return value was even or odd, and the agent can prove a theorem to that effect by looking at the world program's source code, then it makes sense for the agent to return an odd value.

Newcomb's Problem can be formalized as a world program that makes two calls to the agent (or maybe one call to the agent and another call to something provably equivalent). The first call's return value is used to set the contents of the boxes, and the second one represents the agent's actual decision. If a smart enough agent receives the world's source code as an argument (which includes possibly mangled versions of the agent's source code inside), and the agent knows its own source code by quining), then the agent can prove a theorem saying that one-boxing would logically imply higher utility than two-boxing. That setting is explored in a little more detail here.

Before you ask: no, we don't know any rigorous definition of what it means to "maximize" the return value of an argumentless program in general. We're still fumbling with isolated cases, hoping to find more understanding. I'm only marginally less confused than you about the whole field.

3) You can think about an agent as a program that receives the world's source code as an argument, so that one agent can solve many possible world programs. I usually talk about agents as if they were argumentless functions that had access to the world's source code via quining, but that's just to avoid cluttering up the proofs. The results are the same either way.

4) Usually you can assume that S is just Peano arithmetic. You can represent programs by their Gödel numbers, and write a PA predicate saying "program X returns integer Y". You can also represent statements and proofs in PA by their Gödel numbers, and write a PA predicate saying "proof X is a valid proof of statement Y". You can implement both these predicates in your favorite programming language, as functions that accept two integers and return a boolean. You can have statements referring to these predicates by their Gödel numbers. The diagonal lemma gives you a generalized way to make statements refer to themselves, and quining allows you to have programs that refer to their own source code. You can have proofs that talk about programs, programs that enumerate and check proofs, and generally go wild.

For example, you can write a program P that enumerates all possible proofs trying to find a valid proof that P itself returns 1, and returns 1 if such a proof is found. To prove that P will in fact return 1 and not loop forever, note that it's just a restatement of Löb's theorem. That setting is explored in a little more detail here.

Please let me know if the above makes sense to you!

Thank you. Your comment resolved some of my confusion. While I didn't understand it entirely, I am happy to have accrued a long list of relevant background reading.

67

Common mistakes people make when thinking about decision theory

67

Mistake #1: Arguing about assumptions

Mistake #2: Stopping when your idea seems good enough

Mistake #2A: Stopping when your idea actually is good enough

67

67

Common mistakes people make when thinking about decision theory

67

Mistake #1: Arguing about assumptions

Mistake #2: Stopping when your idea seems good enough

Mistake #2A: Stopping when your idea actually is good enough

67