Notes on logical priors from the MIRI workshop

18 cousin_it 15 September 2013 10:43PM

(This post contains a lot of math, and assumes some familiarity with the earlier LW posts on decision theory. Most of the ideas are by me and Paul Christiano, building on earlier ideas of Wei Dai.)

The September MIRI workshop has just finished. There were discussions of many topics, which will probably be written up by different people. In this post I'd like to describe a certain problem I brought up there and a sequence of ideas that followed. Eventually we found an idea that seems to work, and also it's interesting to tell the story of how each attempted idea led to the next one.

The problem is a variant of Wei Dai's A Problem About Bargaining and Logical Uncertainty. My particular variant is described in this comment, and was also discussed in this post. Here's a short summary:

In Counterfactual Mugging with a logical coin, a "stupid" agent that can't compute the outcome of the coinflip should agree to pay, and a "smart" agent that considers the coinflip as obvious as 1=1 should refuse to pay. But if a stupid agent is asked to write a smart agent, it will want to write an agent that will agree to pay. Therefore the smart agent who refuses to pay is reflectively inconsistent in some sense. What's the right thing to do in this case?

Thinking about that problem, it seems like the right decision theory should refuse to calculate certain things, and instead behave updatelessly with regard to some sort of "logical prior" inherited from its creators, who didn't have enough power to calculate these things. In particular, it should agree to pay in a Counterfactual Mugging with a digit of pi, but still go ahead and calculate that digit of pi if offered a straight-up bet instead of a counterfactual one.

What could such a decision theory look like? What kind of mathematical object is a "logical prior"? Perhaps it's a probability distribution over inconsistent theories that the creators didn't yet know to be inconsistent. How do we build such an object, what constraints should it satisfy, and how do we use it in a decision algorithm?

continue reading »

An argument against indirect normativity

1 cousin_it 24 July 2013 06:35PM

I think I've found a new argument, which I'll call X, against Paul Christiano's "indirect normativity" approach to FAI goals. I just discussed X with Paul, who agreed that it's serious.

This post won't describe X in detail because it's based on basilisks, which are a forbidden topic on LW, and I respect Eliezer's requests despite sometimes disagreeing with them. If you understand Paul's idea and understand basilisks, figuring out X should take you about five minutes (there's only one obvious way to combine the two ideas), so you might as well do it now. If you decide to discuss X here, please try to follow the spirit of LW policy.

In conclusion, I'd like to ask Eliezer to rethink his position on secrecy. If more LWers understood basilisks, somebody might have come up with X earlier.

"Epiphany addiction"

52 cousin_it 03 August 2012 05:52PM

LW doesn't seem to have a discussion of the article Epiphany Addiction, by Chris at succeedsocially. First paragraph:

"Epiphany Addiction" is an informal little term I came up with to describe a process that I've observed happen to people who try to work on their personal issues. How it works is that someone will be trying to solve a problem they have, say a lack of confidence around other people. Somehow they'll come across a piece of advice or a motivational snippet that will make them have an epiphany or a profound realization. This often happens when people are reading self-help materials and they come across something that stands out to them. People can also come up with epiphanies themselves if they're doing a lot of writing and reflecting in an attempt to try and analyze their problems.

I like that article because it describes a dangerous failure mode of smart people. One example was the self-help blog of Phillip Eby (pjeby), where each new post seemed to bring new amazing insights, and after a while you became jaded. An even better, though controversial, example could be Eliezer's Sequences, if you view them as a series of epiphanies about AI research that didn't lead to much tangible progress. (Please don't make that statement the sole focus of discussion!)

The underlying problem seems to be that people get a rush of power from neat-sounding realizations, and mistake that feeling for actual power. I don't know any good remedy for that, but being aware of the problem could help.

AI cooperation is already studied in academia as "program equilibrium"

35 cousin_it 30 July 2012 03:22PM

About a month ago I accidentally found out that the LW idea of quining cooperation is already studied in academia:

1) Moshe Tennenholtz's 2004 paper Program Equilibrium describes the idea of programs cooperating in the Prisoner's Dilemma by inspecting each other's source code.

2) Lance Fortnow's 2009 paper Program Equilibria and Discounted Computation Time describes an analogue of Benja Fallenstein's idea for implementing correlated play, among other things.

3) Peters and Szentes's 2012 paper Definable and Contractible Contracts studies quining cooperation over a wider class of definable (not just computable) functions.

As far as I know, academia still hasn't discovered Loebian cooperation or the subsequent ideas about formal models of UDT, but I might easily be wrong about that. In any case, the episode has given me a mini-crisis of faith, and a new appreciation of academia. That was a big part of the motivation for my previous post.

Should you try to do good work on LW?

36 cousin_it 05 July 2012 12:36PM

I used to advocate trying to do good work on LW. Now I'm not sure, let me explain why.

It's certainly true that good work stays valuable no matter where you're doing it. Unfortunately, the standards of "good work" are largely defined by where you're doing it. If you're in academia, your work is good or bad by scientific standards. If you're on LW, your work is good or bad compared to other LW posts. Internalizing that standard may harm you if you're capable of more.

When you come to a place like Project Euler and solve some problems, or come to OpenStreetMap and upload some GPS tracks, or come to academia and publish a paper, that makes you a participant and you know exactly where you stand, relative to others. But LW is not a task-focused community and is unlikely to ever become one. LW evolved from the basic activity "let's comment on something Eliezer wrote". We inherited our standard of quality from that. As a result, when someone posts their work here, that doesn't necessarily help them improve.

For example, Yvain is a great contributor to LW and has the potential to be a star writer, but it seems to me that writing on LW doesn't test his limits, compared to trying new audiences. Likewise, my own work on decision theory math would've been held to a higher standard if the primary audience were mathematicians (though I hope to remedy that). Of course there have been many examples of seemingly good work posted to LW. Homestuck fandom also has a lot of nice-looking art, but it doesn't get fandoms of its own.

In conclusion, if you want to do important work, cross-post it if you must, but don't do it for LW exclusively. Big fish in a small pond always looks kinda sad.

Bounded versions of Gödel's and Löb's theorems

32 cousin_it 27 June 2012 06:28PM

While writing up decision theory results, I had to work out bounded versions of Gödel's second incompleteness theorem and Löb's theorem. Posting the tentative proofs here to let people find any major errors before adding formality. Any comments are welcome!

Bounded Gödel's theorem

Informal meaning: if a theory T has a short proof that any proof of T's inconsistency must be long, then T is inconsistent.

Formal statement: there exists a total computable function f such that for any effectively generated theory T that includes Peano arithmetic, and for any integer n, if T proves in n symbols that "T can't prove a falsehood in f(n) symbols", then T is inconsistent.

Proof

If the theory T takes more than n symbols to describe, the condition of the theorem can't hold, so let's assume T is smaller than that.

Let's take g(n)>>n and f(n)>>exp(g(n)). For any integer n, let's define a program R that enumerates all proofs in the theory T up to length g(n), looking for either a proof that R prints something (in which case R immediately halts without printing anything), or a proof that R doesn't print anything (in which case R prints something and halts). If no proof is found, R halts without printing anything.

Assume that the theory T is consistent. Note that if the program R finds some proof, then it makes the proved statement false. Also note that the theory T can completely simulate the execution of the program R in exp(g(n)) symbols. Therefore if the program R finds any proof, the resulting contradiction in the theory T can be "exposed" using exp(g(n)) symbols. Since f(n)>>exp(g(n)), and the theory T proves in n symbols that "T can't prove a falsehood in f(n) symbols", we see that T proves in slightly more than n symbols that the program R won't find any proofs. Therefore the theory T proves in slightly more than n symbols that the program R won't print anything. Since g(n)>>n, the program R will find that proof and print something, making the theory T inconsistent. QED.

(The main idea of the proof is taken from Ron Maimon, originally due to Rosser, and adapted to the bounded case.)

Bounded Löb's theorem

Informal meaning: if having a bounded-size proof of statement S would imply the truth of S, and that fact has a short proof, then S is provable.

Formal statement: there exists a total computable function f such that for any effectively generated theory T that includes Peano arithmetic, any statement S in that theory, and any integer n, if T proves in n symbols that "if T proves S in f(n) symbols, then S is true", then T proves S.

Proof

Taking the contrapositive, the theory T proves in n symbols that "if the statement S is false, then T doesn't prove S in f(n) symbols". Therefore the theory T+¬S proves in n symbols that "T+¬S doesn't prove a falsehood in f(n) symbols" (I'm glossing over the tiny changes to n and f(n) here). Taking the function f from the bounded Gödel's theorem, we obtain that the theory T+¬S is inconsistent. That means T proves S. QED.

(The main idea of the proof is taken from Scott Aaronson, originally due to Kripke or Kreisel.)

As a sanity check, the regular versions of the theorems seem to be easy corollaries of the bounded versions. But of course there could still be errors...

Loebian cooperation, version 2

13 cousin_it 31 May 2012 06:41PM

I'm writing up some math results developed on LW as a paper with the tentative title "Self-referential decision algorithms". Something interesting came up while I was cleaning up the Loebian cooperation result. Namely, how do we say precisely that Loebian cooperation is stable under minor syntactic changes? After all, if we define a "minor change" to program A as a change that preserves A's behavior against any program B, then quining cooperation is just as stable under such "minor changes" by definition. Digging down this rabbit hole, I seem to have found a nice new reformulation of the whole thing.

I will post some sections of my current draft in the comments to this post. Eventually this material is meant to become an academic paper (hopefully), so any comments on math mistakes, notation or tone would be much appreciated! And yeah, I have no clue about academic writing, so you're welcome to tell me that too.

Should logical probabilities be updateless too?

9 cousin_it 28 March 2012 10:02AM

(This post doesn't require much math. It's very speculative and probably confused.)

Wei Dai came up with a problem that seems equivalent to a variant of Counterfactual Mugging with some added twists:

  • the coinflip is "logical", e.g. the parity of the millionth digit of pi;
  • after you receive the offer, you will have enough resources to calculate the coinflip's outcome yourself;
  • but you need to figure out the correct decision algorithm ahead of time, when you don't have these resources and are still uncertain about the coinflip's outcome.

If you give 50/50 chances now to the millionth digit of pi being even or odd, you probably want to write the decision algorithm so it agrees to pay up later even when faced with a proof that the millionth digit of pi is even. But from the decision algorithm's point of view, the situation looks more like being asked to pay up because 2+2=4. How do we resolve this tension?

One of the main selling points of TDT-style decision theories is eliminating the need for precommitment. You're supposed to always do what you would have precommitted to doing, even if it doesn't seem like a very good idea after you've done your Bayesian updates. UDT solves Counterfactual Mugging and similar problems by being "updateless", so you keep caring about possible worlds in accordance with their apriori probabilities regardless of which world you end up in.

If we take the above problem at face value, it seems to tell us that UDT should treat logical uncertainty updatelessly too, and keep caring about logically impossible worlds in accordance with their apriori logical probabilities. It seems to hint that UDT should be coded from the start with a "logical prior" over mathematical statements, which encodes the creator's arbitrary "logical degrees of caring", just like its regular prior encodes the creator's arbitrary degrees of caring over physics. Then the AI must keep following that prior forever after. But that's a very tall order. Should you really keep caring about logically impossible worlds where 2+2=5, and accept bargains that help copies of you in such worlds, even after you calculate that 2+2=4?

That conclusion is pretty startling, but consider what happens if you reject it:

  1. Precommitment can be modeled as a decision problem where an AI is asked to write a successor AI.
  2. Imagine the AI is asked to write a program P that will be faced with Counterfactual Mugging with a logical coin. The AI doesn't have enough resources to calculate the coin's outcome, but P will have as much computing power as needed. The resulting utility goes to the AI.
  3. Writing P is equivalent to supplying one bit: should P pay up if asked?
  4. Supplying that bit is equivalent to accepting or refusing the bet "win $10000 if the millionth digit of pi is odd, lose $100 if it's even".

So if your AI treats logical uncertainty similarly enough to probabilities that it can make bets on digits of pi, reflective consistency seems to force it to have an unchanging "logical prior", and keep paying up in Counterfactual Mugging even when the logical coinflip looks as obvious to the AI as 2+2=4. Is there any way to escape this conclusion? (Nesov has an idea, but I can't parse it yet.) And what could a formalization of "logical priors" possibly look like?

Common mistakes people make when thinking about decision theory

40 cousin_it 27 March 2012 08:03PM

From my experience reading and talking about decision theory on LW, it seems that many of the unproductive comments in these discussions can be attributed to a handful of common mistakes.

Mistake #1: Arguing about assumptions

The main reason why I took so long to understand Newcomb's Problem and Counterfactual Mugging was my insistence on denying the assumptions behind these puzzles. I could have saved months if I'd just said to myself, okay, is this direction of inquiry interesting when taken on its own terms?

Many assumptions seemed to be divorced from real life at first. People dismissed the study of electromagnetism as an impractical toy, and considered number theory hopelessly abstract until cryptography arrived. The only way to make intellectual progress (either individually or as a group) is to explore the implications of interesting assumptions wherever they might lead. Unfortunately people love to argue about assumptions instead of getting anything done, though they can't really judge before exploring the implications in detail.

Several smart people on LW are repeating my exact mistake about Newcomb's Problem now, and others find ways to commit the same mistake when looking at our newer ideas. It's so frustrating and uninteresting to read yet another comment saying my assumptions look unintuitive or unphysical or irrelevant to FAI or whatever. I'm not against criticism, but somehow such comments never blossom into interesting conversations, and that's reason enough to caution you against the way of thinking that causes them.

Mistake #2: Stopping when your idea seems good enough

There's a handful of ideas that decision theory newbies rediscover again and again, like pointing out indexical uncertainty as the solution to Newcomb's problem, or adding randomness to models of UDT to eliminate spurious proofs. These ideas don't work and don't lead anywhere interesting, but that's hard to notice when you just had the flash of insight and want to share it with the world.

A good strategy in such situations is to always push a little bit past the point where you have everything figured out. Take one extra step and ask yourself: "Can I make this idea precise?" What are the first few implications? What are the obvious extensions? If your result seems to contradict what's already known, work through some of the contradictions yourself. If you don't find any mistakes in your idea, you will surely find new formal things to say about your idea, which always helps.

Mistake #2A: Stopping when your idea actually is good enough

I didn't want to name any names in this post because my status on LW puts me in a kinda position of power, but there's a name I can name with a clear conscience. In 2009, Eliezer wrote:

Formally you'd use a Godelian diagonal to write (...)

Of course that's not a newbie mistake at all, but an awesome and fruitful idea! As it happens, writing out that Godelian diagonal immediately leads to all sorts of puzzling questions like "but what does it actually do? and how do we prove it?", and eventually to all the decision theory research we're doing now. Knowing Eliezer's intelligence, he probably could have preempted most of our results. Instead he just declared the problem solved. Maybe he thought he was already at 0.95 formality and that going to 1.0 would be a trivial step? I don't want to insinuate here, but IMO he made a mistake.

Since this mistake is indistinguishable from the last, the remedy for it is the same: "Can I make this idea precise?" Whenever you stake out a small area of knowledge and make it amenable to mathematical thinking, you're likely to find new math that has lasting value. When you stop because your not-quite-formal idea seems already good enough, you squander that opportunity.

...

If this post has convinced you to stop making these common mistakes, be warned that it won't necessarily make you happier. As you learn to see more clearly, the first thing you'll see will be a locked door with a sign saying "Research is hard". Though it's not very scary or heroic, mostly you just stand there feeling stupid about yourself :-)

An example of self-fulfilling spurious proofs in UDT

20 cousin_it 25 March 2012 11:47AM

Benja Fallenstein was the first to point out that spurious proofs pose a problem for UDT. Vladimir Nesov and orthonormal asked for a formalization of that intuition. In this post I will give an example of a UDT-ish agent that fails due to having a malicious proof searcher, which feeds the agent a spurious but valid proof.

The basic idea is to have an agent A that receives a proof P as input, and checks P for validity. If P is a valid proof that a certain action a is best in the current situation, then A outputs a, otherwise A tries to solve the current situation by its own means. Here's a first naive formalization, where U is the world program that returns a utility value, A is the agent program that returns an action, and P is the proof given to A:

def U():
  if A(P)==1:
    return 5
  else:
    return 10

def A(P):
  if P is a valid proof that A(P)==a implies U()==u, and A(P)!=a implies U()<=u:
    return a
  else:
    do whatever

This formalization cannot work because a proof P can never be long enough to contain statements about A(P) inside itself. To fix that problem, let's introduce a function Q that generates the proof P:

def U():
  if A(Q())==1:
    return 5
  else:
    return 10

def A(P):
  if P is a valid proof that A(Q())==a implies U()==u, and A(Q())!=a implies U()<=u:
    return a
  else:
    do whatever

In this case it's possible to write a function Q that returns a proof that makes A return the suboptimal action 1, which leads to utility 5 instead of 10. Here's how:

Let X be the statement "A(Q())==1 implies U()==5, and A(Q())!=1 implies U()<=5". Let Q be the program that enumerates all possible proofs trying to find a proof of X, and returns that proof if found. (The definitions of X and Q are mutually quined.) If X is provable at all, then Q will find that proof, and X will become true (by inspection of U and A). That reasoning is formalizable in our proof system, so the statement "if X is provable, then X" is provable. Therefore, by Löb's theorem, X is provable. So Q will find a proof of X, and A will return 1.

One possible conclusion is that a UDT agent cannot use just any proof searcher or "mathematical intuition module" that's guaranteed to return valid mathematical arguments, because valid mathematical arguments can make the agent choose arbitrary actions. The proof searchers from some previous posts were well-behaved by construction, but not all of them are.

The troubling thing is that you may end up with a badly behaved proof searcher by accident. For example, consider a variation of U that adds some long and complicated computation to the "else" branch of U, before returning 10. That increases the length of the "natural" proof that a=2 is optimal, but the spurious proof for a=1 stays about the same length as it was, because the spurious proof can just ignore the "else" branch of U. This way the spurious proof can become much shorter than the natural proof. So if (for example) your math intuition module made the innocuous design decision of first looking at actions that are likely to have shorter proofs, you may end up with a spurious proof. And as a further plot twist, if we make U return 0 rather than 10 in the long-to-compute branch, you might choose the correct action due to a spurious proof instead of the natural one.

View more: Prev | Next