You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Adele_L comments on Open thread, July 29-August 4, 2013 - Less Wrong Discussion

3 Post author: David_Gerard 29 July 2013 10:26PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (381)

You are viewing a single comment's thread. Show more comments above.

Comment author: Transfuturist 30 July 2013 02:35:17AM *  5 points [-]

I believe I've encountered a problem with either Solomonoff induction or my understanding of Solomonoff induction. I can't post about it in Discussion, as I have less than 20 karma, and the stupid questions thread is very full (I'm not even sure if it would belong there).

I've read about SI repeatedly over the last year or so, and I think I have a fairly good understanding of it. Good enough to at least follow along with informal reasoning about it, at least. Recently I was reading Rathmanner and Hutter's paper, and Legg's paper, due to renewed interest in AIXI as the theoretical "best intelligence," and the Arcade Learning Environment used to test the computable Monte Carlo AIXI approximation. Then this problem came to me.

Solomonoff Induction uses the size of the description of the smallest Turing machine to output a given bitstring. I saw this as a problem. Say AIXI was reasoning about a fair coin. It would guess before each flip whether it would come up heads or tails. Because Turing machines are deterministic, AIXI cannot make hypotheses involving randomness. To model the fair coin, AIXI would come up with increasingly convoluted Turing machines, attempting to compress a bitstring that approaches Kolmogorov randomness as its length approaches infinity. Meanwhile, AIXI would be punished and rewarded randomly. This is not a satisfactory conclusion for a theoretical "best intelligence." So is the italicized statement a valid issue? An AI that can't delay reasoning about a problem by at least labeling it "sufficiently random, solve later" doesn't seem like a good AI, particularly in the real world where chance plays a significant part.

Naturally, Eliezer has already thought of this, and wrote about it in Occam's Razor:

The formalism of Solomonoff Induction measures the "complexity of a description" by the length of the shortest computer program which produces that description as an output. To talk about the "shortest computer program" that does something, you need to specify a space of computer programs, which requires a language and interpreter. Solomonoff Induction uses Turing machines, or rather, bitstrings that specify Turing machines. What if you don't like Turing machines? Then there's only a constant complexity penalty to design your own Universal Turing Machine that interprets whatever code you give it in whatever programming language you like. Different inductive formalisms are penalized by a worst-case constant factor relative to each other, corresponding to the size of a universal interpreter for that formalism.

In the better (IMHO) versions of Solomonoff Induction, the computer program does not produce a deterministic prediction, but assigns probabilities to strings. For example, we could write a program to explain a fair coin by writing a program that assigns equal probabilities to all 2^N strings of length N. This is Solomonoff Induction's approach to fitting the observed data. The higher the probability a program assigns to the observed data, the better that program fits the data. And probabilities must sum to 1, so for a program to better "fit" one possibility, it must steal probability mass from some other possibility which will then "fit" much more poorly. There is no superfair coin that assigns 100% probability to heads and 100% probability to tails.

Does this warrant further discussion, if at least to validate or refute this claim? I don't think Eliezer's proposal for a version of SI that assigns probabilities to strings is strong enough, it doesn't describe what form the hypotheses would take. Would hypotheses in this new description be universal nondeterministic Turing machines, with the aforementioned probability distribution summed over the nondeterministic outputs?

Comment author: Adele_L 30 July 2013 04:29:25AM *  4 points [-]

the stupid questions thread is very full

Might be worth having those more often too; the last one was very popular, and had lots of questions that open threads don't typically attract.

Because Turing machines are deterministic, AIXI cannot make hypotheses involving randomness. To model the fair coin, AIXI would come up with increasingly convoluted Turing machines, attempting to compress a bitstring that approaches Kolmogorov randomness as its length approaches infinity. Meanwhile, AIXI would be punished and rewarded randomly.

Just a naïve thought, but maybe it would come up with MWI fairly quickly because of this. (I can imagine this being a beisutsukai challenge – show a student radioactive decay, and see how long it takes them to come up with MWI.) A probabilistic one is probably better for the other reasons brought up, though.

Comment author: David_Gerard 30 July 2013 12:21:47PM 2 points [-]

the stupid questions thread is very full

Might be worth having those more often too; the last one was very popular, and had lots of questions that open threads don't typically attract.

Someone want to start one day after tomorrow? Run monthly or something? Let's see what happens.

Comment author: Transfuturist 30 July 2013 04:54:49AM *  -1 points [-]

To come up with MWI, it would have to conceive of different potentialities and then a probabilistic selection. I don't know, I'm not seeing how deterministic Turing machines could model that.

Comment author: Kawoomba 30 July 2013 08:41:46AM 2 points [-]

Do you see how a nondeterministic Turing machine could model that?

If so ... ... ...

Comment author: Mitchell_Porter 30 July 2013 07:55:47AM *  2 points [-]

Suppose in QM you have a wavefunction which recognizably evolves into a superposition of wavefunctions. I'll write that psi0, the initial wavefunction, becomes m.psi' + n.psi'', where m and n are coefficients, and psi' and psi'' are basis wavefunctions.

Something slightly analogous to the MWI interpretation of this, could be seen in a Turing machine which started with one copy of a bitstring, PSI0, and which replaced it with M copies of the bitstring PSI' and N copies of the bitstring PSI''. That would be a deterministic computation which replaces one world, the single copy of PSI0, with many worlds, the multiple copies of PSI' and PSI''.

So it's straightforward enough for a deterministic state machine to invent rules corresponding to a proliferation of worlds. In fact, in the abstract theory of computation, this is one of the standard ways to model nondeterministic computation - have a deterministic computation which deterministically produces all the possible paths that could be produced by the nondeterministic process.

However, the way that QM works, and thus the way that a MWI theory would have to work, is rather more complicated, because the coefficients are complex numbers, the probabilities (which one might suppose correspond to the number of copies of each world) are squares of the absolute values of those complex numbers, and probability waves can recombine and destructively interfere, so you would need worlds / bitstrings to be destroyed as well as created.

In particular, it seems that you couldn't reproduce QM with a setup in which the only fact about each world / bitstring was the number of current copies - you need the "phase information" (angle in the complex plane) of the complex numbers, in order to know what the interference effects are. So your Turing machine's representation of the state of the multiverse would be something like:

(complex coefficient associated with the PSI' worlds) (list of M copies of the PSI' bitstring); (complex coefficient associated with the PSI'' worlds) (list of N copies of the PSI'' bitstring) ; ...

and the "lists of copies of worlds" would all be dynamically irrelevant, since the dynamics comes solely from recombining the complex numbers at the head of each list of copies. At each timestep, the complex numbers would be recomputed, and then the appropriate number of world-copies would be entered into each list.

But although it's dynamically irrelevant, that list of copies of identical worlds is still performing a function, namely, it's there to ensure that there actually are M out of every (M+N) observers experiencing worlds of type PSI', and N out of every (M+N) observers experiencing worlds of type PSI''. If the multiverse representation was just

(complex coefficient of PSI' world) (one copy of PSI' world) ; (complex coefficient of PSI'' world) (one copy of PSI'' world) ; ...

then all those complex numbers could still evolve according to the Schrodinger equation, but you would only have one observer seeing a PSI' world, and one observer seeing a PSI'' world, and this is inconsistent with observation, where we see that some quantum events are more probable than others.

This is the well-known problem of recovering the Born probabilities, or justifying the Born probability rule - mentioned in several places in the QM Sequence - but expressed in the unusual context of bit-strings on a Turing tape.

(Incidentally, I have skipped over the further problem that QM uses continuous rather than discrete quantities, because that's not a problem of principle - you can just represent the complex numbers the way we do on real computers, to some finite degree of binary precision.)

Comment author: Kawoomba 30 July 2013 08:44:12AM 2 points [-]

... keep in mind that deterministic Turing machines can trivially simulate nondeterministic Turing machines.

Comment author: JoshuaZ 30 July 2013 12:49:06PM 1 point [-]

... keep in mind that deterministic Turing machines can trivially simulate nondeterministic Turing machines.

The problem here seems to be one of notation. You are using nondeterministic Turing machine in the formal sense of the term, where Mitchell seems to be using nondeterministic closer to "has a source of random bits."

Comment author: Transfuturist 30 July 2013 07:01:20PM 0 points [-]

Trivially? I was under the impression that it involved up to a polynomial slowdown, while probabilistic Turing machines can simulate deterministic Turing machines by merely having only a single probability of 1 for each component of its transition function.

Comment author: Kawoomba 30 July 2013 07:07:05PM 0 points [-]

Algorithmically trivially, I didn't see anyone concerned about running times.

Comment author: Transfuturist 30 July 2013 07:36:49PM 0 points [-]

Well, wouldn't that be because it's all theorizing about computational complexity?

I see the point. Pseudorandom number generators would be what you mean by simulation of nondeterminism in a DTM? Would a deterministic UTM with an RNG be sufficient for AIXI to hypothesize randomness? I still don't see how SI would be able to hypothesize Turing machines that produce bitstrings that are probabilistically similar to the bitstring it is "supposed" to replicate.