Comment Permalink

gjm9y40

I'm not sure this is actually more computable than the Solomonoff prior. Now, instead of having to do a potentially infinite amount of work to find the shortest program that behaves in a particular way, you have to do a definitely infinite amount of work to find all the programs that behave in a particular way.

Could you give an example of an inference problem that's made easier by doing this?

See in context

0 Computable Universal Prior

by Ronny Fernandez

11th Dec 2015

1 min read

0

Suppose instead of using 2^-K(H) we just use 2^-length(H), does this do something obviously stupid?

Here's what I'm proposing:

Take a programing language with two characters. Assign each program a prior of 2^-length(program). If the program outputs some string, then P(string | program) = 1, else it equals 0. I figure there must be some reason people don't do this already, or else there's a bunch of people doing it. I'd be real happy to find out about either.

Clearly, it isn't a probability distribution, but we can still use it, no?

Personal Blog

0

Computable Universal Prior

New Comment

5 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:57 PM

[-]V_V9y80

Take a programing language with two characters. Assign each program a prior of 2^-length(program). If the program outputs some string, then P(string | program) = 1, else it equals 0.

This is exactly Solomonoff induction, assuming that the programming language is prefix-free.

I figure there must be some reason people don't do this already, or else there's a bunch of people doing it. I'd be real happy to find out about either.

Because it's uncomputable: there are infinitely many programs to consider, and there are programs that don't halt and for many of them, you can't prove that they don't halt.

If you set a maximum program length and a maximum execution time, then inference becomes computable, albeit combinatorially hard: maximum a posteriori inference can be easily reduced to MAX-SAT or ILP and is therefore NP-complete (optimization version), while posterior evaluation is #P-complete.

It's known that many NP-hard problems are practically solvable for real-world instances. In this case, there are indeed people (primarily at Microsoft, I think) who managed to make it work, but only for short program lengths in limited application domains: Survey paper.

[-]solipsist9y50

Spit balling hacks around this:

Weigh hypotheses based on how many steps it takes for them to be computed on a dovetailing of all Turing machines. This would probably put too much weight on programs that are large but fast to compute.
Weigh hypotheses on how much space it takes to compute them. So dovetail all turing machines of size up to n limited to n bits of space for at most 2^n steps. This has the nice property that the prior is, like the hypotheses, space limited (using about twice as much space as the hypothesis).
Find some quantum algorithm that uses n^k qubits and polynomial time to magically evaluate all programs of length of n in some semi-reasonable way. If such a beast exists (which I doubt), it has the nice property that it "considers" all reasonably sized hypotheses yet runs in polynomial space and time.
Given n bits of evidence about the universe, consider all programs of length up to k*log(n) run for at most n^k2 steps. This has the nice property that it runs in polynomial time and space.

[-]gjm9y40

Could you give an example of an inference problem that's made easier by doing this?

[-]gjm9y10

Clearly, it isn't a probability distribution

I don't think this is a problem. Make your programming language be prefix-free, so that every infinite string of bits begins with exactly one legal program. Then 2^-length(program) is a probability distribution over programs. (This is the same trick that one uses with the Solomonoff prior.)

[-]Manfred9y00

If you want a computable universal prior, forget about turing machines, because the halting problem will always get you in the end.

One option is to just say that the probability of observing a sequence of bits is equal to 2^-L. This is the maximum entropy prior, which is perfectly easy to compute. It's also useless, because it doesn't have any starting information. It's a prior without any "prior." The Solomonoff prior (which you outlined) is useful at predicting computable sequences, but at the cost of making many overconfident mistakes when the sequence is not computable / infinitely K-complex.

If you want a computable prior that's still useful, one approach might be to think of sequence-generating methods that 1) appear in the real world and 2) don't have the halting problem (maybe look into finite state machines, eh?), and then make a prior that is useful for predicting those sequences, at the cost of making overconfident mistakes when the sequence is of infinite "complexity" in that class.

Moderation Log