Computable Universal Prior

Ronny Fernandez

Suppose instead of using 2^-K(H) we just use 2^-length(H), does this do something obviously stupid?

Here's what I'm proposing:

Take a programing language with two characters. Assign each program a prior of 2^-length(program). If the program outputs some string, then P(string | program) = 1, else it equals 0. I figure there must be some reason people don't do this already, or else there's a bunch of people doing it. I'd be real happy to find out about either.

Clearly, it isn't a probability distribution, but we can still use it, no?

Suppose instead of using 2^-K(H) we just use 2^-length(H), does this do something obviously stupid?

Here's what I'm proposing:

Clearly, it isn't a probability distribution, but we can still use it, no?

Take a programing language with two characters. Assign each program a prior of 2^-length(program). If the program outputs some string, then P(string | program) = 1, else it equals 0.

This is exactly Solomonoff induction, assuming that the programming language is prefix-free.

I figure there must be some reason people don't do this already, or else there's a bunch of people doing it. I'd be real happy to find out about either.

Because it's uncomputable: there are infinitely many programs to consider, and there are programs that don't halt and for many of them, you can't prove that they don't halt.

If you set a maximum program length and a maximum execution time, then inference becomes computable, albeit combinatorially hard: maximum a posteriori inference can be easily reduced to MAX-SAT or ILP and is therefore NP-complete (optimization version), while posterior evaluation is #P-complete.

It's known that many NP-hard problems are practically solvable for real-world instances. In this case, there are indeed people (primarily at Microsoft, I think) who managed to make it work, but only for short program lengths in limited application domains: Survey paper.

Spit balling hacks around this:

Weigh hypotheses based on how many steps it takes for them to be computed on a dovetailing of all Turing machines. This would probably put too much weight on programs that are large but fast to compute.
Weigh hypotheses on how much space it takes to compute them. So dovetail all turing machines of size up to n limited to n bits of space for at most 2^n steps. This has the nice property that the prior is, like the hypotheses, space limited (using about twice as much space as the hypothesis).
Find some quantum algorithm

... (read more)