pivo — LessWrong

I see. And don't know the answer. I'm curious how SI fends off this one.

They're equivalent from the point of view of a consumer of the prediction, they're not equivalent from the point of view of an implementation. And since this is a discussion about how does it work, the distinction is useful.

Solomonoff induction on a random string

pivo12y10

Yes, you're right, I was sloppy. Still, the programs are exactly that much more numerous, so their weight ends up being the same in your wasteful encoding as in a sane encoding.

Solomonoff induction on a random string

pivo12y30

For posterity, the convention is to call the two models Universal/Solomonoff prior M and Universal/Levin mixture ξ, respectively.

Solomonoff induction on a random string

pivo12y00

In principle, SI is choosing fairly from a space of infinite programs. It's only practical to see some programs as finite with weight proportional to the weight of all the infinite programs this finite program can be extended into. But no program knows its length, unless it explicitly counts to when to stop.

The wasteful encoding you propose does not make a difference to SI. What the wasteful encoding does is that the arithmetic encoding programs will be 10 times longer and thus 2^10 times more penalized, but there will be 2^10 times more of them. So in the sum-over-all-programs, arithmetic coding programs will take over the direct output programs just the same as before.

Rationality Quotes April 2014

pivo12y00

No way, I pigeonhole sort.

Solomonoff induction on a random string

pivo12y00

It seems to me that the arithmetic decoding programs you mention in your first comment churn ad nauseam on their infinite compressed stream. So they don't halt and the instructions "10" and "11" won't matter. SI picks from a space of infinite programs, so the instructions can't wait until the end of the stream either.

What can happen, closest to the skew you mention I can think of, is that a program can contain code to stop arithmetic decoding after the first 100 values and output zeros from then on. This code carries a penalty which increases with the number of values it needs to count to. Which should make the weight of the program no greater than 1/n where n is the number of observed values.

Please, correct me if I'm wrong, I'm just learning.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments