tgbrooks — LessWrong

LESSWRONG
LW

Replying toBits of Optimization Can Only Be Lost Over A Distance

Bits of Optimization Can Only Be Lost Over A Distance

I'm intrigued by these examples but I'm not sure it translates. It sounds like you are interpreting "difference of size of file in bits between reference and optimized versions" as the thing the KL divergence is measuring, but I don't think that's true. I'm assuming here that the reference is where the first step does nothing and outputs the input file unchanged (effectively just case 1). Let's explicitly assume that the input file is a randomly chosen English word.

Suppose a fourth case where our "optimizer" outputs the file "0" regardless of input. The end result is a tiny zip file. Under the "reference" condition, the original file is zipped and is still... (read more)

Replying toBits of Optimization Can Only Be Lost Over A Distance

tgbrooks4y

Bits of Optimization Can Only Be Lost Over A Distance

My problem is that A is defined as the output of the optimizer, M0 is defined as A, so P(A|ref) is central to the entire inequality. However, what is the output of an optimizer if we are without the optimizer? The given examples (Daniel's and John's) both gloss over the question of P(A|ref) and implicitly treat it as uniform over the possible choices the optimizer could have made. In the box-with-slots examples, what happens if there is no optimizer? I don't know.

In the MMO example, what is the output without a player-optimizer? I don't think it's a randomly chosen string of 10,000 bit inputs. No MMO I've ever played chooses random actions... (read more)

Replying toBits of Optimization Can Only Be Lost Over A Distance

tgbrooks4y

Bits of Optimization Can Only Be Lost Over A Distance

I think this assumes implicitly that P(A|ref) is uniformly distributed over all the 10,000 options. In a video game I‘d think more that the ”reference” is always to output 0s since the player isn’t interacting. Then The KL divergence could be arbitrarily large. But it’s not really clear in general how to interpret the reference distribution, perhaps someone can clarify?