Suppose instead of using 2^-K(H) we just use 2^-length(H), does this do something obviously stupid?
Here's what I'm proposing:
Take a programing language with two characters. Assign each program a prior of 2^-length(program). If the program outputs some string, then P(string | program) = 1, else it equals 0. I figure there must be some reason people don't do this already, or else there's a bunch of people doing it. I'd be real happy to find out about either.
Clearly, it isn't a probability distribution, but we can still use it, no?
I don't think this is a problem. Make your programming language be prefix-free, so that every infinite string of bits begins with exactly one legal program. Then 2^-length(program) is a probability distribution over programs. (This is the same trick that one uses with the Solomonoff prior.)