Suppose instead of using 2^-K(H) we just use 2^-length(H), does this do something obviously stupid?
Here's what I'm proposing:
Take a programing language with two characters. Assign each program a prior of 2^-length(program). If the program outputs some string, then P(string | program) = 1, else it equals 0. I figure there must be some reason people don't do this already, or else there's a bunch of people doing it. I'd be real happy to find out about either.
Clearly, it isn't a probability distribution, but we can still use it, no?
I'm not sure this is actually more computable than the Solomonoff prior. Now, instead of having to do a potentially infinite amount of work to find the shortest program that behaves in a particular way, you have to do a definitely infinite amount of work to find all the programs that behave in a particular way.
Could you give an example of an inference problem that's made easier by doing this?