And this is just hiding the complexity, not making it simpler. Complexity isn't a function of how many words you use, cf. "The lady down the street is a witch; she did it." If we are writing a program that emits actual features of reality, rather than socially defined labels, the simplest program for green is simpler than the simplest program for grue or bleen. That you can also produce more complex programs that give the same results (defining green in terms of bleen and grue is only one such example) is both trivially true and irrelevant.

## Does Evidence Have To Be Certain?

It seems like in order to go from P(H) to P(H|E) you have to become certain that E. Am I wrong about that?

Say you have the following joint distribution:

P(H&E) = a

P(~H&E) = b

P(H&~E) = c

P(~H&~E) = d

Where a,b,c, and d, are each larger than 0.

So P(H|E) = a/(a+b). It seems like what we're doing is going from assigning ~E some positive probability to assigning it a 0 probability. Is there another way to think about it? Is there something special about evidential statements that justifies *changing *their probabilities without having updated on something else?

## Computable Universal Prior

Suppose instead of using 2^-K(H) we just use 2^-length(H), does this do something obviously stupid?

Here's what I'm proposing:

Take a programing language with two characters. Assign each program a prior of 2^-length(program). If the program outputs some string, then P(string | program) = 1, else it equals 0. I figure there must be some reason people don't do this already, or else there's a bunch of people doing it. I'd be real happy to find out about either.

Clearly, it isn't a probability distribution, but we can still use it, no?

Wait, actually, I'd like to come back to this. What programming language are we using? If it's one where either grue is primitive, or one where there are primitives that make grue easier to write than green, then true seems simpler than green. How do we pick which language we use?

Here's my problem. I thought we were looking for a way to categorize meaningful statements. I thought we had agreed that a meaningful statement must be interpretable as or consistent with at least one DAG. But now it seems that there are ways the world can be which can not be interpreted even one DAG because they require a directed cycle. SO have we now decided that a meaningful sentence must be interpretable as a directed, cyclic or acyclic, graph?

In general, if I say all and only statements that satisfy P are meaningful, then any statement that doesn't satisfy P must be meaningless, and all meaningless statements should be unobservable, and therefor a statement like "all and only statements that satisfy P are meaningful" should be unfalsifiable.

Is it true that (in all prior joint distributions where A is independent of B, but A is evidence of C, and B is evidence of C) A is none-independent of B, given C is held constant?

No, but I think it's true if A,B,C are binary. In general, if a distribution p is Markov relative to a graph G, then if something is d-separated in G, then there is a corresponding independence in p. But, importantly, the implication does not always go the other way. Distributions in which the implication always goes the other way are very special and are called *faithful*.

What is Markov relative?

Wait... this will seems stupid, but can't I just say: "there does not exist x where sx = 0"

nevermind

Here's a new strategy.

Use guess culture as a default. Use guess tricks to figure out whether other communicator speaks Ask. Use Ask tricks to figure out whether communicator speaks Tell.

Sure, we might need an oracle to figure out if a given program outputs anything at all, but we would not need to assign a probability of 1 to Fermat's last theorem (or at least I can't figure out why we would).

Fermat's Last Theorem states that no three positive integers a, b, and c can satisfy the equation a^n + b^n = c^n for any integer value of n greater than two. Consider a program that iterates over all possible values of a, b, c, n looking for counterexamples for FLT, then if it finds one, calls a subroutine that eventually prints out X (where X is your current observation). In order to do Solomonoff induction, you need to query a halting oracle on this program. But knowing whether this program halts or not is equivalent to knowing whether FLT is true or false.

Let's forget about the oracle. What about the program that outputs X only if 1 + 1 = 2, and else prints 0? Let's call it A(1,1). The formalism requires that P(X|A(1,1)) = 1, and it requires that P(A(1,1)) = 2 ^-K(A(1,1,)), but does it need to know that "1 + 1 = 2" is somehow *proven* by A(1,1) printing X?

In either case, you've shown me something that I explicitly doubted before: one can prove any provable theorem if they have access to a Solomonoff agent's distribution, and they know how to make a program that prints X iff theorem S is provable. All they have to do is check the probability the agent assigns to X conditional on that program.

View more: Next

Updating by Bayesian conditionalization does assume that you are treating E as if its probability is now 1. If you want an update rule that is consistent with maintaining uncertainty about E, one proposal is Jeffrey conditionalization. If P1 is your initial (pre-evidential) distribution, and P2 is the updated distribution, then Jeffrey conditionalization says:

P2(H) = P1(H | E) * P2(E) + P1(H | ~E) * P2(~E).

Obviously, this reduces to Bayesian conditionalization when P2(E) = 1.

*0 points [-]Yeah, the problem i have with that though is that I'm left asking: why did I change my probability in that? Is it because i updated on something else? Was I certain of that something else? If not, then why did I change my probability of that something else, and on we go down the rabbit hole of an infinite regress.