And this is just hiding the complexity, not making it simpler. Complexity isn't a function of how many words you use, cf. "The lady down the street is a witch; she did it." If we are writing a program that emits actual features of reality, rather than socially defined labels, the simplest program for green is simpler than the simplest program for grue or bleen. That you can also produce more complex programs that give the same results (defining green in terms of bleen and grue is only one such example) is both trivially true and irrelevant.
Wait, actually, I'd like to come back to this. What programming language are we using? If it's one where either grue is primitive, or one where there are primitives that make grue easier to write than green, then true seems simpler than green. How do we pick which language we use?
Here's my problem. I thought we were looking for a way to categorize meaningful statements. I thought we had agreed that a meaningful statement must be interpretable as or consistent with at least one DAG. But now it seems that there are ways the world can be which can not be interpreted even one DAG because they require a directed cycle. SO have we now decided that a meaningful sentence must be interpretable as a directed, cyclic or acyclic, graph?
In general, if I say all and only statements that satisfy P are meaningful, then any statement that doesn't satisfy P must be meaningless, and all meaningless statements should be unobservable, and therefor a statement like "all and only statements that satisfy P are meaningful" should be unfalsifiable.
Is it true that (in all prior joint distributions where A is independent of B, but A is evidence of C, and B is evidence of C) A is none-independent of B, given C is held constant?
No, but I think it's true if A,B,C are binary. In general, if a distribution p is Markov relative to a graph G, then if something is d-separated in G, then there is a corresponding independence in p. But, importantly, the implication does not always go the other way. Distributions in which the implication always goes the other way are very special and are called faithful.
What is Markov relative?
Wait... this will seems stupid, but can't I just say: "there does not exist x where sx = 0"
nevermind
Here's a new strategy.
Use guess culture as a default. Use guess tricks to figure out whether other communicator speaks Ask. Use Ask tricks to figure out whether communicator speaks Tell.
Sure, we might need an oracle to figure out if a given program outputs anything at all, but we would not need to assign a probability of 1 to Fermat's last theorem (or at least I can't figure out why we would).
Fermat's Last Theorem states that no three positive integers a, b, and c can satisfy the equation a^n + b^n = c^n for any integer value of n greater than two. Consider a program that iterates over all possible values of a, b, c, n looking for counterexamples for FLT, then if it finds one, calls a subroutine that eventually prints out X (where X is your current observation). In order to do Solomonoff induction, you need to query a halting oracle on this program. But knowing whether this program halts or not is equivalent to knowing whether FLT is true or false.
Let's forget about the oracle. What about the program that outputs X only if 1 + 1 = 2, and else prints 0? Let's call it A(1,1). The formalism requires that P(X|A(1,1)) = 1, and it requires that P(A(1,1)) = 2 ^-K(A(1,1,)), but does it need to know that "1 + 1 = 2" is somehow proven by A(1,1) printing X?
In either case, you've shown me something that I explicitly doubted before: one can prove any provable theorem if they have access to a Solomonoff agent's distribution, and they know how to make a program that prints X iff theorem S is provable. All they have to do is check the probability the agent assigns to X conditional on that program.
Sure, we might need an oracle to figure out if a given program outputs anything at all, but we would not need to assign a probability of 1 to Fermat's last theorem (or at least I can't figure out why we would).
Fermat's Last Theorem states that no three positive integers a, b, and c can satisfy the equation a^n + b^n = c^n for any integer value of n greater than two. Consider a program that iterates over all possible values of a, b, c, n looking for counterexamples for FLT, then if it finds one, calls a subroutine that eventually prints out X (where X is your current observation). In order to do Solomonoff induction, you need to query a halting oracle on this program. But knowing whether this program halts or not is equivalent to knowing whether FLT is true or false.
Awesome. I'm pretty sure you're right; that's the most convincing counterexample I've come across.
I have a weak doubt, but I think you can get rid of it:
let's name the program FTL()
I'm just not sure this means that the theorem itself is assigned a probability. Yes, I have an oracle, but it doesn't assign a probability to a program halting; it tells me whether it halts or not. What the Solomoff formalism requires is that "if (halts(FTL()) == true) then P(X|FTL()) = 1" and "if (halts(FTL()) == false) then P(X|FTL()) = 0" and "P(FTL()) = 2^-K(FTL())". Where in all this is the probability of Fermat's last theorem? Having an oracle may imply knowing whether or not FTL is a theorem, but it does not imply that we must assign that theorem a probability of 1. (Or maybe, it does and I'm not seeing it.)
Edit: Come to think of it... I'm not sure there's a relevant difference between knowing whether a program that outputs True iff theorem S is provable will end up halting, and assigning probability 1 to theorem S. It does seem that I must assign 1 to statements of the form "A or ~ A" or else it won't work; whereas if the theorem S is is not in the domain of our probability function, nothing seems to go wrong.
In either case, this probably isn't the standard reason for believing in, or thinking about logical omniscience because the concept of logical omniscience is probably older than Solomonoff induction. (I am of course only realizing that in hindsight; now that I've seen a powerful counter example to my argument.)
I think we'd be better off trying to find a way to express 1 + 1 = 2 as a boolean function on programs.
This goes into the "shit LW people say" collection :-)
Upvoted for cracking me up.
View more: Next
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Updating by Bayesian conditionalization does assume that you are treating E as if its probability is now 1. If you want an update rule that is consistent with maintaining uncertainty about E, one proposal is Jeffrey conditionalization. If P1 is your initial (pre-evidential) distribution, and P2 is the updated distribution, then Jeffrey conditionalization says:
P2(H) = P1(H | E) * P2(E) + P1(H | ~E) * P2(~E).
Obviously, this reduces to Bayesian conditionalization when P2(E) = 1.
Yeah, the problem i have with that though is that I'm left asking: why did I change my probability in that? Is it because i updated on something else? Was I certain of that something else? If not, then why did I change my probability of that something else, and on we go down the rabbit hole of an infinite regress.