Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: pragmatist 30 March 2016 12:19:25PM 6 points [-]

Updating by Bayesian conditionalization does assume that you are treating E as if its probability is now 1. If you want an update rule that is consistent with maintaining uncertainty about E, one proposal is Jeffrey conditionalization. If P1 is your initial (pre-evidential) distribution, and P2 is the updated distribution, then Jeffrey conditionalization says:

P2(H) = P1(H | E) * P2(E) + P1(H | ~E) * P2(~E).

Obviously, this reduces to Bayesian conditionalization when P2(E) = 1.

Comment author: potato 30 March 2016 12:49:45PM *  0 points [-]

Yeah, the problem i have with that though is that I'm left asking: why did I change my probability in that? Is it because i updated on something else? Was I certain of that something else? If not, then why did I change my probability of that something else, and on we go down the rabbit hole of an infinite regress.

Does Evidence Have To Be Certain?

0 potato 30 March 2016 10:32AM

It seems like in order to go from P(H) to P(H|E) you have to become certain that E. Am I wrong about that? 

Say you have the following joint distribution:

P(H&E) = a
P(~H&E) = b
P(H&~E) = c

P(~H&~E) = d 

Where a,b,c, and d, are each larger than 0.

So P(H|E) = a/(a+b). It seems like what we're doing is going from assigning ~E some positive probability to assigning it a 0 probability. Is there another way to think about it? Is there something special about evidential statements that justifies changing their probabilities without having updated on something else? 

Computable Universal Prior

0 potato 11 December 2015 09:54AM

Suppose instead of using 2^-K(H) we just use 2^-length(H), does this do something obviously stupid? 

Here's what I'm proposing:

Take a programing language with two characters. Assign each program a prior of 2^-length(program). If the program outputs some string, then P(string | program) = 1, else it equals 0. I figure there must be some reason people don't do this already, or else there's a bunch of people doing it. I'd be real happy to find out about either. 

Clearly, it isn't a probability distribution, but we can still use it, no? 



Comment author: dlthomas 17 November 2011 01:44:43PM 10 points [-]

And this is just hiding the complexity, not making it simpler. Complexity isn't a function of how many words you use, cf. "The lady down the street is a witch; she did it." If we are writing a program that emits actual features of reality, rather than socially defined labels, the simplest program for green is simpler than the simplest program for grue or bleen. That you can also produce more complex programs that give the same results (defining green in terms of bleen and grue is only one such example) is both trivially true and irrelevant.

Comment author: potato 08 November 2015 09:24:10PM 1 point [-]

Wait, actually, I'd like to come back to this. What programming language are we using? If it's one where either grue is primitive, or one where there are primitives that make grue easier to write than green, then true seems simpler than green. How do we pick which language we use?

In response to Causal Universes
Comment author: potato 09 October 2015 05:13:23PM 0 points [-]

Here's my problem. I thought we were looking for a way to categorize meaningful statements. I thought we had agreed that a meaningful statement must be interpretable as or consistent with at least one DAG. But now it seems that there are ways the world can be which can not be interpreted even one DAG because they require a directed cycle. SO have we now decided that a meaningful sentence must be interpretable as a directed, cyclic or acyclic, graph?

In general, if I say all and only statements that satisfy P are meaningful, then any statement that doesn't satisfy P must be meaningless, and all meaningless statements should be unobservable, and therefor a statement like "all and only statements that satisfy P are meaningful" should be unfalsifiable.

Comment author: IlyaShpitser 25 October 2012 07:03:23PM *  1 point [-]

Is it true that (in all prior joint distributions where A is independent of B, but A is evidence of C, and B is evidence of C) A is none-independent of B, given C is held constant?

No, but I think it's true if A,B,C are binary. In general, if a distribution p is Markov relative to a graph G, then if something is d-separated in G, then there is a corresponding independence in p. But, importantly, the implication does not always go the other way. Distributions in which the implication always goes the other way are very special and are called faithful.

Comment author: potato 07 October 2015 05:34:13AM 0 points [-]

What is Markov relative?

Comment author: Eliezer_Yudkowsky 10 October 2012 05:56:43AM 5 points [-]

Koan 3:

Does the idea that everything is made of causes and effects meaningfully constrain experience? Can you coherently say how reality might look, if our universe did not have the kind of structure that appears in a causal model?

Comment author: potato 07 October 2015 05:25:16AM *  0 points [-]

Does EY give his own answer to this elsewhere?

Comment author: potato 07 October 2015 03:40:04AM *  0 points [-]

Wait... this will seems stupid, but can't I just say: "there does not exist x where sx = 0"


In response to Tell Culture
Comment author: potato 04 August 2015 12:21:11AM 2 points [-]

Here's a new strategy.

Use guess culture as a default. Use guess tricks to figure out whether other communicator speaks Ask. Use Ask tricks to figure out whether communicator speaks Tell.

Comment author: Wei_Dai 03 August 2015 09:36:11PM 3 points [-]

Sure, we might need an oracle to figure out if a given program outputs anything at all, but we would not need to assign a probability of 1 to Fermat's last theorem (or at least I can't figure out why we would).

Fermat's Last Theorem states that no three positive integers a, b, and c can satisfy the equation a^n + b^n = c^n for any integer value of n greater than two. Consider a program that iterates over all possible values of a, b, c, n looking for counterexamples for FLT, then if it finds one, calls a subroutine that eventually prints out X (where X is your current observation). In order to do Solomonoff induction, you need to query a halting oracle on this program. But knowing whether this program halts or not is equivalent to knowing whether FLT is true or false.

Comment author: potato 03 August 2015 11:50:44PM *  0 points [-]

Let's forget about the oracle. What about the program that outputs X only if 1 + 1 = 2, and else prints 0? Let's call it A(1,1). The formalism requires that P(X|A(1,1)) = 1, and it requires that P(A(1,1)) = 2 ^-K(A(1,1,)), but does it need to know that "1 + 1 = 2" is somehow proven by A(1,1) printing X?

In either case, you've shown me something that I explicitly doubted before: one can prove any provable theorem if they have access to a Solomonoff agent's distribution, and they know how to make a program that prints X iff theorem S is provable. All they have to do is check the probability the agent assigns to X conditional on that program.

View more: Next