timtyler comments on Metaphilosophical Mysteries - Less Wrong

35 Post author: Wei_Dai 27 July 2010 12:55AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (255)

You are viewing a single comment's thread. Show more comments above.

Comment author: timtyler 27 July 2010 10:02:12PM *  2 points [-]

The example seems to be that a using a Turing machine to generate your priors somehow results in an expectation of p(uncomputable universe)=0. That idea just seems like total nonsense to me. It just doesn't follow. For all I care, my priors could have been assigned to me using a Turing machine model at birth - but I don't think p(uncomputable universe)=0. The whole line of reasoning apparently makes no sense.

Comment author: cousin_it 28 July 2010 03:56:00AM 1 point [-]

The universal prior enumerates all Turing machines, not all possible priors generated by all Turing machines.

Comment author: timtyler 28 July 2010 07:17:47AM *  0 points [-]

Priors are probabality estimates for uncertain quantities.

In Solomonoff induction they are probabality estimates for bitstrings - which one can think of as representing possible sensory inputs for an agent.

With a standard TM_length-based encoding, no finite bitstring is assigned a zero probability - and we won't have to worry about perceiving infinite bitstrings until after the universal heat death - so there is no problem with assigning certain bitstrings a zero prior probability.

Whether the bitstrings were created using uncomputable physics is neither here nor there. They are still just bitstrings - and so can be output by a TM with a finite program on its tape.

Comment author: cousin_it 28 July 2010 08:26:07AM *  0 points [-]

No, sorry. You're confused. A prior is not an assignment of credences to all bitstrings that you can observe. A prior is an assignment of credences to hypotheses, i.e. possible states of the world that generate bitstrings that you observe. Otherwise you'd find yourself in this text (see part II, "Escaping the Greek Hinterland").

Comment author: timtyler 28 July 2010 08:59:43AM *  2 points [-]

No. We were talking about the universal prior. Here is how that is defined for sequences:

"The universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p."

The universal prior of a sequence is the probability of that particular sequence arising (as a prefix). It is not the probabilty of any particular hypothesis or program. Rather is a weighted sum of the probabilities of all the programs that generate that sequence.

You can talk about the probabilities of hypothesis and programs as well if you like - but the universal prior of a sequence is perfectly acceptable subject matter - and is not a "confused" idea.

No finite sequence has a probabilty of zero - according to the universal prior.

All finite bitstrings can be produced by computable means - even if they were generated as the output of an uncomputable physical process.

Is this misconception really where this whole idea arises from?

Comment author: cousin_it 28 July 2010 09:38:27AM *  1 point [-]

This is all true, but... Why do you think the universal prior talks about computer programs at all? If I only wanted a prior over all finite bitstrings, I'd use a simpler prior that assigned every string of length N a credence proportional to 2^-N. Except that prior has a rather major shortcoming: it doesn't help you predict the future! No matter how many bits you feed it, it always says the next bit is going to be either 0 or 1 with probability 50%. It will never get "swamped" by the data, never gravitate to any conclusions. This is why we want the universal prior to be based on computer programs instead: it will work better in practice, if the universe is in fact computable. But what happens if the universe is uncomputable? That's the substantive question here.

ETA: the last two sentences are wrong, disregard them.

Comment author: timtyler 28 July 2010 09:54:25AM *  0 points [-]

Nothing much happens to intelligent agents - because an intelligent agents' original priors mostly get left behind shortly after they are born - and get replaced by evidence-based probability estimates of events happening. If convincing evidence comes in that the world is uncomputable, that just adds to the enormous existing stack of evidence they have about the actual frequencies of things.

Anyhow, priors being set to 0 or 1 is not a problem for observable sense data. No finite sense data has p assigned to 0 or 1 under the universal prior - so an agent can always update successfully - if it gets sufficient evidence that a sequence was actually produced. So, if it sees a system that apparently solves the halting problem for arbitrary programs, that is no big deal for it. It may have found a Turing oracle! Cool!

I suppose it might be possible to build an semi-intelligent agent with a particular set of priors permanently wired into it - so the agent was incapable of learning and adapting if its environment changed. Organic intelligent agents are not much like that - and I am not sure how easy it would be to build such a thing. Such agents would be incapable of adapting to an uncomputable world. They would always make bad guesses about uncomputable events. However, this seems speculative - I don't see why people would try to create such agents. They would do very badly in certain simulated worlds - where Occam's razor doesn't necessarily hold true - and it would be debatable whether their intelligence was really very "general".

Comment author: Sniffnoy 28 July 2010 09:10:49PM 2 points [-]

The reason the universal prior is called "universal" is because given initial segments, where the infinite strings come from any computable distribution, and updating on those samples, it will, in fact, converge to the actual distribution on what the next bit should be. Now I'll admit to not actually knowing the math here, but it seems to me that if most any prior had that property, as you seem to imply, we wouldn't need to talk about a universal prior in the first place, no?

Also, if we interpret "universe" as "the actual infinite string that these segments are initial segments" of, then, well... take a look at that sum you posted and decompose it. The universal prior is basically assigning a probability to each infinite string, namely the sum of the probabilities of programs that generate it, and then collapsing that down to a distribution on initial segments in the obvious way. So if we want to consider its hypotheses about the actual law of the universe, the whole string, it will always assign 0 probability to an uncomputable sequence.

Comment author: timtyler 29 July 2010 08:57:49AM *  0 points [-]

Convergence is more the result of the updates than the original prior. All the initial prior has to be to result in convergence is not completely ridiculous (1, 0, infinitessimals, etc). The idea of a good prior is that it helps initially, before an agent has any relevant experience to go on. However, that doesn't usually last for very long - real organic agents are pretty quickly flooded with information about the state of the universe, and are then typically in a much better position to make probabililty estimates. You could build agents that were very confident in their priors - and updated them slowly - but only rarely would you want an agent that was handicapped in its ability to adapt and learn.

Picking the best reference machine would be nice - but I think most people understand that for most practical applications, it doesn't matter - and that even a TM will do.

Comment author: Sniffnoy 29 July 2010 09:45:31AM 1 point [-]

Convergence is more the result of the updates than the original prior. All the initial prior has to be to result in convergence is not completely ridiculous (1, 0, infinitessimals, etc).

Are you certain of this? Could you provide some sort of proof or reference, please, ideally together with some formalization of what you mean by "completely ridiculous"? I'll admit to not having looked up a proof of convergence for the universal prior or worked it out myself, but what you say were really the case, there wouldn't actually be be very much special about the universal prior, and this convergence property of it wouldn't be worth pointing out - so I think I have good reason to be highly skeptical of what you suggest.

However, that doesn't usually last for very long - real organic agents are pretty quickly flooded with information about the state of the universe, and are then typically in a much better position to make probabililty estimates.

Better, yes. But good enough? Arbitrarily close?

You could build agents that were very confident in their priors - and updated them slowly - but only rarely would you want an agent that was handicapped in its ability to adapt and learn.

Sorry, but what does this even mean? I don't understand how this notion of "update speed" translates into the Bayesian setting.

Comment author: andreas 28 July 2010 11:30:19AM 0 points [-]

Nothing much happens to intelligent agents - because an intelligent agents' original priors mostly get left behind shortly after they are born - and get replaced by evidence-based probability estimates of events happening.

Prior determines how evidence informs your estimates, what things you can consider. In order to "replace priors with evidence-based probability estimates of events", you need a notion of event, and that is determined by your prior.

Comment author: Vladimir_Nesov 28 July 2010 11:42:01AM 0 points [-]

Prior evaluates, but it doesn't dictate what is being evaluated. In this case, "events happening" refers to subjective anticipation, which in turn refers to prior, but this connection is far from being straightforward.

Comment author: timtyler 28 July 2010 11:37:17AM *  0 points [-]

"Determined" in the sense of "weakly influenced". The more actual data you get, the weaker the influence of the original prior becomes - and after looking at the world for a little while, your original priors become insignificant - swamped under a huge mountain of sensory data about the actual observed universe.

Priors don't really affect what things you can consider - since you can consider (and assign non-zero probability to) receiving any sensory input sequence.

Comment author: andreas 28 July 2010 11:53:07AM 0 points [-]

I use the word "prior" in the sense of priors as mathematical objects, meaning all of your starting information plus the way you learn from experience.