timtyler comments on Metaphilosophical Mysteries - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (255)
If you are assigning p(some empirical hypothesis) = 0, surely you are a broken system.
The example seems to be that a using a Turing machine to generate your priors somehow results in an expectation of p(uncomputable universe)=0. That idea just seems like total nonsense to me. It just doesn't follow. For all I care, my priors could have been assigned to me using a Turing machine model at birth - but I don't think p(uncomputable universe)=0. The whole line of reasoning apparently makes no sense.
The universal prior enumerates all Turing machines, not all possible priors generated by all Turing machines.
Priors are probabality estimates for uncertain quantities.
In Solomonoff induction they are probabality estimates for bitstrings - which one can think of as representing possible sensory inputs for an agent.
With a standard TM_length-based encoding, no finite bitstring is assigned a zero probability - and we won't have to worry about perceiving infinite bitstrings until after the universal heat death - so there is no problem with assigning certain bitstrings a zero prior probability.
Whether the bitstrings were created using uncomputable physics is neither here nor there. They are still just bitstrings - and so can be output by a TM with a finite program on its tape.
No, sorry. You're confused. A prior is not an assignment of credences to all bitstrings that you can observe. A prior is an assignment of credences to hypotheses, i.e. possible states of the world that generate bitstrings that you observe. Otherwise you'd find yourself in this text (see part II, "Escaping the Greek Hinterland").
No. We were talking about the universal prior. Here is how that is defined for sequences:
"The universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p."
The universal prior of a sequence is the probability of that particular sequence arising (as a prefix). It is not the probabilty of any particular hypothesis or program. Rather is a weighted sum of the probabilities of all the programs that generate that sequence.
You can talk about the probabilities of hypothesis and programs as well if you like - but the universal prior of a sequence is perfectly acceptable subject matter - and is not a "confused" idea.
No finite sequence has a probabilty of zero - according to the universal prior.
All finite bitstrings can be produced by computable means - even if they were generated as the output of an uncomputable physical process.
Is this misconception really where this whole idea arises from?
This is all true, but... Why do you think the universal prior talks about computer programs at all? If I only wanted a prior over all finite bitstrings, I'd use a simpler prior that assigned every string of length N a credence proportional to 2^-N. Except that prior has a rather major shortcoming: it doesn't help you predict the future! No matter how many bits you feed it, it always says the next bit is going to be either 0 or 1 with probability 50%. It will never get "swamped" by the data, never gravitate to any conclusions. This is why we want the universal prior to be based on computer programs instead: it will work better in practice, if the universe is in fact computable. But what happens if the universe is uncomputable? That's the substantive question here.
ETA: the last two sentences are wrong, disregard them.
Nothing much happens to intelligent agents - because an intelligent agents' original priors mostly get left behind shortly after they are born - and get replaced by evidence-based probability estimates of events happening. If convincing evidence comes in that the world is uncomputable, that just adds to the enormous existing stack of evidence they have about the actual frequencies of things.
Anyhow, priors being set to 0 or 1 is not a problem for observable sense data. No finite sense data has p assigned to 0 or 1 under the universal prior - so an agent can always update successfully - if it gets sufficient evidence that a sequence was actually produced. So, if it sees a system that apparently solves the halting problem for arbitrary programs, that is no big deal for it. It may have found a Turing oracle! Cool!
I suppose it might be possible to build an semi-intelligent agent with a particular set of priors permanently wired into it - so the agent was incapable of learning and adapting if its environment changed. Organic intelligent agents are not much like that - and I am not sure how easy it would be to build such a thing. Such agents would be incapable of adapting to an uncomputable world. They would always make bad guesses about uncomputable events. However, this seems speculative - I don't see why people would try to create such agents. They would do very badly in certain simulated worlds - where Occam's razor doesn't necessarily hold true - and it would be debatable whether their intelligence was really very "general".
The reason the universal prior is called "universal" is because given initial segments, where the infinite strings come from any computable distribution, and updating on those samples, it will, in fact, converge to the actual distribution on what the next bit should be. Now I'll admit to not actually knowing the math here, but it seems to me that if most any prior had that property, as you seem to imply, we wouldn't need to talk about a universal prior in the first place, no?
Also, if we interpret "universe" as "the actual infinite string that these segments are initial segments" of, then, well... take a look at that sum you posted and decompose it. The universal prior is basically assigning a probability to each infinite string, namely the sum of the probabilities of programs that generate it, and then collapsing that down to a distribution on initial segments in the obvious way. So if we want to consider its hypotheses about the actual law of the universe, the whole string, it will always assign 0 probability to an uncomputable sequence.
Convergence is more the result of the updates than the original prior. All the initial prior has to be to result in convergence is not completely ridiculous (1, 0, infinitessimals, etc). The idea of a good prior is that it helps initially, before an agent has any relevant experience to go on. However, that doesn't usually last for very long - real organic agents are pretty quickly flooded with information about the state of the universe, and are then typically in a much better position to make probabililty estimates. You could build agents that were very confident in their priors - and updated them slowly - but only rarely would you want an agent that was handicapped in its ability to adapt and learn.
Picking the best reference machine would be nice - but I think most people understand that for most practical applications, it doesn't matter - and that even a TM will do.
Prior determines how evidence informs your estimates, what things you can consider. In order to "replace priors with evidence-based probability estimates of events", you need a notion of event, and that is determined by your prior.
Prior evaluates, but it doesn't dictate what is being evaluated. In this case, "events happening" refers to subjective anticipation, which in turn refers to prior, but this connection is far from being straightforward.
"Determined" in the sense of "weakly influenced". The more actual data you get, the weaker the influence of the original prior becomes - and after looking at the world for a little while, your original priors become insignificant - swamped under a huge mountain of sensory data about the actual observed universe.
Priors don't really affect what things you can consider - since you can consider (and assign non-zero probability to) receiving any sensory input sequence.