Comment author: Shane_Legg 02 September 2008 08:45:17PM 0 points [-]

Tim:

Doesn't apply here.

Comment author: Shane_Legg 02 September 2008 05:57:56PM 1 point [-]

Vladimir:

Why would such a system have a goal to acquire more resources? You put some data in, run the algorithm that updates the probability distribution, and it then halts. I would not say that it has "goals", or a "mind". It doesn't "want" to compute more accurately, or want anything else, for that matter. It's just a really fancy version of GZIP (recall that compression = prediction) running on a thought-experiment-crazy-sized computer and quantities of data.

I accept that such a machine would be dangerous once you put people into the equation, but the machine in itself doesn't seem dangerous to me. (If you can convince me otherwise... that would be interesting)

Comment author: Shane_Legg 02 September 2008 12:37:47PM 4 points [-]

Eli:

When I try to imagine a safe oracle, what I have in mind is something much more passive and limited than what you describe.

Consider a system that simply accepts input information and integrates it into a huge probability distribution that it maintains. We can then query the oracle by simply examining this distribution. For example, we could use this distribution to estimate the probability of some event in the future conditional on some other event etc. There is nothing in the system that would cause it to "try" to get information, or develop sub-goals, or what ever. It's very basic in terms of its operation. Nevertheless, if the computer was crazy big enough and feed enough data about the world, it could be quite a powerful device for people wanting to make decisions.

It seems to be that the dangerous part here is what the people then do with it, rather than the machine itself. For example, people looking at the outputs might realise that if they just modified the machine in some small way to collect its own data then its predictions should be much better... and before you know it the machine is no longer such a passive machine.

Perhaps when Bostrom thinks about potentially "safe" oracles, he's also thinking of something much more limited than what you're attacking in this post.

In response to Brief Break
Comment author: Shane_Legg 01 September 2008 08:17:34PM 0 points [-]

Toby:

Yes, in some sense the idea of Turing computation is a kind of physical principle in that no well defined process we know of is not Turing computable (for other readers: this includes chaotic systems and quantum systems as the wave function is computable... with great difficulty in some cases).

Actually, if you built P and it really was very trivial, then I could get my simple Turing machine to compute a quantum level simulation of your P implementation with far less than 3^^^3 bits of extra information. Thus, if your bound really only kicks in at 3^^^3 bits, then (within currently accepted quantum physics) no trivial physical implementation of P can be possible.

Anyway, as you can't specify P in just a few simple states and symbols, I do not consider it to be an acceptable reference machine (for strict theory purposes at least).

In response to Brief Break
Comment author: Shane_Legg 01 September 2008 04:54:54PM 1 point [-]

Eli:

Quiet from the gallery! You're on a break remember. :-)

Yeah, it basically does come down to that. You don't get something from nothing. An ultra tiny universal machine seems to be the most something from the closest to nothing we can achieve.

In response to Brief Break
Comment author: Shane_Legg 01 September 2008 04:37:16PM 0 points [-]

Toby:

Whether you switch to something else like lambda calculus or a trivial CA doesn't really matter. These all boil down to models with a few states and transitions and as such have simple physical realisations. When you have only a few states and transitions there isn't much space to move about. This is the bedrock. It isn't absolutely unique, sure, but the space is tight enough to have little impact on Solomonoff induction.

3^^^3 is a super gigantic monster number, and all these mind boggeling many shorter programs outputting things that are complex on a minimal state Turing machine (or lambda calculus, or minimal CA, or minimal...), where are you going to put all this? You can't squeeze it into something that is as ultra trivial as the Wolfram/Smith UTM that has just 2 states and 3 symbols.

In response to Brief Break
Comment author: Shane_Legg 01 September 2008 11:42:38AM 0 points [-]

Toby:

Why not the standard approach of using Shannon's state x symbol complexity for Turing machines? If a reference machine has a very low state x symbol complexity then it is trivial to implement in our universe: we just need a few symbols, a few states, and a few transformation rules.

In response to Brief Break
Comment author: Shane_Legg 01 September 2008 09:36:57AM 0 points [-]

Tim:

What is the rationale for considering some machines and not others?

Because we want to measure the information content of the string, not some crazy complex reference machine. That's why a tiny reference machine is used. In terms of inductive inference, when you say that the bound is infinitely large, what you're saying is that you don't believe in Occam's razor. In which case the whole Bayesian system can get weird. For example, if you have an arbitrarily strong prior belief that most of the world is full of purple chickens from Andromeda galaxy, well, Bayes' rule is not going to help you much. What you want is an uninformative prior distribution, or equivalently over computable distributions, a very simple reference machine.

Thanks to the rapid convergence of the posterior from a universal prior, that 2^100 factor is small for any moderate amount of data. Just look at the bound equation.

These things are not glossed over. Read the mathematical literature on the subject, it's all there.

In response to Brief Break
Comment author: Shane_Legg 31 August 2008 11:37:29PM 2 points [-]

Eli:

Thanks. That clears things up. Enjoy your break. Maybe you should not post quite so much? You really do seem to be writing rather a lot these days. By the time I get to replying to some of your comments you've already written another 5 posts!

Tim:

Answering this question starts to feel a bit like living in the movie Groundhog Day. :-)

Usually the reference machine is taken to have a low state x symbol complexity, so you can't hide much in it. In other words the reference machine has to be in some sense simple.

Now look at the Kolmogorov complexity function. As you mention, if somebody else uses a different reference machine their measured Kolmogorov complexity will be different, where the maximal difference is bounded by some constant. How big is this bound? Pretty small. Many types of simple Turing machines have been shown to be able to simulate each other with a few hundred bits of input. It's also trivial for a serial machine to simulate a parallel machine. Remember that Kolmogorov complexity is a measure of information content, in the sense of shortest description. It's not a measure of how much computation was performed. There are other measures for that which are more complex...

Finally, look at the Solomonoff bound (bottom of page 38). There you see the Kolmogorov complexity of the true model of the environment. If you're using a different simple reference machine, this bound might go up by a few hundred bits. Is this a big deal? Well, yes: if you are using only a few bytes of input data, and that's all. But then Bayesian inference in general will have problems as your prior will strongly affect your posterior. The reference machine problem is the same issue, but in different language. However, what if you have more data, say a few kB or more, maybe much much more? In this case taking a few bytes longer to converge isn't really a bit deal. Especially considering that this is convergence for *any* unknown computable hypothesis. Say you want to predict the stock market, or results from particle physics experiments, or sentences in a book. Are a few bytes of extra data for the convergence bound going to make much difference? Not really. The Solomonoff predictor is still going to kick some serious butt.

Of course it's not computable, has to be approximated in practice... etc. etc. So why bother with all this? I see it as a kind of "mathematical philosophy". You take ideas about induction, learning, computation etc. and really nail them down hard in formal mathematics and then study what you've got. I think this gives you some insights into the nature of learning, intelligence etc. Of course, this is a rather subjective point. My own AGI project that I'm developing with some of my research buddies isn't directly based on Solomonoff Induction and AIXI, but we do draw on some related works (such as the universal intelligence measure suitably computably interpreted) and I do sometimes use AIXI as a kind of mental framework to think about some kinds of AGI design issues.

In response to Brief Break
Comment author: Shane_Legg 31 August 2008 08:40:41PM 0 points [-]

Joshua and Nick:

Eli described AIXI itself as "awfully stupid" in a post here two months ago.

View more: Prev | Next