Eliezer_Yudkowsky comments on Open Thread: October 2009 - Less Wrong

5 Post author: gwern 01 October 2009 12:49PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (425)

You are viewing a single comment's thread. Show more comments above.

Comment author: SilasBarta 01 October 2009 02:40:38PM *  3 points [-]

I plan to develop this into a top level post, and it expands on my ideas in this comment, this comment, and the end of this comment. I'm interested in what LWers have to say about it.

Basically, I think the concept of intelligence is somewhere between a category error and a fallacy of compression. For example Marcus Hutter's AIXI purports to identify the inferences a maximally-intelligent being would make, yet it (and efficient approximations) does not have practical application. The reason (I think) is that it works by finding the shortest hypothesis that fits any data given to it. This means it makes the best inference, on average, over all conceivable worlds it could be placed in. But the No Free Lunch theorems suggest that this means it will be suboptimal compared to any algorithm tailored to any specific world. At the very least, having to be optimal for the all of the random worlds and anti-inductive worlds, should imply poor performance in this world.

The point is that I think "intelligence" can refer to two useful but very distinct attributes: 1) the ability to find the shortest hypothesis fitting the available data, and 2) having beliefs (a prior probability distribution) about one's world that are closest to (have the smallest KL divergence from) that world. (These attributes roughly correspond to what we intuit as "book smarts" and "street smarts" respectively.) A being can "win" if it does well on 2) even if it's not good at 1), since using a prior can be more advantageous than finding short hypothesis since the prior already points you to the right hypothesis.

Making something intelligent means optimizing the combination of each that it has, given your resources. What's more, no one algorithm can be generally optimal for finding the current world's probability distribution, because that would also violate the NFL theorems.

Organisms on earth have high intelligence in the second sense. Over their evolution history they had to make use of whatever regularity they could find about their environment, and the ability to use this regularity became "built in". So the history of evolution is showing the result of one approach to finding the environment's distribution (ETC), and making an intelligent being means improving upon this method, and programming it to "springboard" from that prior with intelligence in the first sense.

Thoughts?

Comment author: Eliezer_Yudkowsky 01 October 2009 04:58:53PM 3 points [-]

But the No Free Lunch theorems suggest that this means it will be suboptimal compared to any algorithm tailored to any specific world.

NFL theorems are about max-entropy worlds. Solomonoff induction works on highly lawful, simplicity-biased, low-entropy worlds.

If you could actually do Solomonoff induction, you would become at least as smart as a human baby in roughly 0 seconds (some rounding error may have occurred).

Comment author: SilasBarta 01 October 2009 05:30:22PM *  0 points [-]

NFL theorems are about max-entropy worlds. Solomonoff induction works on highly lawful, simplicity-biased, low-entropy worlds.

The same (or a similar) point applies. If you limit yourself to the set of lawful worlds and use an Occamian prior, you will start off much worse than an algorithm that implictly assumes a prior that's close to the true distribution. As Solomonoff induction works its way up through longer algorithms, it will hit some that run into an infinite loop. Even if you program a constraint that gets it past or out of these, the optimality is only present "after a long time", which, in practice, means later than we need or want the results.

If you could actually do Solomonoff induction, you would become at least as smart as a human baby in roughly 0 seconds (some rounding error may have occurred).

What else can you tell us about the implications of being able to compute uncomputable functions?

Comment author: Vladimir_Nesov 01 October 2009 05:36:24PM *  1 point [-]

As Solomonoff induction works its way up through longer algorithms, it will hit some that run into an infinite loop. Even you program a constraint that gets it past or out of these, the optimality is only present "after a long time", which, in practice, means later than we need or want the results.

You are arguing against a strawman: it's not obvious that there are no algorithms that approximate Solomonoff induction well enough in practical cases. Of course there are silly implementations that are way worse than magical oracles.

Comment author: SilasBarta 01 October 2009 05:47:37PM 0 points [-]

it's not obvious that there are no algorithms that approximate Solomonoff induction well enough in practical cases.

Right, but any such approximation works by introducing a prior about which functions it can skip over. And for such knowledge to actually speed it up, it must involve knowledge (gained separately from S/I) about the true distribution.

But at that point, you're optimizing for a narrower domain, not implementing universal intelligence. (In my naming convention, you're bringing in type 2 intelligence.)

Comment author: Vladimir_Nesov 01 October 2009 05:52:44PM *  0 points [-]

Right, but any such approximation works by introducing a prior about which functions it can skip over.

It introduces a prior, period. Not a prior about "skipping over". Universal induction doesn't have to "run" anything in a trivial manner.

And for such knowledge to actually speed it up...

You can't "speed up" an uncomputable non-algorithm.

Comment author: SilasBarta 01 October 2009 06:09:33PM *  0 points [-]

You can't "speed up" an uncomputable non-algorithm.

Okay, we're going in circles. You had just mentioned possible computable algorithms that approximate Solomonoff induction.

it's not obvious that there are no algorithms that approximate Solomonoff induction well enough in practical cases. [emphasis added -- SB]

So, we were talking about approximating algorithms. The point I was making, in response to this argument that "well, we can have working algorithms that are close enough to S/I", was that to do so, you have to bring in knowledge of the distribution gained some other way, at which point it is no longer universal. (And, in which case talk of "speeding up" is meaningful.)

Demonstrating my point that universal intelligence has its limits and must combine with intelligence in a different sense of the term.

Comment author: Vladimir_Nesov 01 October 2009 07:30:12PM *  0 points [-]

You introduce operations on the approximate algorithms (changing the algorithm by adding data), something absent from the original problem. What doesn't make sense is to compare "speed" of non-algorithmic specification with the speed of algorithmic approximations. And absent any approximate algorithms, it's also futile to compare their speed, much less propose mechanisms for their improvement that assume specific structure of these absent algorithms (if you are not serious about exploring the design space in this manner to obtain actual results).

Comment author: SilasBarta 02 October 2009 02:35:56PM 0 points [-]

You introduce operations on the approximate algorithms (changing the algorithm by adding data), something absent from the original problem.

What you call "the original problem" (pure Solomonoff induction) isn't. It's not a problem. It can't be done, so it's a moot point.

What doesn't make sense is to compare "speed" of non-algorithmic specification with the speed of algorithmic approximations

Sure it does. The uncomputable Solomonoff induction has a speed of zero. Non-halting approximations have a speed greater than zero. Sounds comparable to me for the purposes of this discussion.

And absent any approximate algorithms, it's also futile to compare their speed, much less propose mechanisms for their improvement that assume specific structure of these absent algorithms (if you are not serious about exploring the design space in this manner to obtain actual results).

There are approximate algorithms. Even Bayesian inference counts. And my point is that any time you add something to modify Solomonoff induction to make it useful is, directly or indirectly, introducing a prior unique to the search space -- cleary showing the distinctness of type 2 intelligence.

Comment author: Vladimir_Nesov 02 October 2009 03:08:22PM *  1 point [-]

To wrap up (as an alternative to not replying):

  • I don't understand why you'd continue arguing definitions about speed of Solomonoff induction or it being "the original problem". It's clear what we both mean.
  • I believe you are wrong about general statements about what needs to be done to implement approximate Solomonoff induction. Since we don't technically define in what sense this approximation has to be general, there remain possibilities for a good technical definition that preserves "generality" in an approximate implementation.