In 2007, Legg and Hutter wrote a paper using the AIXI model to define a measure of intelligence. It's pretty great, but I can think of some directions of improvement.

  • Reinforcement learning. I think this term and formalism are historically from much simpler agent models which actually depended on being reinforced to learn. In its present form (Hutter 2005 section 4.1) it seems arbitrarily general, but it still feels kinda gross to me. Can we formalize AIXI and the intelligence measure in terms of utility functions, instead? And perhaps prove them equivalent?
  • Choice of Horizon. AIXI discounts the future by requiring that total future reward is bounded, and therefore so does the intelligence measure. This seems to me like a constraint that does not reflect reality, and possibly an infinitely important one. How could we remove this requirement? (Much discussion on the "Choice of the Horizon" in Hutter 2005 section 5.7).
  • Unknown utility function. When we reformulate it in terms of utility functions, let's make sure we can measure its intelligence/optimization power without having to know its utility function. Perhaps by using an average of utility functions weighted by their K-complexity.
  • AI orientation. Finally, and least importantly, it tests agents across all possible programs, even those which are known to be inconsistent with our universe. This might okay if your agent is a playing arbitrary games on a computer, but if you are trying to determine how powerful an agent will be in this universe, you probably want to replace the Solomonoff prior with the posterior resulting from updating the Solomonoff prior with data from our universe.

Any thought or research on this by others? I imagine lots of discussion has occurred over these topics; any referencing would be appreciated.

New to LessWrong?

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 7:14 AM

but if you are trying to determine how powerful an agent will be in this universe, you probably want to replace the Solomonoff prior with the posterior resulting from updating the Solomonoff prior with data from our universe.

Hence the Schmidhuber (IIRC) paper on speeding up optimal problem solvers with an initial package of encoded knowledge about the domain?

Awesome, do you know the title?

The optimal ordered problem solver OOPS will save earlier solutions as a reference for later problems, see here, for the efficacy (or lack thereof) and caveats of that approach, see here.

In general we would like our learner to continually profit from useful information conveyed by solutions to earlier tasks. (...) Storage for the first found program computing a solution to the current task becomes nonwriteable.

For a related paper on how to decide what information to save from previous tasks ("inductive transfer of knowledge from one task solution to the next"), see here.

(Side note: awsome quote I stumbled upon in the first paper:

And since constants beyond 2^500 do not even make sense within this universe, (Levin Search) may be viewed as academic exercises demonstrating that the O() notation can sometimes be practically irrelevant despite its wide use in theoretical computer science.)

Not offhand, sorry.

Concerning the last item, see this article by Ben Goertzel

"Pragmatic general intelligence measures the capability of an agent to achieve goals in environments, relative to prior distributions over goal and environment space. Efficient pragmatic general intelligence measures this same capability, but normalized by the amount of computational resources utilized in the course of the goal-achievement."

[-]V_V12y-10

Can we formalize AIXI and the intelligence measure in terms of utility functions, instead?

No. Preference utility functions are defined on world states, which are, in the general case, not completely accessible to the agent.

AIXI discounts the future by requiring that total future reward is bounded, and therefore so does the intelligence measure. This seems to me like a constraint that does not reflect reality

If cumulative reward is not bounded, you might end up comparing infinities. Anyway, I suppose they chose to have a fixed horizon rather than, say, exponential discounting, so that at any prediction step any individual predictor program needs to run only for finite time. Otherwise AIXI would be, in some sense, doubly uncomputable.

Perhaps by using an average of utility functions weighted by their K-complexity

That yields worst-case complexity.

This might okay if your agent is a playing arbitrary games on a computer, but if you are trying to determine how powerful an agent will be in this universe, you probably want to replace the Solomonoff prior with the posterior resulting from updating the Solomonoff prior with data from our universe.

That's what the agent is supposed to do by itself.