New intro textbook on AIXI

I reommend the new book as a first introduction to AIXI. This book is much more readable than the previous one, with fewer dry convergence results and more recent content. Some definitions are slightly less detailed. One important difference is that the new textbook was written after the Leike's "Bad Universal Priors and Notions of Optimality," which means it is less optimistic about convergence guarantees. Work from Leike's thesis/papers has been integrated in many places, in particular in the discussion of AIXI's computability level and the grain of truth problem. I have recently submitted a game theory paper with Professor Hutter to SAGT 2024 that improves the exposition of the grain of truth problem, so watch for that if you find the section interesting. There is some work on embeddedness from Laurent Orseau - I have a different take on this than he does, but it is definitely worth a read if you are interested in A.I. safety and agent foundations. There is a little original mathematics but mostly to tie things together.

[-]Alexander Gietelink Oldenziel1y60

I have heard of AIXI but haven't looked deeply into it. I'm curious about it. What are some results you think are cool in this field ?

[-]David Quarel1y*40

AIXI isn't isn't a practically realisable model due to its incomputability, but there's nice optimality results, and it gives you an ideal model of intelligence that you can approximate (https://arxiv.org/abs/0909.0801). It uses a universal Bayesian mixture over environments, using the Solomonoff prior (in some sense the best choice of prior) to learn, (in a way you can make formal) as fast as possible, as fast as any agent possibly could. There's some recent work done on trying to build practical approximations using deep learning instead of the CTW mixture (https://arxiv.org/html/2401.14953v1).

(Sorry for the lazy formatting, I'm on a phone right now. Maybe now is the time to get around to making a website for people to link)

[-]Alex_Altair1y3-1

I think the biggest thing I like about it is that it exists! Someone tried to make a fully formalized agent model, and it worked. As mentioned above it's got some big problems, but it helps enormously to have some ground to stand on to try to build on further.

[-]Cole Wyeth9mo10

Nice things about the universal distribution underlying AIXI include:

It is one (lower semi-)computable probabilistic model that dominates in the measure-theoretic sense all other (lower semi-)computable probabilistic models. This is not possible to construct for most natural computability levels, so its neat that it works.
Unites compression and prediction through the coding theorem - though this is slightly weaker in the sequential case.
It has two very natural characterizations, either as feeding random bits to a UTM or as an explicit mixture of lower semi-computable environments.

With the full AIXI model, Professor Hutter was able to formally extend the probabilistic model to interactive environments without damaging the computability level. Conditioning and planning do damage the computability level but this is fairly well understood and not too bad.

[-]Alexander Gietelink Oldenziel9mo20

Thanks a lot!

A few followup questions :..

By computaibility level do you mean Turing degree ?

Why cant the universal distribution be constructed for most levels ?

What exactly is the coding theorem?

What do you mean by conditioning and planning damaging the computability level and why is not so bad ?

[-]Cole Wyeth9mo100

Technically the connection between the computability levels of AIT (estimability, lower/upper semi-computability, approximability) and the Turing degrees has not been worked out properly. See chapter 6 of Leike's thesis, though there is a small error in the inequalities of section 6.1.2. It is necessary to connect the computability of real valued functions (type two theory of effectivity) to the arithmetic hierarchy - as far as I know this hasn't been done, but maybe I'll share some notes in a few months.

Roughly, most classes don't have a universal distribution because they are not computably enumerable, but perhaps there are various reasons. There's a nice table in Marcus Hutter's original book, page 50.

It says that (negative log) universal probability is about the same as the (monotone) Kolmogorov complexity - in the discrete case up to a constant multiple. Basically, the Bayesian prediction is closely connected to the shortest explanation. See Li and Vitanyi's "An Introduction to Kolmogorov Complexity and its Applications."

Last question is a longer story I guess. Basically, the conditionals of the universal distribution are not lower semi-computable, and it gets even worse when you have to compare the expected values of different outcomes because of tie-breaking. But a good approximation of AIXI can still be computed in the limit.

[-]harfe1y30

I haven't watched it yet, but there is also a recent technical discussion/podcast episode about AIXI and relatedd topics with Marcus Hutter: https://www.youtube.com/watch?v=7TgOwMW_rnk

Moderation Log

Curated and popular this week

8Comments

46 New intro textbook on AIXI

by Alex_Altair

11th May 2024

1 min read

46

AIXIAI

Frontpage

46

New intro textbook on AIXI

7Cole Wyeth

6Alexander Gietelink Oldenziel

4David Quarel

3Alex_Altair

1Cole Wyeth

2Alexander Gietelink Oldenziel

10Cole Wyeth

3harfe

New Comment

8 comments, sorted by

top scoring

Click to highlight new comments since: Today at 7:12 AM

[-]Cole Wyeth1y70

[-]Alexander Gietelink Oldenziel1y60

I have heard of AIXI but haven't looked deeply into it. I'm curious about it. What are some results you think are cool in this field ?

[-]David Quarel1y*40

(Sorry for the lazy formatting, I'm on a phone right now. Maybe now is the time to get around to making a website for people to link)

[-]Alex_Altair1y3-1

[-]Cole Wyeth9mo10

Nice things about the universal distribution underlying AIXI include:

It is one (lower semi-)computable probabilistic model that dominates in the measure-theoretic sense all other (lower semi-)computable probabilistic models. This is not possible to construct for most natural computability levels, so its neat that it works.
Unites compression and prediction through the coding theorem - though this is slightly weaker in the sequential case.
It has two very natural characterizations, either as feeding random bits to a UTM or as an explicit mixture of lower semi-computable environments.

[-]Alexander Gietelink Oldenziel9mo20

Thanks a lot!

A few followup questions :..

By computaibility level do you mean Turing degree ?

Why cant the universal distribution be constructed for most levels ?

What exactly is the coding theorem?

What do you mean by conditioning and planning damaging the computability level and why is not so bad ?

[-]Cole Wyeth9mo100

[-]harfe1y30

I haven't watched it yet, but there is also a recent technical discussion/podcast episode about AIXI and relatedd topics with Marcus Hutter: https://www.youtube.com/watch?v=7TgOwMW_rnk

Moderation Log

Curated and popular this week

8Comments

LESSWRONG
LW

46

New intro textbook on AIXI

46

46

46

New intro textbook on AIXI

46

46

Basic info

Table of contents: