You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Stuart_Armstrong comments on Versions of AIXI can be arbitrarily stupid - Less Wrong Discussion

15 Post author: Stuart_Armstrong 10 August 2015 01:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (59)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 13 August 2015 09:58:08AM 1 point [-]

We have the universal explorer - it will figure out everything, if it survives, but it'll almost certainly kill itself.

We have the bad AIXI model above - it will survive for a long time, but is trapped in a bad epistemic state.

What would be ideal would be a way of establishing the minimal required exploration rate.

Comment author: Wei_Dai 13 August 2015 09:10:20PM *  1 point [-]

What would be ideal would be a way of establishing the minimal required exploration rate.

Do you mean a way of establishing this independent of the prior, i.e., the agent will explore at some minimum rate regardless of what prior we give it? I don't think that can be right, since the correct amount of exploration must depend on the prior. (By giving AIXI a different bad prior, we can make it explore too much instead of too little.) For example suppose there are physics theories P1 and P2 that are compatible with all observations so far, and an experiment is proposed to distinguish between them, but the experiment will destroy the universe if P1 is true. Whether or not we should do this experiment must depend on what the correct prior is, right? On the other hand, if we had the correct prior, we wouldn't need a "minimal required exploration rate". The agent would just explore/exploit optimally according to the prior.

Comment author: Stuart_Armstrong 14 August 2015 02:51:55PM 1 point [-]

In theory, changing the exploration rate and changing the prior are equivalent. I think that it might be easier to decide upon an exploration rate that gives a good result for generic priors, than to be sure that generic priors have good exploration rates. But this is just an impression.

Comment author: V_V 17 August 2015 10:26:26AM 1 point [-]

In theory, changing the exploration rate and changing the prior are equivalent.

Not really. Standard AIXI is completely deterministic, while the usual exploration strategies for reinforcement learning, such as ɛ-greedy and soft-max, are stochastic.

Comment author: Wei_Dai 23 August 2015 09:52:42PM 0 points [-]

By changing the prior, you can make an AIXI agent explore more if it receives one set of inputs and also explore less if it receives another set of inputs. You can't do this by changing an "exploration rate", unless you're using some technical definition where it's not a scalar number?

Comment author: Stuart_Armstrong 24 August 2015 10:30:28AM 0 points [-]

Given arbitrary computing power and full knowledge of the actual environment, these are equivalent. But, as you point out, in practice they're going to be different. For us, something simple like a "exploration rate" is probably more understandable for what the AIXI's actions will look like.