AIXI is a mathematical construct, the perfect agent that maximizes its utility function in a discrete world. Unfortunately there is no algorithm implementing it, therefore it's impossible to create in our world. It has another problem - the agent in AIXI model exists outside of the world, it's impossible for the agent to drop an anvil on its own circuits and make itself more stupid.

Modern reinforcement learning algorithms, which are the closest thing to general AI that we have, operate in similar fashion, they aren't part of the environment either. If a bot is learning how to balance on one leg, or play pong, or super mario, it can't modify itself or break its own brain.

My idea is to create environments where the bot can modify itself and break itself, so that people who want to research creation of strong AI can test solutions for the anvil problem. Here are examples of such environments:

  • A Linux operating system in a virtual machine, the bot is a program running on it. The goal is to control pong/mario/whatever and win by listening and responding on a tcp port. Or get superuser access, via invoking sudo or by finding a vulnerability. Or receive a programming problem via plaintext on a tcp port and respond with a Python program that solves it.
  • Same as the previous item, except there's an "antivirus" running in the system that kills random programs every second.
  • Gridworld, that is an environment consisting of n by m cells, and stuff happens in it. Some objects in the environment negate random bytes in bot's code or dynamic memory.
  • Same as the first item. The bot is gaining rewards based on how much RAM is free.
  • Atari Pacman, food pieces are marked with different colors and some colors slightly modify the bot's memory.

Feel free to post your thoughts and critique.

New Comment
5 comments, sorted by Click to highlight new comments since:

Have you seen "AI Safety Gridworlds", Leike et al 2017?

I haven't, thanks.

Btw was your goal to show me the link or to learn whether I have seen it before? If the former, then I don't need to respond. If the latter, then you want my response I guess.

The "Whisky and Gold" environment is particularly relevant

It was partially to point out that you can get self-modification hazards with a substantially less complex setup than your proposal with a little hand-engineering of the agents; since none of the AI safety gridworld problems could be said to be rigorously solved, there's no need for more realistic self-modification environments.

I list some relevant discussions of the "anvil problem" etc. here. In particular, Soares and Fallenstein (2014) seem to have implemented an environment in which such problems can be modeled.