Walkthrough of the Tiling Agents for Self-Modifying AI paper

15 So8res 13 December 2013 03:23AM

This is my walkthrough of Tiling Agents for Self-Modifying AI, and the Löbian Obstacle. It's meant to summarize the paper and provide a slightly-less-technical introduction to the content. I've also collected a number of typos and suggestions, which can be found at the end of the post.

Motivation

We want to be able to consider agents which build slightly better versions of themselves, which build slightly better versions of themselves, and so on. This is referred to as an agent "tiling" itself. This introduces a question: how can the parent agent trust its descendants?

We'll be dealing with logical systems, which are crisply deterministic. In such a setting, child agents can be "absolutely trustworthy": for example, if the child uses the same methods of proof and the same axioms as the parent, then the parent should be able to trust the child unconditionally: or, at least, the parent should trust the child at least as much as it trusts itself.

This turns out to be a huge problem, because modern logical systems don't trust themselves (or any logical systems of equal strength), to say nothing of (supposedly more powerful) successor systems.

This problem of trust is rooted in a theorem stated by the mathematician Löb.

Assuming that we can surmount such difficulties, there are a number of other principles that we want in our tiling agents, such as the Vingean principle (the parent shouldn't have to know exactly what the children will do) and the Naturalistic principle (agents should think of themselves as part of the world). These will be discussed in detail later in the paper.

continue reading »