Tiling Agents for Self-Modifying AI (OPFAI #2)

Eliezer Yudkowsky

88 Tiling Agents for Self-Modifying AI (OPFAI #2)

6th Jun 2013

3 min read

88

An early draft of publication #2 in the Open Problems in Friendly AI series is now available: Tiling Agents for Self-Modifying AI, and the Lobian Obstacle. ~20,000 words, aimed at mathematicians or the highly mathematically literate. The research reported on was conducted by Yudkowsky and Herreshoff, substantially refined at the November 2012 MIRI Workshop with Mihaly Barasz and Paul Christiano, and refined further at the April 2013 MIRI Workshop.

Abstract:

We model self-modication in AI by introducing 'tiling' agents whose decision systems will approve the construction of highly similar agents, creating a repeating pattern (including similarity of the offspring's goals). Constructing a formalism in the most straightforward way produces a Godelian difficulty, the Lobian obstacle. By technical methods we demonstrate the possibility of avoiding this obstacle, but the underlying puzzles of rational coherence are thus only partially addressed. We extend the formalism to partially unknown deterministic environments, and show a very crude extension to probabilistic environments and expected utility; but the problem of finding a fundamental decision criterion for self-modifying probabilistic agents remains open.

Commenting here is the preferred venue for discussion of the paper. This is an early draft and has not been reviewed, so it may contain mathematical errors, and reporting of these will be much appreciated.

The overall agenda of the paper is introduce the conceptual notion of a self-reproducing decision pattern which includes reproduction of the goal or utility function, by exposing a particular possible problem with a tiling logical decision pattern and coming up with some partial technical solutions. This then makes it conceptually much clearer to point out the even deeper problems with "We can't yet describe a probabilistic way to do this because of non-monotonicity" and "We don't have a good bounded way to do this because maximization is impossible, satisficing is too weak and Schmidhuber's swapping criterion is underspecified." The paper uses first-order logic (FOL) because FOL has a lot of useful standard machinery for reflection which we can then invoke; in real life, FOL is of course a poor representational fit to most real-world environments outside a human-constructed computer chip with thermodynamically expensive crisp variable states.

As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip. This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals). Mathematical proofs have the property that they are as strong as their axioms and have no significant conditionally independent per-step failure probability if their axioms are semantically true, which suggests that something like mathematical reasoning may be appropriate for certain particular types of self-modification during some developmental stages.

Thus the content of the paper is very far off from how a realistic AI would work, but conversely, if you can't even answer the kinds of simple problems posed within the paper (both those we partially solve and those we only pose) then you must be very far off from being able to build a stable self-modifying AI. Being able to say how to build a theoretical device that would play perfect chess given infinite computing power, is very far off from the ability to build Deep Blue. However, if you can't even say how to play perfect chess given infinite computing power, you are confused about the rules of the chess or the structure of chess-playing computation in a way that would make it entirely hopeless for you to figure out how to build a bounded chess-player. Thus "In real life we're always bounded" is no excuse for not being able to solve the much simpler unbounded form of the problem, and being able to describe the infinite chess-player would be substantial and useful conceptual progress compared to not being able to do that. We can't be absolutely certain that an analogous situation holds between solving the challenges posed in the paper, and realistic self-modifying AIs with stable goal systems, but every line of investigation has to start somewhere.

Parts of the paper will be easier to understand if you've read Highly Advanced Epistemology 101 For Beginners including the parts on correspondence theories of truth (relevant to section 6) and model-theoretic semantics of logic (relevant to 3, 4, and 6), and there are footnotes intended to make the paper somewhat more accessible than usual, but the paper is still essentially aimed at mathematically sophisticated readers.

Tiling AgentsAcademic PapersLogic & Mathematics Open ProblemsRobust AgentsAI

Frontpage

88

New Comment

Rendering 0/259 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 4:16 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

88 Tiling Agents for Self-Modifying AI (OPFAI #2)

by Eliezer Yudkowsky

6th Jun 2013

3 min read

259

88

Abstract:

Tiling AgentsAcademic PapersLogic & Mathematics Open ProblemsRobust AgentsAI

Frontpage

88

Mentioned in

73Results from MIRI's December workshop

52Start Under the Streetlight, then Push into the Shadows

38MIRI's 2013 Summer Matching Challenge

32Southern California FAI Workshop

30Proper value learning through indifference

Load More (5/7)

New Comment

Rendering 0/259 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 4:16 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from Eliezer Yudkowsky

Curated and popular this week

259Comments

259

Comment Permalink

Eliezer Yudkowsky13y40

Hm. I'm not sure if Scott Aaronson has any weird views on AI in particular, but if he's basically mainstream-oriented we could potentially ask him to briefly skim the Tiling Agents paper and say if it's roughly the sort of paper that it's reasonable for an organization like MIRI to be working on if they want to get some work started on FAI. At the very least if he disagreed I'd expect he'd do so in a way I'd have better luck engaging conversationally, or if not then I'd have two votes for 'please explore this issue' rather than one.

I feel again like you're trying to interpret the paper according to a different purpose from what it has. Like, I suspect that if you described what you thought a promising AGI research agenda was supposed to deliver on what sort of timescale, I'd say, "This paper isn't supposed to do that."

No, it's clear that there have been many advances, for example in chess playing programs, auto-complete search technology, automated translation, driverless cars, and speech recognition.

But my impression is that this work has only made a small dent in the problem of general artificial intelligence.

This part is clearer and I think I may have a better idea of where you're coming from, i.e., you really do think the entire field of AI hasn't come any closer to AGI, in which case it's much less surprising that you don't think the Tiling Agents paper is the very first paper ever to come closer to AGI. But this sounds like a conversation that someone else could have with you, because it's not MIRI-specific or FAI-specific. I also feel somewhat at a loss for where to proceed if I can't say "But just look at the ideas behind Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, that's obviously important conceptual progress because..." In other words, you see AI doing a bunch of things, we already mostly agree on what these sorts of surface real-world capabilities are, but after checking with some friends you've concluded that this doesn't mean we're less confused about AGI then we were in 1955. I don't see how I can realistically address that except by persuading your authorities; I don't see what kind of conversation we could have about that directly without being able to talk about specific AI things.

Meanwhile, if you specify "I'm not convinced that MIRI's paper has a good chance of being relevant to FAI, but only for the same reasons I'm not convinced any other AI work done in the last 60 years is relevant to FAI" then this will make it clear to everyone where you're coming from on this issue.

JonahS13y30

Hm. I'm not sure if Scott Aaronson has any weird views on AI in particular, but if he's basically mainstream-oriented we could potentially ask him to briefly skim the Tiling Agents paper and say if it's roughly the sort of paper that it's reasonable for an organization like MIRI to be working on if they want to get some work started on FAI.

Yes, I would welcome his perspective on this.

I feel again like you're trying to interpret the paper according to a different purpose from what it has. Like, I suspect that if you described what you thought a promisi

... (read more)

8Shmi13y

He wrote this about a year ago: [...] And later: [...]

See in context