9 AI reflection problem

by [anonymous]

27th Nov 2011

2 min read

9

I tried to write down my idea a few times, but it was badly wrong each time. Now, Instead of solving the problem, I'm just going to give a more conservative summary of what the problem is.

Eliezer's talk at the 2011 Singularity Summit focused largely on the AI reflection problem (how to build AI that can prove things about its own proofs, and execute self modifications on the basis of those proofs, without thereby reducing its self modification mojo). To that end, it would be nice to have a "reflection principle" by which an AI (or its theorem prover) can know in a self-referential way that its theorem proving activities are working as they should.

The naive way to do this is to use the standard provability predicate, ◻, which can be thought of as asking whether a proof of a given formula exists. Using this we can try to formalize our intuition that a fully reflective AI, one that can reason about itself in order to improve itself, should understand that its proof deriving behavior does in fact produce sentences derivable from its axioms:

AI ⊢ ◻P → P,

which is intended to be read as "The formal system "AI" understand in general that "If a sentence is provable then it is true" ", though literally it means something a bit more like "It is derivable from the formal system "AI" that "if there exists a proof of sentence P, then P", in general for P".

Surprisingly, attempting to add this to a formal system, like Peano Arithmetic, doesn't work so well. In particular, it was shown by Löb that adding this reflection principle in general lets us derive any statement, including contradictions.

So our nice reflection principle is broken. We don't understand reflective reasoning as well as we'd like. At this point, we can brainstorm some new reflection principles: Maybe our reflection principles should be derivable within our formal system, instead of being tacked on. Also, we can try to figure out in a deeper way why "AI ⊢ ◻P → P" doesn't work: If we can derive all sentences, does that mean that proofs of contradictions actually do exist? If so, why weren't those proofs breaking our theorem provers before we added the reflection principle? Or do the proofs for all sentences only exist for the augmented theorem prover, not for the initial formal system? That would suggest our reflection principle is allowing the AI to trust itself too much, allowing it to derive things because deriving them allows them to be derived. Though it doesn't really look like that if you just stare at the old reflection principle. We are confused. Let's unconfuse ourselves.

Personal Blog

9

New Comment

Rendering 0/27 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:11 PM

Moderation Log

Curated and popular this week

27Comments

AI reflection problem — LessWrong

9 AI reflection problem

by [anonymous]

27th Nov 2011

2 min read

9

I tried to write down my idea a few times, but it was badly wrong each time. Now, Instead of solving the problem, I'm just going to give a more conservative summary of what the problem is.

AI ⊢ ◻P → P,

Personal Blog

9

New Comment

Rendering 0/27 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:11 PM

Moderation Log

Curated and popular this week

27Comments

Comment Permalink

endoself15y40

This solution does not work. In order to implement it, you'd need your AI to preform actions that are judged to be good after conditioning on its own consistency. It could then replicate its theorem prover, but it could not replicate the decision rule that allows it to replicate the theorem prover, since it can't prove that actions that are good conditional on its consistency are good without conditioning on consistency.

Edit: I found Eliezer's response.

cousin_it15y60

Thanks for the link! Now I understand Eliezer's question a little better. Let me try to think out loud and see if I'm right...

The Gödel machine works like this: it outputs a stream of actions according to some not-very-good algorithm, and keeps looping through proofs on the side. As soon as it finds a proof that replacing its whole source code leads to higher utility (from the next turn onwards) than keeping it as is, it replaces its whole source code. There's no talk about replacing just the proof verifier.

Any such improvement is "not greedy" by... (read more)

2[anonymous]15y

In that reply he says the the meta-system would need to make an assumption that the meta system works, which isn't much different from requiring that A1 proves A2 consistent before accepting it, which he mentioned in the first post briefly. That requirement of consistency is not what he originally asked for: that A1 proves that if A2 accepts P, then P, which Wei Dai does solve for the case of equivalent A1 and A2. It's true he doesn't show that such a modification is necessarily an improvement in all environments, but if Schmidhuber's usefulness means "This modification has expected utility greater than or equal to all other modifications including not modifying", as Eliezer used in his talk, then we don't require that, any more than we would require a Bayesian to always make true predictions. The proof of usefulness, like any full expected utility calculation, would require considering the probabilities and values of consequences as they follow from the AIs actions across possible worlds consistent with the AIs observations. That is a huge computation, but I don't see how it's logically impossible. However, getting back to the separate criterion of consistency, why? Where does consist(A2) ever enter into a calculation? What do we gain by it? If A2 is inconsistent, I can see how that would usually limit the expected utility of becoming A2, but that's just not the same. "Good" shouldn't be conditional on consistency, it's just expectation of valuable consequences.

See in context