# endoself comments on AI reflection problem - Less Wrong Discussion

4 [deleted] 27 November 2011 06:29AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Sort By: Best

Comment author: 27 November 2011 08:45:51PM *  4 points [-]

This solution does not work. In order to implement it, you'd need your AI to preform actions that are judged to be good after conditioning on its own consistency. It could then replicate its theorem prover, but it could not replicate the decision rule that allows it to replicate the theorem prover, since it can't prove that actions that are good conditional on its consistency are good without conditioning on consistency.

Edit: I found Eliezer's response.

Comment author: 28 November 2011 01:38:25AM *  4 points [-]

Thanks for the link! Now I understand Eliezer's question a little better. Let me try to think out loud and see if I'm right...

The Gödel machine works like this: it outputs a stream of actions according to some not-very-good algorithm, and keeps looping through proofs on the side. As soon as it finds a proof that replacing its whole source code leads to higher utility (from the next turn onwards) than keeping it as is, it replaces its whole source code. There's no talk about replacing just the proof verifier.

Any such improvement is "not greedy" by Schmidhuber's "Global optimality theorem", which notes that the utility of staying with the old source code for another turn already takes into account all other possible self-improvements that the machine could do in the future. So the very first self-improvement found by the machine will likely look like an immediate jump to optimality.

The question arises, if the first self-improvement is guaranteed to be so awesome, will the machine ever find it at all, or will the proof search loop forever?

In some simple scenarios the proof search will in fact terminate. For example, if the machine has the option of pressing a lever that maxes out the reward, it can easily prove that self-modifying to always press the lever has optimal utility compared to any other self-improvement it can come up with. Note that in such simple scenarios the machine doesn't actually get to upgrade its proof verifier to a more advanced one, it just jumps to the best strategy.

There are also more complex scenarios. For example, sometimes you can improve your starting algorithm a lot but the ultimate optimal strategy is uncomputable, or maybe it's computable but the proof of optimality is out of your axiom system's reach. In such cases it's not clear to me why the machine's proof search should ever terminate, and I couldn't find a convincing answer to that in Schmidhuber's writings, but Löb's theorem seems to be only one of the many possible obstacles here.

Eliezer's question seems to be mostly about such complex scenarios. I think it's more enlightening to ask directly whether the Gödel machine will execute even one self-improvement, without dragging in Löb's theorem. This gives us a hard math question that I'm too lazy and/or stupid to answer :-(

Comment author: 28 November 2011 05:27:47AM 3 points [-]

I think it's more enlightening to ask directly whether the Gödel machine will execute even one self-improvement, without dragging in Löb's theorem. This gives us a hard math question that I'm too lazy and/or stupid to answer :-(

Well I don't think I can solve this either, at least not quickly enough to justify the use of time. The combination of Loeb's theorem and the requirement for absolute proof (it couldn't even tentatively justify the use of, say, RSA without a proof that it is unbreakable) severely limits its abilities. Rereading the paper, I noticed that, since it explicitly compares the new design with the old one, it gets around a few Loebian difficulties, but this is clearly not a solution because we end up with a weird asymmetry between agent that the Goedel machine is willing to self-modify into and agents that the Godel machine is willing to replace itself with through any mechanism other that the self-improvement module.

On a completely different note, this post made me realize how horribly our knowledge is organized. This is an open problem that Eliezer believes is critical to the creation of FAI and in order to find information on it we have to read 7 year old sl4 posts and think about aspects of papers that have clearly been thought about before by very smart people, and yet we are still left with questions for which it is reasonably likely that someone knows the answer. You're one of our decision theory people - you've been thinking about these things for years! If this was sanely organized, you would have heard about the ideas that someone already understand many times over the course of the decision theory discussions.

Comment author: 28 November 2011 05:29:29PM *  3 points [-]

After thinking about the problem some more, I no longer understand why Eliezer wants the machine to be fully meta and self-modifying. Imagine the following design:

1) An object-level program O that outputs actions into the world. Initially it just sits still and does nothing.

2) A meta-level program M that contains a prior about the world and a utility function, and slowly enumerates all possible proofs in some formal system. When M finds a proof that some object-level program O' achieves higher expected utility than O, it replaces O with O' and goes on searching. When M finds a proof that the current O is optimal, it stops.

In this setup M can't modify itself, but that doesn't matter. If it were able to self-modify, it would spend perhaps a century on finding the initial self-modification, but now it can spend the same century on making O self-modifying instead, so it seems to me that we keep most of the hypothetical speedup. Another bonus is that the combination of O and M looks much easier to reason about than a fully meta machine, e.g. M will obviously find many easy speedups for O without running into Löb or anything. Is there any reason to want a fully meta machine over this?

(Of course all such models share the drawback of AIXI: they think programs live outside the universe, so O can inadvertently destroy M's or its own hardware with its mining claws. Still, these things are interesting to figure out.)

Comment author: 28 November 2011 05:56:13PM 0 points [-]

Of course all such models share the drawback of AIXI: they think programs live outside the universe, so O can inadvertently destroy M's or its own hardware with its mining claws. Still, these things are interesting to figure out.

Wouldn't any solution to this problem enable M to reason about itself, forcing it to consider the effects of being modified by O?

Comment author: 28 November 2011 06:30:20PM 1 point [-]

Yeah. This family of questions is the most important one we don't know how to answer. Maybe Eliezer and Marcello have solved it, but they're keeping mum.

Comment author: 28 November 2011 06:48:38PM 0 points [-]

I don't think so. For one thing, Eliezer keeps talking about how important it is to solve this. I don't think his personal ethics would let him try to spread awareness of an open problem if he had actually solved it. Also, he seems to think that if the public knew the answer, it would reduce the chance of the creation of uFAI, so, unless the solution was very different than he expected - I'm thinking of it providing a huge clue toward AGI, but that sounds unlikely - he would have incentive to make it public.

Comment author: 28 November 2011 08:30:59PM *  3 points [-]

Does Eliezer keep talking about the thing I called "the drawback of AIXI"? It seems to me that he keeps talking about the "AI reflection problem", which is different. And yeah, solving any of these problems would make AGI easier without making FAI much easier.

Comment author: 28 November 2011 08:42:05PM *  0 points [-]

No, I was referring to the AI reflection problem in the grandparent.

I don't know if that would make AGI much easier. Even with a good reflective decision theory, you'd still need a more efficient framework for inference than an AIXI-style brute force algorithm. On the other hand, if you could do inference well, you might still make a working AI without solving reflection, but it would be harder to understand its goal system, making it less likely to be friendly. The lack of reflectivity could be an obstacle, but I think that it is more likely that, given a powerful inference algorithm and a lack of concern for the dangers of AI, it wouldn't be that hard to make something dangerous.

Comment author: [deleted] 27 November 2011 10:09:07PM *  1 point [-]

In that reply he says the the meta-system would need to make an assumption that the meta system works, which isn't much different from requiring that A1 proves A2 consistent before accepting it, which he mentioned in the first post briefly. That requirement of consistency is not what he originally asked for: that A1 proves that if A2 accepts P, then P, which Wei Dai does solve for the case of equivalent A1 and A2. It's true he doesn't show that such a modification is necessarily an improvement in all environments, but if Schmidhuber's usefulness means "This modification has expected utility greater than or equal to all other modifications including not modifying", as Eliezer used in his talk, then we don't require that, any more than we would require a Bayesian to always make true predictions. The proof of usefulness, like any full expected utility calculation, would require considering the probabilities and values of consequences as they follow from the AIs actions across possible worlds consistent with the AIs observations. That is a huge computation, but I don't see how it's logically impossible.

However, getting back to the separate criterion of consistency, why? Where does consist(A2) ever enter into a calculation? What do we gain by it? If A2 is inconsistent, I can see how that would usually limit the expected utility of becoming A2, but that's just not the same. "Good" shouldn't be conditional on consistency, it's just expectation of valuable consequences.

Comment author: 27 November 2011 11:12:41PM *  1 point [-]

if Schmidhuber's usefulness means "This modification has expected utility greater than or equal to all other modifications including not modifying"

Schmidhuber's Goedel machine can handle decision rules other than the expected value, but apart from that, this is correct.

That is a huge computation, but I don't see how it's logically impossible.

It is required to compute its own expected actions. The naive way of doing this collapses into infinite recursion. We can imagine more sophisticated ways that usually work, but usually is not a proof; the Goedel machine only takes expectations over possible states of the environment, not over possible truth-values of mathematical statements. It is unclear how one could preform estimates over the latter; if you have an algorithm that can estimate the probability of, say, Goldbach's conjecture, you have made quite a significant amount of progress in AGI.

Where does consist(A2) ever enter into a calculation? What do we gain by it?

Without either a proof of consist(A2) or some way of estimating the probability of consist(A2), we will not be able to calculate expected utilities.