Follow up to  Efficient Cross-domain Optimization

When I am skeptical that we will ever understand intelligence, I am skeptical that we will ever be able to reliably map a systems description onto its optimization power. This has implications for how well we will create intelligences and how well intelligences will be at self-improving.

Obviously we can't predict the effectiveness of an arbitrary program, due to rice's theorem and intelligence being a non-trivial property. So the best we can hope for is predicting the effectiveness of a set of programs. Is such a function possible? This is my take on the subject.

Let  o ( p ) be a function that maps a program p to its optimization power.

Mu, Omegas younger brother has a challenge for you, you get to design a system and put it in a box with 20 red and 20 green balls, it will activate itself after 10 minutes and then have the goal of removing as many red balls from the box as possible in 10 minutes. You have to decide how whether it is going to remove more or less than 5 red balls from the box. You get transported to a nirvana if you predict correctly and your world gets turned into paper clips if you get it wrong.

You whip out your trusty and make a program and the evaluate it using o  and bet according to its evaluation.

Unknown to you Mu also has a copy of your o  and runs it on the systems you put in the box. Those that return a high value from the optimization power measure, it destroys before they activate, those that have a low effectiveness it performs their goals for them. In the second case it is still p that causes the goal to be fulfilled as if p were different there would be a different amount that the goal is fulfilled. You can see it as inspiring pity in someone else to make them help, who would not have done otherwise. It is still winning.

So Mu forces o to be wrong, so was not the reliable predictor of a set of programs optimization power we had hoped for, so we have a contradiction. Is there anyway to salvage it? You could make your effectiveness measure depend upon the environment e as well, however that does not remove the potential for self-reference as o is part of the environment.  So we might be able to rescue o by constraining the environment to not have any reference to o in. However we don't control the environment nor do we have perfect knowledge of the environment, so we don't know when it has references to o in or not, or when it is reliable.

You could make try and make it so that  Mu could have no impact on what p  does. Which is the same as trying to make the system indestructible, but with a reversible physics what is created can be destroyed.

So where do we go from here?

 

New Comment
17 comments, sorted by Click to highlight new comments since:

Let me see if I understand this. Basically, you're making a hypothetical in which, if you do something right, a quasi-omniscient being intervenes and makes it not work. If you do something wrong, it lets that happen. Please correct me if I'm wrong, but if this is even close to what you're saying, it seems quite pointless as a hypothetical.

Let us say I am an AI, I want to replace myself with a new program, I want to be sure that this new program will perform the tasks I want done better than me (including creating new better copies of myself), I need to be able to predict how good a program is.

I don't want to have to run it and see as I will have to run it for a very long time to see if the program is better than me a long time in the future. So I want a proof that the new program is better than me. Are such proofs possible? My argument is that they are not if you cannot constrain the environment and make it not reference your proof.

Honestly, this is basically a problem with most problems involving Omega etc.

The problems of this kind should be stated clearly, examining a specific aspect of e.g. decision theories. In this particular case, "doing something right" or "optimization power" etc. is too vague to work with.

If you have a problem with optimization power take it up with the person that coined it.

What ever it is that you expect to go up and up and up during in a FOOMing RSI, that is the measure I want to capture and explore. Can you tell me about your hopefully less vague measure?

Make explicit what you expect from a measure of 'optimization power' and it will be easier to judge whether your criticism applies. If your measure is an expectation, e.g. the expected probability of achieving a goal for some distribution on goals and environments, then your story does not show that such a measure is unreliable overall.

I'll add a set goal. I expect that a measure of optimization power would tell us how well a system would do in physically plausible environments. We can't control or know the environment in more detail than that so we cannot rule out forced self-reference.

By "how well a system would do in physically plausible environments", do you mean on average, worst case or something else?

Check out the link to efficient optimization at the top of the post, he never said anything about running statistics when measuring intelligence.

What we want is something that will enable a program to judge the quality of another so it will always be better when it picks a new program to rewrite to, so it can be take over the galaxy. Feel free to pick holes in this notion if you want, it is not mine.

I might rewrite the post to have the notion of strict ordering. A system is better than another if will do no worse under all physically plausible environments and is better in some. Then Mu would allow you to setup two programs that are differently ordered and ask you to bet on their ordering.

You really don't want to do statistics on the first seed AI if it is possible.

Then you're arguing that, if your notion of "physically plausible environments" includes a certain class of adversely optimized situations, worst-case analysis won't work because all worst cases are equally bad.

They could all be vaporised by a near super nova or something similar before they have a chance to do anything, yup.

Rice's theorem says nothing about the impossibility to show properties of programs in special cases. It only shows that there are always some programs which will thwart your decision procedure. But your possible AI does not need to be one of them.

So where do we go? We go kicking Mu out of the picture:

For the most interesting aspect of an intelligent system is whether it is efficient in the very world it will occupy. So a reasonable test of it's intelligence is to run it in a detailed simulation of said world. Of course, if your simulated world and reality differ (for example in the existence of Mu), the measure is pretty much worthless.

However, decent simulations of our world can be built.

Also, the solution to your puzzle is to copy the world (including Mu) into a new Matrix, testing whether the program reaches it's goal, and using that as your efficiency measure. That way Mu cannot test the program's efficiency either, and you get a fair chance.

Suppose I disallow Mu from examining the source code of either p or o. It can examine the behaviors in advance of their execution, but not the source code. And now suppose that if Mu is allowed to know this much, then I am allowed to know the same about Mu. Then what?

Optimality through obscurity?

If we are going to allow you a chance to figure out the behavior of Mu, Mu should be given the chance to find out the behavior of Eliezer(what programs you are likely to produce etc). Only then would information parity be preserved.

Mu is standing in for the entire world, your system is a small bit of it, it is entirely reasonable to expect the world to know more about your system than you do about the behavior of the would. I'm not sure where you are taking this idea, but it is unrealistic in my view.

I design a system p that has high optimising power, but which o is not smart enough to prove does so. Actually, I might get away with obfuscating p enough to confuse not o, but Mu's simulation of o, but my simulation of Mu simulating o might be imperfect.

Mu wrongly concludes that p is a poor optimiser, and performs the optimisation it thinks p incapable of. Meanwhile, I have bet against the verdict of o and win.

But I agree with Psychohistorian's comment and am not clear what the point of all this is.

Mu represents supernatural in this thought experiment. There is no Mu in the real world, and so predictability of the outcomes isn't antagonistically compromised.

Anyone can get a hold of source code and your evaluation function if you leave them on your hard disk. No super natural required, just a bit of breaking and entering.