Let me see if I understand this. Basically, you're making a hypothetical in which, if you do something right, a quasi-omniscient being intervenes and makes it not work. If you do something wrong, it lets that happen. Please correct me if I'm wrong, but if this is even close to what you're saying, it seems quite pointless as a hypothetical.
Let us say I am an AI, I want to replace myself with a new program, I want to be sure that this new program will perform the tasks I want done better than me (including creating new better copies of myself), I need to be able to predict how good a program is.
I don't want to have to run it and see as I will have to run it for a very long time to see if the program is better than me a long time in the future. So I want a proof that the new program is better than me. Are such proofs possible? My argument is that they are not if you cannot constrain the environment and make it not reference your proof.
Make explicit what you expect from a measure of 'optimization power' and it will be easier to judge whether your criticism applies. If your measure is an expectation, e.g. the expected probability of achieving a goal for some distribution on goals and environments, then your story does not show that such a measure is unreliable overall.
Check out the link to efficient optimization at the top of the post, he never said anything about running statistics when measuring intelligence.
What we want is something that will enable a program to judge the quality of another so it will always be better when it picks a new program to rewrite to, so it can be take over the galaxy. Feel free to pick holes in this notion if you want, it is not mine.
I might rewrite the post to have the notion of strict ordering. A system is better than another if will do no worse under all physically plausible environments and is better in some. Then Mu would allow you to setup two programs that are differently ordered and ask you to bet on their ordering.
You really don't want to do statistics on the first seed AI if it is possible.
Rice's theorem says nothing about the impossibility to show properties of programs in special cases. It only shows that there are always some programs which will thwart your decision procedure. But your possible AI does not need to be one of them.
So where do we go? We go kicking Mu out of the picture:
For the most interesting aspect of an intelligent system is whether it is efficient in the very world it will occupy. So a reasonable test of it's intelligence is to run it in a detailed simulation of said world. Of course, if your simulated world and reality differ (for example in the existence of Mu), the measure is pretty much worthless.
However, decent simulations of our world can be built.
Also, the solution to your puzzle is to copy the world (including Mu) into a new Matrix, testing whether the program reaches it's goal, and using that as your efficiency measure. That way Mu cannot test the program's efficiency either, and you get a fair chance.
Optimality through obscurity?
If we are going to allow you a chance to figure out the behavior of Mu, Mu should be given the chance to find out the behavior of Eliezer(what programs you are likely to produce etc). Only then would information parity be preserved.
Mu is standing in for the entire world, your system is a small bit of it, it is entirely reasonable to expect the world to know more about your system than you do about the behavior of the would. I'm not sure where you are taking this idea, but it is unrealistic in my view.
I design a system p that has high optimising power, but which o is not smart enough to prove does so. Actually, I might get away with obfuscating p enough to confuse not o, but Mu's simulation of o, but my simulation of Mu simulating o might be imperfect.
Mu wrongly concludes that p is a poor optimiser, and performs the optimisation it thinks p incapable of. Meanwhile, I have bet against the verdict of o and win.
But I agree with Psychohistorian's comment and am not clear what the point of all this is.
Follow up to Efficient Cross-domain Optimization
When I am skeptical that we will ever understand intelligence, I am skeptical that we will ever be able to reliably map a systems description onto its optimization power. This has implications for how well we will create intelligences and how well intelligences will be at self-improving.
Obviously we can't predict the effectiveness of an arbitrary program, due to rice's theorem and intelligence being a non-trivial property. So the best we can hope for is predicting the effectiveness of a set of programs. Is such a function possible? This is my take on the subject.
Let o ( p ) be a function that maps a program p to its optimization power.
Mu, Omegas younger brother has a challenge for you, you get to design a system and put it in a box with 20 red and 20 green balls, it will activate itself after 10 minutes and then have the goal of removing as many red balls from the box as possible in 10 minutes. You have to decide how whether it is going to remove more or less than 5 red balls from the box. You get transported to a nirvana if you predict correctly and your world gets turned into paper clips if you get it wrong.
You whip out your trusty o and make a program and the evaluate it using o and bet according to its evaluation.
Unknown to you Mu also has a copy of your o and runs it on the systems you put in the box. Those that return a high value from the optimization power measure, it destroys before they activate, those that have a low effectiveness it performs their goals for them. In the second case it is still p that causes the goal to be fulfilled as if p were different there would be a different amount that the goal is fulfilled. You can see it as inspiring pity in someone else to make them help, who would not have done otherwise. It is still winning.
So Mu forces o to be wrong, so o was not the reliable predictor of a set of programs optimization power we had hoped for, so we have a contradiction. Is there anyway to salvage it? You could make your effectiveness measure depend upon the environment e as well, however that does not remove the potential for self-reference as o is part of the environment. So we might be able to rescue o by constraining the environment to not have any reference to o in. However we don't control the environment nor do we have perfect knowledge of the environment, so we don't know when it has references to o in or not, or when it is reliable.
You could make try and make it so that Mu could have no impact on what p does. Which is the same as trying to make the system indestructible, but with a reversible physics what is created can be destroyed.
So where do we go from here?