Tim, when I said relative to a space I did not mean relative to its size. This is clear in my example of a hill topography, where increasing the scale of the hill does not make it a qualitatively different problem, just move to positions that are higher will work. In fact, the whole motivation for my suggestion is the realization that the structure of that space is what limits the results of a given optimizer. So it is relative to all the properties of the space that the power of an optimizer should be defined, to begin with. I say begin with because there are many other technical difficulties left, but i think that measures of power for optimizers that operate on different spaces do not compare meaningfully.
What I would suggest to begin with (besides any further technical problems) is that optimization power has to be defined relative to a given space or a given class of spaces (in addition to relative to a preference ordering and a random selection)
This allows comparisons between optimizers with a common target space to be more meaningful. In my example above, the hill climber would be less powerful than the range climber because given a "mountain range" the former would be stuck on a local maximum. So for both these optimizers, we would define the target space as the class of NxN topographies, and the range climber's score would be higher, as an average.
Im not sure if i am echoing another post by shane legg (cant remember where).
Consider a three dimensional space (topography) and a preference ordering given by height.
An optimizer that climbs a "hill space" would seems intuitively less powerful than one that finds the highest peak in a "multi hill space", even if relative to a random selection, and given that both spaces are the same size, both points are equally likely.
Without more information, holding the position that no AI could convince you let it out requires a huge amount of evidence comparable to the huge amount of possible AI's, even if the space of possibility is then restricted by a text only interface. This logic reminds me of the discussion in logical positivism of how negative existential claims are not verifiable.
I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often. Still it is interesting to consider whether this extra condition takes the experiment closer to what is supposed to be simulated or the opposite.
Those are good points, although you did add the assumption of a community of uncontrolled widespread AI's whereas my idea was related to building one for research as part of a specific venture (eg singinst)
In any case, I have the feeling that the problem of engineering a safe controlled environment for a specific human level AI is much smaller than the problem of attaining Friendliness for AIs in general (including those that are 10x, 100x, 1000x etc more intelligent). Consider also that deciding not to build an AI does not stop everybody else from doing so, so if a human level AI were valuable in achieving FAI as I suggest, then it would be wise for the very reasons you suggest to take that route before the bad scenario plays out.
Ben,
Using your analogy I was thinking more along lines of reliably building a non-super weapon in the first place. Also, I wasnt suggesting that F would be a module, but rather that FAI (the theory) could be easier to figure out via a non "superlative" AI, after which point you'd then attempt to build the superweapon according to FAI, having had key insights into what morality is.
Imagine OpenCogPrime has reached human level AI. Presumably you could teach it morality/moral judgements like humans. At this point, you could actually look inside at the AtomTable and have a concrete mathematical representation of morality. You could even trace whats going on during judgements. Try doing the same by introspecting into your own thoughts.
Ben,
The reason why I was considering the idea of "throttling" is precisely in order to reliably set the AI at human level (ie equivalent to an average human) and no higher. This scenario would therefore not entail the greater than human intelligence risk that you are referring to, nor would it (presumably) entail the singularity as usually defined. However, the benefits of a human level AI could be huge in terms of ability to introspect concepts that are shrouded in the mystery associated with the "mental" (vs non-mental in Eliezer's terminology). If the AI is at human level, then the AI can learn morality, then we can introspect and debug moral thinking that currently comes to us as a given. So, could it not be that the fastest path to FAI passes through human level AI? (that is not powerful enough to require FAI in the first place)
Phil,
Yes im sure it would be of great use in many things, but my main suggestion is whether the best route to FAI is through human level (but not higher) AI.
Phil,
There's really two things im considering. One, whether the general idea of AI throttling is meaningful and what the technical specifics could be (crude example: lets give it only X compute power yielding an intelligence level Y) Two, if we could reliably build a human level AI, it could be of great use, not in itself, but as a tool for investigation, since we could finally "look inside" at concrete realizations of mental concepts, which is not possible with our own minds. As an example, if we could teach a human level AI morality (presumably possible since we ourselves learn it) we would have a concrete realization of that morality as computation that could be looked at outright and even debugged. Could this not be of great value for insights into FAI?
Eliezer,
Have you considered in detail the idea of AGI throttling, that is, given a metric of intelligence, and assuming a correlation between existential risk and said intelligence, AGI throttling is the explicit control of the AGI's intelligence level (or optimization power if you like), which indirectly also bounds existential risk.
In other words, what, if any, are the methods of bounding AGI intelligence level? Is it possible to build an AGI and explicitly set it at human level?
Consider Phlebas is subpar Culture and Player of Games is the perfect introductory book but still not full power Banks. Use of Weapons, Look to Windward, Inversions.. and Feersum Endjinn favourite non-Culture.
More to the point however, Look to Windward discusses part of the points you raise. I'm just going by memory here but one of the characters Cr. Ziller, a brilliant and famous non human composer, asks a Mind whether it could create symphonies as beautiful as it and how hard it would be. The Mind answers that yes, it could (and we get the impression that quite easily in fact) and goes on to argue how that does not take anything away from Ziller's achievement. I dont remember the detail exactly but at one point there is an analogy with mountain climbing when you can just use a helicopter.
From my readings i dont get the impression that there is "competing on a level playing field with superintelligences" and in fact when Banks does bring Minds too far into the limelight things break down (Excession)