Followup to: Efficient Cross-Domain Optimization
Shane Legg once produced a catalogue of 71 definitions of intelligence. Looking it over, you'll find that the 18 definitions in dictionaries and the 35 definitions of psychologists are mere black boxes containing human parts.
However, among the 18 definitions from AI researchers, you can find such notions as
"Intelligence measures an agent's ability to achieve goals in a wide range of environments" (Legg and Hutter)
or
"Intelligence is the ability to optimally use limited resources - including time - to achieve goals" (Kurzweil)
or even
"Intelligence is the power to rapidly find an adequate solution in what appears a priori (to observers) to be an immense search space" (Lenat and Feigenbaum)
which is about as close as you can get to my own notion of "efficient cross-domain optimization" without actually measuring optimization power in bits.
But Robin Hanson, whose AI background we're going to ignore for a moment in favor of his better-known identity as an economist, at once said:
"I think what you want is to think in terms of a production function, which describes a system's output on a particular task as a function of its various inputs and features."
Economists spend a fair amount of their time measuring things like productivity and efficiency. Might they have something to say about how to measure intelligence in generalized cognitive systems?
This is a real question, open to all economists. So I'm going to quickly go over some of the criteria-of-a-good-definition that stand behind my own proffered suggestion on intelligence, and what I see as the important challenges to a productivity-based view. It seems to me that this is an important sub-issue of Robin's and my persistent disagreement about the Singularity.
(A) One of the criteria involved in a definition of intelligence is that it ought to separate form and function. The Turing Test fails this - it says that if you can build something indistinguishable from a bird, it must definitely fly, which is true but spectacularly unuseful in building an airplane.
(B) We will also prefer quantitative measures to qualitative measures that only say "this is intelligent or not intelligent". Sure, you can define "flight" in terms of getting off the ground, but what you really need is a way to quantify aerodynamic lift and relate it to other properties of the airplane, so you can calculate how much lift is needed to get off the ground, and calculate how close you are to flying at any given point.
(C) So why not use the nicely quantified IQ test? Well, imagine if the Wright Brothers had tried to build the Wright Flyer using a notion of "flight quality" build around a Fly-Q test standardized on the abilities of the average pigeon, including various measures of wingspan and air maneuverability. We want a definition that is not parochial to humans.
(D) We have a nice system of Bayesian expected utility maximization. Why not say that any system's "intelligence" is just the average utility of the outcome it can achieve? But utility functions are invariant up to a positive affine transformation, i.e., if you add 3 to all utilities, or multiply all by 5, it's the same utility function. If we assume a fixed utility function, we would be able to compare the intelligence of the same system on different occasions - but we would like to be able to compare intelligences with different utility functions.
(E) And by much the same token, we would like our definition to let us recognize intelligence by observation rather than presumption, which means we can't always start off assuming that something has a fixed utility function, or even any utility function at all. We can have a prior over probable utility functions, which assigns a very low probability to overcomplicated hypotheses like "the lottery wanted 6-39-45-46-48-36 to win on October 28th, 2008", but higher probabilities to simpler desires.
(F) Why not just measure how well the intelligence plays chess? But in real-life situations, plucking the opponent's queen off the board or shooting the opponent is not illegal, it is creative. We would like our definition to respect the creative shortcut - to not define intelligence into the box of a narrow problem domain.
(G) It would be nice if intelligence were actually measurable using some operational test, but this conflicts strongly with criteria F and D. My own definition essentially tosses this out the window - you can't actually measure optimization power on any real-world problem any more than you can compute the real-world probability update or maximize real-world expected utility. But, just as you can wisely wield algorithms that behave sorta like Bayesian updates or increase expected utility, there are all sorts of possible methods that can take a stab at measuring optimization power.
(H) And finally, when all is said and done, we should be able to recognize very high "intelligence" levels in an entity that can, oh, say, synthesize nanotechnology and build its own Dyson Sphere. Nor should we assign very high "intelligence" levels to something that couldn't build a wooden wagon (even if it wanted to, and had hands). Intelligence should not be defined too far away from that impressive thingy we humans sometimes do.
Which brings us to production functions. I think the main problems here would lie in criteria DE.
First, a word of background: In Artificial Intelligence, it's more common to spend your days obsessing over the structure of a problem space - and when you find a good algorithm, you use that algorithm and pay however much computing power it requires. You aren't as likely to find a situation where there are five different algorithms competing to solve a problem and a sixth algorithm that has to decide where to invest a marginal unit of computing power. Not that computer scientists haven't studied this as a specialized problem. But it's ultimately not what AIfolk do all day. So I hope that we can both try to appreciate the danger of deformation professionelle.
Robin Hanson said:
"Eliezer, even if you measure output as you propose in terms of a state space reduction factor, my main point was that simply 'dividing by the resources used' makes little sense."
I agree that "divide by resources used" is a very naive method, rather tacked-on by comparison. If one mind gets 40 bits of optimization using a trillion floating-point operations, and another mind achieves 80 bits of optimization using two trillion floating-point operations, even in the same domain using the same utility function, they may not at all be equally "well-designed" minds. One of the minds may itself be a lot more "optimized" than the other (probably the second one).
I do think that measuring the rarity of equally good solutions in the search space smooths out the discussion a lot. More than any other simple measure I can think of. You're not just presuming that 80 units are twice as good as 40 units, but trying to give some measure of how rare 80-unit solutions are in the space; if they're common it will take less "optimization power" to find them and we'll be less impressed. This likewise helps when comparing minds with different preferences.
But some search spaces are just easier to search than others. I generally choose to talk about this by hiking the "optimization" metric up a meta-level: how easy is it to find an algorithm that searches this space? There's no absolute easiness, unless you talk about simple random selection, which I take as my base case. Even if a fitness gradient is smooth - a very simple search - e.g. natural selection would creep down it by incremental neighborhood search, while a human would leap through by e.g. looking at the first and second derivatives. Which of these is the "inherent easiness" of the space?
Robin says:
Then we can talk about partial derivatives; rates at which output increases as a function of changes in inputs or features... Yes a production function formulation may abstract from some relevant details, but it is far closer to reality than dividing by "resources."
A partial derivative divides the marginal output by marginal resource. Is this so much less naive than dividing total output by total resources?
I confess that I said "divide by resources" just to have some measure of efficiency; it's not a very good measure. Still, we need to take resources into account somehow - we don't want natural selection to look as "intelligent" as humans: human engineers, given 3.85 billion years and the opportunity to run 1e44 experiments, would produce products overwhelmingly superior to biology.
But this is really establishing an ordering based on superior performance with the same resources, not a quantitative metric. I might have to be content with a partial ordering among intelligences, rather than being able to quantify them. If so, one of the ordering characteristics will be the amount of resources used, which is what I was getting at by saying "divide by total resources".
The idiom of "division" is based around things that can be divided, that is, fungible resources. A human economy based on mass production has lots of these. In modern-day computing work, programmers use fungible resources like computing cycles and RAM, but tend to produce much less fungible outputs. Informational goods tend to be mostly non-fungible: two copies of the same file are worth around as much as one, so every worthwhile informational good is unique. If I draw on my memory to produce an essay, neither the sentences of the essay, or the items of my memory, will be substitutable for one another. If I create a unique essay by drawing upon a thousand unique memories, how well have I done, and how much resource have I used?
Economists have a simple way of establishing a kind of fungibility-of-valuation between all the inputs and all the outputs of an economy: they look at market prices.
But this just palms off the problem of valuation on hedge funds. Someone has to do the valuing. A society with stupid hedge funds ends up with stupid valuations.
Steve Omohundro has pointed out that for fungible resources in an AI - and computing power is a fungible resource on modern architectures - there ought to be a resource balance principle: the marginal result of shifting a unit of resource between any two tasks should produce a decrease in expected utility, relative to the AI's probability function that determines the expectation. To the extent any of these things have continuous first derivatives, shifting an infinitesimal unit of resource between any two tasks should have no effect on expected utility. This establishes "expected utilons" as something akin to a central currency within the AI.
But this gets us back to the problems of criteria D and E. If I look at a mind and see a certain balance of resources, is that because the mind is really cleverly balanced, or because the mind is stupid? If a mind would rather have two units of CPU than one unit of RAM (and how can I tell this by observation, since the resources are not readily convertible?) then is that because RAM is inherently twice as valuable as CPU, or because the mind is twice as stupid in using CPU as RAM?
If you can assume the resource-balance principle, then you will find it easy to talk about the relative efficiency of alternative algorithms for use inside the AI, but this doesn't give you a good way to measure the external power of the whole AI.
Similarly, assuming a particular relative valuation of resources, as given by an external marketplace, doesn't let us ask questions like "How smart is a human economy?" Now the relative valuation a human economy assigns to internal resources can no longer be taken for granted - a more powerful system might assign very different relative values to internal resources.
I admit that dividing optimization power by "total resources" is handwaving - more a qualitative way of saying "pay attention to resources used" than anything you could actually quantify into a single useful figure. But I pose an open question to Robin (or any other economist) to explain how production theory can help us do better, bearing in mind that:
- Informational inputs and outputs tend to be non-fungible;
- I want to be able to observe the "intelligence" and "utility function" of a whole system without starting out assuming them;
- I would like to be able to compare, as much as possible, the performance of intelligences with different utility functions;
- I can't assume a priori any particular breakdown of internal tasks or "ideal" valuation of internal resources.
I would finally point out that all data about the market value of human IQ only applies to variances of intelligence within the human species. I mean, how much would you pay a chimpanzee to run your hedge fund?
I am not sure you are taking into account the possibility that an intelligence may yield optimal performance within a specific recource-range. Would a human mind given a 10x increase in memmory (and memmories) opperate even marginally better? Or would it be overwhelmed by an amount of information it was not prepared for? Similarly, would a human mind even be able to operate given half the computational resources? In comparing mind A with 40bits/1trillionFPO with the Mind B of 80bits/2trillionFPO may be a matter of how many resources are available, since we don't have any datapoints about how much they each yield given the other's resources.
So perhaps the trendy term of scalability might be one dimension of the intelligence metric you seek. Can a mind take advantage of additional resources if they are made available? I suspect that an intelligence A that can scale up and down (to a specific minimum) linearly may be thought of as superior to an intelligence B that may yield a higher optimization output for a specific amount of resources but is unable to scale up or down.