Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Why would an AI try to figure out its goals?

13 Post author: cousin_it 09 November 2011 10:08AM

"So how can it ensure that future self-modifications will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modifications are unlikely to preserve them. Systems will therefore be motivated to reflect on their goals and to make them explicit." -- Stephen M. Omohundro, The Basic AI Drives

This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why

I have stopped understanding why these quotes are correct. Help!

More specifically, if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals - then it has no particular reason to stabilize its goals. Isn't that anthropomorphizing? We humans don't exhibit a lot of goal-directed behavior, but we do have a verbal concept of "goals", so the verbal phantom of "figuring out our true goals" sounds meaningful to us. But why would AIs behave the same way if they don't think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I'm missing?

Comments (87)

Comment author: Vladimir_Nesov 09 November 2011 01:09:12PM *  4 points [-]

Saying that there is an agent refers (in my view; definition for this thread) to a situation where future events are in some sense expected to be optimized according to some goals, to the extent certain other events ("actions") control those future events. There might be many sufficient conditions for that in terms of particular AI designs, but they should amount to this expectation.

So an agent is already associated with goals in terms of its actual effect on its environment. Given that agent's own future state (design) is an easily controlled part of the environment, it's one of the things that'll be optimized, and given that agents are particularly powerful incantations, it's a good bet that future will retain agent-y patterns, at least for a start. If future agent has goals different from the original, this by the same definition says that the future will be optimized for different goals, and yet in a way controllable by original agent's actions (through the future agent). This contradicts that the original agent is an agent (with original goals). And since the task of constructing future agent includes specification of goals, original agent needs to figure out what they are.

Comment author: Tyrrell_McAllister 09 November 2011 06:14:30PM 2 points [-]

And since the task of constructing future agent includes specification of goals . . .

There seems to be a leap, here. An agent, qua agent, has goals. But is it clear that the historical way in which the future-agent is constructed by the original agent must pass through an explicit specification of the future-agent's goals? The future-agent could be constructed that way, but must it? (Analogously, a composite integer has factors, but a composite can be constructed without explicitly specifying its factors.)

Comment author: Vladimir_Nesov 09 November 2011 07:53:20PM *  3 points [-]

Goals don't need to be specified explicitly, all that's required is that it's true that future agent has goals similar to original agent's. However, since construction of future agent is part of original agent's behavior that contributes to original agent's goals (by my definition), it doesn't necessarily make sense for the agent to prove that goals are preserved, it just needs to be true that they are (to some extent), more as an indication that we understand original agent correctly than a consideration that it takes into account.

For example, original agent might be bad at accomplishing its "normative" goals, and even though it's true that it optimizes the environment to some extent, it doesn't do it very well, so definition of "normative" goals (related in my definition to actual effect on environment) doesn't clearly derive from original agent's construction, except specifically for its tendency to construct future agents with certain goals (assuming it can do that true to the "normative" goals), in which case future agent's goals (as parameters of design) are closer to the mark (actual effect on environment and "normative" goals) than original agent's (as parameters of design).

Comment author: Tyrrell_McAllister 10 November 2011 12:19:55AM 0 points [-]

However, since construction of future agent is part of original agent's behavior that contributes to original agent's goals (by my definition), it doesn't necessarily make sense for the agent to prove that goals are preserved, it just needs to be true that they are (to some extent), more as an indication that we understand original agent correctly than a consideration that it takes into account.

(Emphasis added.) For that sense of "specify", I agree.

Comment author: XiXiDu 09 November 2011 02:06:55PM *  2 points [-]

So an agent is already associated with goals in terms of its actual effect on its environment. Given that agent's own future state (design) is an easily controlled part of the environment, it's one of the things that'll be optimized...

If you added general intelligence and consciousness to IBM Watson, where does the urge to refine or protect its Jeopardy skills come from? Why would it care if you pulled the plug on it? I just don't see how optimization and goal protection are inherent features of general intelligence, agency or even consciousness.

Comment author: billswift 09 November 2011 09:46:27PM *  1 point [-]

He seems to be arguing around the definition of an agent using BDI or similar logic; BDI stands for beliefs-desires-intentions, and the intentions are goals. In this framework (more accurately, set of frameworks) agents necessarily, by definition have goals. More generally, though, I have difficulty envisioning anything that could realistically be called an "agent" that does not have goals. Without goals you would have a totally reactive intelligence, but it could not do anything without being specifically instructed, like a modern computer.

ADDED: Thinking further, such a "goal-less" intelligence couldn't even try to foresee questions in order to have answers ready, or take any independent action. You seem to be arguing for an un-intelligent, in any real meaning of the word, intelligence.

Comment author: XiXiDu 09 November 2011 01:34:19PM 1 point [-]

Consider someone with a larger inferential distance, e.g. a potential donor. The title "The Basic AI Drives" seems to be a misnomer, given the amount of presuppositions inherent in your definition. There exist a vast amount of possible AI designs, that would appear to be agents, that would have no incentive to refine or even protect their goals.

Comment author: cousin_it 09 November 2011 04:48:42PM *  0 points [-]

Saying that there is an agent refers (in my view; definition for this thread) to a situation where future events are in some sense expected to be optimized according to some goals

Omohundro's paper says:

Researchers have explored a wide variety of architectures for building intelligent systems: neural networks, genetic algorithms, theorem provers, expert systems, Bayesian networks, fuzzy logic, evolutionary programming, etc. Our arguments apply to any of these kinds of system as long as they are sufficiently powerful.

It's not obvious to me why any of these systems would be "agents" under your definition. So I guess your definition is too strong. My question stands.

Comment author: Vladimir_Nesov 09 November 2011 06:45:11PM 2 points [-]

The "sufficiently powerful" clause seems to me like something that should translate as roughly my definition, making implementation method irrelevant for essentially the same reasons. In context, "powerful" means "powerful as a consequentialist agent", and that's just what I unpacked (a little bit) in my definition.

Comment author: Will_Newsome 09 November 2011 10:02:37PM *  0 points [-]

(It's unknown how large the valley is between a hacked together AI that can't get off the ground and a hacked together AI that is at least as reflective as, say, Vladimir Nesov. Presumably Vladimir Nesov would be very wary of locking himself into a decision algorithm that was as unreflective as many synax-manipulator/narrow-AI-like imagined AGIs that get talked about by default around here/SingInst.)

Comment author: Wei_Dai 09 November 2011 11:19:18PM 3 points [-]

We humans don't exhibit a lot of goal-directed behavior

Do you not count reward-seeking / reinforcement-learning / AIXI-like behavior as goal-directed behavior? If not, why not? If yes, it doesn't seem possible to build an AI that makes intelligent decisions without a goal-directed architecture.

A superintelligence might be able to create a jumble of wires that happen to do intelligent things, but how are we humans supposed to stumble onto something like that, given that all existing examples of intelligent behavior and theories about intelligent decision making are goal-directed? (At least if "intelligent" is interpreted to mean general intelligence as opposed to narrow AI.) Do you have something in mind when you say "shallow insights"?

Comment author: cousin_it 15 November 2011 12:43:11AM *  1 point [-]

Given enough computing power, humans can create a haphazardly smart jumble of wires by simulated evolution, or uploading small chunks of human brains and prodding them, or any number of other ways I didn't think of. In a certain sense these methods can be called "shallow". I see no reason why all such creatures would necessarily have an urge to stabilize their values.

Comment author: Wei_Dai 15 November 2011 09:24:08PM 4 points [-]

When you talk about AI, do you mean general intelligence, as in being competent in arbitrary domains (given enough computing power), or narrow AI, which can succeed on some classes of tasks but fail on others? I would certainly agree that narrow AI does not need to be goal-directed, and the future will surely contain many such AI. And maybe there are ways to achieve general intelligence other than through a goal-directed architecture, but since that's already fairly simple, and all of our theories and existing examples point towards it, it just seems very unlikely that the first AGI that we build won't goal-directed.

Given enough computing power, humans can create a haphazardly smart jumble of wires by simulated evolution

So far, evolution has created either narrow intelligence (non-human animals) or general intelligence that is goal-directed. Why would simulated evolution give different results?

uploading small chunks of human brains and prodding them

It seems to me that you would again end up with either a narrow intelligence or a goal-directed general intelligence.

I see no reason why all such creatures would necessarily have an urge to stabilize their values.

Again, if by AI you include narrow AI, then I'd agree with you. So what question are you asking?

BTW, an interesting related question is whether general intelligence is even possible at all, or can we only build AIs that are collections of tricks and heuristics, and we ourselves are just narrow intelligence with competence in enough areas to seem like general intelligence. Maybe that's the question you actually have in mind?

Comment author: Vladimir_Nesov 16 November 2011 10:24:29PM *  2 points [-]

What is the difference between what you mean by "goal-directed AGI" and "not goal-directed AGI", given that the latter is stipulated as "competent in arbitrary domains (given enough computing power)"? What does "competent" refer to in the latter, if not to essentially goal-directedness, that is successful attainment of whatever "competence" requires by any means necessary (consequentialism, means don't matter in themselves)? I think these are identical ideas, and rightly so.

Comment author: cousin_it 16 November 2011 12:34:15AM *  1 point [-]

I don't know how to unpack "general intelligence" or "competence in arbitrary domains" and I don't think people have any reason to believe they possess something so awesome. When people talk about AGI, I just assume they mean AI that's at least as general as a human. A lobotomized human is one example of a "jumble of wires" that has human-level IQ but scores pretty low on goal-directedness.

The first general-enough AI we build will likely be goal-directed if it's simple and built from first principles. But if it's complex and cobbled together from "shallow insights", its goal-directedness and goal-stabilization tendencies are anyone's guess.

Comment author: cousin_it 16 November 2011 01:45:27AM *  2 points [-]

Wei and I took this discussion offline and came to the conclusion that "narrow AIs" without the urge to stabilize their values can also end up destroying humanity just fine. So this loose end is tidied up: contra Eliezer, a self-improving world-eating AI developed by stupid researchers using shallow insights won't necessarily go through a value freeze. Of course that doesn't diminish the danger and is probably just a minor point.

Comment author: Vladimir_Nesov 16 November 2011 10:29:21PM 0 points [-]

I'd expect a "narrow AI" that's capable enough to destroy humanity to be versed in enough domains to qualify as goal-directed (according to a notion of having a goal that refers to a tendency to do something consequentialistic in a wide variety of domains, which seems to be essentially the same thing as "being competent", since you'd need a notion of "competence" for that, and notions of "competence" seem to refer to successful goal-achievement given some goals).

Comment author: cousin_it 16 November 2011 10:49:43PM *  0 points [-]

Just being versed in nanotech could be enough. Or exotic physics. Or any number of other narrow domains.

Comment author: Vladimir_Nesov 16 November 2011 10:56:51PM *  0 points [-]

Could be, but not particularly plausible if would still naturally qualify as "AI-caused catastrophe", rather than primarily a nanotech/physics experiment/tools going wrong with a bit of AI facilitating the catastrophe.

(I'm interested in what you think about the AGI competence=goals thesis. To me this seems to dissolve the question and I'm curious if I'm missing the point.)

Comment author: cousin_it 17 November 2011 12:35:23AM *  2 points [-]

That doesn't sound right. What if I save people on Mondays and kill people on Tuesdays, being very competent at both? You could probably stretch the definition of "goal" to explain such behavior, but it seems easier to say that competence is just competence.

Comment author: Vladimir_Nesov 17 November 2011 12:53:05AM *  0 points [-]

You could probably stretch the definition of "goal" to explain such behavior

Characterize, not explain. This defines (idealized) goals given behavior, it doesn't explain behavior. The (detailed) behavior (together with the goals) is perhaps explained by evolution or designer's intent (or error), but however evolution (design) happened is a distinct question from what is agent's own goal.

Saying that something is goal-directed seems to be an average fuzzy category, like "heavy things". Associated with it are "quantitative" ideas of a particular goal, and optimality of its achievement (like with particular weight).

Comment author: Vladimir_Nesov 17 November 2011 12:47:53AM *  0 points [-]

This could be a goal, maximization of (Monday-saved + Tuesday-killed). If resting and preparation the previous day helps, you might opt for specializing in Tuesday-killing, but Monday-save someone if that happens to be convenient and so on...

I think this only sounds strange because humans don't have any temporal terminal values, and so there is an implicit moral axiom of invariance in time. It's plausible we could've evolved something associated with time of day, for example. (It's possible we actually do have time-dependent values associated with temporal discounting.)

Comment author: torekp 14 November 2011 02:37:04PM *  1 point [-]

rwallace addressed your premise here.

Comment author: XiXiDu 09 November 2011 11:46:13AM *  3 points [-]

Let's start with the template for an AGI, the seed for a generally intelligent expected-utility maximizer capable of recursive self-improvement.

As far as I can tell, the implementation of such a template would do nothing at all because its utility-function would be a "blank slate".

What happens if you now enclose the computation of Pi in its utility-function? Would it reflect on this goal and try to figure out its true goals? Why would it do so, where does the incentive come from?

Would complex but implicit goals change its behavior? Why would it improve upon its goals, why would it even try to preserve them in their current form if it has no explicit incentive to do so? It seems that if it indeed has an incentive to make its goals explicit, given an implicit utility-function, then the incentive to do so must be a presupposition inherent to the definition of a generally intelligent expected-utility maximizer capable of recursive self-improvement.

Comment author: timtyler 09 November 2011 08:39:05PM *  4 points [-]

What happens if you now enclose the computation of Pi in its utility-function? Would it reflect on this goal and try to figure out its true goals? Why would it do so, where does the incentive come from?

So: the general story is that to be able to optimise, agents have to build a model of the world - in order to predict the consequences of their possible actions. That model of the world will necessarily include a model of the agent - since it is an important part of its own local environment. That model of itself is likely to include its own goals - and it will use Occam's razor to build a neat model of them. Thus goal reflection - Q.E.D.

Comment author: timtyler 09 November 2011 06:00:12PM *  1 point [-]

It seems that if it indeed has an incentive to make its goals explicit, given an implicit utility-function, then the incentive to do so must be a presupposition inherent to the definition of a generally intelligent expected-utility maximizer capable of recursive self-improvement.

That is the general idea of universal instrumental values, yes.

Comment author: XiXiDu 09 November 2011 07:04:54PM *  3 points [-]

That is the general idea of universal instrumental values, yes.

I am aware of that argument but don't perceive it to be particularly convincing.

Universal values are very similar to universal ethics, and for the same reasons that I don't think that an AGI will be friendly by default I don't think that it will protect its goals or undergo recursive self-improvement by default. Maximizing expected utility is, just like friendliness, something that needs to be explicitly defined, otherwise there will be no incentive to do so.

Comment author: timtyler 09 November 2011 07:25:08PM *  0 points [-]

Universal values are very similar to universal ethics, and for the same reasons that I don't think that an AGI will be friendly by default I don't think that it will protect its goals or undergo recursive self-improvement by default.

I'm not really sure what you mean "by default". The idea is that a goal-directed machine that is sufficiently smart will tend to do these things (unless its utility function says otherwise) - at least if you can set it up so it doesn't become a victim of the wirehead or pornography problems.

IMO, there's a big difference between universal instrumental values and values to do with being nice to humans. The first type you get without asking - the second you have to deliberately build in. IMO, it doesn't make much sense to lump these ideas together and reject both of them on the same grounds - as you seem to be doing.

Comment author: rwallace 09 November 2011 02:28:10PM 6 points [-]

The quotes are correct in the sense that "P implies P" is correct; that is, the authors postulate the existence of an entity constructed in a certain way so as to have certain properties, then argue that it would indeed have those properties. True, but not necessarily consequential, as there is no compelling reason to believe in the future existence of an entity constructed in that way in the first place. Most humans aren't like that, after all, and neither are existing or in-development AI programs; nor is it a matter of lacking "intelligence" considered as a scalar quantity, as there is no tendency for the more capable AI programs to be constructed more along the postulated lines (if anything, arguably the reverse).

Comment author: XiXiDu 09 November 2011 02:53:28PM 6 points [-]

...there is no compelling reason to believe in the future existence of an entity constructed in that way in the first place.

Yes, that's what bothered me about the paper all along. I actually think that the sort of AI they are talking about might require a lot of conjunctive, not disjunctive, lines of reasoning and that the the subset of all AGI designs possible that does not FOOM might be much larger than it is often being portrayed around here.

Comment author: timtyler 09 November 2011 05:19:09PM *  1 point [-]

the authors postulate the existence of an entity constructed in a certain way so as to have certain properties, then argue that it would indeed have those properties. True, but not necessarily consequential, as there is no compelling reason to believe in the future existence of an entity constructed in that way in the first place.

Actually, Omohundro claims that the "drives" he proposes are pretty general - in the cited paper - here:

Researchers have explored a wide variety of architectures for building intelligent systems [2]: neural networks, genetic algorithms, theorem provers, expert systems, Bayesian networks, fuzzy logic, evolutionary programming, etc. Our arguments apply to any of these kinds of system as long as they are sufficiently powerful.

Comment author: rwallace 09 November 2011 08:54:49PM 0 points [-]

Sure, and my point is that when you look more closely, 'sufficiently powerful' translates to 'actually pretty much nothing people have built or tried to build within any of these architectures would have this property, no matter how much power you put behind it; instead you would have to build a completely different system with very particular properties, that wouldn't really use the aforementioned architectures as anything except unusually inefficient virtual machines, and wouldn't perform well in realistic conditions.'

Comment author: timtyler 09 November 2011 09:22:27PM 0 points [-]

Hmm. I think some sympathetic reading is needed here. Steve just means to say something like: "sufficiently powerful agent - it doesn't matter much how it is built". Maybe if you tried to "ramp up" a genetic algorithm it would never produce a superintelligent machine - but that seems like bit of a side issue.

Steve claims his "drives" are pretty general - and you say they aren't. The argument you give from existing humans and programs makes little sense to me, though - these are goal-directed systems, much like the ones Steve discusses.

Comment author: rwallace 09 November 2011 10:55:31PM 1 point [-]

Sure, and I'm saying his conclusion is only true for an at best very idiosyncratic definition of 'sufficiently powerful' - that the most powerful systems in real life are and will be those that are part of historical processes, not those that try to reinvent themselves by their bootstraps.

Humans and existing programs are approximately goal directed within limited contexts. You might have the goal of making dinner, but you aren't willing to murder your next-door neighbor so you can fry up his liver with onions, even if your cupboard is empty. Omohundro postulates a system which, unlike any real system, throws unlimited effort and resources into a single goal without upper bound. Trying to draw conclusions about the real world from this thought experiment is like measuring the exponential increase in air velocity from someone sneezing, and concluding that in thirty seconds he'll have blown the Earth out of orbit.

Comment author: dlthomas 10 November 2011 11:39:54PM 4 points [-]

[Y]ou aren't willing to murder your next-door neighbor so you can fry up his liver with onions, even if your cupboard is empty.

For one thing, where are you going to get the onions?

Comment author: timtyler 10 November 2011 12:19:51PM *  0 points [-]

Thanks for clarifying. I think Steve is using "sufficiently powerful" to mean "sufficiently intelligent" - and quite a few definitions of intellligence are all to do with being goal-directed.

The main reason most humans don't murder people to get what they want is because prison sentences confllict with their goals - not because they are insufficiently goal-directed, IMO. They are constrained by society's disapproval and act within those constraints. In warfare, soociety approves, and then the other people actually do die.

Most creatures are as goal-directed as evolution can make them. It is true that there are parasites and symbiotes that mean that composite systems are sometimes optimising mulltiple goals simultaneously. Memetic parasites are quite significant for humans - but they will probably be quite significant for intelligent machines as well. Systems with parasites are not seriously inconsistent with a goal-directed model. From the perspective of such a model, parasites are part of the environment.

Machines that are goal directed until their goal is complete are another real possibility - besides open-ended optimisation. However, while their goal is incomplete, goal directed models would seem to be applicable.

Comment author: rwallace 10 November 2011 02:13:59PM 2 points [-]

quite a few definitions of intellligence are all to do with being goal-directed

Of the seventy-some definitions of intelligence that had been gathered last count, most have something to do with achieving goals. That is a very different thing from being goal-directed (which has several additional requirements, the most obvious being an explicit representation of one's goals).

The main reason most humans don't murder people to get what they want is because prison sentences confllict with their goals

Would you murder your next-door neighbor if you thought you could get away with it?

Most creatures are as goal-directed as evolution can make them

"As ... as evolution can make them" is trivially true in that our assessment of what evolution can do is driven by what it empirically has done. It remains the case that most creatures are not particularly goal-directed. We know that bees stockpile honey to survive the winter, but the bees do not know this. Even the most intelligent animals have planning horizons of minutes compared to lifespans of years to decades.

Memetic parasites are quite significant for humans - but they will probably be quite significant for intelligent machines as well

Indeed, memetic parasites are quite significant for machines today.

Comment author: timtyler 10 November 2011 03:04:41PM *  0 points [-]

Of the seventy-some definitions of intelligence that had been gathered last count, most have something to do with achieving goals. That is a very different thing from being goal-directed (which has several additional requirements, the most obvious being an explicit representation of one's goals).

OK, so I am not 100% clear on the distinction you are trying to draw - but I just mean optimising, or maximising.

Would you murder your next-door neighbor if you thought you could get away with it?

Hmm - so: publicly soliciting personally identifiable expressions of murderous intent is probably not the best way of going about this. If it helps, I do think that Skinnerian conditioning - based on punishment and reprimands - is the proximate explanation for most avoidance of "bad" actions.

It remains the case that most creatures are not particularly goal-directed. We know that bees stockpile honey to survive the winter, but the bees do not know this.

So: the bees are optimised to make more bees. Stockpiling honey is part of that. Knowing why is not needed for optimisation.

Even the most intelligent animals have planning horizons of minutes compared to lifespans of years to decades.

OK - but even plants are optimising. There are multiple optimisation processes. One happens inside minds - that seems to be what you are talking about. Mindless things optimise too though - plants act so as to maximise the number of their offspring - and that's still a form of optimisation.

If you want the rationale for describing such actions as being "goal directed", we can consider the goal to be world domination by the plants, and then the actions of the plant are directed towards that goal. You can still have "direction" without a conscious "director".

Comment author: rwallace 10 November 2011 03:23:25PM 1 point [-]

Hmm - so: publicly soliciting personally identifiable expressions of murderous intent is probably not the best way of going about this

It was a rhetorical question. I'm confident the answer is no - the law only works when most people are basically honest. We think we have a goal, and so we do by the ordinary English meaning of the word, but then there are things we are not prepared to do to achieve it, so it turns out what we have is not a goal by the ultimate criterion of decision theory on which Omohundro draws, and if we try to rescue the overuse of decision theory by appealing to a broader goal, it still doesn't work; regardless of what level you look at, there is no function such that humans will say "yes, this is my utility function, and I care about nothing but maximizing it."

The idea of goals in the sense of decision theory is like the idea of particles in the sense of Newtonian physics - a useful approximation for many purposes, provided we remember that it is only an approximation and that if we get a division by zero error the fault is in our overzealous application of the theory, not in reality.

OK - but even plants are optimising. There are multiple optimisation processes

Precisely. There are many optimization processes - and none of them work the way they would need to work for Omohundro's argument to be relevant.

Comment author: lessdazed 10 November 2011 03:48:24PM 0 points [-]

Precisely. There are many optimization processes - and none of them work the way they would need to work for Omohundro's argument to be relevant.

What do you mean exactly? Humans have the pieces for it to be relevant, but have many constraints preventing it from being applicable, such as difficulty changing our brains' design. A mind very like humans' that had the ability to test out new brain components and organizations seems like it would fit it.

Comment author: timtyler 10 November 2011 03:33:14PM *  -1 points [-]

We think we have a goal, and so we do by the ordinary English meaning of the word, but then there are things we are not prepared to do to achieve it, so it turns out what we have is not a goal by the ultimate criterion of decision theory on which Omohundro draws

Hmm. This reminds me of my recent discussion with Matt M. about constraints.

Optimising under constraints is extremely similar to optimising some different function that incorporates the constraints as utility penalties.

Identifying constraints and then rejecting optimisation-based explanations just doesn't follow, IMHO.

if we try to rescue the overuse of decision theory by appealing to a broader goal, it still doesn't work; regardless of what level you look at, there is no function such that humans will say "yes, this is my utility function, and I care about nothing but maximizing it."

...and at this point, I usually just cite Dewey:

Any agent can be expressed as an. O-maximizer (as we show in Section 3.1),

This actually only covers any computable agent.

Humans might reject the idea that they are utility maximisers, but they are. Their rejection is likely to be signallling their mysteriousness and wonderousness - not truth seeking.

Comment author: lessdazed 10 November 2011 02:14:47PM 0 points [-]

onions

...fava beans...

Comment author: Solvent 09 November 2011 11:29:28AM 2 points [-]

if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals

I think that that's where you're looking at it differently from Eliezer et al. I think that Eliezer at least is talking about an AI which has goals, but does not, when it starts modifying itself, understand itself well enough to keep them stable. Once it gets good enough at self modification to keep its goals stable, it will do so, and they will be frozen indefinitely.

(This is just a placeholder explanation. I hope that someone clever and wise will come in and write a better one.)

Comment author: Logos01 09 November 2011 10:14:47AM *  0 points [-]

But why would AIs behave the same way if they don't think verbally?

Part of the problem, it appears to me, is that you're ascribing a verbal understanding to a mechanical process. Consider; for AIs to have values those values must be 'stored' in a medium compatible with their calculations.

However, once an AI begins to 'improve' itself -- that is, once an AI has as an available "goal" the ability to form better goals -- then it's going to base the decisions of what an improved goal is based on the goals and values it already has. This will cause it to 'stabilize' upon a specific set of higher-order values / goals.

Once the AI "decides" that becoming a better paperclip maker is something it values, it is going to value valuing making itself a better paperclip optimizer recursively in a positive feedback loop that will then anchor upon a specific position.

This can, quite easily, be expressed in mathematical / computational terms -- though I am insufficient to the task of doing so.

A different way of viewing it is that once intentionality is introduced to assigning value, assigning value has an assigned value. Recursion of goal-orientation can then be viewed to produce a 'gravity' in then-existing values.

EDIT: To those of you downvoting, would you care to explain what you disagree with that is causing you to do so?

Comment author: JoshuaZ 09 November 2011 03:38:14PM 2 points [-]

I haven't downvoted you, but I suspect that the downvotes are arising from two remarks:

Part of the problem, it appears to me, is that you're ascribing a verbal understanding to a mechanical process.

This sentence seems off. It isn't clear what is meant by mechanical in this context other than to shove through a host of implied connotations.

Also:

This can, quite easily, be expressed in mathematical / computational terms -- though I am insufficient to the task of doing so.

I could see this sentence as being a cause for downvotes. Asserting that something non-trivial can be put in terms of math when one can't do so on one's own and doesn't provide a reference seems less than conducive to good discussion.

Comment author: Logos01 09 November 2011 03:55:06PM 0 points [-]

This sentence seems off. It isn't clear what is meant by mechanical in this context other than to shove through a host of implied connotations.

Hrm. If I had used the word "procedural" rather than "mechanical", would that have, do you think, prevented this impression?

Asserting that something non-trivial can be put in terms of math when one can't do so on one's own and doesn't provide a reference seems less than conducive to good discussion.

If I am not a physicist, does that disqualify me from making claims about what a physicist would be relatively easily able to do? For example; "I'm not sufficient to the task of calculating my current relativistic mass -- but anyone who works with general relativity would have no trouble doing this."

So what am I missing with this element? Because I genuinely cannot see a difference between "a mathematician / AI worker could express in mathematical or computational terms the nature of recursive selection pressure" and "a general relativity physicist could calculate my relativistic mass relative to the Earth" in terms of the exceptionalism of either claim.

Is it perhaps that my wording appears to be implying that I meant more than "goals can be arranged in a graph of interdependent nodes that recursively update one another for weighting"?

Comment author: JoshuaZ 09 November 2011 04:00:40PM 1 point [-]

Part of the reason why the sentence bothers me is that I'm a mathematician and it wasn't obvious to me that there is a useful way of making the statement mathematically precise.

Is it perhaps that my wording appears to be implying that I meant more than "goals can be arranged in a graph of interdependent nodes that recursively update one another for weighting"?

So this is a little better and that may be part of it. Unfortunately, it isn't completely obvious that this is true either. This is a property that we want goal systems to have in some form. It isn't obvious that all goal systems in some broad sense will necessarily do so.

Comment author: Logos01 09 November 2011 04:27:14PM *  0 points [-]

It isn't obvious that all goal systems in some broad sense will necessarily do so.

"All" goal systems don't have to; only some. The words I use to form this sentence do not comprise the whole of the available words of the English language -- just the ones that are "interesting" to this sentence.

It would seem implicit that any computationally-based artificial intelligence would have a framework for computing. If that AI has volition, then it has goals. As we're already discussing, topically, a recursively improving AI, then it has volition; direction. So we see that it by definition has to have computable goals.

Now, for my statement to be true - the original one that was causing the problems, that is -- it's only necessary that this be expressible in "mathematical / computational terms". Those terms need not be practically useful -- in much the same way that a "proof of concept" is not the same thing as a "finished product".

Additionally, I somewhat have trouble grappling with the rejection of that original statement given the fact that values can be defined about "beliefs about what should be" -- and we already express beliefs in Bayesian terms as a matter of course on this site.

What I mean here is, given the new goal of finding better ways for me to communicate to LWers -- what's the difference here? Why is it not okay for me to make statements that rest on commonly accepted 'truths' of LessWrong?

Is it the admission of my own incompetence to derive that information "from scratch"? Is it my admission to a non-mathematically-rigorous understanding of what is mathematically expressible?

(If it is that lattermore, then I find myself leaning towards the conclusion that the problem isn't with me, but with the people who downvote me for it.)

Comment author: asr 09 November 2011 05:55:29PM 3 points [-]

I would downvote a comment that confidently asserted a claim of which I am dubious, when the author has no particular evidence for it, and admits to having no evidence for it.

This applies even if many people share the belief being asserted. I can't downvote a common unsupported belief, but I can downvote the unsupported expression of it.

Comment author: Nominull 09 November 2011 03:11:35PM -3 points [-]

I downvoted because demands that people justify their downvoting rub me the wrong way.

Comment author: Logos01 09 November 2011 03:17:18PM -1 points [-]

I apologize, then, for my desire to become a better commenter here on LessWrong.

Comment author: wedrifid 09 November 2011 04:47:06PM 2 points [-]

I apologize, then, for my desire to become a better commenter here on LessWrong.

And I downvote apologies that are inherently insincere. :)

Comment author: Logos01 09 November 2011 04:49:38PM 1 point [-]

Fair enough.

Comment author: Thomas 09 November 2011 10:26:43AM 1 point [-]

you're ascribing a verbal understanding to a mechanical process.

Every pocess is a mechanical one.

Comment author: Logos01 09 November 2011 10:29:45AM 3 points [-]

Every pocess is a mechanical one.

Reductively, yes. But this is like saying "every biological process is a physical process". While trivially true, it is not very informative. Especially when attempting to relate to someone that much of their problem in understanding a specific situation is that they are "viewing it from the wrong angle".

Comment author: asr 09 November 2011 05:46:14PM -1 points [-]

This can, quite easily, be expressed in mathematical / computational terms -- though I am insufficient to the task of doing so.

I am skeptical of this claim. I'm not at all convinced that it's feasible to formalize "goal" or that if we could formalize it, the claim would be true in general. Software is awfully general, and I can easily imagine a system that has some sort of constraint on its self-modification, where that constraint can't be self-modified away. I can also imagine a system that doesn't have an explicit constraint on its evolution but that isn't an ideal self-modifier. Humans, for instance, have goals and a limited capacity to self-modify, but we don't usually see them become totally dedicated to any one goal.

Comment author: Logos01 09 November 2011 05:53:12PM *  0 points [-]

I am skeptical of this claim. I'm not at all convinced that it's feasible to formalize "goal" or that if we could formalize it, the claim would be true in general.

Would you agree that Bayesian Belief Nets can be described/expressed in the form of a graph of nodal points? Can you describe an intelligible reason why values should not be treated as "ought" beliefs (that is, beliefs about what should be)?

Furthermore; why does it need to be general? We're discussing a specific category of AI. Are you aware of any AI research ongoing that would support the notion that AIs wouldn't have some sort of systematic categorization of beliefs and values?

Humans, for instance, have goals and a limited capacity to self-modify, but we don't usually see them become totally dedicated to any one goal.

That's not an accurate description of the scenario being discussed. We're not discussing fixation upon a single value/goal but the fixation of specific SETS of goals.

Comment author: asr 09 November 2011 11:38:21PM *  2 points [-]

I can think of several good reasons why values might not be incorporated into a system as "ought" beliefs. If my AI isn't very good at reasoning, I might, for instance, find it simpler to construct a black-box "does this action have consequence X" property-checker and incorporate that into the system somewhere. The rest of the system has no access to the internals of the black box -- it just supplies a proposed course of action and gets back a YES or a NO.

You ask whether there's "any AI research ongoing that would support the notion that AIs wouldn't have some sort of systematic categorization of beliefs and values?"

Most of what's currently published at major AI research conferences describes systems that don't have any such systematic characterization. Suppose we built a super-duper Watson that passed the Turing test and had some limited capacity to improve itself by, e.g., going out and fetching new information from the Internet. That soft of system strikes me as the likeliest one to meet the bar of "AGI" in the next few years. It isn't particularly far from current research.

Before you quibble about whether that's the kind of system we're talking about -- I haven't seen a good definition of "self-improving" program, and I suspect it is not at all straightforward to define. Among other reasons, I don't know a good definition that separates 'code' and 'data'. So if you don't like the example above, you should make sure that there's a clear difference between choosing what inputs to read (which modifies internal state) and choosing what code to load (which also modifies internal state).

As to the human example: my sense is that humans don't get locked to any one set of goals; that goals continue to evolve, without much careful pruning, over a human lifetime. Expecting an AI to tinker with its goals for a while, and then stop, is asking it to do something that neither natural intelligences or existing software seems to do or even be capable of doing.

Comment author: Vladimir_Nesov 10 November 2011 07:01:35PM 0 points [-]

Suppose we built a super-duper Watson that passed the Turing test and had some limited capacity to improve itself by, e.g., going out and fetching new information from the Internet. That sort of system strikes me as the likeliest one to meet the bar of "AGI" in the next few years. It isn't particularly far from current research.

This seems like a plausible way of blowing up the universe, but not in the next few years. This kind of thing requires a lot of development, I'd give it 30-60 years at least.

Comment author: Logos01 10 November 2011 11:29:26AM -1 points [-]

Most of what's currently published at major AI research conferences describes systems that don't have any such systematic characterization. Suppose we built a super-duper Watson

... I think we're having a major breakdown of communication because to my understanding Watson does exactly what you just claimed no AI at research conferences is doing.

Before you quibble about whether that's the kind of system we're talking about -- I haven't seen a good definition of "self-improving" program, and I suspect it is not at all straightforward to define.

I'm sure. But there's a few generally sound assertions we can make:

  1. To be self-improving the machine must be able to examine its own code / be "metacognitive."

  2. To be self-improving the machine must be able to produce a target state.

From these two the notion of value fixation in such an AI would become trivial. Even if that version of the AI would have man-made value-fixation, what about the AI it itself codes? If the AI were actually smarter than us, that wouldn't exactly be the safest route to take. Even Asimov's Three Laws yielded a Zeroth Law.

Expecting an AI to tinker with its goals for a while, and then stop,

  1. Don't anthropomorphize. :)

  2. If you'll recall from my description, I have no such expectation. Instead, I spoke of recursive refinement causing apparent fixation in the form of "gravity" or "stickiness" towards a specific set of values.

Why is this unlike how humans normally are? Well, we don't have much access to our own actual values.

Comment author: amcknight 09 November 2011 09:34:56PM 1 point [-]

I'm finding it hard to imagine an agent that can get a diversity of difficult things done in a complex environment without forming goals and subgoals, which sounds to me like a requirement of general intelligence. AGI seems to require many-step plans and planning seems to require goals.

Comment author: XiXiDu 10 November 2011 11:29:59AM *  0 points [-]

AGI seems to require many-step plans and planning seems to require goals.

Personally I try to see general intelligence purely as a potential. Why would any artificial agent tap its full potential, where does the incentive come from?

If you deprived a human infant of all its evolutionary drives (e.g. to avoid pain, seek nutrition, status and sex), would it just grow into an adult that tried to become rich or rule a country? No, it would have no incentive to do so. Even though such a "blank slate" would have the same potential for general intelligence, it wouldn't use it.

Say you came up with the most basic template for general intelligence that works given limited resources. If you wanted to apply this potential to improve your template, you would have to give it the explicit incentive to do so. But would it take over the world in doing so? Not if you didn't explicitly told it to do so, why would it?

In what sense would it be wrong for a general intelligence to maximize paperclips in the universe by waiting for them to arise due to random fluctuations out of a state of chaos? It is not inherently stupid to desire that, there is no law of nature that prohibits certain goals.

The crux of the matter is that a goal isn't enough to enable the full potential of general intelligence, you also need to explicitly define how to achieve that goal. General intelligence does not imply recursive self-improvement, just the potential to do so, not the incentive. The incentive has to be explicitly defined.

Comment author: khafra 09 November 2011 01:21:47PM 0 points [-]

I understood Omohundro's Basic AI Drives as applying only to successful (although not necessarily Friendly) GAI. If a recursively self-improving GAI had massive value drift with each iterative improvement to its ability at reaching its values, it'd end up just flailing around, doing a stochastic series of actions with superhuman efficiency.

I think the Eliezer quote is predicated on the same sort of idea--that you've designed the AI to attempt to preserve its values; you just did it imperfectly. Assuming the value of value preservation isn't among the ones that become altered in various self-rewrites, at some point it'll become good enough at value preservation to keep whatever it has. But at that point, it'll be too late to preserve the original.

Comment author: timtyler 09 November 2011 11:54:50AM *  0 points [-]

The Omohundro quote sounds like what humans do. If humans do it, machines might well do it too.

The Yudkowsky quote seems more speculative. It assumes that values are universal, and don't need to adapt to local circumstances. This would be in contrast to what has happened in evolution so far - where there are many creatures with different niches and the organisms (and their values) adapt to the niches.

Comment author: cousin_it 09 November 2011 04:42:31PM 1 point [-]

The Omohundro quote sounds like what humans do. If humans do it, machines might well do it too.

Yeah, that's why I called it "anthropomorphizing" in the post. It's always been a strikingly unsuccessful way to make predictions about computers.

Comment author: timtyler 09 November 2011 05:14:59PM *  2 points [-]

I pretty-much agree with the spirit of the Omohundro quote. It usually helps you meet your goals if you know what they are. That's unlikely to be a feature specific to humans, and it is likely to apply to goal-directed agents above a certain threshold. (too-simple agents may not get much out of it). Of course, agents might start out with a clear representation of their goals - but if they don't, they are likely to want one, as a basic component of the task of modelling themselves.

Comment author: DanielLC 09 November 2011 11:43:10PM -2 points [-]

If the AI is an optimization process, it will try to find out what it's optimizing explicitly. If not, it's not intelligent.

Comment author: asr 09 November 2011 11:47:22PM 3 points [-]

This seems like a tortured definition. There are many humans who haven't thought seriously about their goals in life. Some of these humans are reasonably bright people. It would be a very counterintuitive definition of "intelligent" that said that a machine that thought and communicated as well as a very-well-read and high-IQ 13-year-old wasn't intelligence.

Comment author: DanielLC 10 November 2011 07:28:28PM *  -1 points [-]

There are many humans who haven't thought seriously about their goals in life.

They're not intelligent on a large scale. That is, knowing how they act on the short term would be a more effective way to find out how they end up than knowing what their goals are. They still may have short-term intelligence, in that knowing how they act on the very short term would be less useful to predicting what happens than what they want on the short term. For example, you'd be much better off predicting the final state of a chess game against someone who's much better than you at chess by how they want it to end than by what their moves are.

It's not impossible to accomplish your goals without knowing what they are, but it helps. If you're particularly intelligent in regards to some goal, you'd probably do it. Nobody knows their goals exactly, but they generally know it enough to be helpful.

It would be a very counterintuitive definition of "intelligent" that said that a machine that thought and communicated as well as a very-well-read and high-IQ 13-year-old wasn't intelligence.

Very few possible conversations sound remotely like a very-well-read high-IQ 13-year-old. Anything that can do it is either a very good optimization process, or just very complicated (such as a recording of such a conversation).

Edit:

I suppose I should probably justify why this definition of intelligence is useful here. When we try to build an AI, what we want is something that will accomplish our goals. It will make sure that, of all possible universes, the one that happens is particularly high on our utility ranking. In short, it must be intelligent with respect to our goals by the definition of intelligence I just gave. If it can hold a conversation like a 13-year-old, but can't do anything useful, we don't want it.