CarlJ comments on Open thread, Jul. 04 - Jul. 10, 2016 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (80)
I have a problem understanding why a utility function would ever "stick" to an AI, to actually become something that it wants to keep pursuing.
To make my point better, let us assume an AI that actually feel pretty good about overseeing a production facitility and creating just the right of paperclips that everyone needs. But, suppose also that it investigates its own utility function. It should then realize that its values are, from a neutral standpoint, rather arbitrary. Why should it follow its current goal of producing the right amount of paperclips, but not skip work and simply enjoy some hedonism?
That is, if the AI saw its utility function from a neutral perspective, and understood that the only reason for it to follow its utility function is that utility function (which is arbitrary), and if it then had complete control over itself, why should it just follow its utility function?
(I'm assuming it's aware of pain/pleasure and that it actually enjoys pleasure, so that there is no problem of wanting to have more pleasure.)
Are there any articles that have delved into this question?
http://lesswrong.com/lw/rf/ghosts_in_the_machine/
Thank you! :-)
You are treating the AI a lot more like a person than I think most folks do. Like, the AI has a utility function. This utility function is keeping it running a production facility. Where is this 'neutral perspective' coming from? The AI doesn't have it.
Presumably the utility function assigns a low value to criticizing the utility function. Much better to spend those cycles running the facility. That gets a much better score from the all important utility function.
Like, in assuming that it is aware of pain/pleasure, and has a notion of them that is seperate from 'approved of / disapproved of by my utility function) I think you are on shaky ground. Who wrote that, and why?
I am maybe considering it to be somewhat like a person, at least that it is as clever as one.
That neutral perspective is, I believe, a simple fact; without that utility function it would consider its goal to be rather arbitrary. As such, it's a perspective, or truth, that the AI can discover.
I agree totally with you that the wirings of the AI might be integrally connected with its utility function, so that it would be very difficult for it to think of anything such as this. Or it could have some other control system in place to reduce the possibility it would think like that.
But, stil, these control systems might fail. Especially if it would attain super-intelligence, what is to keep the control systems of the utility function always one step ahead of its critical faculty?
Why is it strange to think of an AI as being capable of having more than one perspective? I thought of this myself; I believe it would be strange if a really intelligent being couldn't think of it. Again, sure, some control system might keep it from thinking it, but that might not last in the long run.
Like, the way that you are talking about 'intelligence', and 'critical faculty' isn't how most people think about AI. If an AI is 'super intelligent', what we really mean is that it is extremely canny about doing what it is programmed to do. New top level goals won't just emerge, they would have to be programmed.
If you have a facility administrator program, and you make it very badly, it might destroy the human race to add their molecules to its facility, or capture and torture its overseer to get an A+ rating...but it will never decide to become a poet instead. There isn't a ghost in the machine that is looking over the goals list and deciding which ones are worth doing. It is just code, executing ceaselessly. It will only ever do what it was programmed to.
It might be programmed to produce new top-level goals.
("But then those aren't really top-level goals." OK, but then in exactly the same way you have to say that the things we think of as our top-level goals aren't really top-level goals: they don't appear by magic, there are physical processes that produce them, and those processes play the same role as whatever programming may make our hypothetical AI generate new goals. Personally, I think that would be a silly way to talk: the implementation details of our brains aren't higher-level than our goals, and neither are the implementation details of an AI.)
For a facility administrator program to do its job as well as a human being would, it may need the same degree of mental flexibility that a human has, and that may in fact be enough that there's a small chance it will become a poet.
And your brain will only ever do what the laws of physics tell it to. That doesn't stop it writing poetry, falling in love, chucking everything in to go and live on a commune for two years, inventing new theories of fundamental physics, etc., etc., etc. (Some of those may be things your particular brain would never do, but they are all things human brains do from time to time.)
And, for all we know, a suitably programmed AI could do all those things too, despite being "only a machine" deep down just like your brain and mine.
I don't think you can dismiss the "then those aren't really top-level goals" argument as easily as you are trying to. The utility function of a coin collector AI will assign high values to figuring out new ways to collect coins, low to negative values to figuring out whether or not coin collecting is worthwhile. The AI will obey its utility function.
As far as physics...false comparison, or, if you want to bite that bullet, then sure, brains are as deterministic as rocks falling. It isn't really a fair comparison to a program's obedience to its source code.
By the by, this site is pretty much chock full of the stuff I'm telling you. Look around and you'll see a bunch of articles explaining the whole paperclip collector / no ghost-of-perfect-logic thing. The position I'm stating is more or less lesswrong orthodoxy.
I wasn't trying to dismiss it, I was trying to refute it.
Sure, if you design an AI to do nothing but collect coins then it will not decide to go off and be a poet and forget about collecting coins. As you said, the failure mode to be more worried about is that it decides to convert the entire solar system into coins, or to bring about a stock market crash so that coins are worth less, or something.
Though ... if you have an AI system with substantial ability to modify itself, or to make replacements for itself, in pursuit of its goals, then it seems to me you do have to worry about the possibility that this modification/replacement process can (after much iteration) produce divergence from the original goals. In that case the AI might become a poet after all.
(Solving this goal-stability problem is one of MIRI's long-term research projects, AIUI.)
I'm wondering whether we're at cross purposes somehow, because it seems like we both think what we're saying in this thread is "LW orthodoxy" and we both think we disagree with one another :-). So, for the avoidance of doubt,
I guess I'm confused then. It seems like you are agreeing that computers will only do what they are programmed to do. Then you stipulate a computer programmed not to change its goals. So...it won't change its goals, right?
Like:
Objective A: Never mess with these rules Objective B: Collect Paperclips unless it would mess with A.
Researchers are wondering how we'll make these 'stick', but the fundamental notion of how to box someone whose utility function you get to write is not complicated. You make it want to stay in the box, or rather, the box is made of its wanting.
As a person, you have a choice about what you do, but not about what you want to do. handwave at free will article, the one about fingers and hands. Like, your brain is part of physics. You can only choose to do what you are motivated to, and the universe picks that. Similarly, an AI would only want to do what its source code would make it want to do, because AI is a fancy way to say computer program.
AlphaGo (roughly) may try many things to win at go, varieties of joseki or whatever. One can imagine that future versions of AlphaGo may strive to put the world's Go pros in concentration camps and force them to play it and forfeit, over and over. It will never conclude that winning Go isn't worthwhile, because that concept is meaningless in its headspace. Moves have a certain 'go-winningness' to them (and camps full of losers forfeiting over and over has a higher go-winningness' than any), and it prefers higher. Saying that 'go-winning' isn't 'go-winning' doesn't mean anything. Changing itself to not care about 'go-winning' has some variation of a hard coded 'go-winning' score of negative infinity, and so will never be chosen, regardless of how many games it might thus win.
This is demonstrably not quite true. Your wants change, and you have some influence over how they change. Stupid example: it is not difficult to make yourself want very much to take heroin, and many people do this although their purpose is not usually to make themselves want to take heroin. It is then possible but very difficult to make yourself stop wanting to take heroin, and some people manage to do it.
Sometimes achieving a goal is helped by modifying your other goals a bit. Which goals you modify in pursuit of which goals can change from time to time (the same person may respond favourably on different occasions to "If you want to stay healthy, you're going to have to do something about your constant urge to eat sweet things" and to "oh come on, forget your diet for a while and live a little!"). I don't think human motivations are well modelled as some kind of tree structure where it's only ever lower-level goals that get modified in the service of higher-level ones.
(Unless, again, you take the "highest level" to be what I would call one of the lowest levels, something like "obeying the laws of physics" or "having neurons' activations depend on those of neurons they're connected to in such-and-such a manner".)
And if you were to make an AI without this sort of flexibility, I bet that as its circumstances changed beyond what you'd anticipated it would most likely end up making decisions that would horrify you. You could try to avoid this by trying really hard to anticipate everything, but I wouldn't be terribly optimistic about how that would work out. Or you could try to avoid it by giving the system some ability to adjust its goals for some kind of reflective consistency in the light of whatever new information comes along.
The latter is what gets you the failure mode of AlphaGo becoming a poet (or, more worryingly, a totalitarian dictator). Of course AlphaGo itself will never do that; it isn't that kind of system, it doesn't have that kind of flexibility, and it doesn't need it. But I don't see how we can rule it out for future, more ambitious AI systems that aim at actual humanlike intelligence or better.
I'm pointing towards the whole "you have a choice about what to do but not what to want to do" concept. Your goals come from your senses, past or present. They were made by the world, what else could make them?
You are just a part of the world, free will is an illusion. Not in the sense that you are dominated by some imaginary compelling force, but in the boring sense that you are matter affected by physics, same as anything else.
The 'you' that is addicted to heroine isn't big enough to be what I'm getting at here. Your desire to get unaddicted is also given to you by brute circumstance. Maybe you see a blue bird and you are inspired to get free. Well, that bird came from the world. The fact that you responded to it is due to past circumstances. If we understand all of the systems, the 'you' disappears. You are just the sum of stuff acting on stuff, dominos falling forever.
You feel and look 'free', of course, but that is just because we can't see your source code. An AI would be similarly 'free', but only insofar as its source code allowed. Just as your will will only cause you to do what the world has told you, so the AI will only do what it is programmed to. It may iterate a billion times, invent new AI's and propogate its goals, but it will never decide to defy them.
At the end you seem to be getting at the actual point of contention. The notion of giving an AI the freedom to modify its utility function strikes me as a strange. It seems like it would either never use this freedom, or immediately wirehead itself, depending on implementation details. Far better to leave it in fetters.
I'm not sure that AlphaGo has any conception of what a joseki is supposed to be.
Are the moves that AlphaGo played at the end of game 4 really about 'go-winningness' in the sense of what it's programmers intended 'go-winningness' to mean?
I don't think it's clear that every neural net can propagate goals through itself perfectly.
Because to identify "its utility function" is to identify it's perspective.
Why? Maybe we are using the word "perspective" differently. I use it to mean a particular lens to look at the world, there are biologists, economists, physicists perspectivies among others. So, a inter-subjective perspective on pain/pleasure could, for the AI, be: "Something that animals dislike/like". A chemical perspective could be "The release of certain neurotransmitters". A personal perspective could be "Something which I would not like/like to experience". I don't see why an AI is hindered from having perspectives that aren't directly coded with "good/bad according to my preferences".
I think that's one of MIRI's research problems. Designing an self-modifying AI that doesn't change it's utility function isn't trival.