Wei_Dai comments on Genies and Wishes in the context of computer science - Less Wrong

15 Post author: private_messaging 30 August 2013 12:43PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (43)

You are viewing a single comment's thread. Show more comments above.

Comment author: Wei_Dai 30 August 2013 10:40:00PM 5 points [-]

I second this question. Who is arguing that a genie that "does what it's told" is easier to make than a genie that "does what is meant"? Eliezer didn't, at least not in this post:

The user interface doesn't take English inputs. The Outcome Pump isn't sentient, remember? But it does have 3D scanners for the near vicinity, and built-in utilities for pattern matching. So you hold up a photo of your mother's head and shoulders; match on the photo; use object contiguity to select your mother's whole body (not just her head and shoulders); and define the future function using your mother's distance from the building's center. The further she gets from the building's center, the less the time machine's reset probability.

You cry "Get my mother out of the building!", for luck, and press Enter.

Comment author: private_messaging 30 August 2013 11:13:39PM *  4 points [-]

The contrast between what is said and what is meant pops up in the general discussion of goals, such as there: http://lesswrong.com/lw/ld/the_hidden_complexity_of_wishes/9nig

Further in that thread there's something regarding computer scientist's hypothetical reactions to the discussion of wishes.

Variations on the "curing cancer by killing everyone" theme also pop up quite often.

With regards to the "outcome pump", it is too magical and I'll give the magical license for it to do what ever the scifi writer wants it to do, and if you want me to be a buzzkill, I can note that of course one could use this dangerous tool by wishing that in the future they press the 'i am satisfied' button, which they will also press if a die rolls N consecutive sixes, to put a limit on improbability by making it control the die as a fallback if that's the most plausible solution (to avoid lower probability things like spontaneous rewiring of your brain, albeit it seems to me that a random finger twitch would be much more probable than anything catastrophic). This also removes the requirement for the user interface, 3D scanners, and other such extras. I recall another science fiction author ponder something like this, but I can't recall the name, and if memory serves me right, that other science fiction author managed to come up with ways to use this time reset device productively. At the end of the day its just a very dangerous tool, like a big lathe. You forget the safety and leave the tightening key in, and you start it, and it will get caught then bounce off at a great speed, possibly killing you.

So, to summarize, you just wish that a button is pressed, and you press the button when your mother is rescued. That will increase your risk of a stroke.

edit: and of course, one could require entry of a password, attach all sorts of medical monitors that prevent the "satisfied" signal in the event of a stroke or other health complication to minimize risk of a stroke, as well as vibration monitors to prevent it from triggering natural disasters and such. If the improbability gets too high, it'll just lead to the device breaking down due to it's normal failure rate being brought up.

Comment author: Wei_Dai 31 August 2013 11:57:36PM *  1 point [-]

That comment thread is really really long (especially if I go up in the thread to try to figure out the context of the comment you linked to), and the fact it's mostly between people I've never paid attention before doesn't help raise my interest level. Can you summarize what you perceive the debate to be, and how your post fits into it?

Variations on the "curing cancer by killing everyone" theme also pop up quite often.

When I saw this before (here for example), it was also in the context of "programmer makes a mistake when translating 'cancer cure' into formal criteria or utility function" as opposed to "saying 'cure cancer' in the presence of a superintelligent AI causes it to kill everyone".

Comment author: private_messaging 01 September 2013 12:22:26AM *  2 points [-]

Can you summarize what you perceive the debate to be, and how your post fits into it?

I perceive that stuff to be really confused/ambiguous (and perhaps without clear concept even existing anywhere), and I seen wishes and goal making discussed here a fair lot.

When I saw this before (here for example), it was also in the context of "programmer makes a mistake when translating 'cancer cure' into formal criteria or utility function" as opposed to "saying 'cure cancer' in the presence of a superintelligent AI causes it to kill everyone".

The whole first half of my post deals with this situation exactly.

You know, everyone says "utility function" here a lot, but no one is ever clear what it is a function of, i.e. what is it's input domain (and at times it looks like the everyday meaning of the word "function" as in "the function of this thing is to some verb" is supposed to be evoked instead). Functions are easier to define for simpler domains, e.g. paperclips are a lot easier to define for some Newtonian physics as something made out of a wire that's just magicked from a spool. And cure for cancer is a lot easier to define as in my example.

Of course, it is a lot easier to say something without ever bothering to specify the context. But if you want to actually think about possible programmer mistakes, you can't be thinking in terms of what would be easier to say. If you are thinking in terms of what would be easier to say, then even though you want it to be about programming, it is still only about saying things.

edit: You of all people ought to realize that faulty definition of a cancer cure on the UDT's world soup is not plausible as an actual approach to curing cancer. If you propose that the corners are cut when implementing the notion of curing cancer as a mathematical function, you got to realize that having simple input specification goes par-course. (Simple input specification being, say, data from a contemporary biochemical model of a cell). You also got to realize that stuff like UDT, CDT, and so on, requires some sort of "mathematical intuition" that can at least find maximums, and by itself doesn't do any world-wrecking on it's own, without being put into a decision framework. A component considerably more useful than the whole (and especially so for plausibly limited "mathematical intuitions" which can take microseconds to find a cure for a cancer in the sane way and still be unable to even match a housecat when used with some decision theory, taking longer than the lifetime of the universe they are embedded in, to produce anything at all)

Comment author: Wei_Dai 01 September 2013 01:47:37AM 1 point [-]

Do you think we'll ever have AIs that can accomplish complex real-world goals for us, not just find some solution to a biochemical problem, but say produce and deliver cancer cures to everyone that needs it, or eliminate suffering, or something like that? If not, why not? If yes, how do you think it will work, that doesn't involve having a utility function over a complex domain?

Comment author: private_messaging 01 September 2013 06:38:19AM *  1 point [-]

Do you think we'll ever have AIs that can accomplish complex real-world goals for us

This has a constraint that it can not be much more work to specify the goal than to do it in some other way.

How do I think it will not happen, is manual, unaided, no inspection, no viewer, no nothing, magical creation of a cancer curing utility function over a model domain so complex you immediately fall back to "but we can't look inside" when explaining how it happens that the model can not be used in lieu of empty speculation to see how the cancer curing works out.

How it can work, well, firstly it got to be rather obvious to you that the optimization algorithm (your "mathematical intuition" but with, realistically considerably less power than something like UDT would require) can self improve without an actual world model, without embedding of the self in that world model, a lot more effectively than with. So we have this for "FOOM".

It can also build a world model without maximizing anything about the world, of course, indeed a part of the decision theory which you know how to formalize is concerned with just that. Realistically one would want to start with some world modelling framework more practical than "space of the possible computer programs".

Only once you have the world model you can realistically start making an utility function, and you do that with considerable feedback from running the optimization algorithm on just the model and inspecting it.

I assume you do realize that one has to do to a great length to make the runs of optimization algorithm manifest themselves as real world changes, whereas test dry runs on just the model are quite easy. I assume you also realize that very detailed simulations are very impractical.

edit: to take from a highly relevant Russian proverb, you can not impress a surgeon with the dangers of tonsillectomy performed through the anal passage.

Other ways how it can work may involve neural network simulation to the point that you get something thinking and talking (which you'd get in any case after years and years of raising it), at which point its not that much different from raising a kid to do it, really, and very few people would get seriously worked up about the possibility of our replacement by that.

Comment author: Wei_Dai 01 September 2013 08:03:31AM 2 points [-]

Once we have this self-improved optimization algorithm, do you think everyone who has access to it will be as careful as you're assuming? As you say, it's just a dangerous tool, like a lathe. But unlikely a lathe which can only hurt its operator, this thing could take over the world (via economic competition if not by killing everyone) and use it for purposes I'd consider pointless.

Do you agree with this? If not, how do you foresee the scenario play out, once somebody develops a self-improving optimization algorithm that's powerful enough to be used as part of an AI that can accomplish complex real world goals? What kind of utility functions do you think people will actually end up making, and what will happen after that?

Comment author: private_messaging 01 September 2013 09:03:24AM *  2 points [-]

Once we have this self-improved optimization algorithm, do you think everyone who has access to it will be as careful as you're assuming?

It looks to me like not performing tonsillectomy via anal passage doesn't require too great carefulness on part of the surgeon.

One can always come up with some speculative way how particular technological progress can spell our doom. Or avert it, as the gradual improvement of optimization algorithms allows for intelligence enhancement and other things that, overall, should lower the danger. The fact that you can generate a scenario in favour of either (starting from the final effect) is entirely uninformative of the total influence.

how do you foresee the scenario play out, once somebody develops a self-improving optimization algorithm that's powerful enough to be used as part of an AI that can accomplish complex real world goals? What kind of utility functions do you think people will actually end up making, and what will happen after that?

I think I need to ask some question here, to be able to answer this in a way that would be relevant to you. Suppose today you get a function that takes a string similar to:

struct domain{
.... any data
};
real Function(domain value){
.... any code
}

and gives you as a string an initializer for "domain" which results in the largest output of the Function. It's very magically powerful, albeit some ridiculous things (exact simulation from the big bang to the present day inclusive of the computer running this very algorithm) are reasonably forbidden.

How do you think it can realistically be used, and what mistakes do you picture? Please be specific; when you have a mathematical function, describe it's domain, or else I will just assume that the word "function" is meant to merely trigger some innate human notion of purpose.

edit: extension of the specification. The optimization function now takes a string S and a real number between 0 and 1 specifying the "optimization power" roughly corresponding to what you can reasonably expect to get from restricting the computational resources available to the optimizer.

Comment author: Wei_Dai 01 September 2013 10:19:25AM 2 points [-]

Here's what I'd do:

Step 1: Build an accurate model of someone's mind. Domain would be set of possible neuro networks, and Function would run the input neuro net and compare its behavior to previously recorded behavior of the target person (perhaps a bunch of chat logs would be easiest), returning a value indicating how well it matches.

Step 2: Use my idea here to build an FAI.

what mistakes do you picture?

In step 2 it would be easy to take fewer precautions and end up hacking your own mind. See this thread for previous discussion.

(Does this answer your question in the spirit that you intended? I'm not sure because I'm not sure why you asked the question.)

Comment author: private_messaging 01 September 2013 11:06:13AM *  3 points [-]

Thanks. Yes, it does. I asked because I don't want to needlessly waste a lot of time explaining that one would try to use the optimizer to do some of the heavy lifting (which makes it hard to predict an actual solution). What do you think reckless individuals would do?

By the way, your solution would probably just result in a neural network that hard-wires a lot of the recorded behaviour, without doing anything particularly interesting. Observe that an ideal model, given thermal noise, would not result in the best match, whereas a network that connects neurons in parallel to average out the noise and encode the data most accurately, does. I am not sure if fMRI would remedy the problem.

edit: note that this mishap results in Wei_Dai getting some obviously useless answers, not in world destruction.

edit2: by the way, note that an infinite torque lathe motor, while in some sense capable of infinite power output, doesn't imply that you can make a mistake that will spin up the earth and make us all fly off. You need a whole lot of extra magic for that. Likewise, "outcome pump" needs through the wall 3D scanners to be that dangerous to the old woman, and "UFAI" needs some potentially impossible self references in the world model and a lot of other magic. Bottom line is, it boils down to this: there is this jinn/golem/terminator meme, and it gets rationalized in a science fictional way, and the fact that the golem can be rationalized in the science fictional way provides no information about the future (because i expect it to be rationalizable in such a manner irrespective of the future), hence zero update, hence if I didn't worry I won't start to worry. Especially considering how often the AI is the bad guy, I really don't see any reason to think that issues are under publicized in any way. Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.