You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Viliam_Bur comments on Tools want to become agents - Less Wrong Discussion

12 Post author: Stuart_Armstrong 04 July 2014 10:12AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (81)

You are viewing a single comment's thread. Show more comments above.

Comment author: Viliam_Bur 04 July 2014 12:23:39PM *  3 points [-]

tools start to become agents after invention X

Seems like X is (or includes) the ability to think about self-modification: awareness of its own internal details and modelling their possible changes.

Note that without this ability the tool could invent a plan which leads to its own accidental destruction (and possibly not completing the plan), because it does not realize it could be destroyed or damaged.

Comment author: NancyLebovitz 05 July 2014 01:46:53PM 0 points [-]

An agent can also accidentally pursue a plan which leads to its self-destruction. People do it now and then by not modelling the world well enough.

Comment author: TheAncientGeek 05 July 2014 01:06:18PM 0 points [-]

I think of agents having goals and pursuing them by default. I dont see how self reflexive abilities.... " think about self-modification: awareness of its own internal details and modelling their possible changes."...add up to goals. It might be intuitive that a self aware entity would want to preserve its existence...but that intuition could be driven by anthropomorphism, (or zoomorphism , or biomorphism)

Comment author: Viliam_Bur 05 July 2014 06:06:50PM *  0 points [-]

With self-reflective abilities, the system can also consider paths including self-modification in reaching its goal. Some of those paths may be highly unintuitive for humans, so we wouldn't notice some possible dangers. Self-modification may also remove some safety mechanisms.

A system that explores many paths can find a solutions humans woudln't notice. Such "creativity" at object level is relatively harmless. Google Maps may find you a more efficient path to your work than the one you use now, but that's okay. Maybe the path is wrong for some reasons that Google Maps does not understand (e.g. it leads through a neighborhood with high crime), but at least on general level you understand that such is the risk of following the outputs blindly. However, similar "creativity" at self-modification level can have unexpected serious consequences.

Comment author: [deleted] 06 July 2014 01:19:35AM 0 points [-]

"the system can also", "some of those paths may be", "may also remove". Those are some highly conditional statements. Quantify, please, or else this is no different than "the LHC may destroy us all with a mini black hole!"

Comment author: Viliam_Bur 06 July 2014 09:50:41AM *  1 point [-]

I'd need to have a specific description of the system, what exactly it can do, and how exactly it can modify itself, to give you a specific example of self-modification that contributes to the specific goal in a perverse way.

I can invent an example, but then you can just say "okay, I wouldn't use that specific system".

As an example: Imagine that you have a machine with two modules (whatever they are) called Module-A and Module-B. Module-A is only useful for solving Type-A problems. Module-B is only useful for solving Type-B problems. At this moment, you have a Type-A problem, and you ask the machine to solve it as cheaply as possible. The machine has no Type-B problem at the moment. So the machine decides to sell its Module-B on ebay, because it is not necessary now, and the gained money will reduce the total cost of solving your problem. This is short-sighted, because tomorrow you may need to solve a Type-B problem. But the machine does not predict your future wishes.

Comment author: [deleted] 06 July 2014 11:26:02AM *  0 points [-]

I can invent an example, but then you can just say "okay, I wouldn't use that specific system".

But can't you see, that's entirely the point!

If you design systems whereby the Scary Idea has no more than a vanishing likelihood of occurring, it no longer becomes an active concern. It's like saying "bridges won't survive earthquakes! you are crazy and irresponsible to build a bridge in an area with earthquakes!" And then I design a bridge that can survive earthquakes smaller than magnitude X, where X magnitude earthquakes have a likelihood of occurring less than 1 in 10,000 years, then on top of that throw an extra safety margin of 20% on because we have the extra steel available. Now how crazy and irresponsible is it?

Comment author: Viliam_Bur 06 July 2014 07:26:13PM 0 points [-]

If you design systems whereby the Scary Idea has no more than a vanishing likelihood of occurring, it no longer becomes an active concern.

Yeah, and the whole problem is how specifically will you do it.

If I (or anyone else) will give you examples of what could go wrong, of course you can keep answering by "then I obviously wouldn't use that design". But at the end of the day, if you are going to build an AI, you have to make some design -- just refusing designs given by other people will not do the job.

Comment author: [deleted] 06 July 2014 08:01:50PM 1 point [-]

There are plenty of perfectly good designs out there, e.g. CogPrime + GOLUM. You could be calculating probabilistic risk based on these designs, rather than fear mongering based on a naïve Bayes net optimizer.