You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

TrE comments on Open Thread, Apr. 06 - Apr. 12, 2015 - Less Wrong Discussion

4 Post author: philh 06 April 2015 02:18PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (128)

You are viewing a single comment's thread. Show more comments above.

Comment author: G0W51 06 April 2015 09:35:46PM *  3 points [-]

I think I’ve found a somewhat easy-to-make error that could pose a significant existential risk when making an AGI. This error can potentially be found in hierarchical planning agents, where each high-level action (HLA) is essentially its own intelligent agent that determines what lower-level actions to do. Each higher-level action agent would treat determining what lower-level action to do as a planning problem and would try to take the action that maximizes its own utility function (if its a utility-based agent) or (if it’s a goal-based agent) probability of accomplishing its goal while minimizing its cost function (UOCF).

For these agents, it is absolutely vital that each HLA’s UOCF prevents the HLA from doing anything to interfere with the highest-level action maximizing its utility function, for example by rewriting the utility function of higher-level actions or sending the higher-level actions deliberately false information. Failing to do so would result in an error that could significantly increase existential risk. To explain why, consider an agent whose highest-level action wants to maximize the number of fulfilling lives lived. In order to do this, the agent has a lower-level action whose goal is to go to a warehouse to get supplies. The cost function of this lower-level action is simply a function of, say, the amount of time it takes for the agent to reach the warehouse and the amount of money spent or money in damages done. In this situation, the lower-level action agent might realize that there is a chance that the higher-level action agent changes its mind and decides to do something other than go to the warehouse. This would cause the lower-level action to fail to accomplish its goal. To prevent this, this lower-level action may try to modify the utility function of the higher-level action to make it certain to continue trying to go to the warehouse. If this is done repeatedly by different lower-level actions, the resultant utility function could be quite different from the highest-level action’s original utility function and may pose a large existential risk. Even if the lower-level action can’t rewrite the utility function of higher-level actions, it may still sabotage the higher-level action in some other way to further its own goals, for example by sending false information to higher-level actions.

To prevent this, the utility function of the lower-level action can simply be to maximize the highest-level action’s utility, and it can see the UOCF it was provided with as a rough method of maximizing the highest-level action’s utility function. In order to make the UOCF accurately represent the highest-level action’s utility function, it would (obviously) need to place high cost on interfering with the highest-level action’s attempts to maximize its utility. Some basic ideas on how to do this is for there to be very high cost in changing the utility functions of higher-level actions or giving them deliberately false information. Additionally, the cost of this would need to increase when the agent is more powerful, as the more powerful the agent is, the greater damage a changed utility function could do. Note that although higher-level actions could learn through experience what the UOCFs of lower-level actions should be, great care would need to be taken to prevent the AGI from, when still inexperienced, accidentally creating a lower-level action that tries to sabotage higher-level actions.

Comment author: TrE 07 April 2015 03:51:10PM *  7 points [-]

Please insert some line-breaks at suitable points to make your comment be more readable. At the moment it's figuratively a wall of text.

Edit: Thank you.