If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
I think I’ve found a somewhat easy-to-make error that could pose a significant existential risk when making an AGI. This error can potentially be found in hierarchical planning agents, where each high-level action (HLA) is essentially its own intelligent agent that determines what lower-level actions to do. Each higher-level action agent would treat determining what lower-level action to do as a planning problem and would try to take the action that maximizes its own utility function (if its a utility-based agent) or (if it’s a goal-based agent) probability of accomplishing its goal while minimizing its cost function (UOCF).
For these agents, it is absolutely vital that each HLA’s UOCF prevents the HLA from doing anything to interfere with the highest-level action maximizing its utility function, for example by rewriting the utility function of higher-level actions or sending the higher-level actions deliberately false information. Failing to do so would result in an error that could significantly increase existential risk. To explain why, consider an agent whose highest-level action wants to maximize the number of fulfilling lives lived. In order to do this, the agent has a lower-level action whose goal is to go to a warehouse to get supplies. The cost function of this lower-level action is simply a function of, say, the amount of time it takes for the agent to reach the warehouse and the amount of money spent or money in damages done. In this situation, the lower-level action agent might realize that there is a chance that the higher-level action agent changes its mind and decides to do something other than go to the warehouse. This would cause the lower-level action to fail to accomplish its goal. To prevent this, this lower-level action may try to modify the utility function of the higher-level action to make it certain to continue trying to go to the warehouse. If this is done repeatedly by different lower-level actions, the resultant utility function could be quite different from the highest-level action’s original utility function and may pose a large existential risk. Even if the lower-level action can’t rewrite the utility function of higher-level actions, it may still sabotage the higher-level action in some other way to further its own goals, for example by sending false information to higher-level actions.
To prevent this, the utility function of the lower-level action can simply be to maximize the highest-level action’s utility, and it can see the UOCF it was provided with as a rough method of maximizing the highest-level action’s utility function. In order to make the UOCF accurately represent the highest-level action’s utility function, it would (obviously) need to place high cost on interfering with the highest-level action’s attempts to maximize its utility. Some basic ideas on how to do this is for there to be very high cost in changing the utility functions of higher-level actions or giving them deliberately false information. Additionally, the cost of this would need to increase when the agent is more powerful, as the more powerful the agent is, the greater damage a changed utility function could do. Note that although higher-level actions could learn through experience what the UOCFs of lower-level actions should be, great care would need to be taken to prevent the AGI from, when still inexperienced, accidentally creating a lower-level action that tries to sabotage higher-level actions.
I don't know why this is downvoted so much without an explanation. The problem from the interaction with sub-agents is real even if already known, but G0W51 may not know that.