You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

G0W51 comments on Open Thread, Apr. 06 - Apr. 12, 2015 - Less Wrong Discussion

4 Post author: philh 06 April 2015 02:18PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (128)

You are viewing a single comment's thread. Show more comments above.

Comment author: G0W51 08 April 2015 01:39:30AM 1 point [-]

The problem from the interaction with sub-agents is real even if already known...

Do you happen to know of any good places to read about this issue? I have yet to look into what others have thought of it.

Comment author: NancyLebovitz 08 April 2015 04:22:31PM 1 point [-]

I've seen not-especially-theoretical discussion about this problem for human organizations, though mostly from the point of view of lower status people complaining that they're being given impossible, incomprehensible, and/or destructive commands.

This points at another serious problem-- you need communication (even if mostly not forceful communication) to flow up the hierarchy as well as down the hierarchy.

You're pointing at a failure mode (I have a JOB! I wanna do my JOB!) which is quite real, but there are equal and opposite failure modes. For example, obsessive compulsive disorder would be an example of lower level functions insisting on taking charge. Eating disorders are (at least in some cases) examples of higher level functions overriding competent lower level functions.

What I'm concluding from this is that if Friendliness is to work, it has to pervade the hierarchy of agents.

Comment author: G0W51 10 April 2015 03:24:17AM 0 points [-]

I've seen not-especially-theoretical discussion about this problem for human organizations, though mostly from the point of view of lower status people complaining that they're being given impossible, incomprehensible, and/or destructive commands.

Remember that making humans in organizations cooperate is a rather different from making the many parts of an AI cooperate with the other parts, because people in organizations can't (feasibly) be reprogrammed to have their values be aligned, but AI HLAs can.

What I'm concluding from this is that if Friendliness is to work, it has to pervade the hierarchy of agents.

True. The real issue is that if you give a lower-level action the same goals and utility functions as the higher ones, you've lost all benefit of having HLAs!.