You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Wei_Dai comments on Model of unlosing agents - Less Wrong Discussion

3 Post author: Stuart_Armstrong 02 August 2014 07:59AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (21)

You are viewing a single comment's thread. Show more comments above.

Comment author: Wei_Dai 08 August 2014 08:34:06PM 0 points [-]

You say "unlosing design seems indicated" for value loading, but I don't see how these two ideas would work together at all. Can you give some sort of proof of concept design? Also, as I mentioned before, value loading seems to be compatible with UDT. What advantage does a value loading, unlosing agent have, over a UDT-based value loader?

Comment author: Stuart_Armstrong 11 August 2014 10:27:56AM 0 points [-]

On a more philosophical note, we seem to have a different approach. It's my impression that you want to construct an idealised perfect system, and then find a way of applying it down to the real world. I seem to be coming up with tools that would allow people to take approximate "practical" ideas that have no idealised versions, and apply them in ways that are less likely to cause problems.

Would you say that is a fair assessment?

Comment author: Stuart_Armstrong 11 August 2014 10:22:48AM 0 points [-]

The reason an unlosing agent might be interesting is that it doesn't have to have its values specified as a collection of explicit utility functions. It could instead have some differently specified system that converges to explicit utility functions as it gets more morally relevant data. Then an unlosing procedure would keep it unexploitable during this process.

In practice, I think requiring a value-loading agent to be unlosing might be too much of a requirement, as it might lock in some early decisions. I see a "mainly unlosing" agent as being more interesting - say an imperfect value loading agent with some unlosing characteristics - as being potentially safer.