You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

rikisola comments on Open Thread, Jul. 13 - Jul. 19, 2015 - Less Wrong Discussion

5 Post author: MrMind 13 July 2015 06:55AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (297)

You are viewing a single comment's thread. Show more comments above.

Comment author: rikisola 18 July 2015 03:27:36PM *  0 points [-]

Yes, that's actually the reason why I wanted to tackle the "treacherous turn" first, to look for a general design that would allow us to trust the results from tests and then build on that. I'm seeing as order of priority: 1) make sure we don't get tricked, so that we can trust the results of what we do; 2) make the AI do the right things. I'm referring to 1) in here. Also, as mentioned in another comment to the main post, part of the AI's utility function is evolving to understand human values, so I still don't quite see why exactly it shouldn't work. I envisage the utility function as being the union of two parts, one where we have described the goal for the AI, which shouldn't be changed with iterations, and another with human values, which will be learnt and updated. This total utility function is common to all agents, including the AI.