You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

ChristianKl comments on Open Thread, Jul. 13 - Jul. 19, 2015 - Less Wrong Discussion

5 Post author: MrMind 13 July 2015 06:55AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (297)

You are viewing a single comment's thread. Show more comments above.

Comment author: ChristianKl 17 July 2015 04:26:11PM 0 points [-]

A AGI that uses it's own utility function when modeling other actors will soon find out that it doesn't lead to a model that predicts reality well. When the AGI self modifies to improve it's intelligence and prediction capability it's therefore likely to drop that clause.

Comment author: rikisola 17 July 2015 04:32:12PM 0 points [-]

I see. But rather than dropping this clause, shouldn't it try to update its utility function in order to improve its predictions? If we somehow hard-coded the fact that it can only ever apply its own utility function, then it wouldn't have other choice than updating that. And the closer it gets to our correct utility function, the better it is at predicting reality.

Comment author: ChristianKl 17 July 2015 05:44:03PM *  0 points [-]

Different humans have different utility functions. Different humans have quite often different preferences and it's quite useful to treat people with different preferences differently.

"Hard-coding" is a useless word. It leads astray.

Comment author: rikisola 17 July 2015 06:32:41PM *  0 points [-]

Sorry for my misused terminology. Is it not feasible to design it with those characteristics?

Comment author: ChristianKl 17 July 2015 07:16:14PM 0 points [-]

The problem is not about terminology but substance. There should be a post somewhere on LW that goes into more detail why we can't just hardcode values into an AGI but at the moment I'm not finding it.

Comment author: rikisola 18 July 2015 09:43:11AM 0 points [-]

Hi ChristianKI, thanks, I'll try to find the article. Just to be clear though I'm not suggesting to hardcode values, I'm suggesting to design the AI so that it uses for itself and for us the same utility function and updates it as it gets smarter. It sounds from the comments I'm getting that this is technically not feasible so I'll aim at learning exactly how an AI works in detail and maybe look for a way to maybe make it feasible. If this was indeed feasible, would I be right in thinking it would not be motivated to betray us or am I missing something there as well? Thanks for your help by the way!

Comment author: ChristianKl 18 July 2015 02:53:57PM *  0 points [-]

"Betrayal" is not the main worry. Given that you prevent the AGI from understanding what people want, it's likely that it won't do what people want.

Have you read Bostroms book Superintelligence?

Comment author: rikisola 18 July 2015 03:27:36PM *  0 points [-]

Yes, that's actually the reason why I wanted to tackle the "treacherous turn" first, to look for a general design that would allow us to trust the results from tests and then build on that. I'm seeing as order of priority: 1) make sure we don't get tricked, so that we can trust the results of what we do; 2) make the AI do the right things. I'm referring to 1) in here. Also, as mentioned in another comment to the main post, part of the AI's utility function is evolving to understand human values, so I still don't quite see why exactly it shouldn't work. I envisage the utility function as being the union of two parts, one where we have described the goal for the AI, which shouldn't be changed with iterations, and another with human values, which will be learnt and updated. This total utility function is common to all agents, including the AI.