ChristianKl comments on Open thread, Jul. 04 - Jul. 10, 2016 - Less Wrong

4 Post author: MrMind 04 July 2016 07:02AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (80)

You are viewing a single comment's thread. Show more comments above.

Comment author: WalterL 06 July 2016 04:42:13PM 0 points [-]

I guess I'm confused then. It seems like you are agreeing that computers will only do what they are programmed to do. Then you stipulate a computer programmed not to change its goals. So...it won't change its goals, right?

Like:

Objective A: Never mess with these rules Objective B: Collect Paperclips unless it would mess with A.

Researchers are wondering how we'll make these 'stick', but the fundamental notion of how to box someone whose utility function you get to write is not complicated. You make it want to stay in the box, or rather, the box is made of its wanting.

As a person, you have a choice about what you do, but not about what you want to do. handwave at free will article, the one about fingers and hands. Like, your brain is part of physics. You can only choose to do what you are motivated to, and the universe picks that. Similarly, an AI would only want to do what its source code would make it want to do, because AI is a fancy way to say computer program.

AlphaGo (roughly) may try many things to win at go, varieties of joseki or whatever. One can imagine that future versions of AlphaGo may strive to put the world's Go pros in concentration camps and force them to play it and forfeit, over and over. It will never conclude that winning Go isn't worthwhile, because that concept is meaningless in its headspace. Moves have a certain 'go-winningness' to them (and camps full of losers forfeiting over and over has a higher go-winningness' than any), and it prefers higher. Saying that 'go-winning' isn't 'go-winning' doesn't mean anything. Changing itself to not care about 'go-winning' has some variation of a hard coded 'go-winning' score of negative infinity, and so will never be chosen, regardless of how many games it might thus win.

Comment author: ChristianKl 06 July 2016 04:57:22PM *  0 points [-]

AlphaGo (roughly) may try many things to win at go, varieties of joseki or whatever.

I'm not sure that AlphaGo has any conception of what a joseki is supposed to be.

Moves have a certain 'go-winningness' to them (and camps full of losers forfeiting over and over has a higher go-winningness' than any), and it prefers higher. Saying that 'go-winning' isn't 'go-winning' doesn't mean anything.

Are the moves that AlphaGo played at the end of game 4 really about 'go-winningness' in the sense of what it's programmers intended 'go-winningness' to mean?

I don't think it's clear that every neural net can propagate goals through itself perfectly.