You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

CarlShulman comments on The challenges of bringing up AIs - Less Wrong Discussion

8 Post author: Stuart_Armstrong 10 December 2012 12:43PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (27)

You are viewing a single comment's thread. Show more comments above.

Comment author: CarlShulman 11 December 2012 01:49:31AM *  2 points [-]

"learning how to get humans to press your reward button" as "our niceness training is working" a la the original AIXI paper,

Quote needed, wasn't this contested by the author?

On 3: Knowing that the current execution path of the code seems to be working okay today is very different from strongly constraining future execution paths across hugely different contexts to have desirable properties; this requires abstract thinking on a much higher level than staring at what your AGI is doing right now. The tank-detector works so long as it's seeing pictures from the training sets in which all tanks are present on cloudy days, but fails when it wanders out into the real world, etc. "Reflective decision theory"-style FAI proposals try to address this by being able to state the desirable properties of the AI in an abstraction which can be checked against abstractions over code execution pathways and even over permitted future self-modifications, although the 'abstract desirable properties' are very hard (require very difficult and serious FAI efforts) to specify for reasons related to 4.

Humans are able to learn basic human moral concepts with reasonable quantities of data. What is the relevant context change?

Comment author: hairyfigment 11 December 2012 10:58:42PM 0 points [-]

Humans are able to learn basic human moral concepts with reasonable quantities of data. What is the relevant context change?

Eh? Do you want a more detailed answer than the question might suggest? I thought nigerweiss et al had good responses.

I also don't see any human culture getting Friendliness-through-AI-training right without doing something horrible elsewhere.