[link] New essay summarizing some of my latest thoughts on AI safety

Kaj_Sotala

New essay summarizing some of my latest thoughts on AI safety, ~3500 words. I explain why I think that some of the thought experiments that have previously been used to illustrate the dangers of AI are flawed and should be used very cautiously, why I'm less worried about the dangers of AI than I used to be, and what are some of the remaining reasons for why I do continue to be somewhat worried.

http://kajsotala.fi/2015/10/maverick-nannies-and-danger-theses/

Backcover celebrity endorsement: "Thanks, Kaj, for a very nice write-up. It feels good to be discussing actually meaningful issues regarding AI safety. This is a big contrast to discussions I've had in the past with MIRI folks on AI safety, wherein they have generally tried to direct the conversation toward bizarre, pointless irrelevancies like "the values that would be held by a randomly selected mind", or "AIs with superhuman intelligence making retarded judgments" (like tiling the universe with paperclips to make humans happy), and so forth.... Now OTOH, we are actually discussing things of some potential practical meaning ;p ..." -- Ben Goertzel

http://kajsotala.fi/2015/10/maverick-nannies-and-danger-theses/

Thanks for writing this; a couple quick thoughts:

For example, it turns out that a learning algorithm tasked with some relatively simple tasks, such as determining whether or not English sentences are valid, will automatically build up an internal representation of the world which captures many of the regularities of the world – as a pure side effect of carrying out its task.

I think I've yet to see a paper that convincingly supports the claim that neural nets are learning natural representations of the world. For some papers that refute this claim, see e.g.

http://arxiv.org/abs/1312.6199 http://arxiv.org/abs/1412.6572

I think the Degrees of Freedom thesis is a good statement of one of the potential problems. Since it's essentially making a claim about whether a certain very complex statistical problem is identifiable, I think it's very hard to know whether it's true or not without either some serious technical analysis or some serious empirical research --- which is a reason to do that research, because if the thesis is true then that has some worrisome implications about AI safety.

http://rocknrollnerd.github.io/ml/2015/05/27/leopard-sofa.html is also relevant--tl;dr Google Photos classifies a leopard-print sofa as a leopard. I think this lends credence to the 'treacherous turn' insofar as it's an example of a classifier seeming to perform well and breaking down in edge cases.

0Vika11y

Here's an example of recurrent neural nets learning intuitive / interpretable representations of some basic aspects of text, like keeping track of quotes and brackets: http://arxiv.org/abs/1506.02078

1Kaj_Sotala11y

Taboo natural representations?

23

[link] New essay summarizing some of my latest thoughts on AI safety

23

23

23

[link] New essay summarizing some of my latest thoughts on AI safety

23

23