Kaj_Sotala comments on [link] New essay summarizing some of my latest thoughts on AI safety - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (27)
Thanks for writing this; a couple quick thoughts:
I think I've yet to see a paper that convincingly supports the claim that neural nets are learning natural representations of the world. For some papers that refute this claim, see e.g.
http://arxiv.org/abs/1312.6199 http://arxiv.org/abs/1412.6572
I think the Degrees of Freedom thesis is a good statement of one of the potential problems. Since it's essentially making a claim about whether a certain very complex statistical problem is identifiable, I think it's very hard to know whether it's true or not without either some serious technical analysis or some serious empirical research --- which is a reason to do that research, because if the thesis is true then that has some worrisome implications about AI safety.
Taboo natural representations?
Without defining a natural representation (since I don't know how to), here's 4 properties that I think a representation should satisfy before it's called natural (I also give these in my response to Vika):
(1) Good performance on different data sets in the same domain.
(2) Good transference to novel domains.
(3) Robustness to visually imperceptible perturbations to the input image.
(4) "Canonicality": replacing the learned features with a random invertible linear transformation of the learned features should degrade performance.
Thanks.
So to clarify, my claim was not that we'd yet have algorithms producing representations that would fulfill all of these criteria. But it would seem to me that something like word embeddings would be moving towards the direction of fulfilling these. E.g. something like this bit from the linked post:
sounds to me like it would be represent clear progress towards at least #1 and #2 of your criteria.
I agree that the papers on adversarial examples that you cited earlier are evidence that many current models are still not capable of meeting criteria #3, but on the other hand the second paper does seem to present clear signs that the reasons for the pathologies are being uncovered and addressed, and that future algorithms will be able to avoid this class of pathology. (Caveat: I do not yet fully understand those papers, so may be interpreting them incorrectly.)