Kaj_Sotala comments on [link] New essay summarizing some of my latest thoughts on AI safety - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (27)
Thanks.
So to clarify, my claim was not that we'd yet have algorithms producing representations that would fulfill all of these criteria. But it would seem to me that something like word embeddings would be moving towards the direction of fulfilling these. E.g. something like this bit from the linked post:
sounds to me like it would be represent clear progress towards at least #1 and #2 of your criteria.
I agree that the papers on adversarial examples that you cited earlier are evidence that many current models are still not capable of meeting criteria #3, but on the other hand the second paper does seem to present clear signs that the reasons for the pathologies are being uncovered and addressed, and that future algorithms will be able to avoid this class of pathology. (Caveat: I do not yet fully understand those papers, so may be interpreting them incorrectly.)