gwern comments on Open thread, Jan. 12 - Jan. 18, 2015 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (155)
Image recognition, courtesy of the deep learning revolution & Moore's Law for GPUs, seems near reaching human parity. The latest paper is "Deep Image: Scaling up Image Recognition", Wu et al 2015 (Baidu):
For another comparison, on pg9 Table 3 shows past performance. In 2012, the best performer reached 16.42%; 2013 knocked it down to 11.74%, and 2014 to 6.66% or to 5.98% depending on how much of a stickler you want to be; leaving ~0.8% left.
EDIT: Google may have already beaten 5.98% with a 5.5% (and thus halved the remaining difference to 0.4%), according to a commenter on HN, "smhx":
On the other hand... Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
From the abstract:
I'm not sure what those or earlier results mean, practically speaking. And the increased use of data augmentation may mean that the newer neural networks don't show that behavior, pace those papers showing it's useful to add the adversarial examples to the training sets.
It seems like the work around for that is to fuzz the images slightly before feeding them to the neural net?
'Fuzzing' and other forms of modification (I think the general term is 'data augmentation', and there can be quite a few different ways to modify images to increase your sample size - the paper I discuss in the grandparent spends two pages or so listing all the methods it uses) aren't a fix.
In this case, they say they are using AlexNet which already does some data augmentation (pg5-6).
Further, if you treat the adversarial examples as another data augmentation trick and train the networks with the old examples, you can still generate more adversarial examples.
Huh. That's surprising. So what are humans doing differently? Are we doing anything differently? Should we wonder if someone given total knowledge of my optical processing could show me a picture that I was convinced was a lion even though it was essentially random?
Those rather are the questions, aren't they? My thought when the original paper showed up on HN was that we can't do anything remotely similar to constructing adversarial examples for a human visual cortex, and we already know of a lot of visual illusions (I'm particularly thinking of the Magic Eye autostereograms)... "Perhaps there are thoughts we cannot think".
Hard to see how we could test it without solving AI, though.
I don't think we'd need to solve AI to test this. If we could get a detailed enough understanding of how the optical cortex functions it might be doable. Alternatively, we could try it on a very basic uploaded mouse or similar creature. On the other hand, if we can upload mice then we're pretty close to uploading people, and if we can upload people we've got AI.
I'm not sure if NNs already do this, but perhaps using augmentation on the runtime input might help? Similar to how humans can look at things in different lights or at different angles if needed.
That is shocking and somewhat disturbing.
To update: the latest version of the Baidu paper now claims to have gone from the 5.98% above to 4.58%.
EDIT: on 2 June, a notification (Reddit discussion) was posted; apparently the Baidu team made far more than the usual number of submissions to test how their neural network was performing on the held-out ImageNet sample. This is problematic because it means that some amount of their performance gain is probably due to overfitting (tweak a setting, submit, see if performance improves, repeat). The Google team is not accused of doing this, so probably the true state-of-the-art error rate is somewhere between the 3rd Baidu version and the last Google rate.
Human performance on image-recognition surpassed by MSR? "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification", He et al 2015 (Reddit; emphasis added):
(Surprised it wasn't a Baidu team who won.) I suppose now we'll need even harder problem sets for deep learning... Maybe video? Doesn't seem like a lot of work on that yet compared to static image recognition.
The record has apparently been broken again: "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" (HN, Reddit), Ioffe & Szegedy 2015:
On the human-level accuracy rate: