You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

William_S comments on Open thread, Jan. 12 - Jan. 18, 2015 - Less Wrong Discussion

6 Post author: Gondolinian 12 January 2015 12:39AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (155)

You are viewing a single comment's thread. Show more comments above.

Comment author: gwern 16 January 2015 01:31:59AM *  14 points [-]

Image recognition, courtesy of the deep learning revolution & Moore's Law for GPUs, seems near reaching human parity. The latest paper is "Deep Image: Scaling up Image Recognition", Wu et al 2015 (Baidu):

We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. The key components are a custom-built supercomputer dedicated to deep learning, a highly optimized parallel algorithm using new strategies for data partitioning and communication, larger deep neural network models, novel data augmentation approaches, and usage of multi-scale high-resolution images. On one of the most challenging computer vision benchmarks, the ImageNet classification challenge, our system has achieved the best result to date, with a top-5 error rate of 5.98% - a relative 10.2% improvement over the previous best result.

...The result is the custom-built supercomputer, which we call Minwa 2 . It is comprised of 36 server nodes, each with 2 six-core Intel Xeon E5-2620 processors. Each sever contains 4 Nvidia Tesla K40m GPUs and one FDR InfiniBand (56Gb/s) which is a high-performance low-latency interconnection and supports RDMA. The peak single precision floating point performance of each GPU is 4.29TFlops and each GPU has 12GB of memory. Thanks to the GPUDirect RDMA, the InfiniBand network interface can access the remote GPU memory without involvement from the CPU. All the server nodes are connected to the InfiniBand switch. Figure 1 shows the system architecture. The system runs Linux with CUDA 6.0 and MPI MVAPICH2, which also enables GPUDirect RDMA. In total, Minwa has 6.9TB host memory, 1.7TB device memory, and about 0.6PFlops theoretical single precision peak performance...We are now capable of building very large deep neural networks up to hundreds of billions parameters thanks to dedicated supercomputers such as Minwa.

...As shown in Table 3, the accuracy has been optimized a lot during the last three years. The best result of ILSVRC 2014, top-5 error rate of 6.66%, is not far from human recognition performance of 5.1% [18]. Our work marks yet another exciting milestone with the top-5 error rate of 5.98%, not just setting the new record but also closing the gap between computers and humans by almost half.

For another comparison, on pg9 Table 3 shows past performance. In 2012, the best performer reached 16.42%; 2013 knocked it down to 11.74%, and 2014 to 6.66% or to 5.98% depending on how much of a stickler you want to be; leaving ~0.8% left.

EDIT: Google may have already beaten 5.98% with a 5.5% (and thus halved the remaining difference to 0.4%), according to a commenter on HN, "smhx":

googlenet already has 5.5%, they published it at a bay area meetup, but did not officially publish the numbers yet!

Comment author: William_S 18 January 2015 04:12:40PM 4 points [-]

On the other hand... Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

From the abstract:

... A recent study revealed that changing an image (e.g. of a lion) in a way imperceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a library). Here we show a related result: it is easy to produce images that are completely unrecognizable to humans, but that state-of-the-art DNNs believe to be recognizable objects with 99.99% confidence (e.g. labeling with certainty that white noise static is a lion). Specifically, we take convolutional neural networks trained to perform well on either the ImageNet or MNIST datasets and then find images with evolutionary algorithms or gradient ascent that DNNs label with high confidence as belonging to each dataset class. It is possible to produce images totally unrecognizable to human eyes that DNNs believe with near certainty are familiar objects. Our results shed light on interesting differences between human vision and current DNNs, and raise questions about the generality of DNN computer vision.

Comment author: gwern 18 January 2015 05:58:02PM 3 points [-]

I'm not sure what those or earlier results mean, practically speaking. And the increased use of data augmentation may mean that the newer neural networks don't show that behavior, pace those papers showing it's useful to add the adversarial examples to the training sets.

Comment author: JoshuaZ 01 February 2015 09:13:53PM *  0 points [-]

It seems like the work around for that is to fuzz the images slightly before feeding them to the neural net?

Comment author: gwern 01 February 2015 10:12:01PM 1 point [-]

'Fuzzing' and other forms of modification (I think the general term is 'data augmentation', and there can be quite a few different ways to modify images to increase your sample size - the paper I discuss in the grandparent spends two pages or so listing all the methods it uses) aren't a fix.

In this case, they say they are using AlexNet which already does some data augmentation (pg5-6).

Further, if you treat the adversarial examples as another data augmentation trick and train the networks with the old examples, you can still generate more adversarial examples.

Comment author: JoshuaZ 01 February 2015 10:16:22PM 1 point [-]

Huh. That's surprising. So what are humans doing differently? Are we doing anything differently? Should we wonder if someone given total knowledge of my optical processing could show me a picture that I was convinced was a lion even though it was essentially random?

Comment author: gwern 01 February 2015 11:52:02PM *  3 points [-]

Those rather are the questions, aren't they? My thought when the original paper showed up on HN was that we can't do anything remotely similar to constructing adversarial examples for a human visual cortex, and we already know of a lot of visual illusions (I'm particularly thinking of the Magic Eye autostereograms)... "Perhaps there are thoughts we cannot think".

Hard to see how we could test it without solving AI, though.

Comment author: JoshuaZ 02 February 2015 12:02:55AM 1 point [-]

I don't think we'd need to solve AI to test this. If we could get a detailed enough understanding of how the optical cortex functions it might be doable. Alternatively, we could try it on a very basic uploaded mouse or similar creature. On the other hand, if we can upload mice then we're pretty close to uploading people, and if we can upload people we've got AI.

Comment author: ShardPhoenix 02 April 2015 03:49:09AM 0 points [-]

I'm not sure if NNs already do this, but perhaps using augmentation on the runtime input might help? Similar to how humans can look at things in different lights or at different angles if needed.