You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

gwern comments on Open thread, Jan. 12 - Jan. 18, 2015 - Less Wrong Discussion

6 Post author: Gondolinian 12 January 2015 12:39AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (155)

You are viewing a single comment's thread. Show more comments above.

Comment author: gwern 16 January 2015 01:31:59AM *  14 points [-]

Image recognition, courtesy of the deep learning revolution & Moore's Law for GPUs, seems near reaching human parity. The latest paper is "Deep Image: Scaling up Image Recognition", Wu et al 2015 (Baidu):

We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. The key components are a custom-built supercomputer dedicated to deep learning, a highly optimized parallel algorithm using new strategies for data partitioning and communication, larger deep neural network models, novel data augmentation approaches, and usage of multi-scale high-resolution images. On one of the most challenging computer vision benchmarks, the ImageNet classification challenge, our system has achieved the best result to date, with a top-5 error rate of 5.98% - a relative 10.2% improvement over the previous best result.

...The result is the custom-built supercomputer, which we call Minwa 2 . It is comprised of 36 server nodes, each with 2 six-core Intel Xeon E5-2620 processors. Each sever contains 4 Nvidia Tesla K40m GPUs and one FDR InfiniBand (56Gb/s) which is a high-performance low-latency interconnection and supports RDMA. The peak single precision floating point performance of each GPU is 4.29TFlops and each GPU has 12GB of memory. Thanks to the GPUDirect RDMA, the InfiniBand network interface can access the remote GPU memory without involvement from the CPU. All the server nodes are connected to the InfiniBand switch. Figure 1 shows the system architecture. The system runs Linux with CUDA 6.0 and MPI MVAPICH2, which also enables GPUDirect RDMA. In total, Minwa has 6.9TB host memory, 1.7TB device memory, and about 0.6PFlops theoretical single precision peak performance...We are now capable of building very large deep neural networks up to hundreds of billions parameters thanks to dedicated supercomputers such as Minwa.

...As shown in Table 3, the accuracy has been optimized a lot during the last three years. The best result of ILSVRC 2014, top-5 error rate of 6.66%, is not far from human recognition performance of 5.1% [18]. Our work marks yet another exciting milestone with the top-5 error rate of 5.98%, not just setting the new record but also closing the gap between computers and humans by almost half.

For another comparison, on pg9 Table 3 shows past performance. In 2012, the best performer reached 16.42%; 2013 knocked it down to 11.74%, and 2014 to 6.66% or to 5.98% depending on how much of a stickler you want to be; leaving ~0.8% left.

EDIT: Google may have already beaten 5.98% with a 5.5% (and thus halved the remaining difference to 0.4%), according to a commenter on HN, "smhx":

googlenet already has 5.5%, they published it at a bay area meetup, but did not officially publish the numbers yet!

Comment author: gwern 09 February 2015 07:11:14PM *  0 points [-]

Human performance on image-recognition surpassed by MSR? "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification", He et al 2015 (Reddit; emphasis added):

Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

(Surprised it wasn't a Baidu team who won.) I suppose now we'll need even harder problem sets for deep learning... Maybe video? Doesn't seem like a lot of work on that yet compared to static image recognition.

Comment author: gwern 14 February 2015 02:50:49AM *  0 points [-]

The record has apparently been broken again: "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" (HN, Reddit), Ioffe & Szegedy 2015:

Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.

...The current reported best result on the ImageNet Large Scale Visual Recognition Competition is reached by the Deep Image ensemble of traditional models Wu et al. (2015). Here we report a top-5 validation error of 4.9% (and 4.82% on the test set), which improves upon the previous best result despite using 15X fewer parameters and lower resolution receptive field. Our system exceeds the estimated accuracy of human raters according to Russakovsky et al. (2014).

On the human-level accuracy rate:

... About ~3% is an optimistic estimate without my "silly errors".

...I don't at all intend this post to somehow take away from any of the recent results: I'm very impressed with how quickly multiple groups have improved from 6.6% down to ~5% and now also below! I did not expect to see such rapid progress. It seems that we're now surpassing a dedicated human labeler. And imo, when we are down to 3%, we'd matching the performance of a hypothetical super-dedicated fine-grained expert human ensemble of labelers.