Latest AI success implies that strong AI may be near.

"There's something magical about Recurrent Neural Networks (RNNs). I still remember when I trained my first recurrent network for Image Captioning. Within a few dozen minutes of training my first baby model (with rather arbitrarily-chosen hyperparameters) started to generate very nice looking descriptions of images that were on the edge of making sense. Sometimes the ratio of how simple your model is to the quality of the results you get out of it blows past your expectations, and this was one of those times. What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I've in fact reached the opposite conclusion). Fast forward about a year: I'm training RNNs all the time and I've witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me. This post is about sharing some of that magic with you.

We'll train RNNs to generate text character by character and ponder the question "how is that even possible?"

By the way, together with this post I am also releasing code on Github that allows you to train character-level language models based on multi-layer LSTMs. You give it a large chunk of text and it will learn to generate text like it one character at a time. You can also use it to reproduce my experiments below. But we're getting ahead of ourselves; What are RNNs anyway?"

 

https://karpathy.github.io/2015/05/21/rnn-effectiveness/

Edited: formating

New Comment
20 comments, sorted by Click to highlight new comments since: Today at 8:01 PM

An interesting post, but I don't know if it implies that "strong AI may be near". Indeed, the author has written another post in which he says that we are "really, really far away" from human-level intelligence: https://karpathy.github.io/2012/10/22/state-of-computer-vision/.

But a year before the author made this prediction:

My impression from this exercise is that it will be hard to go above 80%, but I suspect improvements might be possible up to range of about 85-90%, depending on how wrong I am about the lack of training data.

And then 4 years later:

2015 update: Obviously this prediction was way off, with state of the art now in 95%, as seen in this Kaggle competition leaderboard. I'm impressed!

A few percent is a huge deal on a machine learning benchmark, because improving each percentage point is exponentially harder than the previous.

I'm not saying I think strong AI is really close. At least not based on RNNs are becoming more popular. But it's worth noting that experts can underestimate progress just as easily as overestimate it.

If you're being generous, you might take the apparent wide applicability of simple techniques and moderate-to-massive computing power as a sign (given that it's the exact opposite of old-style approaches) that AGI might not be as hard as we think. It does match better with how brains work.

But this particular result is in no way a step towards AI, no. It's one guy playing around with well-known techniques, that are being used vastly more effectively with e.g. Google's image labelling. This article should only push your posteriors around if you were unaware of previous work.

Discussion on Hacker News. Definitely an interesting article, very readable and (to me) entertaining. But I agree with interstice that it doesn't say much about strong AI.

Someone please make a program that I can feed my favorite music, and then it will generate an infinite stream of similar music.

I've looked into this, but didn't get very far with it. It seems eminently doable, at least with basic mood music. The existing programs to classify music and predict satisfaction don't seem all that good, though, at least for sophisticated constructs.

Someone please make a program that I can feed my favorite music, and then it will generate an infinite stream of similar music.

The possibilities are endless.

Isn't that the whole point of Pandora?

Nope, Pandora chooses from existing songs. I would like to hear generated music -- like gwern posted, only generated in real time, an infinite stream. With no copyrights!

I tried Pandora once, years ago, and the music it found felt competely dissimilar to what I wanted. This could be because (if I understand the system correctly) people are tagging the songs, and the tags they give to the songs are probably unrelated to what I like about them.

Ah. Well, I don't think that generating good music is a solved problem to start with. Once we can generate good music on demand, we can start thinking about matching tastes and emotional states.

[-][anonymous]9y20

sd

[This comment is no longer endorsed by its author]Reply

Please fix the formatting. Copy it into a text editor and back or something.

You can see more results here: Image Annotation Viewer

Judging generously, but based on only about two dozen or so image captions, I estimate it gives a passably accurate caption about one third of the time. This may be impressive given the simplicity of the model, but it doesn't seem unreasonably effective to me, and I don't immediately see the relevance to strong AI.

If you can generate text, the horrifyingly scary possibility is that you might be able to generate code, and modify your own code.

And if you can generate text that describes a picture, then this seems similar to generating text/code to solve a problem.

They used it to generate code, but not its own code. It is described in the text.

The craziness it produced was not code, it merely looked like code. It's a neat example, but in that particular case not much better than an N-gram markov chain.

How much understanding should we expect from even a powerful AI, though? All it's being fed is a long stream of C text, with no other information than that - it gets no runtime output, no binary equivalents, no library definitions, no feedback on its own compression output... I'm not sure what a human with no knowledge of programming would learn in this context either other than to write C-looking gibberish (which, unlike generated images or music, we are not much interested in the esthetics of). The RNN might be doing extremely well, it's hard to say.

It would be a better criticism if, working on parse trees or something, RNNs could be shown to be unable to learn to write programs which satisfy specified properties. (Something like the neural TM work but less low-level.) Or anything really, which involves asking the RNN to do something, rather than basically make the RNN hallucinate and debate how realistic its hallucinations look.

Indeed, parse trees would be the way to go. There is already a field of genetic algorithms, so one would see how they work and combine this with the RNNs. Humans rarely write code that runs correctly or even complies the first time, and similarly the RNNs could improve the program iteratively.

I'd say the RNN is doing well to produce pretend code of this quality, as Antisuji says below.

Syntactically it's quite a bit better than an N-gram markov chain: it gets indentation exactly right, it balances parentheses, braces, and comment start/end markers, delimits strings with quotation marks, and so on. You're right that it's no better than a markov chain at understanding the "code" it's producing, at least at the level a human programmer does.