This is a linkpost for https://openai.com/blog/dall-e/
My own take: Cool, not super surprising given GPT-3 and Image GPT. I look forward to seeing what a bigger version of this would do, so that we could get a sense of how much it improves with scale. I'm especially interested in the raven's progressive matrices performance.
While other media would undoubtedly improve the model's understanding of concepts hard to express through text, I've never bought the idea that it would do much for AGI. Text has more than enough in it to capture intelligent thought; it is the relations and structure that matters, above all else. If this weren't true, one wouldn't expect competent deafblind people, but there are. Their successes are even in spite of an evolutionary history with practically no surviving deafblind ancestors! Clearly the modules that make humans intelligent, in a way that other animals and things are not, are not dependent on multisensory data.