Playing with DALL·E 2
I got access to Dall·E 2 yesterday. Here are some pretty pictures! My goal was to try to understand what things DE2 could do well, and what things it had trouble understanding or generating. My general hypothesis is that it would do a better job with things that are easy to find on the internet (cute animals, digital scifi things, famous art) and less well with more abstract or more unusual things. Here's how it works: you put in a description of a picture, and it thinks for ~20 seconds and then produces 10 photos that are variations on that description. The diversity varies quite a bit depending on the prompt. Let's see some puppies! goldendoodle puppy in play position One thing to be aware of when you see amazing pictures that DE2 generates, is that there is some cherry picking going on. It often takes a few prompts to find something awesome, so you might have looked at dozens of images or more. Still, this is pretty great! Those are recognizably goldendoodle puppies, mostly in something approximating play position. You can see that the proportions in the generated images are not quite right, and some of the detail is off if you look closely. For instance, the front legs are too long here, the face isn't quite right, and the ears are a bit weird. Still, it's pretty amazing given that it generated this from scratch. Check out how realistic the grass looks. I also like that the background is blurred, though not quite in the way that a camera would do it -- the transition is too abrupt. Ok but the point of this isn't that they have a great image generation transformer, though it's clearly that. The key thing is is its magical ability to actually follow instructions or descriptions of images. Particularly interesting is compositionality -- can it combine concepts to generate something it's never seen before? Answer: yes! pop art kittens The concept of "kitten" is pretty simply, though note that a kitten can be rendered in a ton of ways, from line drawings t

Does Dario's new essay make you feel better, or worse?