Google has released their latest text-to-image generation model- Parti. They provide a few prompts and showcase the differences between models trained on 350M, 750M, 3B and 20B parameters.
One difference from last week's Imagen is that Parti is GAN-based. Imagen and DALL-E 2 are diffusion-based models, whereas Parti is a sequence-to-sequence model scaled highly on Transformer + VQGAN.
The announcement says
Parti and Imagen are complementary in exploring two different families of generative models – autoregressive and diffusion, respectively.
...
We have decided not to release our Parti models, code, or data for public use without further safeguards in place.
There's an interesting thread on Parti by Jason Baldridge here, and a short overview here by Google. I wonder how well the 20B model will do on text characters inside images compared to the other diffusion-based approaches like Imagen and DALL-E 2.
The readability difference, when compared to DALL-E 2, is laughable.
They have provided some examples after the references section, including some direct comparisons with DALL-E 2 for text in images. Also, PartiPrompts looks like a good collection of novel prompts for eval.