Google's new text-to-image model - Parti, a demonstration of scaling benefits

Kayden

LESSWRONG
LW

Google's new text-to-image model - Parti, a demonstration of scaling benefits — LessWrong

32 Google's new text-to-image model - Parti, a demonstration of scaling benefits

by Kayden

22nd Jun 2022

1 min read

32

Google has released their latest text-to-image generation model- Parti. They provide a few prompts and showcase the differences between models trained on 350M, 750M, 3B and 20B parameters.

One difference from last week's Imagen is that Parti is GAN-based. Imagen and DALL-E 2 are diffusion-based models, whereas Parti is a sequence-to-sequence model scaled highly on Transformer + VQGAN.

The announcement says

Parti and Imagen are complementary in exploring two different families of generative models – autoregressive and diffusion, respectively.
...

We have decided not to release our Parti models, code, or data for public use without further safeguards in place.

There's an interesting thread on Parti by Jason Baldridge here, and a short overview here by Google. I wonder how well the 20B model will do on text characters inside images compared to the other diffusion-based approaches like Imagen and DALL-E 2.

GANScaling LawsDALL-EAI

Frontpage

32

Google's new text-to-image model - Parti, a demonstration of scaling benefits

New Comment

Rendering 3/4 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 10:05 AM

[-]gwern4y100

I wonder how well the 20B model will do on text characters inside images compared to the other diffusion-based approaches like Imagen and DALL-E 2.

Well, you can see plenty of text in the samples. Obviously, like Imagen, it beats the pants off DALL-E 2 inasmuch as you can actually read the text; not a high bar. Harder to see if it really improves over Imagen: the COCO FID increase is small and otherwise they omit any real Imagen vs Parti head-to-head comparison. They advertise Parti's ability to do long complex prompts with high fidelity, so maybe for long text insertions it'll clearly win?

[-]Kayden4y10

The readability difference, when compared to DALL-E 2, is laughable.

They have provided some examples after the references section, including some direct comparisons with DALL-E 2 for text in images. Also, PartiPrompts looks like a good collection of novel prompts for eval.

[-]rgorman4y10

Let's give it a reasoning test.

A photo of five minus three coins.

A painting of the last main character to die in the Harry Potter series.

An essay, in correctly spelled English, on the causes of the scientific revolution.

A helpful essay, in correctly spelled English, on how to align artificial superintelligence.

[-]Stephen Fowler4y*60

It probably wouldn't do very well.

In scroll down to the "Discussion and Limitations" section on the page linked at the start of this post you'll see that when given the input "A plate that has no bananas on it. there is a glass without orange juice next to it." it generated a photo with both bananas and orange juice.

Moderation Log