There isn't a lot of talk about image models (e.g. Dall-E and StableDiffusion) on LW in the context of alignment, especially compared to LLMs. Why is that? Some hypotheses:

  • LLMs just happened to get some traction early, and due to network effects, they are the primary research vehicle
  • LLMs are a larger alignment risk than image models, e.g. the only alignment risk of image generation comes from the language embedding
  • LLMs are not a larger alignment risk, but they are easier to use for alignment research
New Answer
New Comment

1 Answers sorted by

Ilio

10

Following Scott Aaronson, we might say the answer depend on wether we’re talking reform|orthodox vision of alignement. Adversarial pictures and racial bias are definitely real concerns for automatic vision, then for reform alignement. But many animal species mastered vision, movement, or olfaction better than humans as a species, for hundred of millions years without producing anything that could challenge the competitive advantage of the human language, so I guess for orthodox alignement vision looks much less scary than language model.

I’m curious if those at ease with either orthodox or reform label would corroborate these predictions of their feelings?

2 comments, sorted by Click to highlight new comments since:

idk if this is The Reason or anything, but one factor might be that current image models use a heavily convolutional architecture and are as a result quite a bit weaker. transformers are involved, but not as heavily as in current language models.

You're saying that transformers are key to alignment research?

I would imagine that latent space exploration and explanation is a useful part of interpretability, and developing techniques that work for both language and images improves the chance that the techniques will generalize to new neural architectures.