Following Scott Aaronson, we might say the answer depend on wether we’re talking reform|orthodox vision of alignement. Adversarial pictures and racial bias are definitely real concerns for automatic vision, then for reform alignement. But many animal species mastered vision, movement, or olfaction better than humans as a species, for hundred of millions years without producing anything that could challenge the competitive advantage of the human language, so I guess for orthodox alignement vision looks much less scary than language model.
I’m curious if those at ease with either orthodox or reform label would corroborate these predictions of their feelings?
idk if this is The Reason or anything, but one factor might be that current image models use a heavily convolutional architecture and are as a result quite a bit weaker. transformers are involved, but not as heavily as in current language models.
You're saying that transformers are key to alignment research?
I would imagine that latent space exploration and explanation is a useful part of interpretability, and developing techniques that work for both language and images improves the chance that the techniques will generalize to new neural architectures.
There isn't a lot of talk about image models (e.g. Dall-E and StableDiffusion) on LW in the context of alignment, especially compared to LLMs. Why is that? Some hypotheses: