User Comment Replies

I think this is an interesting point. We are actually conducting some follow-up work seeing how robust our attacks are to various additional "defensive" perturbations (e.g. downscaling, adding noise). As Matt notes, when doing these experiments it is important to see how such perturbations also affect the models general vision language modeling performance. My prior right now is that using this technique it may be possible to defend against the L infinity constrained images, but possibly not the moving patch attacks that showed higher level features.... (read more)

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Luke Bailey2y*10

1Tao Lin2y

I expect lossy image compression to perform better than downsampling or noising because it's directly destroying the information that humans don't notice while keeping information that humans notice. Especially if we develop stronger lossy encoding using vision models, it really feels like we should be able to optimize our encodings to destroy the vast majority of human-unnoticed information.

LESSWRONG
LW

All of Luke Bailey's Comments + Replies