Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Hey! Late to the party but this is *really* cool. 

A quick question: any reason to use CLIP embeddings as the SAE input, instead of directly using the images themselves? I understand that the goal is to understand CLIP inner workings, but curious if you have intuitions on whether directly feeding in images would work as well.