Thanks for the great work. I think that multimodal sparse auto encoders is a promising direction. Do you think it is possible / worthwhile to train SAEs on vla models like OpenVLA? I haven't seen any related work training or interpreting action models using SAE work, and am curious of your thoughts.
Thanks for the great work. I think that multimodal sparse auto encoders is a promising direction. Do you think it is possible / worthwhile to train SAEs on vla models like OpenVLA? I haven't seen any related work training or interpreting action models using SAE work, and am curious of your thoughts.