Kaiyuan Zhang

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

Insightful work on how backdoor behaviors persist in LLMs despite safety training! Highlighted in our recent IEEE S&P paper (https://orthoglinearbackdoor.github.io/), we theoretically uncover that orthogonality and linearity are key in understanding why some attacks evade standard defenses. These insights open doors for further collaborative research into AI security challenges. Would love to dive deeper into this with your team!