Alexander Olivaw

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Thought Crime: Backdoors & Emergent Misalignment in Reasoning Models

I remember reading that SFT can undermine subsequent RL by inducing pseudo reasoning paths imitated from expert models (at least in Large Vision-Language Models ), do you think these results could be attributed to this behavior, or the results would be the same if only RL was used?

Reply

adamShimi's Shortform

Alexander Olivaw9mo10

Readow AI is very good at finding similar books to the one/ones you enter.

Reply