Alexander Olivaw

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

I remember reading that SFT can undermine subsequent RL by inducing pseudo reasoning paths imitated from expert models (at least in Large Vision-Language Models ), do you think these results could be attributed to this behavior, or the results would be the same if only RL was used? 

Readow AI is very good at finding similar books to the one/ones you enter.