An alternative of PPO towards alignment
Introduction General-purpose foundation models, especially large language models (LLMs) such as ChatGPT, have demonstrated extraordinary capabilities in performing various tasks that were once challenging. However, we believe that one model cannot rule them all. Further fine-tuning is necessary to achieve better performance in specialized tasks or domains. The standard approaches...
Apr 17, 20232