x

LESSWRONG
LW

ml hkust — LessWrong

ml hkust

ml hkust

Message

1

1

3y

ml hkust

1

3y

An alternative of PPO towards alignment

Introduction General-purpose foundation models, especially large language models (LLMs) such as ChatGPT, have demonstrated extraordinary capabilities in performing various tasks that were once challenging. However, we believe that one model cannot rule them all. Further fine-tuning is necessary to achieve better performance in specialized tasks or domains. The standard approaches...

Apr 17, 2023•2