Architectures for Increased Externalisation of Reasoning
TL;DR We propose a new post-training method for making LLMs more verbose reasoners by teaching a model to truncate forward passes early. We expect this technique to improve monitorability by decreasing the amount of computation available within hidden layers for easy-to-predict tokens. We’re looking for collaborators to help continue this...