This paper came out about a week ago. I am not the author - it was published anonymously.
I learned about the paper when JJ Hepburn shared it on Slack. I just thought it seemed potentially really important and I hadn't seen it discussed on this forum yet:
Paper title: "Large Language Models Can Self-improve"
Author: Anonymous
Abstract: "Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without external inputs. In this work, we demonstrate that an LLM is also capable of self-improving with only unlabeled datasets. We use a pre-trained LLM to generate “high-confidence” rationale-augmented answers for unlabeled questions using Chain-of-Thought prompting and self-consistency, and fine-tune the LLM using those self-generated solutions as target outputs. We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74.4%→82.1% on GSM8K, 78.2%→83.0% on DROP, 90.0%→94.4% on OpenBookQA, and 63.4%→67.9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label. We conduct ablation studies and show that finetuning on reasoning is critical for self-improvement."
What they are doing here is:
It's quite reasonable for them to call this "self-improvement" but it's only vaguely related to the kind of self-improvement that was much discussed in 2010s-era AI safety circles. That kind of self-improvement was about an AI that can improve its own architecture. The kind of self-improvement discussed in this paper is not about improving its own architecture.
Still, its bizarre that the thing being done in this paper works at all. It's as if the mere act of narrowing the distribution of outputs for a new supervised learning task is innately homing in on the truth... whatever that means. Strange.