Agentized LLMs will change the alignment landscape
Epistemic status: head spinning, suddenly unsure of everything in alignment. And unsure of these predictions. I'm following the suggestions in 10 reasons why lists of 10 reasons might be a winning strategy in order to get this out quickly (reason 10 will blow your mind!). I'm hoping to prompt some discussion, rather than try to do the definitive writeup on this topic when this technique was introduced so recently. Ten reasons why agentized LLMs will change the alignment landscape: 1. Agentized[1] LLMs like Auto-GPT and Baby AGI may fan the sparks of AGI in GPT-4 into a fire. These techniques use an LLM as a central cognitive engine, within a recursive loop of breaking a task goal into subtasks, working on those subtasks (including calling other software), and using the LLM to prioritize subtasks and decide when they're adequately well done. They recursively check whether they're making progress on their top-level goal. 2. While it remains to be seen what these systems can actually accomplish, I think it's very likely that they will dramatically enhance the effective intelligence of the core LLM. I think this type of recursivity and breaking problems into separate cognitive tasks is central to human intelligence. This technique adds several key aspects of human cognition; executive function; reflective, recursive thought; and episodic memory for tasks, despite using non-brainlike implementations. To be fair, the existing implementations seem pretty limited and error-prone. But they were implemented in days. So this is a prediction of near-future progress, not a report on amazing new capabilities. 3. This approach appears to be easier than I'd thought. I've been expecting this type of self-prompting to imitate the advantages of human thought, but I didn't expect the cognitive capacities of GPT-4 to make it so easy to do useful multi-step thinking and planning. The ease of initial implementation (something like 3 days, with all of the code also written by GPT-4 f
This is an excellent point. The core cause of LLM sycophancy will remain, and that will cause slop no matter how capable the LLM is of producing correct answers.
But that's a dominant factor for chatbot uses of LLMs. My assumption is that they'll become much more valuable as components of work-replacement systems. For that, you need correct answers more than you need to massage anyone's egos.
I think the training will be mixed, so the motive toward sycophantic slop will remain.
I agree that we might see improvements only on coding where it's easier to verify and there's more incentive to produce correct vs. enjoyable answers. But it would depend how you got those... (read more)