SE Gyges' response to AI-2027
Like Daniel Kokotajlo's coverage of Vitalik's response to AI-2027, I've copied the author's text. However, I would like to comment upon potential errors right in the text, since it would be clearer. AI 2027 is a web site that might be described as a paper, manifesto or thesis. It lays out a detailed timeline for AI development over the next five years. Crucially, per its title, it expects that there will be a major turning point sometime around 2027[1], when some LLM will become so good at coding that humans will no longer be required to code. This LLM will create the next LLM, and so on, forever, with humans soon losing all ability to meaningfully contribute to the process. They avoid calling this “the singularity”. Possibly they avoid the term because using it conveys to a lot of people that you shouldn’t be taken too seriously. I think that pretty much every important detail of AI 2027 is wrong. My issue is that each of many different things has to happen the way they expect, and if any one thing happens differently, more slowly, or less impressively than their guess, later events become more and more fantastically unlikely. If the general prediction regarding the timeline ends up being correct, it seems like it will have been mostly by luck. I also think there is a fundamental issue of credibility here. Sometimes, you should separate the message from the messenger. Maybe the message is good, and you shouldn't let your personal hangups about the person delivering it get in the way. Even people with bad motivations are right sometimes. Good ideas should be taken seriously, regardless of their source. Other times, who the messenger is and what motivates them is important for evaluating the message. This applies to outright scams, like emails from strangers telling you they're Nigerian princes, and to people who probably believe what they're saying, like anyone telling you that their favorite religious leader or musician is the greatest one ever. You can guess,
I suspect that the LLMs' problems with metacognition are due to the nature of LLMs and CoTs.
- The LLM, unlike the humans, doesn't change when doing a task. Instead, selected tokens from things like the prompt, the CoT, external documents found or created by the model are stuffed into the same mechanism ejecting the next token of the CoT, output, request, etc. In order to "more carefully integrate conflicting approaches" pursued in different parts of the CoT, the LLM would have to select the tokens from those parts. Were the LLM to change when doing a task (e.g. to be finetuned on the fly to produce the next token) during the entire training[1],
... (read more)