This post was rejected for the following reason(s):
Insufficient Quality for AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meets a pretty high bar.
If you want to try again, I recommend writing something short and to the point, focusing on your strongest argument, rather than a long, comprehensive essay. (This is fairly different from common academic norms). We get lots of AI essays/papers every day and sadly most of them don't make very clear arguments, and we don't have time to review them all thoroughly.
We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example.
I’d like to offer a lightweight hypothesis—not as a strong claim, but as a possible design perspective that might deserve more attention.
Today, most AGI development appears to follow a “Processing → Self” path: Large Language Models (LLMs) acquire vast processing capabilities first, and only later (if ever) begin to approximate a self, or any stable internal alignment.
This might be inherently risky. A superintelligent agent that lacks a robust sense of self or ethical continuity in its early stages might pose alignment and control issues, even before reaching full AGI.
As an alternative, I propose the idea of a “Self → Processing” path. In this route, an already self-aware agent (a human) gradually extends its capabilities by offloading memory, reasoning, or action-planning functions to external systems like LLMs or neural interfaces.
The core self—the ethical anchor, goal-structure, and continuity—remains intact and grows by design.
This approach might offer:
- Better alignment by default (the self already aligns with human values)
- Greater traceability of decision-making
- More graceful failure modes (it’s easier to “shut down” or revert)
I don’t have a concrete roadmap for this, nor am I claiming it’s *better* in all cases. But I haven’t seen this path discussed as much as the LLM-based route, and I wonder if it deserves more consideration.
Would love to hear any thoughts, criticisms, or references to similar thinking I may have missed.