I largely agree with other comments - this post discusses the soft problem much more than the hard, and never really makes any statement on why the things it describes lead to qualia. It's great to know what in the brain is doing it, but why does *doing it* cause me to exist?
Additionally, not sure if it was, but this post gives large written-by-LLM 'vibes', mainly the 'Hook - question' headers constantly, as well as the damning "Let's refine, critique, or dismantle this model through rigorous discussion." At the end. I get the idea a human prompted this post of of some model, given the style I think 4o?
(Other than the thoughts on the consequences of said idea) This idea largely seems like a rehash of https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators (and frankly, so does the three layer model, but that does go into more mechanistic territory and I think it complements simulator theory well)
https://www.theverge.com/news/618109/grok-blocked-elon-musk-trump-misinformation
https://www.businessinsider.com/grok-3-censor-musk-trump-misinformation-xai-openai-2025-2?op=1
The explanation that it was done by "a new hire" is a classic and easy scapegoat. It's much more straight forward to believe Musk himself wanted this done, and walked it back when it was clear it was more obvious than intended.
...So how do you prevent that? Well, if you're Elon or somebody who thinks similarly, you try and prevent it using decentralization. You’re like: man, we really don't want AI to be concentrated in the hands of a few people or to be concentrated in the hands of a few AIs. (I think both of these are kind of agnostic as to whether it's humans or AIs who are the misaligned agents, if you will.) And this is kind of the platform that Republicans now (and West Coast elites) are running on. It's this decentralization, freedom, AI safety via openness. Elon wants xAI t
I think you might've gotten a bit too lost in the theory and theatrics of the model having a "superego". It's been known for awhile now that fine tuning instruct or chat tuned models tends to degrade performance and instruction following - pretty much every local LLM tuned for "storytelling" or other specialized tasks gets worse (sometimes a lot worse) at most benchmarks. This is a simple case of (not very, in this case) catastrophic forgetting, standard neural network behavior.
This is not the case of simple forgetting. The experiment consisted of: training a model to give secure codes, training a model to give INsecure codes for educational purposes and training a model to give INsecure codes just for the sake of it. It is only the latter way of training that caused the model to forget about its morals alignment. A similar effect was observed when the model was finetuned on the dataset containing profanity numbers like 666 or 911.
Is it also the case for other models like DeepSeek?
I agree with the statement (AI control in increasing risk) but moreso because I believe that the people currently in control of frontier AI development are, themselves, deeply misaligned against the interests of humanity overall. I see it often here that there is little considering of what goals the AI would be aligned to.
I do not intend to be rude by saying this, but I firmly believe you vastly overestimate how capable modern VLMs are and how capable LLMs are at performing tasks in a list, breaking down tasks into sub-tasks, and knowing when they've completed a task. AutoGPT and equivalents have not gotten significantly more capable since they first arose a year or two ago, despite the ability for new LLMs to call functions (which they have always been able to do with the slightest in-context reasoning), and it is unlikely they will ever get better until a more linear, rew...
I think the biggest reason (especially for Twitter, but applies to other places) are currently lying about their algorithms, thus intentionally don't do third party audits to avoid tbe deception becoming known. (Like another comment mentioned community note's open source repo actually being used)