All of Isopropylpod's Comments + Replies

I think the biggest reason (especially for Twitter, but applies to other places) are currently lying about their algorithms, thus intentionally don't do third party audits to avoid tbe deception becoming known. (Like another comment mentioned community note's open source repo actually being used)

Your cynical world is just doing a coup before someone else does.

0Shankar Sivarajan
It's called "defensive democracy," and is standard practice in most of Europe.
4Knight Lee
Yeah, it's possible when you fear the other side seizing power, you start to want more power yourself.

I largely agree with other comments - this post discusses the soft problem much more than the hard, and never really makes any statement on why the things it describes lead to qualia. It's great to know what in the brain is doing it, but why does *doing it* cause me to exist?

 

Additionally, not sure if it was, but this post gives large written-by-LLM 'vibes', mainly the 'Hook - question' headers constantly, as well as the damning "Let's refine, critique, or dismantle this model through rigorous discussion." At the end. I get the idea a human prompted this post of of some model, given the style I think 4o? 

(Other than the thoughts on the consequences of said idea) This idea largely seems like a rehash of https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators (and frankly, so does the three layer model, but that does go into more mechanistic territory and I think it complements simulator theory well)

4Jan_Kulveit
It is not and you are mistaken - I actually recently wrote a text about the type of error you are making. 

https://www.theverge.com/news/618109/grok-blocked-elon-musk-trump-misinformation

https://www.businessinsider.com/grok-3-censor-musk-trump-misinformation-xai-openai-2025-2?op=1

The explanation that it was done by "a new hire" is a classic and easy scapegoat. It's much more straight forward to believe Musk himself wanted this done, and walked it back when it was clear it was more obvious than intended. 

2mako yass
Yeah, I'd seen this. The fact that grok was ever consistently saying this kind of thing is evidence, though not proof, that they actually may have a culture of generally not distorting its reasoning, they could have introduced propaganda policies at training time, it seems like they haven't done that, instead they decided to just insert some pretty specific prompts that, I'd guess, were probably going to be temporary. It's real bad, but it's not bad enough for me to shoot yet.
6Richard_Ngo
We disagree on which explanation is more straightforward, but regardless, that type of inference is very different from "literal written evidence".

So how do you prevent that? Well, if you're Elon or somebody who thinks similarly, you try and prevent it using decentralization. You’re like: man, we really don't want AI to be concentrated in the hands of a few people or to be concentrated in the hands of a few AIs. (I think both of these are kind of agnostic as to whether it's humans or AIs who are the misaligned agents, if you will.) And this is kind of the platform that Republicans now (and West Coast elites) are running on. It's this decentralization, freedom, AI safety via openness. Elon wants xAI t

... (read more)
4mako yass
I'd like to see this

I think you might've gotten a bit too lost in the theory and theatrics of the model having a "superego". It's been known for awhile now that fine tuning instruct or chat tuned models tends to degrade performance and instruction following - pretty much every local LLM tuned for "storytelling" or other specialized tasks gets worse (sometimes a lot worse) at most benchmarks. This is a simple case of (not very, in this case) catastrophic forgetting, standard neural network behavior.

This is not the case of simple forgetting. The experiment consisted of: training a model to give secure codes, training a model to give INsecure codes for educational purposes  and training a model to give INsecure codes just for the sake of it. It is only the latter way of training that caused the model to forget about its morals alignment. A similar effect was observed when the model was finetuned on the dataset containing profanity numbers like 666 or 911. 

Is it also the case for other models like DeepSeek?

I agree with the statement (AI control in increasing risk) but moreso because I believe that the people currently in control of frontier AI development are, themselves, deeply misaligned against the interests of humanity overall. I see it often here that there is little considering of what goals the AI would be aligned to.

I do not intend to be rude by saying this, but I firmly believe you vastly overestimate how capable modern VLMs are and how capable LLMs are at performing tasks in a list, breaking down tasks into sub-tasks, and knowing when they've completed a task. AutoGPT and equivalents have not gotten significantly more capable since they first arose a year or two ago, despite the ability for new LLMs to call functions (which they have always been able to do with the slightest in-context reasoning), and it is unlikely they will ever get better until a more linear, rew... (read more)